But the Writables in each tuple include a vector which could be
hundreds of doubles. It's not a big deal.

On Sun, Dec 18, 2011 at 9:29 PM, Raphael Cendrillon
<cendrillon1...@gmail.com> wrote:
> Yes, but tuplewritable is pretty inefficient since it stores the classname 
> with every record.  This seems wasteful given that the class is always the 
> same.
>
> On 18 Dec, 2011, at 9:19 PM, Lance Norskog wrote:
>
>> JIRA is acting up, so posting here instead.
>>
>> You have already made RandomPermuteJob extend AbstractJob. Never mind.
>>
>> bq. Does this seem like a reasonable approach? It would require that a
>> class be created for each object type of interest which is somewhat
>> painfull. However I can't see a simpler approach since
>> setMapOutputValueClass() needs to take a class that has a default
>> constructor (and PairWritable doesn't have a default constructor since
>> it doesn't know how to call new for first and second since it doesn't
>> know what class first and second belong to).
>>
>> TupleWritable handles this by writing the classname. Looking at this
>> again, can't this just use TupleWritable?
>>
>> http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson.hadoop/hadoop-core/0.19.1-hudson-3/org/apache/hadoop/mapred/join/TupleWritable.java
>>
>> On Sun, Dec 18, 2011 at 7:48 PM, Raphael Cendrillon (Commented) (JIRA)
>> <j...@apache.org> wrote:
>>>
>>>    [ 
>>> https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172021#comment-13172021
>>>  ]
>>>
>>> Raphael Cendrillon commented on MAHOUT-904:
>>> -------------------------------------------
>>>
>>> Hi Lance. Is that a general comment, or specifically for the issue 
>>> regarding PairWritable/IntVectorWritable?
>>>
>>>> SplitInput should support randomizing the input
>>>> -----------------------------------------------
>>>>
>>>>                 Key: MAHOUT-904
>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-904
>>>>             Project: Mahout
>>>>          Issue Type: Improvement
>>>>            Reporter: Grant Ingersoll
>>>>            Assignee: Raphael Cendrillon
>>>>              Labels: MAHOUT_INTRO_CONTRIBUTE
>>>>         Attachments: MAHOUT-904.patch, MAHOUT-904.patch, MAHOUT-904.patch
>>>>
>>>>
>>>> For some learning tasks, we need the input to be randomized (SGD) instead 
>>>> of blocks of labels all at once.  SplitInput is a useful tool for setting 
>>>> up train/test files but it currently doesn't support randomizing the input.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA 
>>> administrators: 
>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to