Heck if I know. Stick with what you wrote until you decide it does not work.

On Mon, Dec 19, 2011 at 12:08 AM, Raphael Cendrillon
<cendrillon1...@gmail.com> wrote:
> That's a very good point.  Using this type of framework will make things much 
> cleaner.
>
> This comment (from the top of the TupleWritable file) is what makes me a 
> little concerned:
>
> This is *not* a general-purpose tuple type. In almost all cases, users are 
> encouraged to implement their own serializable types, which can perform 
> better validation and provide more efficient encodings than this class is 
> capable. TupleWritable relies on the join framework for type safety and 
> assumes its instances will rarely be persisted, assumptions not only 
> incompatible with, but contrary to the general case.
>
> If we don't mind storing the class name, would it be better to use 
> ObjectWritable for the vector, or whatever else happens to be there?
>
>
> On 18 Dec, 2011, at 11:26 PM, Lance Norskog wrote:
>
>> But the Writables in each tuple include a vector which could be
>> hundreds of doubles. It's not a big deal.
>>
>> On Sun, Dec 18, 2011 at 9:29 PM, Raphael Cendrillon
>> <cendrillon1...@gmail.com> wrote:
>>> Yes, but tuplewritable is pretty inefficient since it stores the classname 
>>> with every record.  This seems wasteful given that the class is always the 
>>> same.
>>>
>>> On 18 Dec, 2011, at 9:19 PM, Lance Norskog wrote:
>>>
>>>> JIRA is acting up, so posting here instead.
>>>>
>>>> You have already made RandomPermuteJob extend AbstractJob. Never mind.
>>>>
>>>> bq. Does this seem like a reasonable approach? It would require that a
>>>> class be created for each object type of interest which is somewhat
>>>> painfull. However I can't see a simpler approach since
>>>> setMapOutputValueClass() needs to take a class that has a default
>>>> constructor (and PairWritable doesn't have a default constructor since
>>>> it doesn't know how to call new for first and second since it doesn't
>>>> know what class first and second belong to).
>>>>
>>>> TupleWritable handles this by writing the classname. Looking at this
>>>> again, can't this just use TupleWritable?
>>>>
>>>> http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson.hadoop/hadoop-core/0.19.1-hudson-3/org/apache/hadoop/mapred/join/TupleWritable.java
>>>>
>>>> On Sun, Dec 18, 2011 at 7:48 PM, Raphael Cendrillon (Commented) (JIRA)
>>>> <j...@apache.org> wrote:
>>>>>
>>>>>    [ 
>>>>> https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172021#comment-13172021
>>>>>  ]
>>>>>
>>>>> Raphael Cendrillon commented on MAHOUT-904:
>>>>> -------------------------------------------
>>>>>
>>>>> Hi Lance. Is that a general comment, or specifically for the issue 
>>>>> regarding PairWritable/IntVectorWritable?
>>>>>
>>>>>> SplitInput should support randomizing the input
>>>>>> -----------------------------------------------
>>>>>>
>>>>>>                 Key: MAHOUT-904
>>>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-904
>>>>>>             Project: Mahout
>>>>>>          Issue Type: Improvement
>>>>>>            Reporter: Grant Ingersoll
>>>>>>            Assignee: Raphael Cendrillon
>>>>>>              Labels: MAHOUT_INTRO_CONTRIBUTE
>>>>>>         Attachments: MAHOUT-904.patch, MAHOUT-904.patch, MAHOUT-904.patch
>>>>>>
>>>>>>
>>>>>> For some learning tasks, we need the input to be randomized (SGD) 
>>>>>> instead of blocks of labels all at once.  SplitInput is a useful tool 
>>>>>> for setting up train/test files but it currently doesn't support 
>>>>>> randomizing the input.
>>>>>
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> If you think it was sent incorrectly, please contact your JIRA 
>>>>> administrators: 
>>>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>>>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goks...@gmail.com
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to