Thanks Sean. Currently I'm thinking of reading out the current key class from 
the SequenceFile and just propagating it through. Do you think that's 
reasonable?

On Dec 23, 2011, at 4:52 AM, "Sean Owen (Commented) (JIRA)" <j...@apache.org> 
wrote:

> 
>    [ 
> https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175408#comment-13175408
>  ] 
> 
> Sean Owen commented on MAHOUT-904:
> ----------------------------------
> 
> (I don't know if this is a relevant comment, but we ought to be using 
> VarIntWritable and VarLongWritable, not IntWritable and LongWritable, for 
> better space savings.)
> 
>> SplitInput should support randomizing the input
>> -----------------------------------------------
>> 
>>                Key: MAHOUT-904
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-904
>>            Project: Mahout
>>         Issue Type: Improvement
>>           Reporter: Grant Ingersoll
>>           Assignee: Raphael Cendrillon
>>             Labels: MAHOUT_INTRO_CONTRIBUTE
>>        Attachments: MAHOUT-904.patch, MAHOUT-904.patch, MAHOUT-904.patch, 
>> MAHOUT-904.patch, MAHOUT-904.patch, MAHOUT-904.patch
>> 
>> 
>> For some learning tasks, we need the input to be randomized (SGD) instead of 
>> blocks of labels all at once.  SplitInput is a useful tool for setting up 
>> train/test files but it currently doesn't support randomizing the input.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA 
> administrators: 
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

Reply via email to