[ https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172012#comment-13172012 ]
Raphael Cendrillon commented on MAHOUT-904: ------------------------------------------- Thanks Grant. I'll update to drop the Pair class in and integrate into SplitInput. By the way, did you notice the way that PairWritable needs to be extended for each object type (e.g. IntVectorWritable if the object is a Vector)? Does this seem like a reasonable approach? It would require that a class be created for each object type of interest which is somewhat painfull. However I can't see a simpler approach since setMapOutputValueClass() needs to take a class that has a default constructor (and PairWritable doesn't have a default constructor since it doesn't know how to call new for first and second since it doesn't know what class first and second belong to). > SplitInput should support randomizing the input > ----------------------------------------------- > > Key: MAHOUT-904 > URL: https://issues.apache.org/jira/browse/MAHOUT-904 > Project: Mahout > Issue Type: Improvement > Reporter: Grant Ingersoll > Assignee: Raphael Cendrillon > Labels: MAHOUT_INTRO_CONTRIBUTE > Attachments: MAHOUT-904.patch, MAHOUT-904.patch, MAHOUT-904.patch > > > For some learning tasks, we need the input to be randomized (SGD) instead of > blocks of labels all at once. SplitInput is a useful tool for setting up > train/test files but it currently doesn't support randomizing the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira