See suggestion in the review board (if i use it correctly, i am still not sure what to do about it :)
On Mon, Dec 12, 2011 at 12:28 AM, Raphael Cendrillon <cendrillon1...@gmail.com> wrote: > Thanks Dmitry. I think I understand more clearly now. Are you saying I should > make a map only job and then just use some post-processing to manually > combine the map outputs? > > How many rows should I process per map job? > > On Dec 12, 2011, at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > >>> A combiner is definitely the next step. >> >> It is definitely not. Why do you need to sort??? >> >>> One question, is there already a writable for tuples of e.g. int and >>> Vector, or should I just write one from scratch? >> >> From scratch. >> >> Or, you can save n as first element in the vector, why not. Your front >> end code would know how to re-shuffle that. >> But if not that, then custom writable. TupleWritable saves the class >> with the value. That's exactly why they invented writables and not >> using java serialization: you must not save type with each value. >> >> -d >> >> >> On Sun, Dec 11, 2011 at 8:14 PM, Raphael Cendrillon (Commented) (JIRA) >> <j...@apache.org> wrote: >>> >>> [ >>> https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167341#comment-13167341 >>> ] >>> >>> Raphael Cendrillon commented on MAHOUT-923: >>> ------------------------------------------- >>> >>> Thanks Lance. A combiner is definitely the next step. One question, is >>> there already a writable for tuples of e.g. int and Vector, or should I >>> just write one from scratch? I know there is TupleWritable, but from what >>> I've read online it's better to avoid that unless you're doing a multiple >>> input join. >>> >>> Regarding the class for the output vector, are you saying that instead of >>> inhereting the class from the rows of the DistributedRowMatrix you'd rather >>> be able to specify this manually? >>> >>> >>> >>>> Row mean job for PCA >>>> -------------------- >>>> >>>> Key: MAHOUT-923 >>>> URL: https://issues.apache.org/jira/browse/MAHOUT-923 >>>> Project: Mahout >>>> Issue Type: Improvement >>>> Components: Math >>>> Affects Versions: 0.6 >>>> Reporter: Raphael Cendrillon >>>> Assignee: Raphael Cendrillon >>>> Fix For: Backlog >>>> >>>> Attachments: MAHOUT-923.patch >>>> >>>> >>>> Add map reduce job for calculating mean row (column-wise mean) of a >>>> Distributed Row Matrix for use in PCA. >>> >>> -- >>> This message is automatically generated by JIRA. >>> If you think it was sent incorrectly, please contact your JIRA >>> administrators: >>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >>> For more information on JIRA, see: http://www.atlassian.com/software/jira >>> >>>