The person using this job knows the right vector to use. It may be that it gets a lot of sparse vectors but will become a dense vector. Or a vector that writes to a database. Or something else. In fact, I may just want to turn a vector from Dense to Sparse, and I could achieve that with this job.
On Mon, Dec 12, 2011 at 12:06 AM, Lance Norskog <goks...@gmail.com> wrote: > To use a combiner, TupleWritable should be fine. I have not used it. > > But it will copy the entire vector. You would want to minimize this. > If this is a big problem, you can do an ugly trick: you store the > counter as the key value, but make a custom Writable that always > returns 'this equals the other'. So, all of your counters have the > same key and thus all vectors go to the same reducer. > > > > On Sun, Dec 11, 2011 at 8:14 PM, Raphael Cendrillon (Commented) (JIRA) > <j...@apache.org> wrote: >> >> [ >> https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167341#comment-13167341 >> ] >> >> Raphael Cendrillon commented on MAHOUT-923: >> ------------------------------------------- >> >> Thanks Lance. A combiner is definitely the next step. One question, is there >> already a writable for tuples of e.g. int and Vector, or should I just write >> one from scratch? I know there is TupleWritable, but from what I've read >> online it's better to avoid that unless you're doing a multiple input join. >> >> Regarding the class for the output vector, are you saying that instead of >> inhereting the class from the rows of the DistributedRowMatrix you'd rather >> be able to specify this manually? >> >> >> >>> Row mean job for PCA >>> -------------------- >>> >>> Key: MAHOUT-923 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-923 >>> Project: Mahout >>> Issue Type: Improvement >>> Components: Math >>> Affects Versions: 0.6 >>> Reporter: Raphael Cendrillon >>> Assignee: Raphael Cendrillon >>> Fix For: Backlog >>> >>> Attachments: MAHOUT-923.patch >>> >>> >>> Add map reduce job for calculating mean row (column-wise mean) of a >>> Distributed Row Matrix for use in PCA. >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators: >> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> > > > > -- > Lance Norskog > goks...@gmail.com -- Lance Norskog goks...@gmail.com