See suggestion in the review board (if i use it correctly, i am still
not sure what to do about it :)

On Mon, Dec 12, 2011 at 12:28 AM, Raphael Cendrillon
<cendrillon1...@gmail.com> wrote:
> Thanks Dmitry. I think I understand more clearly now. Are you saying I should 
> make a map only job and then just use some post-processing to manually 
> combine the map outputs?
>
> How many rows should I process per map job?
>
> On Dec 12, 2011, at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>
>>> A combiner is definitely the next step.
>>
>> It is definitely not. Why do you need to sort???
>>
>>> One question, is there already a writable for tuples of e.g. int and 
>>> Vector, or should I just write one from scratch?
>>
>> From scratch.
>>
>> Or, you can save n as first element in the vector, why not. Your front
>> end code would know how to re-shuffle that.
>> But if not that, then custom writable. TupleWritable saves the class
>> with the value. That's exactly why they invented writables and not
>> using java serialization: you must not save type with each value.
>>
>> -d
>>
>>
>> On Sun, Dec 11, 2011 at 8:14 PM, Raphael Cendrillon (Commented) (JIRA)
>> <j...@apache.org> wrote:
>>>
>>>    [ 
>>> https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167341#comment-13167341
>>>  ]
>>>
>>> Raphael Cendrillon commented on MAHOUT-923:
>>> -------------------------------------------
>>>
>>> Thanks Lance. A combiner is definitely the next step. One question, is 
>>> there already a writable for tuples of e.g. int and Vector, or should I 
>>> just write one from scratch? I know there is TupleWritable, but from what 
>>> I've read online it's better to avoid that unless you're doing a multiple 
>>> input join.
>>>
>>> Regarding the class for the output vector, are you saying that instead of 
>>> inhereting the class from the rows of the DistributedRowMatrix you'd rather 
>>> be able to specify this manually?
>>>
>>>
>>>
>>>> Row mean job for PCA
>>>> --------------------
>>>>
>>>>                 Key: MAHOUT-923
>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-923
>>>>             Project: Mahout
>>>>          Issue Type: Improvement
>>>>          Components: Math
>>>>    Affects Versions: 0.6
>>>>            Reporter: Raphael Cendrillon
>>>>            Assignee: Raphael Cendrillon
>>>>             Fix For: Backlog
>>>>
>>>>         Attachments: MAHOUT-923.patch
>>>>
>>>>
>>>> Add map reduce job for calculating mean row (column-wise mean) of a 
>>>> Distributed Row Matrix for use in PCA.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA 
>>> administrators: 
>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>>
>>>

Reply via email to