[ 
https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171946#comment-13171946
 ] 

Dmitriy Lyubimov commented on MAHOUT-923:
-----------------------------------------

Raphael, thank you for seeing this thru. 

Q: 
1) -- why do you need vector class for the accumulator now? mean is kind of 
expected to be dense in the end, if not in the mappers then at least in the 
reducer for sure. And secondly, if you want to do this, why don't your api 
would accept a class instance, not a "short" name? that would be consistent 
with the Hadoop Job and file format apis which kind of take classes, not 
strings. 

2) --  I know you have a unit test, but did you test it on a simulated input, 
like say 2G big? if not, i will have to test it before you proceed.

As a next step, i guess i need to try it out to see if it works on various kind 
of inputs. 
                
> Row mean job for PCA
> --------------------
>
>                 Key: MAHOUT-923
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-923
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Raphael Cendrillon
>            Assignee: Raphael Cendrillon
>             Fix For: Backlog
>
>         Attachments: MAHOUT-923.patch, MAHOUT-923.patch, MAHOUT-923.patch
>
>
> Add map reduce job for calculating mean row (column-wise mean) of a 
> Distributed Row Matrix for use in PCA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to