[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168465#comment-13168465 ]
jirapos...@reviews.apache.org commented on MAHOUT-923: ------------------------------------------------------ bq. On 2011-12-13 13:08:20, Ted Dunning wrote: bq. > /trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java, line 199 bq. > <https://reviews.apache.org/r/3147/diff/5/?file=64279#file64279line199> bq. > bq. > I would really rather use standard terminology here. bq. > bq. > A mean row is a row that is that average of all others, but a row mean would mean an average of the elements a single row. The plural form, row means, indicates the means of all rows. What you are computing are the means of every column. bq. > bq. > In contrast, R, Octave and Matlab all use columnMeans as the name of the function being implemented here. Sure. In Matlab/Octave I'm used to mean(A,1) (takes the mean across the 1st dimension, ie. across rows, but done per column). I'll change this to colMeans(), which seems to be clearer. bq. On 2011-12-13 13:08:20, Ted Dunning wrote: bq. > /trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowMeanJob.java, lines 129-132 bq. > <https://reviews.apache.org/r/3147/diff/5/?file=64280#file64280line129> bq. > bq. > There are lots of lines with trailing white space. Isn't this easily suppressed? I can use sed, or perhaps there's a better way? - Raphael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/#review3874 ----------------------------------------------------------- On 2011-12-13 04:46:47, Raphael Cendrillon wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3147/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-12-13 04:46:47) bq. bq. bq. Review request for mahout, lancenorskog and Dmitriy Lyubimov. bq. bq. bq. Summary bq. ------- bq. bq. Here's a patch with a simple job to calculate the row mean (column-wise mean). One outstanding issue is the combiner, this requires a wrtiable class IntVectorTupleWritable, where the Int stores the number of rows, and the Vector stores the column-wise sum. bq. bq. bq. This addresses bug MAHOUT-923. bq. https://issues.apache.org/jira/browse/MAHOUT-923 bq. bq. bq. Diffs bq. ----- bq. bq. /trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1213474 bq. /trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowMeanJob.java PRE-CREATION bq. /trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1213474 bq. bq. Diff: https://reviews.apache.org/r/3147/diff bq. bq. bq. Testing bq. ------- bq. bq. Junit test bq. bq. bq. Thanks, bq. bq. Raphael bq. bq. > Row mean job for PCA > -------------------- > > Key: MAHOUT-923 > URL: https://issues.apache.org/jira/browse/MAHOUT-923 > Project: Mahout > Issue Type: Improvement > Components: Math > Affects Versions: 0.6 > Reporter: Raphael Cendrillon > Assignee: Raphael Cendrillon > Fix For: Backlog > > Attachments: MAHOUT-923.patch, MAHOUT-923.patch > > > Add map reduce job for calculating mean row (column-wise mean) of a > Distributed Row Matrix for use in PCA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira