-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3147/#review3838
-----------------------------------------------------------


Hm. I hope i did not read the code or miss something. 

1 -- i am not sure this will actually work as intended unless # of reducers is 
corced to 1, of which i see no mention in the code. 
2 -- mappers do nothing, passing on all the row pressure to sort which is 
absolutely not necessary. Even if you use combiners. This is going to be 
especially the case if you coerce 1 reducer an no combiners. IMO mean 
computation should be pushed up to mappers to avoid sort pressures of map 
reduce. Then reduction becomes largely symbolical(but you do need pass on the # 
of rows mapper has seen, to the reducer, in order for that operation to apply 
correctly).
3 -- i am not sure -- is NullWritable as a key legit? In my experience sequence 
file reader cannot instantiate it because NullWritable is a singleton and its 
creation is prohibited by making constructor private.

- Dmitriy


On 2011-12-12 00:30:24, Raphael Cendrillon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3147/
> -----------------------------------------------------------
> 
> (Updated 2011-12-12 00:30:24)
> 
> 
> Review request for mahout.
> 
> 
> Summary
> -------
> 
> Here's a patch with a simple job to calculate the row mean (column-wise 
> mean). One outstanding issue is the combiner, this requires a wrtiable class 
> IntVectorTupleWritable, where the Int stores the number of rows, and the 
> Vector stores the column-wise sum.
> 
> 
> This addresses bug MAHOUT-923.
>     https://issues.apache.org/jira/browse/MAHOUT-923
> 
> 
> Diffs
> -----
> 
>   
> /trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java
>  1213095 
>   
> /trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowMeanJob.java 
> PRE-CREATION 
>   
> /trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java
>  1213095 
> 
> Diff: https://reviews.apache.org/r/3147/diff
> 
> 
> Testing
> -------
> 
> Junit test
> 
> 
> Thanks,
> 
> Raphael
> 
>

Reply via email to