[ 
https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852977#comment-15852977
 ] 

ASF GitHub Bot commented on MAHOUT-1936:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/278#discussion_r99477362
  
    --- Diff: 
math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala
 ---
    @@ -38,11 +38,13 @@ class AsFactor extends PreprocessorFitter {
     
         import org.apache.mahout.math.function.VectorFunction
         val factorMap = input.allreduceBlock(
    -      { case (keys, block: Matrix) =>
    +      { case (keys, block: Matrix) => block },
    +      { case (oldM: Matrix, newM: Matrix) =>
             // someday we'll replace this with block.max: Vector
             // or better yet- block.distinct
    -        dense(block.aggregateColumns( new VectorFunction {
    -            def apply(f: Vector): Double = f.max
    +
    +        dense((oldM rbind newM).aggregateColumns( new VectorFunction {
    --- End diff --
    
    +1


> FactorMap finds column maximums incorrectly on large data sets
> --------------------------------------------------------------
>
>                 Key: MAHOUT-1936
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1936
>             Project: Mahout
>          Issue Type: Bug
>          Components: Algorithms
>    Affects Versions: 0.13.0
>            Reporter: Trevor Grant
>            Assignee: Trevor Grant
>             Fix For: 0.13.0
>
>
> FactorMap's fit method does not properly find the maximum of the column. 
> Likely due to an impropper allreduceBlock here
> https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
> Also, factorMap in this instance might be more appropriately named "factorMax"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to