[
https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852977#comment-15852977
]
ASF GitHub Bot commented on MAHOUT-1936:
----------------------------------------
Github user andrewpalumbo commented on a diff in the pull request:
https://github.com/apache/mahout/pull/278#discussion_r99477362
--- Diff:
math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala
---
@@ -38,11 +38,13 @@ class AsFactor extends PreprocessorFitter {
import org.apache.mahout.math.function.VectorFunction
val factorMap = input.allreduceBlock(
- { case (keys, block: Matrix) =>
+ { case (keys, block: Matrix) => block },
+ { case (oldM: Matrix, newM: Matrix) =>
// someday we'll replace this with block.max: Vector
// or better yet- block.distinct
- dense(block.aggregateColumns( new VectorFunction {
- def apply(f: Vector): Double = f.max
+
+ dense((oldM rbind newM).aggregateColumns( new VectorFunction {
--- End diff --
+1
> FactorMap finds column maximums incorrectly on large data sets
> --------------------------------------------------------------
>
> Key: MAHOUT-1936
> URL: https://issues.apache.org/jira/browse/MAHOUT-1936
> Project: Mahout
> Issue Type: Bug
> Components: Algorithms
> Affects Versions: 0.13.0
> Reporter: Trevor Grant
> Assignee: Trevor Grant
> Fix For: 0.13.0
>
>
> FactorMap's fit method does not properly find the maximum of the column.
> Likely due to an impropper allreduceBlock here
> https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
> Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)