[ https://issues.apache.org/jira/browse/MAHOUT-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070909#comment-14070909 ]
ASF GitHub Bot commented on MAHOUT-1597: ---------------------------------------- Github user dlyubimov commented on the pull request: https://github.com/apache/mahout/pull/33#issuecomment-49802148 Lazy evaluation. i.e. if element-wise scalar execution is not put into physical plan, then fix will never be evaluated. Similarly probably could be fixed in other conditions. Also should survive "masking" stuff (such as mapBlock() or other unary operators in between source rdd and elementwise scalar). > A + 1.0 (element-wise scala operation) gives wrong result if rdd is missing > rows, Spark side > -------------------------------------------------------------------------------------------- > > Key: MAHOUT-1597 > URL: https://issues.apache.org/jira/browse/MAHOUT-1597 > Project: Mahout > Issue Type: Bug > Affects Versions: 0.9 > Reporter: Dmitriy Lyubimov > Assignee: Dmitriy Lyubimov > Fix For: 1.0 > > > {code} > // Concoct an rdd with missing rows > val aRdd: DrmRdd[Int] = sc.parallelize( > 0 -> dvec(1, 2, 3) :: > 3 -> dvec(3, 4, 5) :: Nil > ).map { case (key, vec) => key -> (vec: Vector)} > val drmA = drmWrap(rdd = aRdd) > val controlB = inCoreA + 1.0 > val drmB = drmA + 1.0 > (drmB -: controlB).norm should be < 1e-10 > {code} > should not fail. > it was failing due to elementwise scalar operator only evaluates rows > actually present in dataset. > In case of Int-keyed row matrices, there are implied rows that yet may not be > present in RDD. > Our goal is to detect the condition and evaluate missing rows prior to > physical operators that don't work with missing implied rows. -- This message was sent by Atlassian JIRA (v6.2#6252)