[ 
https://issues.apache.org/jira/browse/MAHOUT-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071001#comment-14071001
 ] 

ASF GitHub Bot commented on MAHOUT-1597:
----------------------------------------

Github user avati commented on the pull request:

    https://github.com/apache/mahout/pull/33#issuecomment-49807438
  
    The missing rows seems to be a spark specific characteristic (For e.g all 
matrices in h2o are fundamentally sparse (they are just called dense if they 
happen to have all rows))
    
    I think (not completely sure yet), that the canHaveMissingRows could be 
moved into DrmRddInput instead of DrmLike and have it propagate recursively 
through the plan DAG as it is evaluated in tr2phys().
    
    Each operator, like At.scala, can inspect srcA.canHaveMissingRows instead 
of op.canHaveMissingRows. This way DrmLike would not be polluted.


> A + 1.0 (element-wise scala operation) gives wrong result if rdd is missing 
> rows, Spark side
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1597
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1597
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.9
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> {code}
>     // Concoct an rdd with missing rows
>     val aRdd: DrmRdd[Int] = sc.parallelize(
>       0 -> dvec(1, 2, 3) ::
>           3 -> dvec(3, 4, 5) :: Nil
>     ).map { case (key, vec) => key -> (vec: Vector)}
>     val drmA = drmWrap(rdd = aRdd)
>     val controlB = inCoreA + 1.0
>     val drmB = drmA + 1.0
>     (drmB -: controlB).norm should be < 1e-10
> {code}
> should not fail.
> it was failing due to elementwise scalar operator only evaluates rows 
> actually present in dataset. 
> In case of Int-keyed row matrices, there are implied rows that yet may not be 
> present in RDD. 
> Our goal is to detect the condition and evaluate missing rows prior to 
> physical operators that don't work with missing implied rows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to