[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Dmitriy Lyubimov (JIRA) Mon, 14 Apr 2014 10:59:33 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968587#comment-13968587
 ]


Dmitriy Lyubimov commented on MAHOUT-1464:
------------------------------------------

Running using Spark Client (inside the cluster) is a new thing in 0.9. Assuming 
it is stable, it is not supported at this point and going this way will have 
multiple hurdles. 

for one, mahout spark context requires MAHOUT_HOME to set all mahout binaries 
properly. The assumption is one needs Mahout's binaries only on driver's side, 
but if driver runs inside remote cluster, this will fail. So our batches should 
really be started in one of the ways i described in earlier email. 

Second, i don't think driver can load classes reliably because it includes 
Mahout dependencies such as mahout-math. That's another reason why using Client 
seems problematic to me -- it assumes one has his _entire_ application within 
that jar. So not true.

That said, your attempt doesn't exhibit any direct ClassNotFounds and looks 
more like akka communication issues i.e. spark setup issues. One thing about 
Spark is that requires direct port connectivity not only between cluster nodes 
but also back to client. In particular it means your client must not firewall 
incoming calls and must not be behind NAT. (even port forwarding doesn't really 
solve networking issues here). So my first bet would be on akka connectivity 
issues between cluster and back to client.




> Cooccurrence Analysis on Spark
> ------------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
> can be used as input. 
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
> several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Reply via email to