Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Pat Ferrel Thu, 17 Apr 2014 13:56:34 -0700

I have no trouble reading from HDFS using the spark-shell. I assume I would 
also have no trouble writing but that is using the basic shell that comes with 
Spark.

scala> val textFile = sc.textFile("xrsj/ratings_data.txt")
scala> textFile.count()

This works with local, pseudo-cluster, or even full cluster. I just can’t write 
using the RSJ code. 

Are you using your custom mahout+spark Scala shell on github, doing a writeDRM? 
At home you are using cdh 4.3.2 on a single machine pseudo-cluster? Which 
versions of hadoop and spark are you running? Did you install spark outside of 
cdh? What os?

If nothing else I can try to duplicate the environment. We know your writeDRM 
works so if I can duplicate that I can start debugging the RSJ stuff.

BTW data for the RSJ code is here: 
https://cloud.occamsmachete.com/public.php?service=files&t=0011a9651691ee38e905a36e99a0f125

On Apr 17, 2014, at 1:23 PM, Dmitriy Lyubimov (JIRA) <[email protected]> wrote:

   [ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973347#comment-13973347
 ] 

Dmitriy Lyubimov commented on MAHOUT-1464:
------------------------------------------

Hm. At home i don't have any trouble reading/writing from/to hdfs. 

There are some minor differences in configuration plus i am running hdfs cdh 
4.3.2 at home vs. 4.3.0 at work computer. That's the only difference. 

(some patchlevel specific?)

> Cooccurrence Analysis on Spark
> ------------------------------
> 
>                Key: MAHOUT-1464
>                URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>            Project: Mahout
>         Issue Type: Improvement
>         Components: Collaborative Filtering
>        Environment: hadoop, spark
>           Reporter: Pat Ferrel
>           Assignee: Sebastian Schelter
>            Fix For: 1.0
> 
>        Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
> 
> 
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
> can be used as input. 
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
> several applications including cross-action recommendations. 

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Reply via email to