I have no trouble reading from HDFS using the spark-shell. I assume I would
also have no trouble writing but that is using the basic shell that comes with
Spark.
scala> val textFile = sc.textFile("xrsj/ratings_data.txt")
scala> textFile.count()
This works with local, pseudo-cluster, or even full cluster. I just can’t write
using the RSJ code.
Are you using your custom mahout+spark Scala shell on github, doing a writeDRM?
At home you are using cdh 4.3.2 on a single machine pseudo-cluster? Which
versions of hadoop and spark are you running? Did you install spark outside of
cdh? What os?
If nothing else I can try to duplicate the environment. We know your writeDRM
works so if I can duplicate that I can start debugging the RSJ stuff.
BTW data for the RSJ code is here:
https://cloud.occamsmachete.com/public.php?service=files&t=0011a9651691ee38e905a36e99a0f125
On Apr 17, 2014, at 1:23 PM, Dmitriy Lyubimov (JIRA) <[email protected]> wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973347#comment-13973347
]
Dmitriy Lyubimov commented on MAHOUT-1464:
------------------------------------------
Hm. At home i don't have any trouble reading/writing from/to hdfs.
There are some minor differences in configuration plus i am running hdfs cdh
4.3.2 at home vs. 4.3.0 at work computer. That's the only difference.
(some patchlevel specific?)
> Cooccurrence Analysis on Spark
> ------------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM
> can be used as input.
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has
> several applications including cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)