[
https://issues.apache.org/jira/browse/MAHOUT-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903320#comment-15903320
]
Pat Ferrel commented on MAHOUT-1951:
------------------------------------
A quick way to test this is:
1) get Spark and HDFS running locally in pseudo-cluster mode.
2) build the version of Mahout under test I use simply "mvn clean install
-DskipTests"
3) "hdfs dfs -rm -r test-results" to removes any old results
4) run the script below and look for exceptions in the output, they will look
like the above errors
#!/usr/bin/env bash
#begin script
mahout spark-itemsimilarity \
--input test.csv \
--output test-result \
--master spark://Maclaurin.local:7077 \
--filter1 purchase \
--filter2 view \
--itemIDColumn 2 \
--rowIDColumn 0 \
--filterColumn 1
#end-script
test.csv file for the script
u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus
u2,purchase,galaxy
u3,purchase,surface
u4,purchase,iphone
u4,purchase,galaxy
u1,view,iphone
u1,view,ipad
u1,view,nexus
u1,view,galaxy
u2,view,iphone
u2,view,ipad
u2,view,nexus
u2,view,galaxy
u3,view,surface
u3,view,nexus
u4,view,iphone
u4,view,ipad
> Drivers don't run with remote Spark
> -----------------------------------
>
> Key: MAHOUT-1951
> URL: https://issues.apache.org/jira/browse/MAHOUT-1951
> Project: Mahout
> Issue Type: Bug
> Components: Classification, CLI, Collaborative Filtering
> Affects Versions: 0.13.0
> Environment: The command line drivers spark-itemsimilarity and
> spark-naivebayes using a remote or pseudo-clustered Spark
> Reporter: Pat Ferrel
> Assignee: Pat Ferrel
> Priority: Blocker
> Fix For: 0.13.0
>
>
> Missing classes when running these jobs because the dependencies-reduced jar,
> passed to Spark for serialization purposes, does not contain all needed
> classes.
> Found by a user.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)