[ https://issues.apache.org/jira/browse/MAHOUT-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903320#comment-15903320 ]
Pat Ferrel commented on MAHOUT-1951: ------------------------------------ A quick way to test this is: 1) get Spark and HDFS running locally in pseudo-cluster mode. 2) build the version of Mahout under test I use simply "mvn clean install -DskipTests" 3) "hdfs dfs -rm -r test-results" to removes any old results 4) run the script below and look for exceptions in the output, they will look like the above errors #!/usr/bin/env bash #begin script mahout spark-itemsimilarity \ --input test.csv \ --output test-result \ --master spark://Maclaurin.local:7077 \ --filter1 purchase \ --filter2 view \ --itemIDColumn 2 \ --rowIDColumn 0 \ --filterColumn 1 #end-script test.csv file for the script u1,purchase,iphone u1,purchase,ipad u2,purchase,nexus u2,purchase,galaxy u3,purchase,surface u4,purchase,iphone u4,purchase,galaxy u1,view,iphone u1,view,ipad u1,view,nexus u1,view,galaxy u2,view,iphone u2,view,ipad u2,view,nexus u2,view,galaxy u3,view,surface u3,view,nexus u4,view,iphone u4,view,ipad > Drivers don't run with remote Spark > ----------------------------------- > > Key: MAHOUT-1951 > URL: https://issues.apache.org/jira/browse/MAHOUT-1951 > Project: Mahout > Issue Type: Bug > Components: Classification, CLI, Collaborative Filtering > Affects Versions: 0.13.0 > Environment: The command line drivers spark-itemsimilarity and > spark-naivebayes using a remote or pseudo-clustered Spark > Reporter: Pat Ferrel > Assignee: Pat Ferrel > Priority: Blocker > Fix For: 0.13.0 > > > Missing classes when running these jobs because the dependencies-reduced jar, > passed to Spark for serialization purposes, does not contain all needed > classes. > Found by a user. -- This message was sent by Atlassian JIRA (v6.3.15#6346)