[jira] [Commented] (MAHOUT-1951) Drivers don't run with remote Spark

Pat Ferrel (JIRA) Thu, 09 Mar 2017 08:28:53 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903320#comment-15903320
 ]


Pat Ferrel commented on MAHOUT-1951:
------------------------------------

A quick way to test this is:

1) get Spark and HDFS running locally in pseudo-cluster mode.
2) build the version of Mahout under test I use simply "mvn clean install 
-DskipTests"
3) "hdfs dfs -rm -r test-results" to removes any old results
4) run the script below and look for exceptions in the output, they will look 
like the above errors

#!/usr/bin/env bash
#begin script
mahout spark-itemsimilarity \
    --input test.csv \
    --output test-result \
    --master spark://Maclaurin.local:7077 \
    --filter1 purchase \
    --filter2 view \
    --itemIDColumn 2 \
    --rowIDColumn 0 \
    --filterColumn 1
#end-script

test.csv file for the script

u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus
u2,purchase,galaxy
u3,purchase,surface
u4,purchase,iphone
u4,purchase,galaxy
u1,view,iphone
u1,view,ipad
u1,view,nexus
u1,view,galaxy
u2,view,iphone
u2,view,ipad
u2,view,nexus
u2,view,galaxy
u3,view,surface
u3,view,nexus
u4,view,iphone
u4,view,ipad


> Drivers don't run with remote Spark
> -----------------------------------
>
>                 Key: MAHOUT-1951
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1951
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification, CLI, Collaborative Filtering
>    Affects Versions: 0.13.0
>         Environment: The command line drivers spark-itemsimilarity and 
> spark-naivebayes using a remote or pseudo-clustered Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> Missing classes when running these jobs because the dependencies-reduced jar, 
> passed to Spark for serialization purposes, does not contain all needed 
> classes.
> Found by a user. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (MAHOUT-1951) Drivers don't run with remote Spark

Reply via email to