[ 
https://issues.apache.org/jira/browse/MAHOUT-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pat Ferrel updated MAHOUT-1951:
-------------------------------

User found the following error running the spark-itemsimilarity driver (affect 
the NB driver too) on a remote Spark master:

17/03/03 10:08:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
reco-master): java.io.IOException: org.apache.spark.SparkException: Failed to 
register classes with Kryo
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212)
        at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
...
Caused by: java.lang.ClassNotFoundException: 
org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
        at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
        at scala.Option.map(Option.scala:145)
        at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:123)

When I run the exactly same command on the 0.12.2 release distribution against 
the same Spark cluster, the
command completes sucessfully.

My Environment is:
* Ubuntu 14.04
* Oracle-JDK 1.8.0_121
* Spark standalone cluster using this distribution: 
http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
* Mahout 0.13.0-RC: 
https://repository.apache.org/content/repositories/orgapachemahout-1034/org/apache/mahout/apache-mahout-distribution/0.13.0/apache-mahout-distribution-0.13.0.tar.gz


> Drivers don't run with remote Spark
> -----------------------------------
>
>                 Key: MAHOUT-1951
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1951
>             Project: Mahout
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 0.13.0
>         Environment: The command line drivers spark-itemsimilarity and 
> spark-naivebayes using a remote or pseudo-clustered Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> Missing classes when running these jobs because the dependencies-reduced jar, 
> passed to Spark for serialization purposes, does not contain all needed 
> classes.
> Found by a user. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to