[ https://issues.apache.org/jira/browse/MAHOUT-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pat Ferrel updated MAHOUT-1951: ------------------------------- User found the following error running the spark-itemsimilarity driver (affect the NB driver too) on a remote Spark master: 17/03/03 10:08:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, reco-master): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryo at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165) ... Caused by: java.lang.ClassNotFoundException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123) at scala.Option.map(Option.scala:145) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:123) When I run the exactly same command on the 0.12.2 release distribution against the same Spark cluster, the command completes sucessfully. My Environment is: * Ubuntu 14.04 * Oracle-JDK 1.8.0_121 * Spark standalone cluster using this distribution: http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz * Mahout 0.13.0-RC: https://repository.apache.org/content/repositories/orgapachemahout-1034/org/apache/mahout/apache-mahout-distribution/0.13.0/apache-mahout-distribution-0.13.0.tar.gz > Drivers don't run with remote Spark > ----------------------------------- > > Key: MAHOUT-1951 > URL: https://issues.apache.org/jira/browse/MAHOUT-1951 > Project: Mahout > Issue Type: Bug > Components: CLI > Affects Versions: 0.13.0 > Environment: The command line drivers spark-itemsimilarity and > spark-naivebayes using a remote or pseudo-clustered Spark > Reporter: Pat Ferrel > Assignee: Pat Ferrel > Priority: Blocker > Fix For: 0.13.0 > > > Missing classes when running these jobs because the dependencies-reduced jar, > passed to Spark for serialization purposes, does not contain all needed > classes. > Found by a user. -- This message was sent by Atlassian JIRA (v6.3.15#6346)