Trevor Grant created MAHOUT-1950:
------------------------------------

             Summary: Unread Block Data in Spark Shell Pseudo Cluster
                 Key: MAHOUT-1950
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1950
             Project: Mahout
          Issue Type: Bug
          Components: Mahout spark shell
    Affects Versions: 0.13.0
         Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all 
observed)
            Reporter: Trevor Grant
            Assignee: Trevor Grant
            Priority: Blocker


When doing an operation in the Spark Shell on a Pseudo Cluster, a 
`java.lang.IllegalStateException: unread block data` error is thrown. 

Research and stack trace implies there is some issue with serialization.  Other 
issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped 
around.

Toying has shown that:
`$SPARK_HOME/bin/spark-shell --jars 
"/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar"
 -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false 
--conf 
spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator 
--conf spark.kryoserializer.buffer=32k --conf 
spark.kryoserializer.buffer.max=600m --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer`

works, and should be used in place of:
https://github.com/apache/mahout/blob/master/bin/mahout#L294



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to