Trevor Grant created MAHOUT-1950: ------------------------------------ Summary: Unread Block Data in Spark Shell Pseudo Cluster Key: MAHOUT-1950 URL: https://issues.apache.org/jira/browse/MAHOUT-1950 Project: Mahout Issue Type: Bug Components: Mahout spark shell Affects Versions: 0.13.0 Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed) Reporter: Trevor Grant Assignee: Trevor Grant Priority: Blocker
When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown. Research and stack trace implies there is some issue with serialization. Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around. Toying has shown that: `$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer` works, and should be used in place of: https://github.com/apache/mahout/blob/master/bin/mahout#L294 -- This message was sent by Atlassian JIRA (v6.3.15#6346)