Aaron Defazio created SPARK-6520: ------------------------------------ Summary: Kyro serialization broken in the shell Key: SPARK-6520 URL: https://issues.apache.org/jira/browse/SPARK-6520 Project: Spark Issue Type: Bug Affects Versions: 1.3.0 Reporter: Aaron Defazio
If I start spark as follows: {quote} ~/spark-1.3.0-bin-hadoop2.4/bin/spark-shell --master local[1] --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" {quote} Then using :paste, run {quote} case class Example(foo : String, bar : String) val ex = sc.parallelize(List(Example("foo1", "bar1"), Example("foo2", "bar2"))).collect() {quote} I get the error: {quote} $VAL10 ($iwC) $outer ($iwC$$iwC) $outer ($iwC$$iwC$Example) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:979) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1873) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) {quote} As far as I can tell, when using :paste, Kyro serialization doesn't work for classes defined in within the same paste. It does work when the statements are entered without paste. This issue seems serious to me, since Kyro serialization is virtually mandatory for performance (20x slower with default serialization on my problem), and I'm assuming feature parity between spark-shell and spark-submit is a goal. Note that this is different from SPARK-6497, which covers the case when Kyro is set to require registration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org