Re: How to trace/debug serialization?
Will this work even with Kryo Serialization ? Now spark.closure.serializer must be org.apache.spark.serializer.JavaSerializer. Therefore the serialization closure functions won’t be involved with Kryo. Kryo is only used to serialize the data. Best Regards, Shixiong Zhu 2014-11-07 12:27 GMT+08:00 nsareen nsar...@gmail.com: Will this work even with Kryo Serialization ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230p18319.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How to trace/debug serialization?
In my spark job, I have a loop something like this: bla.forEachRdd(rdd = { //init some vars rdd.forEachPartition(partiton = { //init some vars partition.foreach(kv = { ... I am seeing serialization errors (unread block data), because I think spark is trying to serialize the whole containing class. But I have been careful not to reference instance vars in the block. Is there a way to see exactly what class is failing serialization, and maybe how spark decided it needs to be serialized? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to trace/debug serialization?
From what i've observed, there are no debug logs while serialization takes place. You can see the source code if you want, TaskSetManager class has some functions for serialization. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230p18244.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to trace/debug serialization?
This is more about mechanism of Scala compiler and Java serialization. By default, Java will serialize an object deeply and recursively. Secondly, how Scala compiler generates the byte codes does matter. I'm not a Scala expert. Here is just some observation: 1. If the function does not use any outer variable, it should be able to serialized. 2. If the function uses some outer variables in a Scala `object`, it does not require the outer `object` and variables be Serializable. 3. If the function uses some outer variables in a Scala `class` instance, this class should be Serializable because the function will have a field which refer to this outer class instance. 4. If the function uses some outer variables in a method, these variables should be Serializable because the function will have a field which refer to them. At last, javap is a friend to diagnose such serialization error. Best Regards, Shixiong Zhu 2014-11-06 7:56 GMT+08:00 ankits ankitso...@gmail.com: In my spark job, I have a loop something like this: bla.forEachRdd(rdd = { //init some vars rdd.forEachPartition(partiton = { //init some vars partition.foreach(kv = { ... I am seeing serialization errors (unread block data), because I think spark is trying to serialize the whole containing class. But I have been careful not to reference instance vars in the block. Is there a way to see exactly what class is failing serialization, and maybe how spark decided it needs to be serialized? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org