Re: How to trace/debug serialization?

2014-11-06 Thread Shixiong Zhu
Will this work even with Kryo Serialization ?

Now spark.closure.serializer must be
org.apache.spark.serializer.JavaSerializer. Therefore the serialization
closure functions won’t be involved with Kryo. Kryo is only used to
serialize the data.
​

Best Regards,
Shixiong Zhu

2014-11-07 12:27 GMT+08:00 nsareen nsar...@gmail.com:

 Will this work even with Kryo Serialization ?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230p18319.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




How to trace/debug serialization?

2014-11-05 Thread ankits
In my spark job, I have a loop something like this:

bla.forEachRdd(rdd = {
  //init some vars
  rdd.forEachPartition(partiton = {
//init some vars
partition.foreach(kv = {
 ...

I am seeing serialization errors (unread block data), because I think spark
is trying to serialize the whole containing class. But I have been careful
not to reference instance vars in the block. 

Is there a way to see exactly what class is failing serialization, and maybe
how spark decided it needs to be serialized?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to trace/debug serialization?

2014-11-05 Thread nsareen
From what i've observed, there are no debug logs while serialization takes
place. You can see the source code if you want, TaskSetManager class has
some functions for serialization.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230p18244.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to trace/debug serialization?

2014-11-05 Thread Shixiong Zhu
This is more about mechanism of Scala compiler and Java serialization.

By default, Java will serialize an object deeply and recursively.
Secondly, how Scala compiler generates the byte codes does matter. I'm not
a Scala expert. Here is just some observation:

1. If the function does not use any outer variable, it should be able to
serialized.
2. If the function uses some outer variables in a Scala `object`, it does
not require the outer `object` and variables be Serializable.
3. If the function uses some outer variables in a Scala `class` instance,
this class should be Serializable because the function will have a field
which refer to this outer class instance.
4. If the function uses some outer variables in a method, these variables
should be Serializable because the function will have a field which refer
to them.

At last, javap is a friend to diagnose such serialization error.


Best Regards,
Shixiong Zhu

2014-11-06 7:56 GMT+08:00 ankits ankitso...@gmail.com:

 In my spark job, I have a loop something like this:

 bla.forEachRdd(rdd = {
   //init some vars
   rdd.forEachPartition(partiton = {
 //init some vars
 partition.foreach(kv = {
  ...

 I am seeing serialization errors (unread block data), because I think spark
 is trying to serialize the whole containing class. But I have been careful
 not to reference instance vars in the block.

 Is there a way to see exactly what class is failing serialization, and
 maybe
 how spark decided it needs to be serialized?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org