John Chen created SPARK-9523:
--------------------------------

             Summary: Receiver for Spark Streaming does not naturally support 
kryo serializer
                 Key: SPARK-9523
                 URL: https://issues.apache.org/jira/browse/SPARK-9523
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
    Affects Versions: 1.3.0
         Environment: Windows 7 local mode
            Reporter: John Chen
             Fix For: 1.3.2, 1.4.2


In some cases, some attributes in a class is not serializable, which you still 
want to use after serialization of the whole object, you'll have to customize 
your serialization codes. For example, you can declare those attributes as 
transient, which makes them ignored during serialization, and then you can 
reassign their values during deserialization.

Now, if you're using Java serialization, you'll have to implement Serializable, 
and write those codes in readObject() and writeObejct() methods; And if you're 
using kryo serialization, you'll have to implement KryoSerializable, and write 
these codes in read() and write() methods.

In Spark and Spark Streaming, you can set kryo as the serializer for speeding 
up. However, the functions taken by RDD or DStream operations are still 
serialized by Java serialization, which means you only need to write those 
custom serialization codes in readObject() and writeObejct() methods.

But when it comes to Spark Streaming's Receiver, things are different. When you 
wish to customize an InputDStream, you must extend the Receiver. However, it 
turns out, the Receiver will be serialized by kryo if you set kryo serializer 
in SparkConf, and will fall back to Java serialization if you didn't.

So here's comes the problems, if you want to change the serializer by 
configuration and make sure the Receiver runs perfectly for both Java and kryo, 
you'll have to write all the 4 methods above. First, it is redundant, since 
you'll have to write serialization/deserialization code almost twice; Secondly, 
there's nothing in the doc or in the code to inform users to implement the 
KryoSerializable interface. 

Since all other function parameters are serialized by Java only, I suggest you 
also make it so for the Receiver. It may be slower, but since the serialization 
will only be executed for each interval, it's durable. More importantly, it can 
cause fewer trouble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to