[ https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968628#comment-14968628 ]
Yuhang Chen commented on SPARK-9523: ------------------------------------ So you mean closures also support kryo? But I never add any kryo codes to them and they worked just fine when kryo serializer was set in SparkConf, while the receivers didn't. I got confused by that. > Receiver for Spark Streaming does not naturally support kryo serializer > ----------------------------------------------------------------------- > > Key: SPARK-9523 > URL: https://issues.apache.org/jira/browse/SPARK-9523 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 1.3.1 > Environment: Windows 7 local mode > Reporter: Yuhang Chen > Priority: Minor > Labels: kryo, serialization > Original Estimate: 120h > Remaining Estimate: 120h > > In some cases, some attributes in a class is not serializable, which you > still want to use after serialization of the whole object, you'll have to > customize your serialization codes. For example, you can declare those > attributes as transient, which makes them ignored during serialization, and > then you can reassign their values during deserialization. > Now, if you're using Java serialization, you'll have to implement > Serializable, and write those codes in readObject() and writeObejct() > methods; And if you're using kryo serialization, you'll have to implement > KryoSerializable, and write these codes in read() and write() methods. > In Spark and Spark Streaming, you can set kryo as the serializer for speeding > up. However, the functions taken by RDD or DStream operations are still > serialized by Java serialization, which means you only need to write those > custom serialization codes in readObject() and writeObejct() methods. > But when it comes to Spark Streaming's Receiver, things are different. When > you wish to customize an InputDStream, you must extend the Receiver. However, > it turns out, the Receiver will be serialized by kryo if you set kryo > serializer in SparkConf, and will fall back to Java serialization if you > didn't. > So here's comes the problems, if you want to change the serializer by > configuration and make sure the Receiver runs perfectly for both Java and > kryo, you'll have to write all the 4 methods above. First, it is redundant, > since you'll have to write serialization/deserialization code almost twice; > Secondly, there's nothing in the doc or in the code to inform users to > implement the KryoSerializable interface. > Since all other function parameters are serialized by Java only, I suggest > you also make it so for the Receiver. It may be slower, but since the > serialization will only be executed for each interval, it's durable. More > importantly, it can cause fewer trouble -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org