Hi All,

I am working on Spark runner issue
https://issues.apache.org/jira/browse/BEAM-2669
"Kryo serialization exception when DStreams containing
non-Kryo-serializable data are cached"

Currently, Spark runner enforce the kryo serializer and in shuffling
scenarios we always uses coders to transfer bytes over the network. But
Spark can also serialize data for caching/persisting and currently it uses
kryo for this.

Today, when the user uses a class which is not kryo serializable the
caching fails and we thought to open the option to use java serialization
as a fallback.

Our RDDs/DStreams behind the PCollections are usually typed
RDD/DStream<WindowedValue<InputT>>
and when Spark tries to java serialize them for caching purposes it fails
on WindowedValue not being java serializable.

Is there any objection to add Serializable implements to SDK classes like
WindowedValue, PaneInfo and others?

Again, i want to emphasise that coders are a big part of the Spark runner
code and we do use them whenever a shuffle is expected. Changing the types
of the  RDDs/DStreams to byte[] will make the code pretty unreadable and
will weaken type safety checks.

Thanks
Kobi

Reply via email to