Hi All, I am working on Spark runner issue https://issues.apache.org/jira/browse/BEAM-2669 "Kryo serialization exception when DStreams containing non-Kryo-serializable data are cached"
Currently, Spark runner enforce the kryo serializer and in shuffling scenarios we always uses coders to transfer bytes over the network. But Spark can also serialize data for caching/persisting and currently it uses kryo for this. Today, when the user uses a class which is not kryo serializable the caching fails and we thought to open the option to use java serialization as a fallback. Our RDDs/DStreams behind the PCollections are usually typed RDD/DStream<WindowedValue<InputT>> and when Spark tries to java serialize them for caching purposes it fails on WindowedValue not being java serializable. Is there any objection to add Serializable implements to SDK classes like WindowedValue, PaneInfo and others? Again, i want to emphasise that coders are a big part of the Spark runner code and we do use them whenever a shuffle is expected. Changing the types of the RDDs/DStreams to byte[] will make the code pretty unreadable and will weaken type safety checks. Thanks Kobi
