How does Kryo's FieldSerializer fail on WindowedValue/PaneInfo, it seems like those are pretty simple types.
Also, I don't see why they can't be tagged with Serializable but I think the original reasoning was that coders should be used to guarantee a stable representation across JVM versions. On Thu, Aug 24, 2017 at 5:38 AM, Kobi Salant <kobi.sal...@gmail.com> wrote: > Hi All, > > I am working on Spark runner issue > https://issues.apache.org/jira/browse/BEAM-2669 > "Kryo serialization exception when DStreams containing > non-Kryo-serializable data are cached" > > Currently, Spark runner enforce the kryo serializer and in shuffling > scenarios we always uses coders to transfer bytes over the network. But > Spark can also serialize data for caching/persisting and currently it uses > kryo for this. > > Today, when the user uses a class which is not kryo serializable the > caching fails and we thought to open the option to use java serialization > as a fallback. > > Our RDDs/DStreams behind the PCollections are usually typed > RDD/DStream<WindowedValue<InputT>> > and when Spark tries to java serialize them for caching purposes it fails > on WindowedValue not being java serializable. > > Is there any objection to add Serializable implements to SDK classes like > WindowedValue, PaneInfo and others? > > Again, i want to emphasise that coders are a big part of the Spark runner > code and we do use them whenever a shuffle is expected. Changing the types > of the RDDs/DStreams to byte[] will make the code pretty unreadable and > will weaken type safety checks. > > Thanks > Kobi >