How does Kryo's FieldSerializer fail on WindowedValue/PaneInfo, it seems
like those are pretty simple types.

Also, I don't see why they can't be tagged with Serializable but I think
the original reasoning was that coders should be used to guarantee a stable
representation across JVM versions.

On Thu, Aug 24, 2017 at 5:38 AM, Kobi Salant <kobi.sal...@gmail.com> wrote:

> Hi All,
>
> I am working on Spark runner issue
> https://issues.apache.org/jira/browse/BEAM-2669
> "Kryo serialization exception when DStreams containing
> non-Kryo-serializable data are cached"
>
> Currently, Spark runner enforce the kryo serializer and in shuffling
> scenarios we always uses coders to transfer bytes over the network. But
> Spark can also serialize data for caching/persisting and currently it uses
> kryo for this.
>
> Today, when the user uses a class which is not kryo serializable the
> caching fails and we thought to open the option to use java serialization
> as a fallback.
>
> Our RDDs/DStreams behind the PCollections are usually typed
> RDD/DStream<WindowedValue<InputT>>
> and when Spark tries to java serialize them for caching purposes it fails
> on WindowedValue not being java serializable.
>
> Is there any objection to add Serializable implements to SDK classes like
> WindowedValue, PaneInfo and others?
>
> Again, i want to emphasise that coders are a big part of the Spark runner
> code and we do use them whenever a shuffle is expected. Changing the types
> of the  RDDs/DStreams to byte[] will make the code pretty unreadable and
> will weaken type safety checks.
>
> Thanks
> Kobi
>

Reply via email to