Re: Coder Evolution

Reuven Lax Mon, 13 May 2019 12:34:05 -0700

For schemas it requires some design and discussion. One approach is to
allow one-way evolution the way protos and BigQuery does. Essentially this
means we allow adding new fields and making existing fields options, and
any other change is disallowed.


*From: *Maximilian Michels <[email protected]>
*Date: *Fri, May 10, 2019 at 6:30 AM
*To: * <[email protected]>

Thanks for the references Luke! I thought that there may have been prior
> discussions, so this thread could be a good place to consolidate.
>
> > Dataflow also has an update feature, but it's limited by the fact that
> Beam does not have a good concept of Coder evolution. As a result we try
> very hard never to change import Coders,
>
> Trying not to break Coders is a fair approach and could work fine for
> Beam itself, if the Coders were designed really carefully. But what
> about custom Coders users may have written? AvroCoder or ProtoCoder
> would be good candidates for forwards-compatibility, but even these do
> not have migration functionality built in.
>
> Is schema evolution already part of SchemaCoder? It's definitely a good
> candidate for evolution because a schema provides the insight-view for a
> Coder, but as for how to actually perform the evolution, it looks like
> this is still an open question.
>
> -Max
>
> On 09.05.19 18:56, Reuven Lax wrote:
> > Dataflow also has an update feature, but it's limited by the fact that
> > Beam does not have a good concept of Coder evolution. As a result we try
> > very hard never to change import Coders, which sometime makes
> > development of parts of Beam much more difficult. I think Beam would
> > benefit greatly by having a first-class concept of Coder evolution.
> >
> > BTW for schemas there is a natural way of defining evolution that can be
> > handled by SchemaCoder.
> >
> > On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     There was a thread about coder update in the past here[1]. Also,
> >     Reuven sent out a doc[2] about pipeline drain and update which was
> >     discussed in this thread[3]. I believe there have been more
> >     references to pipeline update in other threads when people tried to
> >     change coder encodings in the past as well.
> >
> >     Reuven/Dan are the best contacts about this on how this works inside
> >     of Google, the limitations and other ideas that had been proposed.
> >
> >     1:
> >
> https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
> >     2:
> >
> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
> >     3:
> >
> https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E
> >
> >     On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         Hi,
> >
> >         I'm looking into updating the Flink Runner to Flink version 1.8.
> >         Since
> >         version 1.7 Flink has a new optional interface for Coder
> evolution*.
> >
> >         When a Flink pipeline is checkpointed, CoderSnapshots are
> >         written out
> >         alongside with the checkpointed data. When the pipeline is
> >         restored from
> >         that checkpoint, the CoderSnapshots are restored and used to
> >         reinstantiate the Coders.
> >
> >         Furthermore, there is a compatibility and migration check
> >         between the
> >         old and the new Coder. This allows to determine whether
> >
> >            - The serializer did not change or is compatible (ok)
> >            - The serialization format of the coder changed (ok after
> >         migration)
> >            - The coder needs to be reconfigured and we know how to that
> >         based on
> >              the old version (ok after reconfiguration)
> >            - The coder is incompatible (error)
> >
> >         I was wondering about the Coder evolution story in Beam. The
> >         current
> >         state is that checkpointed Beam pipelines are only guaranteed to
> >         run
> >         with the same Beam version and pipeline version. A newer version
> of
> >         either might break the checkpoint format without any way to
> >         migrate the
> >         state.
> >
> >         Should we start thinking about supporting Coder evolution in
> Beam?
> >
> >         Thanks,
> >         Max
> >
> >
> >         * Coders are called TypeSerializers in Flink land. The interface
> is
> >         TypeSerializerSnapshot.
> >
>

Re: Coder Evolution

Reply via email to