For schemas it requires some design and discussion. One approach is to allow one-way evolution the way protos and BigQuery does. Essentially this means we allow adding new fields and making existing fields options, and any other change is disallowed.
*From: *Maximilian Michels <[email protected]> *Date: *Fri, May 10, 2019 at 6:30 AM *To: * <[email protected]> Thanks for the references Luke! I thought that there may have been prior > discussions, so this thread could be a good place to consolidate. > > > Dataflow also has an update feature, but it's limited by the fact that > Beam does not have a good concept of Coder evolution. As a result we try > very hard never to change import Coders, > > Trying not to break Coders is a fair approach and could work fine for > Beam itself, if the Coders were designed really carefully. But what > about custom Coders users may have written? AvroCoder or ProtoCoder > would be good candidates for forwards-compatibility, but even these do > not have migration functionality built in. > > Is schema evolution already part of SchemaCoder? It's definitely a good > candidate for evolution because a schema provides the insight-view for a > Coder, but as for how to actually perform the evolution, it looks like > this is still an open question. > > -Max > > On 09.05.19 18:56, Reuven Lax wrote: > > Dataflow also has an update feature, but it's limited by the fact that > > Beam does not have a good concept of Coder evolution. As a result we try > > very hard never to change import Coders, which sometime makes > > development of parts of Beam much more difficult. I think Beam would > > benefit greatly by having a first-class concept of Coder evolution. > > > > BTW for schemas there is a natural way of defining evolution that can be > > handled by SchemaCoder. > > > > On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <[email protected] > > <mailto:[email protected]>> wrote: > > > > There was a thread about coder update in the past here[1]. Also, > > Reuven sent out a doc[2] about pipeline drain and update which was > > discussed in this thread[3]. I believe there have been more > > references to pipeline update in other threads when people tried to > > change coder encodings in the past as well. > > > > Reuven/Dan are the best contacts about this on how this works inside > > of Google, the limitations and other ideas that had been proposed. > > > > 1: > > > https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E > > 2: > > > https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit# > > 3: > > > https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E > > > > On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, > > > > I'm looking into updating the Flink Runner to Flink version 1.8. > > Since > > version 1.7 Flink has a new optional interface for Coder > evolution*. > > > > When a Flink pipeline is checkpointed, CoderSnapshots are > > written out > > alongside with the checkpointed data. When the pipeline is > > restored from > > that checkpoint, the CoderSnapshots are restored and used to > > reinstantiate the Coders. > > > > Furthermore, there is a compatibility and migration check > > between the > > old and the new Coder. This allows to determine whether > > > > - The serializer did not change or is compatible (ok) > > - The serialization format of the coder changed (ok after > > migration) > > - The coder needs to be reconfigured and we know how to that > > based on > > the old version (ok after reconfiguration) > > - The coder is incompatible (error) > > > > I was wondering about the Coder evolution story in Beam. The > > current > > state is that checkpointed Beam pipelines are only guaranteed to > > run > > with the same Beam version and pipeline version. A newer version > of > > either might break the checkpoint format without any way to > > migrate the > > state. > > > > Should we start thinking about supporting Coder evolution in > Beam? > > > > Thanks, > > Max > > > > > > * Coders are called TypeSerializers in Flink land. The interface > is > > TypeSerializerSnapshot. > > >
