Coders such as AvroCoder are translated to an intermediate JSON form called a CloudObject[1]. Dataflow only uses the serialized Java representation (embedded as bytes in ?base64? within the CloudObject) for coders which extend SerializableCoder[2]. Dataflow only cares that these CloudObject representations didn't change.
For Beam runners in general today, they could rely on the proto version of the coder (which is meant to replace the CloudObject that Dataflow uses eventually). Relying on not breaking Java serialization makes changing coders too strict. Eventually we could be using schemas for most things and then the representation will be completely definable and separate from the implementation. There might be more details in the snapshotting and update design doc[3] but I believe it was pretty light on how to deal with these kinds of changes beyond that we know its a problem. 1: https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/AvroCoderCloudObjectTranslator.java 2: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java 3: https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MYhttps://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY <https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY> On Tue, Aug 13, 2019 at 1:13 PM Gleb Kanterov <[email protected]> wrote: > I'm looking into the code of AvroCoder, and I was wondering what happens > when users upgrade Beam for streaming pipelines? > > As I understand it, we should be able to deserialize coder from previous > Beam version. Looking into guava vendoring, it's going to break > serialization when we are going to switch guava version because the current > version is a part of the namespace: > > import > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier; > import > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers; > > We don't have tests for it, but probably we already broke compatibility > when we vendored guava. Can anybody clarify what would be the approach for > coders? >
