I'm not very familiar with this effort. Were there ITs / POCs created for these changes? (to surface any obvious bugs) Are these changes usable in DirectRunner?
On Fri, Jun 12, 2020 at 8:50 AM Luke Cwik <lc...@google.com> wrote: > A few months back there was a discussion[1] about performing work to > stabilize the protos used for pipeline execution looking forward to cross > language pipelines and runners who want to use them across SDK versions > (Dataflow). > > All the proposed incompatible clean-up tasks were done and made it into > 2.21 (there are some left related to documentation and cleaning up some > stuff that can be removed in a backwards compatible way and general > re-organization within the files to delineate what is stable and what > isn't). > > Beyond documenting the versioning story (sketch below) in a more durable > location then this ML, performing these last clean-up tasks and general > re-organization within the files, is there anything else that should be > done before we can vote and consider the protos to be stable (which would > mean that 2.21 would contain the first stable version assuming no other > incompatible changes are suggested)? > > The versioning story is around 3 parts and effectively occurs whenever > there is an incompatible change such as: > * adding a new field that didn't exist where it semantically changes what > is to be done > * removing a field that was effectively required > * requiring an SDK or runner to behave differently (e.g. support large > iterables, support a new API (such as a future map state for StatefulDoFns)) > The three ways of handling versioning for incompatible changes are: > * many protos have URNs, when there is an incompatible change the URN > should be changed. If it is effectively the same thing then this should > lead to a version bump and update of the documentation reflecting what the > requirements of the new version are. > * there is a capabilities section on each environment, this should > enumerate everything the SDK can support, protocols (e.g. large iterables, > ...), coders, well known transforms, ... > * there is a requirements section on the pipeline proto, this is an > enumeration of everything the SDK needs the runner to know to be able to > interpret the pipeline (e.g. splittable dofn, requires time sorted input, > ...). > > Updating the URN of the transform/coder is typically the easiest way to > handle incompatible changes followed by using the capabilities list to > enable new things (used like an allowlist) and the requirements list to > prevent runners from doing things they shouldn't (used like a denylist). > Many features/APIs that are part of the initial version are implicitly not > in either the capabilities or requirements lists to prevent a huge > definition list and can be disabled in the future by relying on adding > requirements that disable these currently unnamed features/APIs if it is > ever necessary. > > 1: > https://lists.apache.org/thread.html/rdf247cfa3a509f80578f03b2454ea1e50474ee3576a059486d58fdf4%40%3Cdev.beam.apache.org%3E >
smime.p7s
Description: S/MIME Cryptographic Signature