Getting back to this, I think Luke has outlined a good implementation strategy. I have not followed progress on getting this documented durably and voted on. Maybe gdoc draft to vote on and then web site since it should be *very* stable and also forms the new core of what Beam "is" so it should be clear to explain the concepts at a high level, with good PR review of any changes to the protocol and documentation.
Kenn On Fri, Jun 12, 2020 at 1:14 PM Udi Meiri <[email protected]> wrote: > I'm not very familiar with this effort. > Were there ITs / POCs created for these changes? (to surface any obvious > bugs) > Are these changes usable in DirectRunner? > > > On Fri, Jun 12, 2020 at 8:50 AM Luke Cwik <[email protected]> wrote: > >> A few months back there was a discussion[1] about performing work to >> stabilize the protos used for pipeline execution looking forward to cross >> language pipelines and runners who want to use them across SDK versions >> (Dataflow). >> >> All the proposed incompatible clean-up tasks were done and made it into >> 2.21 (there are some left related to documentation and cleaning up some >> stuff that can be removed in a backwards compatible way and general >> re-organization within the files to delineate what is stable and what >> isn't). >> >> Beyond documenting the versioning story (sketch below) in a more durable >> location then this ML, performing these last clean-up tasks and general >> re-organization within the files, is there anything else that should be >> done before we can vote and consider the protos to be stable (which would >> mean that 2.21 would contain the first stable version assuming no other >> incompatible changes are suggested)? >> >> The versioning story is around 3 parts and effectively occurs whenever >> there is an incompatible change such as: >> * adding a new field that didn't exist where it semantically changes what >> is to be done >> * removing a field that was effectively required >> * requiring an SDK or runner to behave differently (e.g. support large >> iterables, support a new API (such as a future map state for StatefulDoFns)) >> The three ways of handling versioning for incompatible changes are: >> * many protos have URNs, when there is an incompatible change the URN >> should be changed. If it is effectively the same thing then this should >> lead to a version bump and update of the documentation reflecting what the >> requirements of the new version are. >> * there is a capabilities section on each environment, this should >> enumerate everything the SDK can support, protocols (e.g. large iterables, >> ...), coders, well known transforms, ... >> * there is a requirements section on the pipeline proto, this is an >> enumeration of everything the SDK needs the runner to know to be able to >> interpret the pipeline (e.g. splittable dofn, requires time sorted input, >> ...). >> >> Updating the URN of the transform/coder is typically the easiest way to >> handle incompatible changes followed by using the capabilities list to >> enable new things (used like an allowlist) and the requirements list to >> prevent runners from doing things they shouldn't (used like a denylist). >> Many features/APIs that are part of the initial version are implicitly not >> in either the capabilities or requirements lists to prevent a huge >> definition list and can be disabled in the future by relying on adding >> requirements that disable these currently unnamed features/APIs if it is >> ever necessary. >> >> 1: >> https://lists.apache.org/thread.html/rdf247cfa3a509f80578f03b2454ea1e50474ee3576a059486d58fdf4%40%3Cdev.beam.apache.org%3E >> >
