I'm not very familiar with this effort.
Were there ITs / POCs created for these changes? (to surface any obvious
bugs)
Are these changes usable in DirectRunner?


On Fri, Jun 12, 2020 at 8:50 AM Luke Cwik <lc...@google.com> wrote:

> A few months back there was a discussion[1] about performing work to
> stabilize the protos used for pipeline execution looking forward to cross
> language pipelines and runners who want to use them across SDK versions
> (Dataflow).
>
> All the proposed incompatible clean-up tasks were done and made it into
> 2.21 (there are some left related to documentation and cleaning up some
> stuff that can be removed in a backwards compatible way and general
> re-organization within the files to delineate what is stable and what
> isn't).
>
> Beyond documenting the versioning story (sketch below) in a more durable
> location then this ML, performing these last clean-up tasks and general
> re-organization within the files, is there anything else that should be
> done before we can vote and consider the protos to be stable (which would
> mean that 2.21 would contain the first stable version assuming no other
> incompatible changes are suggested)?
>
> The versioning story is around 3 parts and effectively occurs whenever
> there is an incompatible change such as:
> * adding a new field that didn't exist where it semantically changes what
> is to be done
> * removing a field that was effectively required
> * requiring an SDK or runner to behave differently (e.g. support large
> iterables, support a new API (such as a future map state for StatefulDoFns))
> The three ways of handling versioning for incompatible changes are:
> * many protos have URNs, when there is an incompatible change the URN
> should be changed. If it is effectively the same thing then this should
> lead to a version bump and update of the documentation reflecting what the
> requirements of the new version are.
> * there is a capabilities section on each environment, this should
> enumerate everything the SDK can support, protocols (e.g. large iterables,
> ...), coders, well known transforms, ...
> * there is a requirements section on the pipeline proto, this is an
> enumeration of everything the SDK needs the runner to know to be able to
> interpret the pipeline (e.g. splittable dofn, requires time sorted input,
> ...).
>
> Updating the URN of the transform/coder is typically the easiest way to
> handle incompatible changes followed by using the capabilities list to
> enable new things (used like an allowlist) and the requirements list to
> prevent runners from doing things they shouldn't (used like a denylist).
> Many features/APIs that are part of the initial version are implicitly not
> in either the capabilities or requirements lists to prevent a huge
> definition list and can be disabled in the future by relying on adding
> requirements that disable these currently unnamed features/APIs if it is
> ever necessary.
>
> 1:
> https://lists.apache.org/thread.html/rdf247cfa3a509f80578f03b2454ea1e50474ee3576a059486d58fdf4%40%3Cdev.beam.apache.org%3E
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to