Hi,

I like this idea of making it easier to push out improvements, and had a
look at the PR.

One question to better understand how it works today:

   1. The upgrades that the runners do, such as those not visible to the
   user, can they be initiated at any time or do they only happen in relation
   to that the user updates the running pipeline e.g. with new user code?

And, assuming the former, some reflections that came to mind when reviewing
the changes:

   1. Will the update_compatibility_version option be effective both when
   creating and updating a pipeline? It is grouped with the update options in
   the Python SDK, but users may want to configure the compatibility already
   when launching the pipeline.
   2. Would it be possible to revert setting a fixed prior version, i.e.
   (re-)enable upgrades?
      1. If yes: in practice, would this motivate another option, or
      passing a value like "auto" or "latest" to update_compatibility_version?
   3. The option is being introduced to the Java and Python SDKs. Should
   this also be applicable to the Go SDK?

Thanks,
Johanna

On Thu, Oct 26, 2023 at 2:25 AM Robert Bradshaw via dev <dev@beam.apache.org>
wrote:

> Dataflow (among other runners) has the ability to "upgrade" running
> pipelines with new code (e.g. capturing bug fixes, dependency updates,
> and limited topology changes). Unfortunately some improvements (e.g.
> new and improved ways of writing to BigQuery, optimized use of side
> inputs, a change in algorithm, sometimes completely internally and not
> visible to the user) are not sufficiently backwards compatible which
> causes us, with the motivation to not break users, to either not make
> these changes or guard them as a parallel opt-in mode which is a
> significant drain on both developer productivity and causes new
> pipelines to run in obsolete modes by default.
>
> I created https://github.com/apache/beam/pull/29140 which adds a new
> pipeline option, update_compatibility_version, that allows the SDK to
> move forward while letting users with pipelines launched previously to
> manually request the "old" way of doing things to preserve update
> compatibility. (We should still attempt backwards compatibility when
> it makes sense, and the old way would remain in code until such a time
> it's actually deprecated and removed, but this means we won't be
> constrained by it, especially when it comes to default settings.)
>
> Any objections or other thoughts on this approach?
>
> - Robert
>
> P.S. Separately I think it'd be valuable to elevate the vague notion
> of update compatibility to a first-class Beam concept and put it on
> firm footing, but that's a larger conversation outside the thread of
> this smaller (and I think still useful in such a future world) change.
>

Reply via email to