Re: Streaming update compatibility

Robert Burke Fri, 27 Oct 2023 09:32:52 -0700

On Fri, Oct 27, 2023, 9:09 AM Robert Bradshaw via dev <dev@beam.apache.org>
wrote:


> On Fri, Oct 27, 2023 at 7:50 AM Kellen Dye via dev <dev@beam.apache.org>
> wrote:
> >
> > > Auto is hard, because it would involve
> > > querying the runner before pipeline construction, and we may not even
> > > know what the runner is at this point
> >
> > At the point where pipeline construction will start, you should have
> access to the pipeline arguments and be able to determine the runner. What
> seems to be missing is a place to query the runner pre-construction. If
> that query could return metadata about the currently running version of the
> job, then that could be incorporated into graph construction as necessary.
>
> While this is the common case, it is not true in general. For example
> it's possible to cache the pipeline proto and submit it to a separate
> choice of runner later. We have Jobs API implementations that
> forward/proxy the job to other runners, and the Python interactive
> runner is another example where the runner is late-binding (e.g. one
> tries a sample locally, and if all looks good can execute remotely,
> and also in this case the graph that's submitted is often mutated
> before running).
>
> Also, in the spirit of the portability story, the pipeline definition
> itself should be runner-independent.
>
> > That same hook could be a place to for example return the
> currently-running job graph for pre-submission compatibility checks.
>
> I suppose we could add something to the Jobs API to make "looking up a
> previous version of this pipeline" runner-agnostic, though that
> assumes it's available at construction time.


As I pointed out,  we can access a given pipeline via the job management
API. It's already runner agnostic other than Dataflow.

Operationally though, we'd need to provide the option to "dry run" an
update locally, or validate update compatibility against a given pipeline
proto.

And +1 as Kellen says we

> should define (and be able to check) what pipeline compatibility means
> in a via graph-to-graph comparison at the Beam level. I'll defer both
> of these as future work as part of the "make update a portable Beam
> concept" project.
>

Big +1 to that. Hard to know what to check for without defining it. This
would avoid needing to ask a given runner WRT dry run updates.

It's on a longer term plan, but I have intended to add Pipeline Update as a
feature to Prism. As it becomes more fully featured, it becomes a great
test bed to develop the definitions.

>

Re: Streaming update compatibility

Reply via email to