Thomas, how were you planning to get the StreamingViewAs* PTransforms out
of the apply() in the Dataflow runner?

I'm asking because the Flink Runner works basically the same way and I've
run into a problem. My problem is that the TupleTag of a PCollectionView is
generated when the user applies, for example, the View.AsIterable
PTransform. When the override is in the runner apply() this is not a
problem because the tag of the PCollectionView that the user has matches
the tag that we internally have in the Pipeline. Now, if I later change the
View.AsIterable using Pipeline surgery a new PCollectionView will be
created that will have a different internal tag from the one that the user
PCollectionView has. Then , we the user creates a ParDo with side inputs I
cannot properly match the replaced View against the user View and I can't
properly stitch together an execution graph.

On Fri, 17 Feb 2017 at 00:45 Amit Sela <[email protected]> wrote:

> Awesome!
> First thing I'm gonna do:
>
>    1. traverse the pipeline to determine if streaming.
>    2. If streaming, replace Read.Bounded with an adapted Read.Unbounded.
>
> Current implementation forces translating bounded reads by the unbounded
> translator and it feels awkward, this makes it right again.
>
> Thanks Thomas!
>
> On Thu, Feb 16, 2017 at 4:12 PM Aljoscha Krettek <[email protected]>
> wrote:
>
> > I might just try and do that. ;-)
> >
> > On Thu, 16 Feb 2017 at 03:55 Thomas Groh <[email protected]>
> wrote:
> >
> > > As of Github PR #1998 (https://github.com/apache/beam/pull/1998), the
> > new
> > > Pipeline Surgery API is ready and available. There are a couple of
> > > refinements coming in PR #2006, but in general pipelines can now, post
> > > construction, have PTransforms swapped out to whatever the runner
> desires
> > > (standard "behavior-maintaining" caveats apply).
> > >
> > > Moving forwards, this will enable pipelines to be run on multiple
> runners
> > > without having to reconstruct the graph via repeated applications of
> > > PTransforms to the pipeline (this also includes being able to, for
> > example,
> > > read a pipeline from a serialized representation, and executing the
> > result
> > > on an arbitrary runner).
> > >
> > > Those of you who are runner authors (at least, those who I can easily
> > > identify as such) should expect a Pull Request from me sometime next
> week
> > > porting you off of intercepting calls to apply and to the new surgery
> > API.
> > > You are, of course, welcome to beat me to the punch.
> > >
> > > Thanks,
> > >
> > > Thomas
> > >
> >
>

Reply via email to