Re: Pipeline Surgery and an interception-free future

Thomas Groh Fri, 17 Feb 2017 08:34:33 -0800

That depends on how you determine which view you're trying to use. All of
the nodes that are visited in Pipeline#traverseTopologically are wired up
in such a way that their outputs correspond one-to-one with the pipeline
independent user graph. However, that means that the contents of any
override don't necessarily match up with the graph that they're present in
- that usually means that outputs have to be determined from the graph
nodes, and shouldn't be obtained from the view itself.


I think you can swap from `PCollectionView.getView()` to
`context.getOutput()` in FlinkStreamingTransformTranslators and everything
should work out. PR #2035 is what I think should be sufficient (pending
test runs, of course)

On Fri, Feb 17, 2017 at 3:29 AM, Aljoscha Krettek <[email protected]>
wrote:

> Thomas, how were you planning to get the StreamingViewAs* PTransforms out
> of the apply() in the Dataflow runner?
>
> I'm asking because the Flink Runner works basically the same way and I've
> run into a problem. My problem is that the TupleTag of a PCollectionView is
> generated when the user applies, for example, the View.AsIterable
> PTransform. When the override is in the runner apply() this is not a
> problem because the tag of the PCollectionView that the user has matches
> the tag that we internally have in the Pipeline. Now, if I later change the
> View.AsIterable using Pipeline surgery a new PCollectionView will be
> created that will have a different internal tag from the one that the user
> PCollectionView has. Then , we the user creates a ParDo with side inputs I
> cannot properly match the replaced View against the user View and I can't
> properly stitch together an execution graph.
>
> On Fri, 17 Feb 2017 at 00:45 Amit Sela <[email protected]> wrote:
>
> > Awesome!
> > First thing I'm gonna do:
> >
> >    1. traverse the pipeline to determine if streaming.
> >    2. If streaming, replace Read.Bounded with an adapted Read.Unbounded.
> >
> > Current implementation forces translating bounded reads by the unbounded
> > translator and it feels awkward, this makes it right again.
> >
> > Thanks Thomas!
> >
> > On Thu, Feb 16, 2017 at 4:12 PM Aljoscha Krettek <[email protected]>
> > wrote:
> >
> > > I might just try and do that. ;-)
> > >
> > > On Thu, 16 Feb 2017 at 03:55 Thomas Groh <[email protected]>
> > wrote:
> > >
> > > > As of Github PR #1998 (https://github.com/apache/beam/pull/1998),
> the
> > > new
> > > > Pipeline Surgery API is ready and available. There are a couple of
> > > > refinements coming in PR #2006, but in general pipelines can now,
> post
> > > > construction, have PTransforms swapped out to whatever the runner
> > desires
> > > > (standard "behavior-maintaining" caveats apply).
> > > >
> > > > Moving forwards, this will enable pipelines to be run on multiple
> > runners
> > > > without having to reconstruct the graph via repeated applications of
> > > > PTransforms to the pipeline (this also includes being able to, for
> > > example,
> > > > read a pipeline from a serialized representation, and executing the
> > > result
> > > > on an arbitrary runner).
> > > >
> > > > Those of you who are runner authors (at least, those who I can easily
> > > > identify as such) should expect a Pull Request from me sometime next
> > week
> > > > porting you off of intercepting calls to apply and to the new surgery
> > > API.
> > > > You are, of course, welcome to beat me to the punch.
> > > >
> > > > Thanks,
> > > >
> > > > Thomas
> > > >
> > >
> >
>

Re: Pipeline Surgery and an interception-free future

Reply via email to