Thomas, how were you planning to get the StreamingViewAs* PTransforms out of the apply() in the Dataflow runner?
I'm asking because the Flink Runner works basically the same way and I've run into a problem. My problem is that the TupleTag of a PCollectionView is generated when the user applies, for example, the View.AsIterable PTransform. When the override is in the runner apply() this is not a problem because the tag of the PCollectionView that the user has matches the tag that we internally have in the Pipeline. Now, if I later change the View.AsIterable using Pipeline surgery a new PCollectionView will be created that will have a different internal tag from the one that the user PCollectionView has. Then , we the user creates a ParDo with side inputs I cannot properly match the replaced View against the user View and I can't properly stitch together an execution graph. On Fri, 17 Feb 2017 at 00:45 Amit Sela <[email protected]> wrote: > Awesome! > First thing I'm gonna do: > > 1. traverse the pipeline to determine if streaming. > 2. If streaming, replace Read.Bounded with an adapted Read.Unbounded. > > Current implementation forces translating bounded reads by the unbounded > translator and it feels awkward, this makes it right again. > > Thanks Thomas! > > On Thu, Feb 16, 2017 at 4:12 PM Aljoscha Krettek <[email protected]> > wrote: > > > I might just try and do that. ;-) > > > > On Thu, 16 Feb 2017 at 03:55 Thomas Groh <[email protected]> > wrote: > > > > > As of Github PR #1998 (https://github.com/apache/beam/pull/1998), the > > new > > > Pipeline Surgery API is ready and available. There are a couple of > > > refinements coming in PR #2006, but in general pipelines can now, post > > > construction, have PTransforms swapped out to whatever the runner > desires > > > (standard "behavior-maintaining" caveats apply). > > > > > > Moving forwards, this will enable pipelines to be run on multiple > runners > > > without having to reconstruct the graph via repeated applications of > > > PTransforms to the pipeline (this also includes being able to, for > > example, > > > read a pipeline from a serialized representation, and executing the > > result > > > on an arbitrary runner). > > > > > > Those of you who are runner authors (at least, those who I can easily > > > identify as such) should expect a Pull Request from me sometime next > week > > > porting you off of intercepting calls to apply and to the new surgery > > API. > > > You are, of course, welcome to beat me to the punch. > > > > > > Thanks, > > > > > > Thomas > > > > > >
