Thanks Kenn. We already do the Runner API roundtripping (I believe Robert implemented this). With this change, we would start doing exactly what you're suggesting, where we apply overrides to a post-deserialization pipeline.
On Thu, Feb 1, 2018 at 6:45 PM Kenneth Knowles <[email protected]> wrote: > +1 for removing apply_* > > For the Java SDK, removing specialized intercepts was an important first > step towards the portability framework. I wonder if there is a way for the > Python SDK to leapfrog, taking advantage of some of the lessons that Java > learned a bit more painfully. Most pertinent I think is that if an SDK's > role is to construct a pipeline and ship the proto to a runner (service) > then overrides apply to a post-deserialization pipeline. The Java > DirectRunner does a proto round-trip to avoid accidentally depending on > things that are not really part of the pipeline. I would this crisp > abstraction enforcement would add even more value to Python. > > Kenn > > On Thu, Feb 1, 2018 at 5:21 PM, Charles Chen <[email protected]> wrote: > >> In the Python DirectRunner, we currently use apply_* overrides to >> override the operation of the default .expand() operation for certain >> transforms. For example, GroupByKey has a special implementation in the >> DirectRunner, so we use an apply_* override hook to replace the >> implementation of GroupByKey.expand(). >> >> However, this strategy has drawbacks. Because this override operation >> happens eagerly during graph construction, the pipeline graph is >> specialized and modified before a specific runner is bound to the >> pipeline's execution. This makes the pipeline graph non-portable and blocks >> full migration to using the Runner API pipeline representation in the >> DirectRunner. >> >> By contrast, the SDK's PTransformOverride mechanism allows the expression >> of matchers that operate on the unspecialized graph, replacing PTransforms >> as necessary to produce a DirectRunner-specialized pipeline graph for >> execution. >> >> I therefore propose to replace these eager apply_* overrides with >> PTransformOverrides that operate on the completely constructed graph. >> >> The JIRA issue is https://issues.apache.org/jira/browse/BEAM-3566, and >> I've prepared a candidate patch at >> https://github.com/apache/incubator-beam/pull/4529. >> >> Best, >> Charles >> > >
