In the Python DirectRunner, we currently use apply_* overrides to override the operation of the default .expand() operation for certain transforms. For example, GroupByKey has a special implementation in the DirectRunner, so we use an apply_* override hook to replace the implementation of GroupByKey.expand().
However, this strategy has drawbacks. Because this override operation happens eagerly during graph construction, the pipeline graph is specialized and modified before a specific runner is bound to the pipeline's execution. This makes the pipeline graph non-portable and blocks full migration to using the Runner API pipeline representation in the DirectRunner. By contrast, the SDK's PTransformOverride mechanism allows the expression of matchers that operate on the unspecialized graph, replacing PTransforms as necessary to produce a DirectRunner-specialized pipeline graph for execution. I therefore propose to replace these eager apply_* overrides with PTransformOverrides that operate on the completely constructed graph. The JIRA issue is https://issues.apache.org/jira/browse/BEAM-3566, and I've prepared a candidate patch at https://github.com/apache/incubator-beam/pull/4529. Best, Charles