Chamikara Jayalath created BEAM-1925: ----------------------------------------
Summary: Make DoFn invocation logic of Python SDK more extensible Key: BEAM-1925 URL: https://issues.apache.org/jira/browse/BEAM-1925 Project: Beam Issue Type: Improvement Components: sdk-py Reporter: Chamikara Jayalath Assignee: Chamikara Jayalath DoFn invocation logic of Python SDK is currently in DoFnRunner class. https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L54 At initialization of this, we parse a DoFn and create local state. We use this state when invoking DoFn methods process, start_bundle, and finish_bundle. For example, we store a list of ArgPlaceholder objects within the state of DoFnRunner to facilitate invocation of process method. We will need to extend this functionality when adding new features to DoFn class (for example to support Splittable DoFn [1]). So I think it's good to refactor this code to be more extensible. I think a good approach for this is to add DoFnInvoker and DoFnSignature classes similar to Java SDK [2]. In this approach: A DoFnSignature captures the signature of a DoFn including methods and arguments. A DoFnInvoker implements a particular way DoFn methods will be executed (initially we'll have simple and per-window invokers [3]). A runner uses DoFnRunner to execute methods of a given DoFn. At initialization, DoFnRunner crates a DoFnSignature and a DoFnInvoker for the given DoFn. DoFnSignature and DoFnInvoker methods will be used by SplittableDoFn implementation as well. [1] https://docs.google.com/document/d/1h_zprJrOilivK2xfvl4L42vaX4DMYGfH1YDmi-s_ozM/edit#heading=h.e6patunrpiql [2]https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java [3] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L200 -- This message was sent by Atlassian JIRA (v6.3.15#6346)