Chamikara Jayalath created BEAM-1925:
----------------------------------------

             Summary: Make DoFn invocation logic of Python SDK more extensible
                 Key: BEAM-1925
                 URL: https://issues.apache.org/jira/browse/BEAM-1925
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py
            Reporter: Chamikara Jayalath
            Assignee: Chamikara Jayalath


DoFn invocation logic of Python SDK is currently in DoFnRunner class.

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L54

At initialization of this, we parse a DoFn and create local state. We use this 
state when invoking DoFn methods process, start_bundle, and finish_bundle. For 
example, we store a list of  ArgPlaceholder objects within the state of 
DoFnRunner to facilitate invocation of process method.

We will need to extend this functionality when adding new features to DoFn 
class (for example to support Splittable DoFn [1]). So I think it's good to 
refactor this code to be more extensible. 

I think a good approach for this is to add DoFnInvoker and DoFnSignature 
classes similar to Java SDK [2].

In this approach:
A DoFnSignature captures the signature of a DoFn including methods and 
arguments.
A DoFnInvoker implements a particular way DoFn methods will be executed 
(initially we'll have simple and per-window invokers [3]).

A runner uses DoFnRunner to execute methods of a given DoFn. At initialization, 
DoFnRunner crates a DoFnSignature and a DoFnInvoker for the given DoFn.

DoFnSignature and DoFnInvoker methods will be used by SplittableDoFn 
implementation as well. 


[1] 
https://docs.google.com/document/d/1h_zprJrOilivK2xfvl4L42vaX4DMYGfH1YDmi-s_ozM/edit#heading=h.e6patunrpiql

[2]https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java

[3] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L200



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to