Hi Jincheng,

Thanks for bringing this up and capturing the ideas in the doc.

Intuitively, I would have also considered adding a new Proto message for the teardown, but I think the idea to trigger this logic when the SDK Harness evicts process bundle descriptors is more elegant.

Thanks,
Max

On 25.10.19 17:23, Luke Cwik wrote:
I like approach 3 since it doesn't add additional complexity to the API and individual SDKs can choose to implement any clean-up strategy they want or none at all which is the simplest.

On Thu, Oct 24, 2019 at 8:46 PM jincheng sun <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    Thanks for your comments in doc, I have add Approach 3 which you
    mentioned! @Luke

    For now, we should do a decision for Approach 3 and Approach 1.
    Detail can be found in doc [1]

    Welcome anyone's feedback :)

    Regards,
    Jincheng

    [1]
    
https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing

    jincheng sun <[email protected]
    <mailto:[email protected]>> 于2019年10月25日周五 上午10:40写道:

        Hi,

        Functionally capable of `abort`, but it will be called at the
        end of operator. So, I prefer `dispose` semantics. i.e., all
        normal logic has been executed.

        Best,
        Jincheng

        Harsh Vardhan <[email protected] <mailto:[email protected]>>
        于2019年10月23日周三 上午12:14写道:

            Would approach 1 be akin to abort semantics?

            On Mon, Oct 21, 2019 at 8:01 PM jincheng sun
            <[email protected] <mailto:[email protected]>>
            wrote:

                Hi Luke,

                Thanks a lot for your reply. Since it allows to share
                one SDK harness between multiple executable stages, the
                control service termination may occur much later than
                the completion of an executable stage. This is the main
                reason I prefer runners to control the teardown of DoFns.

                Regarding to "SDK harnesses can terminate instances any
                time they want and start new instances anytime as
                well.", personally I think it's not conflict with the
                proposed Approach 1 as the SDK harness could decide what
                to do when receiving the teardown request. It could do
                nothing if the DoFns has already been teared down and
                could also tear down the DoFns if needed.

                What do you think?

                Best,
                Jincheng

                Luke Cwik <[email protected] <mailto:[email protected]>>
                于2019年10月22日周二 上午2:05写道:

                    Approach 2 is currently the suggested approach[1]
                    for DoFn's to shutdown.
                    Note that SDK harnesses can terminate instances any
                    time they want and start new instances anytime as well.

                    Why do you want to expose this logic so that Runners
                    could control it?

                    1:
                    
https://docs.google.com/document/d/1n6s3BOxOPct3uF4UgbbI9O9rpdiKWFH9R6mtVmR7xp0/edit#

                    On Mon, Oct 21, 2019 at 4:27 AM jincheng sun
                    <[email protected]
                    <mailto:[email protected]>> wrote:

                        Hi,
                        I found that in `SdkHarness` do not  stop the
                        `SdkWorker` when finish.  We should add the
logic for stop the `SdkWorker` in `SdkHarness`. More detail can be found [1].

                        There are two approaches to solve this issue:

                        Approach 1:  We can add a Fn API for teardown
                        purpose and the runner will teardown a specific
                        bundle descriptor via this teardown Fn API
                        during disposing.
                        Approach 2: The control service termination
                        could be seen as a signal and once SDK harness
                        receives this signal, the teardown of the bundle
                        descriptor will be performed.

                        More detail can be found in [2].

                        As the Approach 2, SDK harness could be shared
                        between multiple executable stages. The control
                        service termination only occurs when all the
                        executable stages sharing the same SDK harness
                        finished. This means that the teardown of DoFns
                        may not be executed immediately after an
                        executable stage is finished.

                        So, I prefer Approach 1. Welcome any feedback :)

                        Best,
                        Jincheng

                        [1]
                        
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py
                        [2]
                        
https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing

--
            Got feedback? go/harsh-feedback
            <https://goto.google.com/harsh-feedback>

Reply via email to