Hi Jincheng,
Thanks for bringing this up and capturing the ideas in the doc.
Intuitively, I would have also considered adding a new Proto message for
the teardown, but I think the idea to trigger this logic when the SDK
Harness evicts process bundle descriptors is more elegant.
Thanks,
Max
On 25.10.19 17:23, Luke Cwik wrote:
I like approach 3 since it doesn't add additional complexity to the API
and individual SDKs can choose to implement any clean-up strategy they
want or none at all which is the simplest.
On Thu, Oct 24, 2019 at 8:46 PM jincheng sun <[email protected]
<mailto:[email protected]>> wrote:
Hi,
Thanks for your comments in doc, I have add Approach 3 which you
mentioned! @Luke
For now, we should do a decision for Approach 3 and Approach 1.
Detail can be found in doc [1]
Welcome anyone's feedback :)
Regards,
Jincheng
[1]
https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing
jincheng sun <[email protected]
<mailto:[email protected]>> 于2019年10月25日周五 上午10:40写道:
Hi,
Functionally capable of `abort`, but it will be called at the
end of operator. So, I prefer `dispose` semantics. i.e., all
normal logic has been executed.
Best,
Jincheng
Harsh Vardhan <[email protected] <mailto:[email protected]>>
于2019年10月23日周三 上午12:14写道:
Would approach 1 be akin to abort semantics?
On Mon, Oct 21, 2019 at 8:01 PM jincheng sun
<[email protected] <mailto:[email protected]>>
wrote:
Hi Luke,
Thanks a lot for your reply. Since it allows to share
one SDK harness between multiple executable stages, the
control service termination may occur much later than
the completion of an executable stage. This is the main
reason I prefer runners to control the teardown of DoFns.
Regarding to "SDK harnesses can terminate instances any
time they want and start new instances anytime as
well.", personally I think it's not conflict with the
proposed Approach 1 as the SDK harness could decide what
to do when receiving the teardown request. It could do
nothing if the DoFns has already been teared down and
could also tear down the DoFns if needed.
What do you think?
Best,
Jincheng
Luke Cwik <[email protected] <mailto:[email protected]>>
于2019年10月22日周二 上午2:05写道:
Approach 2 is currently the suggested approach[1]
for DoFn's to shutdown.
Note that SDK harnesses can terminate instances any
time they want and start new instances anytime as well.
Why do you want to expose this logic so that Runners
could control it?
1:
https://docs.google.com/document/d/1n6s3BOxOPct3uF4UgbbI9O9rpdiKWFH9R6mtVmR7xp0/edit#
On Mon, Oct 21, 2019 at 4:27 AM jincheng sun
<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I found that in `SdkHarness` do not stop the
`SdkWorker` when finish. We should add the
logic for stop the `SdkWorker` in `SdkHarness`.
More detail can be found [1].
There are two approaches to solve this issue:
Approach 1: We can add a Fn API for teardown
purpose and the runner will teardown a specific
bundle descriptor via this teardown Fn API
during disposing.
Approach 2: The control service termination
could be seen as a signal and once SDK harness
receives this signal, the teardown of the bundle
descriptor will be performed.
More detail can be found in [2].
As the Approach 2, SDK harness could be shared
between multiple executable stages. The control
service termination only occurs when all the
executable stages sharing the same SDK harness
finished. This means that the teardown of DoFns
may not be executed immediately after an
executable stage is finished.
So, I prefer Approach 1. Welcome any feedback :)
Best,
Jincheng
[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py
[2]
https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing
--
Got feedback? go/harsh-feedback
<https://goto.google.com/harsh-feedback>