cozos commented on code in PR #26526:
URL: https://github.com/apache/beam/pull/26526#discussion_r1185941858
##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -2268,19 +2279,35 @@ def finish_bundle(self):
def teardown(self):
self._call_remote(self._remote_teardown)
- self._pool.shutdown()
- self._pool = None
+ self._terminate_pool()
def _call_remote(self, method, *args, **kwargs):
if self._pool is None:
self._pool = concurrent.futures.ProcessPoolExecutor(1)
Review Comment:
FYI the default process creation strategy in Linux (also COS in Dataflow) is
`fork` which actually causes problems in Python multiprocessing:
https://pythonspeed.com/articles/python-multiprocessing/
Personally I've ran into problems with fork copying threads and gRPC client
(i.e. network connections). In fact the Fn Logging handler doesn't work in the
subprocess I believe. I changed my pool to
`ProcessPoolExecutor(mp_context=get_context("spawn"))` which fixed some issues.
This brings other issues though like needing to reestablish env variables, fn
logging connection, lull logging thread.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]