Joar Wandborg created BEAM-7540: ----------------------------------- Summary: deadlock using save_main_session and logging Key: BEAM-7540 URL: https://issues.apache.org/jira/browse/BEAM-7540 Project: Beam Issue Type: Bug Components: sdk-py-core Environment: Python 3.5 Linux apache-beam 2.12.0 Reporter: Joar Wandborg
If you set {{save_main_session = True}} and have a logging.Logger instance in your __main__ module, calling a logger method *after* Pipeline.run has been called, the process will hang and never exit. Python 3 Pipeline that reproduces the error: {code} import logging import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions _log = logging.getLogger(__name__) def main(argv=None): logging.basicConfig(level=logging.INFO) pipeline_options = PipelineOptions(argv) setup_options = pipeline_options.view_as(SetupOptions) # type: SetupOptions setup_options.save_main_session = True _log.info("Running pipeline") with beam.Pipeline(runner="DirectRunner", options=pipeline_options) as p: p | beam.Create(["hello", "world"]) | beam.Map(lambda x: print(x)) print(""" Call to _log.info will now deadlock, since the logging handler's threading.RLock() has been passed through dill. When you press Ctrl-C, the traceback should confirm that the process is stuck at: File "/usr/lib/python3.5/logging/__init__.py", line 810, in acquire self.lock.acquire() """) _log.info("Pipeline done") print("Launching nukes") if __name__ == '__main__': main() {code} I have opened an issue with {{dill}} as well: https://github.com/uqfoundation/dill/issues/321 -- This message was sent by Atlassian JIRA (v7.6.3#76005)