[ https://issues.apache.org/jira/browse/AIRFLOW-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kousuke Saruta updated AIRFLOW-6529: ------------------------------------ Description: When we try to run the scheduler on macOS, we will get a serialization error like as follows. {code} ____________ _____________ ____ |__( )_________ __/__ /________ __ ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ [2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor SequentialExecutor [2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the scheduler [2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file at most -1 times [2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files in /Users/sarutak/airflow/dags [2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files in /Users/sarutak/airflow/dags [2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned tasks for active dag runs [2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when executing execute_helper Traceback (most recent call last): File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1498, in _execute self._execute_helper() File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1531, in _execute_helper self.processor_agent.start() File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 348, in start self._process.start() File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen return Popen(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'SchedulerJob._execute.<locals>.processor_factory' {code} The reason is scheduler try to run subprocesses using multiprocessing with spawn mode and the mode tries to pickle objects. In this case, `processor_factory` inner method is tried to be pickled. Actually, as of Python 3.8, spawn mode is the default mode in macOS. was: When we try to run the scheduler on macOS, we will get a serialization error like as follows. {code} ____________ _____________ ____ |__( )_________ __/__ /________ __ ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ [2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor SequentialExecutor [2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the scheduler [2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file at most -1 times [2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files in /Users/sarutak/airflow/dags [2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files in /Users/sarutak/airflow/dags [2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned tasks for active dag runs [2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when executing execute_helper Traceback (most recent call last): File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1498, in _execute self._execute_helper() File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1531, in _execute_helper self.processor_agent.start() File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 348, in start self._process.start() File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen return Popen(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'SchedulerJob._execute.<locals>.processor_factory' {code} The reason is scheduler try to run subprocesses using multiprocessing with spawn mode. Actually, as of Python 3.8, spawn mode is the default mode in macOS. > Serialization error occurs when the scheduler tries to run on macOS. > -------------------------------------------------------------------- > > Key: AIRFLOW-6529 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6529 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: 1.10.8 > Environment: macOS > Python 3.8 > multiprocessing with spawn mode > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta > Priority: Major > > When we try to run the scheduler on macOS, we will get a serialization error > like as follows. > {code} > ____________ _____________ > ____ |__( )_________ __/__ /________ __ > ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / > ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / > _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ > [2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor > SequentialExecutor > [2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the > scheduler > [2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file > at most -1 times > [2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files > in /Users/sarutak/airflow/dags > [2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files > in /Users/sarutak/airflow/dags > [2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned > tasks for active dag runs > [2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when > executing execute_helper > Traceback (most recent call last): > File > "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", > line 1498, in _execute > self._execute_helper() > File > "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", > line 1531, in _execute_helper > self.processor_agent.start() > File > "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", > line 348, in start > self._process.start() > File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line > 121, in start > self._popen = self._Popen(self) > File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line > 224, in _Popen > return _default_context.get_context().Process._Popen(process_obj) > File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line > 283, in _Popen > return Popen(process_obj) > File > "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line > 32, in __init__ > super().__init__(process_obj) > File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line > 19, in __init__ > self._launch(process_obj) > File > "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line > 47, in _launch > reduction.dump(process_obj, fp) > File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line > 60, in dump > ForkingPickler(file, protocol).dump(obj) > AttributeError: Can't pickle local object > 'SchedulerJob._execute.<locals>.processor_factory' > {code} > The reason is scheduler try to run subprocesses using multiprocessing with > spawn mode and the mode tries to pickle objects. In this case, > `processor_factory` inner method is tried to be pickled. > Actually, as of Python 3.8, spawn mode is the default mode in macOS. -- This message was sent by Atlassian Jira (v8.3.4#803005)