Hi all, We've fans of Airflow at Flexport and have adopted it for our (pretty basic) use case. However, we're seeing intermittent exit issues with BashOperators; specifically, we're seeing SIGTERMS being sent to our operator and can't track where they're coming from. We think it's due to *recycled PIDs*, per the note here: https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/utils/helpers.py#L212
Is this likely, and has anyone encountered this before? If this is something you're familiar with, I've included a writeup below: *Description* We orchestrate a workflow of bash scripts via BashOperators. Every so often, we observe the operator exiting with signal -15. *Subsequent re-runs always succeed.* Since our bash script does not explicitly exit with -15 anywhere, we're not sure what the cause could be. Our BashOperators run a python script which launches a subprocess call to psql. Consistent symptoms are: * Task process receives SIGTERM from unknown source * the next time the Task runs, it succeeds Our environment: * Airflow v1.9.0 * running with CeleryExecutor, 64 concurrent workers on a single EC2 instance * scheduler & workers run on same instance * Ubuntu 16.04 *Logs* Here's the relevant logs. Observe the TaskRunner seems to be receiving a SIGTERM from an unknown source and getting killed off. ``` [2018-06-12 20:15:51,990] {models.py:1428} INFO - Executing <Task(BashOperator): 15min_bi_products> on 2018-06-12 20:00:00 [2018-06-12 20:15:51,990] {base_task_runner.py:115} INFO - Running: ['bash', '-c', u'airflow run 15min_build_bi_v1 15min_bi_products 2018-06-12T20:00:00 --job_id 254272 --raw -sd DAGS_FOLDER/15min_build_bi_v1_with_op.py'] [2018-06-12 20:15:52,304] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,304] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt [2018-06-12 20:15:52,325] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,325] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt [2018-06-12 20:15:52,549] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,548] {configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config [2018-06-12 20:15:52,549] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,549] {default_celery.py:41} WARNING - Celery Executor will run without SSL [2018-06-12 20:15:52,550] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,550] {__init__.py:45} INFO - Using executor CeleryExecutor [2018-06-12 20:15:52,598] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,598] {models.py:189} INFO - Filling up the DagBag from /home/bi_etl_user/bi-etl/airflow/dags/15min_build_bi_v1_with_op.py [2018-06-12 20:15:52,661] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,661] {pagerduty_plugin.py:49} WARNING - Pagerduty Events API v1 will be deprecated on October 19, 2018. Consider upgrading to v2. [2018-06-12 20:15:52,772] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,772] {bash_operator.py:70} INFO - Tmp dir root location: [2018-06-12 20:15:52,772] {base_task_runner.py:98} INFO - Subtask: /tmp [2018-06-12 20:15:52,773] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,773] {bash_operator.py:80} INFO - Temporary script location: /tmp/airflowtmp3T0rTd//tmp/airflowtmp3T0rTd/15min_bi_productsetb4RO [2018-06-12 20:15:52,773] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,773] {bash_operator.py:88} INFO - Running command: python $DEPLOY_PATH/bi-etl/scripts/build_table.py --index="$DEPLOY_PATH/bi-etl/load_bi/modules/bi_products_indexes.sql" --target-system="bi_data" --target-table="bi_products" --file="$DEPLOY_PATH/bi-etl/load_bi/modules/bi_products.sql" [2018-06-12 20:15:52,777] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:15:52,777] {bash_operator.py:97} INFO - Output: [2018-06-12 20:16:22,830] {helpers.py:233} INFO - Terminating descendant processes of ['/usr/bin/python /usr/local/bin/airflow run 15min_build_bi_v1 15min_bi_products 2018-06-12T20:00:00 --job_id 254272 --raw -sd DAGS_FOLDER/15min_build_bi_v1_with_op.py'] PID: 10544 [2018-06-12 20:16:22,830] {helpers.py:237} INFO - Terminating descendant process ['bash', '/tmp/airflowtmp3T0rTd/15min_bi_productsetb4RO'] PID: 10624 [2018-06-12 20:16:22,836] {helpers.py:195} ERROR - [2018-06-12 20:16:22,837] {helpers.py:196} INFO - Killed process 10624 with signal 15 [2018-06-12 20:16:22,837] {helpers.py:237} INFO - Terminating descendant process ['python', '/home/bi_etl_user/bi-etl/scripts/build_table.py', '--index=/home/bi_etl_user/bi-etl/load_bi/modules/bi_products_indexes.sql', '--target-system=bi_data', '--target-table=bi_products', '--file=/home/bi_etl_user/bi-etl/load_bi/modules/bi_products.sql'] PID: 10625 [2018-06-12 20:16:22,842] {helpers.py:195} ERROR - [2018-06-12 20:16:22,842] {helpers.py:196} INFO - Killed process 10625 with signal 15 [2018-06-12 20:16:22,842] {helpers.py:237} INFO - Terminating descendant process ['/bin/sh', '-c', 'psql -h bi-data-master2.cdm3aqbdbqi4.us-east-1.rds.amazonaws.com -U bi_etl_user -p 5432 -d flexport_bi -v ON_ERROR_STOP=true -f "/data/sql_run/bi_products_index_schema_1528834552.sql" '] PID: 11299 [2018-06-12 20:16:22,845] {base_task_runner.py:98} INFO - Subtask: [2018-06-12 20:16:22,844] {bash_operator.py:105} INFO - Command exited with return code -15 [2018-06-12 20:16:22,847] {helpers.py:195} ERROR - [2018-06-12 20:16:22,848] {helpers.py:196} INFO - Killed process 11299 with signal 15 [2018-06-12 20:16:22,848] {helpers.py:237} INFO - Terminating descendant process ['/usr/lib/postgresql/9.5/bin/psql', '-h', 'bi-data-master2.cdm3aqbdbqi4.us-east-1.rds.amazonaws.com', '-U', 'bi_etl_user', '-p', '5432', '-d', 'flexport_bi', '-v', 'ON_ERROR_STOP=true', '-f', '/data/sql_run/bi_products_index_schema_1528834552.sql'] PID: 11300 [2018-06-12 20:16:22,853] {helpers.py:195} ERROR - [2018-06-12 20:16:22,853] {helpers.py:196} INFO - Killed process 11300 with signal 15 [2018-06-12 20:16:22,853] {helpers.py:242} INFO - Waiting up to 60s for processes to exit... [2018-06-12 20:16:22,854] {helpers.py:245} INFO - Done waiting [2018-06-12 20:16:22,865] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last): [2018-06-12 20:16:22,866] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/bin/airflow", line 27, in <module> [2018-06-12 20:16:22,866] {base_task_runner.py:98} INFO - Subtask: args.func(args) [2018-06-12 20:16:22,866] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 392, in run [2018-06-12 20:16:22,867] {base_task_runner.py:98} INFO - Subtask: pool=args.pool, [2018-06-12 20:16:22,867] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 50, in wrapper [2018-06-12 20:16:22,867] {base_task_runner.py:98} INFO - Subtask: result = func(*args, **kwargs) [2018-06-12 20:16:22,868] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1488, in _run_raw_task [2018-06-12 20:16:22,868] {base_task_runner.py:98} INFO - Subtask: result = task_copy.execute(context=context) [2018-06-12 20:16:22,869] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/dist-packages/airflow/operators/bash_operator.py", line 109, in execute [2018-06-12 20:16:22,869] {base_task_runner.py:98} INFO - Subtask: raise AirflowException("Bash command failed") [2018-06-12 20:16:22,869] {base_task_runner.py:98} INFO - Subtask: airflow.exceptions.AirflowException: Bash command failed Thanks! Jim ```