[ https://issues.apache.org/jira/browse/AIRFLOW-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289152#comment-16289152 ]
Ash Berlin-Taylor commented on AIRFLOW-1916: -------------------------------------------- I've added some debugging in, showing where each close was called from, and what the os args were. On python 3 this is what I see. The raw shutdown happens automatically from the logging system: {noformat} Closing log from pid 445 / ['/usr/local/bin/airflow', 'run', 'tests', 'test-logging', '2017-12-13T11:37:55.558651', '--job_id', '18', '--raw', '-sd', '/usr/local/airflow/dags/example/csv_to_parquet.py'] File "/usr/local/lib/python3.6/logging/__init__.py", line 1919, in shutdown h.close() File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py", line 72, in close traceback.print_stack(file=fh) {noformat} The non-raw shutdown is called by us. {noformat} Closing log from pid 440 / ['/usr/local/bin/airflow', 'run', 'tests', 'test-logging', '2017-12-13T11:37:55.558651', '--local', '-sd', '/usr/local/airflow/dags/example/csv_to_parquet.py'] File "/usr/local/bin/airflow", line 27, in <module> args.func(args) File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 438, in run handler.close() File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py", line 72, in close traceback.print_stack(file=fh) {noformat} > S3 Task logs end up duplicated in the file. > ------------------------------------------- > > Key: AIRFLOW-1916 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1916 > Project: Apache Airflow > Issue Type: Bug > Affects Versions: 1.9.0 > Reporter: Ash Berlin-Taylor > Fix For: 1.9.1 > > > If using the S3TaskHandler logger the contents of the log file in the S3 > bucket end up duplicated - once from when `airflow run --raw` finalizes the > task, and again from when `airflow run --local` finalizes it's logger. > Log from the UI included below. The file on disk does not have the repetition. > There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not > meant to upload, but something is. > {noformat} > *** Reading remote log from > s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log. > [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,906] {models.py:1407} INFO - > -------------------------------------------------------------------------------- > Starting attempt 1 of 1 > -------------------------------------------------------------------------------- > [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing > <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705 > [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', > '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 > --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py'] > [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key > [celery/celery_ssl_active] not found in config > [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor > will run without SSL > [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor > CeleryExecutor > [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello > [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value > was: None > [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,906] {models.py:1407} INFO - > -------------------------------------------------------------------------------- > Starting attempt 1 of 1 > -------------------------------------------------------------------------------- > [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing > <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705 > [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', > '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 > --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py'] > [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key > [celery/celery_ssl_active] not found in config > [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor > will run without SSL > [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor > CeleryExecutor > [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello > [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value > was: None > [2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:54,922] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): sts.amazonaws.com > [2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:55,498] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): s3.eu-west-1.amazonaws.com > [2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:55,784] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): s3.eu-west-1.amazonaws.com > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)