saurabhladhe opened a new issue #14672:
URL: https://github.com/apache/airflow/issues/14672


   
   **Apache Airflow version**:2.0.1
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`):1.18.10
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:Azure Kubernetes Service 
   - **OS** (e.g. from /etc/os-release):Debian GNU/Linux 10
   - **Kernel** (e.g. `uname -a`):Linux airflow-k8s-scheduler-blabla-xvv5f 
5.4.0-1039-azure #41~18.04.1-Ubuntu SMP Mon Jan 18 14:00:01 UTC 2021 x86_64 
GNU/Linux
   - **Install tools**:
   - **Others**:
   
   **What happened**: 
   tasks keep failing with sigterm
   <details>
   <summary>Task fails due to sigterm before it can finish- </summary>
   [2021-03-08 20:39:14,503] {taskinstance.py:570} DEBUG - Refreshing 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 
[running]> from DB
   [2021-03-08 20:39:14,878] {taskinstance.py:605} DEBUG - Refreshed 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 
[running]>
   [2021-03-08 20:39:14,880] {base_job.py:219} DEBUG - [heartbeat]
   [2021-03-08 20:39:21,241] {taskinstance.py:570} DEBUG - Refreshing 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 
[running]> from DB
   [2021-03-08 20:39:21,670] {taskinstance.py:605} DEBUG - Refreshed 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 
[running]>
   [2021-03-08 20:39:21,672] {base_job.py:219} DEBUG - [heartbeat]
   [2021-03-08 20:39:27,736] {taskinstance.py:570} DEBUG - Refreshing 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 
[running]> from DB
   [2021-03-08 20:39:28,360] {taskinstance.py:605} DEBUG - Refreshed 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 [None]>
   [2021-03-08 20:39:28,362] {local_task_job.py:188} WARNING - State of this 
instance has been externally set to None. Terminating instance.
   [2021-03-08 20:39:28,363] {process_utils.py:100} INFO - Sending 
Signals.SIGTERM to GPID 21
   [2021-03-08 20:39:50,119] {taskinstance.py:1239} ERROR - Received SIGTERM. 
Terminating subprocesses.
   [2021-03-08 20:39:50,119] {taskinstance.py:570} DEBUG - Refreshing 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 
[running]> from DB
   [2021-03-08 20:39:51,328] {taskinstance.py:605} DEBUG - Refreshed 
TaskInstance <TaskInstance: xxxxx_xxxxxx_xxx 2021-03-08T19:20:00+00:00 [queued]>
   [2021-03-08 20:39:51,329] {taskinstance.py:1455} ERROR - Task received 
SIGTERM signal
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 1112, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 1285, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 1315, in _execute_task
       result = task_copy.execute(context=context)
     File "/opt/airflow/mnt/dags/xyz/operators/xyz_data_sync_operator.py", line 
73, in execute
       XyzUpsertClass(self.dest_conn_id, self.table, self.schema, src_res, 
drop_missing_columns=self.drop_missing_columns).execute()
     File "/opt/airflow/mnt/dags/xyz/operators/xyz_data_sync_operator.py", line 
271, in execute
       cursor.execute( f"UPDATE {self.schema}.{self.table} SET {up_template} 
WHERE {pk_template}",row)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 1241, in signal_handler
       raise AirflowException("Task received SIGTERM signal")
   airflow.exceptions.AirflowException: Task received SIGTERM signal
   [2021-03-08 20:39:51,409] {taskinstance.py:1862} DEBUG - Task Duration set 
to 1055.564798
   [2021-03-08 20:39:51,410] {taskinstance.py:1503} INFO - Marking task as 
FAILED. dag_id=xyz_sync task_id=xxxxx, execution_date=20210308T192000, 
start_date=20210308T202215, end_date=20210308T203951
   </details>
   <details>
   <summary>This also happens with a successful task-</summary>
   [2021-03-08 21:46:01,466] {__init__.py:124} DEBUG - Preparing lineage inlets 
and outlets
   [2021-03-08 21:46:01,466] {__init__.py:168} DEBUG - inlets: [], outlets: []
   [2021-03-08 21:46:01,467] {logging_mixin.py:104} INFO - {'conf': 
<airflow.configuration.AirflowConfigParser object at 0x7fc0f7f2a910>, 'dag': 
<DAG: canary_dag>, 'dag_run': <DagRun canary_dag @ 2021-03-08 20:10:00+00:00: 
scheduled__2021-03-08T20:10:00+00:00, externally triggered: False>, 
'ds_nodash': '20210308', 'execution_date': DateTime(2021, 3, 8, 20, 10, 0, 
tzinfo=Timezone('+00:00')), 'inlets': [], 'macros': <module 'airflow.macros' 
from 
'/home/airflow/.local/lib/python3.7/site-packages/airflow/macros/__init__.py'>, 
'next_ds': '2021-03-08', 'next_ds_nodash': '20210308', 'next_execution_date': 
DateTime(2021, 3, 8, 21, 10, 0, tzinfo=Timezone('UTC')), 'outlets': [], 
'params': {}, 'prev_ds': '2021-03-08', 'prev_ds_nodash': '20210308', 
'prev_execution_date': DateTime(2021, 3, 8, 19, 10, 0, tzinfo=Timezone('UTC')), 
'prev_execution_date_success': <Proxy at 0x7fc0e3d0d780 with factory <function 
TaskInstance.get_template_context.<locals>.<lambda> at 0x7fc0e3d2ad40>>, 
'prev_start_date_su
 ccess': <Proxy at 0x7fc0e3d0d7d0 with factory <function 
TaskInstance.get_template_context.<locals>.<lambda> at 0x7fc0e3ff8830>>, 
'run_id': 'scheduled__2021-03-08T20:10:00+00:00', 'task': 
<Task(PythonOperator): print_the_context>, 'task_instance': <TaskInstance: 
canary_dag.print_the_context 2021-03-08T20:10:00+00:00 [running]>, 
'task_instance_key_str': 'canary_dag__print_the_context__20210308', 
'test_mode': False, 'ti': <TaskInstance: canary_dag.print_the_context 
2021-03-08T20:10:00+00:00 [running]>, 'tomorrow_ds': '2021-03-09', 
'tomorrow_ds_nodash': '20210309', 'ts': '2021-03-08T20:10:00+00:00', 
'ts_nodash': '20210308T201000', 'ts_nodash_with_tz': '20210308T201000+0000', 
'var': {'json': None, 'value': None}, 'yesterday_ds': '2021-03-07', 
'yesterday_ds_nodash': '20210307', 'templates_dict': None}
   [2021-03-08 21:46:01,467] {logging_mixin.py:104} INFO - 2021-03-08
   [2021-03-08 21:46:01,467] {python.py:118} INFO - Done. Returned value was: 
Whatever you return gets printed in the logs
   [2021-03-08 21:46:21,690] {__init__.py:88} DEBUG - Lineage called with 
inlets: [], outlets: []
   [2021-03-08 21:46:21,691] {taskinstance.py:570} DEBUG - Refreshing 
TaskInstance <TaskInstance: canary_dag.print_the_context 
2021-03-08T20:10:00+00:00 [running]> from DB
   [2021-03-08 21:46:32,042] {taskinstance.py:605} DEBUG - Refreshed 
TaskInstance <TaskInstance: canary_dag.print_the_context 
2021-03-08T20:10:00+00:00 [running]>
   [2021-03-08 21:46:32,051] {taskinstance.py:1166} INFO - Marking task as 
SUCCESS. dag_id=canary_dag, task_id=print_the_context, 
execution_date=20210308T201000, start_date=20210308T214431, 
end_date=20210308T214632
   [2021-03-08 21:46:32,051] {taskinstance.py:1862} DEBUG - Task Duration set 
to 120.361368
   [2021-03-08 21:46:38,577] {taskinstance.py:570} DEBUG - Refreshing 
TaskInstance <TaskInstance: canary_dag.print_the_context 
2021-03-08T20:10:00+00:00 [running]> from DB
   [2021-03-08 21:46:51,150] {taskinstance.py:605} DEBUG - Refreshed 
TaskInstance <TaskInstance: canary_dag.print_the_context 
2021-03-08T20:10:00+00:00 [success]>
   [2021-03-08 21:46:51,152] {local_task_job.py:188} WARNING - State of this 
instance has been externally set to success. Terminating instance.
   [2021-03-08 21:46:51,153] {process_utils.py:100} INFO - Sending 
Signals.SIGTERM to GPID 22
   [2021-03-08 21:46:59,668] {taskinstance.py:1239} ERROR - Received SIGTERM. 
Terminating subprocesses.
   [2021-03-08 21:46:59,669] {cli_action_loggers.py:84} DEBUG - Calling 
callbacks: []
   [2021-03-08 21:46:59,757] {process_utils.py:66} INFO - Process 
psutil.Process(pid=22, status='terminated', exitcode=1, started='21:44:41') 
(22) terminated with exit code 1
   [2021-03-08 21:46:59,758] {base_job.py:219} DEBUG - [heartbeat]
   </details>
   
   **What you expected to happen**:
   Tasks marked correctly instead of receiving sigterm
   
   **How to reproduce it**:
   mostly occurs when a large number of dags (100+) are running on AKS
   
   **Anything else we need to know**:
   DAGs that run longer or have more than 10-12 tasks (that may run long) seem 
to have a higher probability of this happening to them
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to