shaurya-sood opened a new issue, #28201:
URL: https://github.com/apache/airflow/issues/28201

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   ### Apache Airflow version
   2.4.3
   
   - Tasks get `SIGTERM` once a huge DAG is triggered (DAG with 100+ parallel 
tasks) and go into `UP_FOR_RETRY`/`FAILED` after retry. 
   - `scheduler_heartbeat` metric drops very low 0-0.05 during the same time.
   - The CPU utilization of the database is spiked up to 100%.
   
   ### Airflow Logs
   ```
   [2022-12-07, 15:37:49 UTC] {local_task_job.py:223} WARNING - State of this 
instance has been externally set to up_for_retry. Terminating instance.
   [2022-12-07, 15:37:49 UTC] {process_utils.py:133} INFO - Sending 
Signals.SIGTERM to group 89412. PIDs of all processes in the group: [89412]
   [2022-12-07, 15:37:49 UTC] {process_utils.py:84} INFO - Sending the signal 
Signals.SIGTERM to group 89412
   [2022-12-07, 15:37:49 UTC] {taskinstance.py:1562} ERROR - Received SIGTERM. 
Terminating subprocesses.
   ```
   
   ### Message on the UI
   `The scheduler does not appear to be running. Last heartbeat was received 30 
seconds ago. The DAGs list may not update, and new tasks will not be scheduled.`
   
   ### Meta database CPU Utilization
   ![Screenshot 2022-12-07 at 19 18 
20](https://user-images.githubusercontent.com/19922777/206263962-78a31497-a5ba-4b51-909e-644ea973f870.png)
   
   
   
   ### What you think should happen instead
   
   Tasks must get executed successfully without any SIGTERM signal.
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Linux
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   ### Apache Airflow version
   2.4.3
   
   ### Executor
   Celery
   
   ### Airflow metadatabase
    Postgres DB (`db.r6g.large` RDS instance)
   
   ### Config
   ```
   config:
       core:
         dag_discovery_safe_mode: false
         hostname_callable: airflow.utils.net.get_host_ip_address
         parallelism: 300
         max_active_tasks_per_dag: 30
         dagbag_import_timeout: 90
         killed_task_cleanup_time: 604800
         min_serialized_dag_update_interval: 300
       celery:
         sync_parallelism: 1
         worker_concurrency: 10
       scheduler:
         dag_dir_list_interval: 300
         min_file_process_interval: 300
         parsing_processes: 2
         schedule_after_task_execution: false
         job_heartbeat_sec: 20
   ```
   
   ### Anything else
   
   ```
   [2022-12-06 14:35:46,653: INFO/ForkPoolWorker-7] Task 
airflow.executors.celery_executor.execute_command[b5552af6-76cf-4d55-a300-ba0351bf7b45]
 succeeded in 42.28011583001353s: None
   [2022-12-06 14:35:46,560: INFO/ForkPoolWorker-7] Using connection ID 
'S3_default' for task execution.
   return func(*bound_args.args, **bound_args.kwargs)
   Traceback (most recent call last):
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 466, in head_object
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/log/s3_task_handler.py",
 line 167, in s3_write
   raise e
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 92, in wrapper
   return func(*bound_args.args, **bound_args.kwargs)
   obj = self.head_object(key, bucket_name)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 64, in wrapper
   [2022-12-06 14:35:46,506: ERROR/ForkPoolWorker-7] Could not verify previous 
log to append
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 461, in head_object
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/log/s3_task_handler.py",
 line 133, in s3_log_exists
   return self.get_conn().head_object(Bucket=bucket_name, Key=key)
   if append and self.s3_log_exists(remote_log_location):
   File "/home/airflow/.local/lib/python3.7/site-packages/botocore/client.py", 
line 515, in _api_call
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 479, in check_for_key
   return self.hook.check_for_key(remote_log_location)
   return self._make_api_call(operation_name, kwargs)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 64, in wrapper
   File "/home/airflow/.local/lib/python3.7/site-packages/botocore/client.py", 
line 934, in _make_api_call
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 92, in wrapper
   raise error_class(parsed_response, operation_name)
   return func(*bound_args.args, **bound_args.kwargs)
   botocore.exceptions.ClientError: An error occurred (403) when calling the 
HeadObject operation: Forbidden
   return func(*bound_args.args, **bound_args.kwargs)
   [2022-12-06 14:35:46,191: INFO/ForkPoolWorker-7] AWS Connection 
(conn_id='S3_default', conn_type='s3') credentials retrieved from extra.
   -
   [2022-12-06 14:35:46,190: WARNING/ForkPoolWorker-7] 
/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/utils/connection_wrapper.py:8:
 DeprecationWarning: AWS Connection (conn_id='S3_default', conn_type='s3') has 
connection type 's3', which has been replaced by connection type 'aws'. Please 
update your connection to have `conn_type='aws'`.
   #
   [2022-12-06 14:35:46,188: INFO/ForkPoolWorker-7] Using connection ID 
'S3_default' for task execution.
   [2022-12-06 14:35:46,130: INFO/ForkPoolWorker-7] Using connection ID 
'S3_default' for task execution.
   2022-12-06 14:35:43.058 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, 
out 0 B/s, xact 0 us, query 0 us, wait 0 us
   File "/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", 
line 734, in __protected_call__
   _execute_in_fork(command_to_exec, celery_task_id)
   File "/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", 
line 451, in trace_task
   [2022-12-06 14:35:12,120: ERROR/ForkPoolWorker-7] Task 
airflow.executors.celery_executor.execute_command[9f2ecb02-09c0-40cf-bb6d-f1bef4abb879]
 raised unexpected: AirflowException('Celery command failed on host:  with 
celery_task_id 9f2ecb02-09c0-40cf-bb6d-f1bef4abb879')
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 96, in execute_command
   airflow.exceptions.AirflowException: Celery command failed on host:  with 
celery_task_id 9f2ecb02-09c0-40cf-bb6d-f1bef4abb879
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 111, in _execute_in_fork
   Traceback (most recent call last):
   R = retval = fun(*args, **kwargs)
   raise AirflowException(msg)
   return self.run(*args, **kwargs)
   [2022-12-06 14:35:12,057: INFO/ForkPoolWorker-7] Using connection ID 
'S3_default' for task execution.
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 92, in wrapper
   raise error_class(parsed_response, operation_name)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 92, in wrapper
   Traceback (most recent call last):
   raise e
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 64, in wrapper
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 479, in check_for_key
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 466, in head_object
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 64, in wrapper
   return self._make_api_call(operation_name, kwargs)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/log/s3_task_handler.py",
 line 133, in s3_log_exists
   File "/home/airflow/.local/lib/python3.7/site-packages/botocore/client.py", 
line 515, in _api_call
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/log/s3_task_handler.py",
 line 167, in s3_write
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 461, in head_object
   return func(*bound_args.args, **bound_args.kwargs)
   return self.hook.check_for_key(remote_log_location)
   File "/home/airflow/.local/lib/python3.7/site-packages/botocore/client.py", 
line 934, in _make_api_call
   obj = self.head_object(key, bucket_name)
   return func(*bound_args.args, **bound_args.kwargs)
   botocore.exceptions.ClientError: An error occurred (403) when calling the 
HeadObject operation: Forbidden
   return func(*bound_args.args, **bound_args.kwargs)
   if append and self.s3_log_exists(remote_log_location):
   return self.get_conn().head_object(Bucket=bucket_name, Key=key)
   [2022-12-06 14:35:12,025: ERROR/ForkPoolWorker-7] Could not verify previous 
log to append
   return func(*bound_args.args, **bound_args.kwargs)
   [2022-12-06 14:35:11,774: INFO/ForkPoolWorker-7] AWS Connection 
(conn_id='S3_default', conn_type='s3') credentials retrieved from extra.
   -
   [2022-12-06 14:35:11,773: WARNING/ForkPoolWorker-7] 
/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/utils/connection_wrapper.py:8:
 DeprecationWarning: AWS Connection (conn_id='S3_default', conn_type='s3') has 
connection type 's3', which has been replaced by connection type 'aws'. Please 
update your connection to have `conn_type='aws'`.
   #
   [2022-12-06 14:35:11,771: INFO/ForkPoolWorker-7] Using connection ID 
'S3_default' for task execution.
   [2022-12-06 14:35:11,745: INFO/ForkPoolWorker-7] Using connection ID 
'S3_default' for task execution.
   return f(*args, **kwargs)
   raise AirflowException("Hostname of job runner does not match")
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py",
 line 247, in _run_task_by_local_task_job
   self._execute()
   _run_task_by_local_task_job(args, ti)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py",
 line 189, in _run_task_by_selected_method
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py", 
line 247, in run
   return func(*args, **kwargs)
   return func(*args, **kwargs)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 130, in _execute_in_fork
   self.heartbeat_callback(session=session)
   args.func(args)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py", 
line 228, in heartbeat
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py",
 line 382, in task_run
   airflow.exceptions.AirflowException: Hostname of job runner does not match
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", 
line 52, in command
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", 
line 72, in wrapper
   _run_task_by_selected_method(args, dag, ti)
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 
103, in wrapper
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/local_task_job.py",
 line 189, in heartbeat_callback
   run_job.run()
   Traceback (most recent call last):
   self.heartbeat()
   [2022-12-06 14:35:11,715: ERROR/ForkPoolWorker-7] 
[9f2ecb02-09c0-40cf-bb6d-f1bef4abb879] Failed to execute task Hostname of job 
runner does not match.
   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/local_task_job.py",
 line 135, in _execute
   
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to