allenhaozi commented on issue #18041: URL: https://github.com/apache/airflow/issues/18041#issuecomment-1163818202
> > > I have the same problem I'm using airflow 2.2.5, SparkKubernetesOperator and SparkKubernetesSensor > > Driver is running But the sensor displays the following logs until the number of retries exceeds the threshold > > ``` > > 2022-06-17, 18:05:52 CST] {spark_kubernetes.py:104} INFO - Poking: load-customer-data-init-1655486757.7793136 > > [2022-06-17, 18:05:52 CST] {spark_kubernetes.py:124} INFO - Spark application is still in state: RUNNING > > [2022-06-17, 18:06:49 CST] {local_task_job.py:211} WARNING - State of this instance has been externally set to up_for_retry. Terminating instance. > > [2022-06-17, 18:06:49 CST] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 84. PIDs of all processes in the group: [84] > > [2022-06-17, 18:06:49 CST] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 84 > > [2022-06-17, 18:06:49 CST] {taskinstance.py:1430} ERROR - Received SIGTERM. Terminating subprocesses. > > [2022-06-17, 18:06:49 CST] {taskinstance.py:1774} ERROR - Task failed with exception > > Traceback (most recent call last): > > File "/home/airflow/.local/lib/python3.8/site-packages/airflow/sensors/base.py", line 249, in execute > > time.sleep(self._get_next_poke_interval(started_at, run_duration, try_number)) > > File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1432, in signal_handler > > raise AirflowException("Task received SIGTERM signal") > > airflow.exceptions.AirflowException: Task received SIGTERM signal > > [2022-06-17, 18:06:49 CST] {taskinstance.py:1278} INFO - Marking task as FAILED. dag_id=salesforecast-load-init, task_id=load-customer-data-init-sensor, execution_date=20220617T172033, start_date=20220617T175649, end_date=20220617T180649 > > [2022-06-17, 18:06:49 CST] {standard_task_runner.py:93} ERROR - Failed to execute job 24 for task load-customer-data-init-sensor (Task received SIGTERM signal; 84) > > [2022-06-17, 18:06:49 CST] {process_utils.py:70} INFO - Process psutil.Process(pid=84, status='terminated', exitcode=1, started='17:56:48') (84) terminated with exit code 1 > > ``` > > Did you try the earlier suggestions with dagrun_timeout? Do you know what is sending SIGTERM to this task? thank you @potiuk I tried this parameter and it didn't work, But in my environment, I commented out these three parameters and it works fine for now 1. AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC: 600 2. AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC: 200 3. AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD: 600 ```yaml airflow: config: # if other ns, u should config a new sa AIRFLOW__KUBERNETES__NAMESPACE: "airflow" AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "false" AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC: "15" AIRFLOW__LOGGING__LOGGING_LEVEL: "DEBUG" AIRFLOW__LOGGING__REMOTE_LOGGING: "True" AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://airflow-logs/" AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "openaios_airflow_log" AIRFLOW__API__AUTH_BACKEND: "airflow.api.auth.backend.basic_auth" #AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC: 600 #AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC: 200 #AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD: 600 AIRFLOW__KUBERNETES__WORKER_PODS_QUEUED_CHECK_INTERVAL: "86400" AIRFLOW__WEBSERVER__DEFAULT_UI_TIMEZONE: "Asia/Shanghai" AIRFLOW__CORE__DEFAULT_TIMEZONE: "Asia/Shanghai" AIRFLOW__CORE__KILLED_TASK_CLEANUP_TIME: "604800" AIRFLOW__CORE__HOSTNAME_CALLABLE: socket.gethostname AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "30" AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION: "False" ## a list of users to create -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org