allenhaozi commented on issue #18041:
URL: https://github.com/apache/airflow/issues/18041#issuecomment-1163818202

   > 
   
   
   
   > > I have the same problem I'm using airflow 2.2.5, SparkKubernetesOperator 
and SparkKubernetesSensor
   > > Driver is running But the sensor displays the following logs until the 
number of retries exceeds the threshold
   > > ```
   > > 2022-06-17, 18:05:52 CST] {spark_kubernetes.py:104} INFO - Poking: 
load-customer-data-init-1655486757.7793136
   > > [2022-06-17, 18:05:52 CST] {spark_kubernetes.py:124} INFO - Spark 
application is still in state: RUNNING
   > > [2022-06-17, 18:06:49 CST] {local_task_job.py:211} WARNING - State of 
this instance has been externally set to up_for_retry. Terminating instance.
   > > [2022-06-17, 18:06:49 CST] {process_utils.py:120} INFO - Sending 
Signals.SIGTERM to group 84. PIDs of all processes in the group: [84]
   > > [2022-06-17, 18:06:49 CST] {process_utils.py:75} INFO - Sending the 
signal Signals.SIGTERM to group 84
   > > [2022-06-17, 18:06:49 CST] {taskinstance.py:1430} ERROR - Received 
SIGTERM. Terminating subprocesses.
   > > [2022-06-17, 18:06:49 CST] {taskinstance.py:1774} ERROR - Task failed 
with exception
   > > Traceback (most recent call last):
   > >   File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/sensors/base.py", 
line 249, in execute
   > >     time.sleep(self._get_next_poke_interval(started_at, run_duration, 
try_number))
   > >   File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 1432, in signal_handler
   > >     raise AirflowException("Task received SIGTERM signal")
   > > airflow.exceptions.AirflowException: Task received SIGTERM signal
   > > [2022-06-17, 18:06:49 CST] {taskinstance.py:1278} INFO - Marking task as 
FAILED. dag_id=salesforecast-load-init, task_id=load-customer-data-init-sensor, 
execution_date=20220617T172033, start_date=20220617T175649, 
end_date=20220617T180649
   > > [2022-06-17, 18:06:49 CST] {standard_task_runner.py:93} ERROR - Failed 
to execute job 24 for task load-customer-data-init-sensor (Task received 
SIGTERM signal; 84)
   > > [2022-06-17, 18:06:49 CST] {process_utils.py:70} INFO - Process 
psutil.Process(pid=84, status='terminated', exitcode=1, started='17:56:48') 
(84) terminated with exit code 1
   > > ```
   > 
   > Did you try the earlier suggestions with dagrun_timeout? Do you know what 
is sending SIGTERM to this task?
   
   thank you @potiuk 
   I tried this parameter and it didn't work, 
   But in my environment, I commented out these three parameters and it works 
fine for now
    1. AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC: 600
    2. AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC: 200
    3. AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD: 600
   
   ```yaml
   airflow:
     config:
       # if other ns, u should config a new sa
       AIRFLOW__KUBERNETES__NAMESPACE: "airflow"
       AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "false"
       AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC: "15"
       AIRFLOW__LOGGING__LOGGING_LEVEL: "DEBUG"
       AIRFLOW__LOGGING__REMOTE_LOGGING: "True"
       AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://airflow-logs/"
       AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "openaios_airflow_log"
       AIRFLOW__API__AUTH_BACKEND: "airflow.api.auth.backend.basic_auth"
       #AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC: 600
       #AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC: 200
       #AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD: 600
       AIRFLOW__KUBERNETES__WORKER_PODS_QUEUED_CHECK_INTERVAL: "86400"
       AIRFLOW__WEBSERVER__DEFAULT_UI_TIMEZONE: "Asia/Shanghai"
       AIRFLOW__CORE__DEFAULT_TIMEZONE: "Asia/Shanghai"
       AIRFLOW__CORE__KILLED_TASK_CLEANUP_TIME: "604800"
       AIRFLOW__CORE__HOSTNAME_CALLABLE: socket.gethostname
       AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "30"
       AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION: "False"
   
     ## a list of users to create
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to