[ https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929517#comment-16929517 ]
Chris Wegrzyn commented on AIRFLOW-5447: ---------------------------------------- I just upgraded a deployment from 1.10.2 to 1.10.5 (also had the same issues on 1.10.4, haven't yet tried 1.10.3) and am experiencing the same issues. I've tracked it down to the same line. Judging by the log messages, we get this log message: [https://github.com/apache/airflow/blob/1.10.5/airflow/contrib/executors/kubernetes_executor.py#L762] But we never get this log message: [https://github.com/apache/airflow/blob/1.10.5/airflow/executors/base_executor.py#L135] Here's a slightly sanitized log: {{[2019-09-13 19:51:24,074] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the executor}} {{[2019-09-13 19:51:24,074] \{{base_executor.py:124}} DEBUG - 0 running task instances}} {{[2019-09-13 19:51:24,074] \{{base_executor.py:125}} DEBUG - 0 in queue}} {{[2019-09-13 19:51:24,074] \{{base_executor.py:126}} DEBUG - 32 open slots}} {{[2019-09-13 19:51:24,075] \{{base_executor.py:135}} DEBUG - Calling the <class 'airflow.contrib.executors.kubernetes_executor.KubernetesExecutor'> sync method}} {{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1459}} DEBUG - Ran scheduling loop in 0.01 seconds}} {{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1462}} DEBUG - Sleeping for 1.00 seconds}} {{[2019-09-13 19:51:24,087] \{{settings.py:54}} INFO - Configured default timezone <Timezone [UTC]>}} {{[2019-09-13 19:51:24,093] \{{settings.py:327}} DEBUG - Failed to import airflow_local_settings.}} {{Traceback (most recent call last):}} {{ File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/settings.py", line 315, in import_local_settings}} {{ import airflow_local_settings}} {{ModuleNotFoundError: No module named 'airflow_local_settings'}} {{[2019-09-13 19:51:24,094] \{{logging_config.py:59}} DEBUG - Unable to load custom logging, using default config instead}} {{[2019-09-13 19:51:24,109] \{{settings.py:170}} DEBUG - Setting up DB connection pool (PID 49)}} {{[2019-09-13 19:51:24,110] \{{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=49}} {{[2019-09-13 19:51:24,295] \{{settings.py:238}} DEBUG - Disposing DB connection pool (PID 55)}} {{[2019-09-13 19:51:24,380] \{{settings.py:238}} DEBUG - Disposing DB connection pool (PID 59)}} {{[2019-09-13 19:51:25,084] \{{scheduler_job.py:1474}} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging}} {{[2019-09-13 19:51:25,117] \{{scheduler_job.py:257}} DEBUG - Waiting for <Process(DagFileProcessor0-Process, stopped)>}} {{[2019-09-13 19:51:25,118] \{{scheduler_job.py:257}} DEBUG - Waiting for <Process(DagFileProcessor1-Process, stopped)>}} {{[2019-09-13 19:51:25,226] \{{settings.py:238}} DEBUG - Disposing DB connection pool (PID 69)}} {{[2019-09-13 19:51:25,278] \{{settings.py:238}} DEBUG - Disposing DB connection pool (PID 73)}} {{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1390}} DEBUG - Starting Loop...}} {{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1401}} DEBUG - Harvesting DAG parsing results}} {{[2019-09-13 19:51:26,076] \{{dag_processing.py:637}} DEBUG - Received message of type DagParsingStat}} {{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message of type SimpleDag}} {{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message of type DagParsingStat}} {{[2019-09-13 19:51:26,078] \{{dag_processing.py:637}} DEBUG - Received message of type DagParsingStat}} {{[2019-09-13 19:51:26,078] \{{scheduler_job.py:1403}} DEBUG - Harvested 1 SimpleDAGs}} {{[2019-09-13 19:51:26,109] \{{scheduler_job.py:921}} INFO - 1 tasks up for execution:}} {{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00 [scheduled]>}} {{[2019-09-13 19:51:26,122] \{{scheduler_job.py:953}} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued}} {{[2019-09-13 19:51:26,123] \{{scheduler_job.py:981}} INFO - DAG parse_log has 0/16 running and queued tasks}} {{[2019-09-13 19:51:26,132] \{{scheduler_job.py:257}} DEBUG - Waiting for <Process(DagFileProcessor2-Process, stopped)>}} {{[2019-09-13 19:51:26,133] \{{scheduler_job.py:257}} DEBUG - Waiting for <Process(DagFileProcessor3-Process, stopped)>}} {{[2019-09-13 19:51:26,142] \{{scheduler_job.py:1031}} INFO - Setting the following tasks to queued state:}} {{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00 [scheduled]>}} {{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1107}} INFO - Setting the following 1 tasks to queued state:}} {{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00 [queued]>}} {{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1143}} INFO - Sending ('parse_log', 'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) to executor with priority 2 and queue default}} {{[2019-09-13 19:51:26,158] \{{base_executor.py:59}} INFO - Adding to queue: ['airflow', 'run', 'parse_log', 'xyz_parse_log_2019-09-13', '2019-09-12T00:00:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/xyz.py']}} {{[2019-09-13 19:51:26,158] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the executor}} {{[2019-09-13 19:51:26,159] \{{base_executor.py:124}} DEBUG - 0 running task instances}} {{[2019-09-13 19:51:26,159] \{{base_executor.py:125}} DEBUG - 1 in queue}} {{[2019-09-13 19:51:26,167] \{{base_executor.py:126}} DEBUG - 32 open slots}} {{[2019-09-13 19:51:26,167] \{{kubernetes_executor.py:764}} INFO - Add task ('parse_log', 'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) with command ['airflow', 'run', 'parse_log', 'xyz_parse_log_2019-09-13', '2019-09-12T00:00:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/xyz.py'] with executor_config {}}} {{[2019-09-13 19:51:26,248] \{{settings.py:238}} DEBUG - Disposing DB connection pool (PID 83)}} {{[2019-09-13 19:51:26,314] \{{settings.py:238}} DEBUG - Disposing DB connection pool (PID 85)}} {{[2019-09-13 19:51:27,149] \{{scheduler_job.py:257}} DEBUG - Waiting for <Process(DagFileProcessor4-Process, stopped)>}} {{[2019-09-13 19:51:27,150] \{{scheduler_job.py:257}} DEBUG - Waiting for <Process(DagFileProcessor5-Process, stopped)>}} {{[2019-09-13 19:51:27,289] \{{settings.py:238}} DEBUG - Disposing DB connection pool (PID 101)}} After that, the messages about "Disposing DB connection pool" and "Waiting for <Process(DagFileProcessor" repeat endlessly. This is running on an Amazon EKS cluster. No mem/CPU limits on the pod. > KubernetesExecutor hangs on task queueing > ----------------------------------------- > > Key: AIRFLOW-5447 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5447 > Project: Apache Airflow > Issue Type: Bug > Components: executor-kubernetes > Affects Versions: 1.10.4, 1.10.5 > Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5 > Reporter: Henry Cohen > Assignee: Daniel Imberman > Priority: Blocker > > Starting in 1.10.4, and continuing in 1.10.5, when using the > KubernetesExecutor, with the webserver and scheduler running in the > kubernetes cluster, tasks are scheduled, but when added to the task queue, > the executor process hangs indefinitely. Based on log messages, it appears to > be stuck at this line > https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761 -- This message was sent by Atlassian Jira (v8.3.2#803003)