[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929517#comment-16929517
 ] 

Chris Wegrzyn commented on AIRFLOW-5447:
----------------------------------------

I just upgraded a deployment from 1.10.2 to 1.10.5 (also had the same issues on 
1.10.4, haven't yet tried 1.10.3) and am experiencing the same issues.

 

I've tracked it down to the same line. Judging by the log messages, we get this 
log message:

[https://github.com/apache/airflow/blob/1.10.5/airflow/contrib/executors/kubernetes_executor.py#L762]

 

But we never get this log message:

[https://github.com/apache/airflow/blob/1.10.5/airflow/executors/base_executor.py#L135]

 

Here's a slightly sanitized log:

 

{{[2019-09-13 19:51:24,074] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the 
executor}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:124}} DEBUG - 0 running task 
instances}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:125}} DEBUG - 0 in queue}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:126}} DEBUG - 32 open slots}}
{{[2019-09-13 19:51:24,075] \{{base_executor.py:135}} DEBUG - Calling the 
<class 'airflow.contrib.executors.kubernetes_executor.KubernetesExecutor'> sync 
method}}
{{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1459}} DEBUG - Ran scheduling 
loop in 0.01 seconds}}
{{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1462}} DEBUG - Sleeping for 
1.00 seconds}}
{{[2019-09-13 19:51:24,087] \{{settings.py:54}} INFO - Configured default 
timezone <Timezone [UTC]>}}
{{[2019-09-13 19:51:24,093] \{{settings.py:327}} DEBUG - Failed to import 
airflow_local_settings.}}
{{Traceback (most recent call last):}}
{{ File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/settings.py", 
line 315, in import_local_settings}}
{{ import airflow_local_settings}}
{{ModuleNotFoundError: No module named 'airflow_local_settings'}}
{{[2019-09-13 19:51:24,094] \{{logging_config.py:59}} DEBUG - Unable to load 
custom logging, using default config instead}}
{{[2019-09-13 19:51:24,109] \{{settings.py:170}} DEBUG - Setting up DB 
connection pool (PID 49)}}
{{[2019-09-13 19:51:24,110] \{{settings.py:213}} INFO - 
settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, 
pool_recycle=1800, pid=49}}
{{[2019-09-13 19:51:24,295] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 55)}}
{{[2019-09-13 19:51:24,380] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 59)}}
{{[2019-09-13 19:51:25,084] \{{scheduler_job.py:1474}} DEBUG - Sleeping for 
0.99 seconds to prevent excessive logging}}
{{[2019-09-13 19:51:25,117] \{{scheduler_job.py:257}} DEBUG - Waiting for 
<Process(DagFileProcessor0-Process, stopped)>}}
{{[2019-09-13 19:51:25,118] \{{scheduler_job.py:257}} DEBUG - Waiting for 
<Process(DagFileProcessor1-Process, stopped)>}}
{{[2019-09-13 19:51:25,226] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 69)}}
{{[2019-09-13 19:51:25,278] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 73)}}
{{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1390}} DEBUG - Starting 
Loop...}}
{{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1401}} DEBUG - Harvesting DAG 
parsing results}}
{{[2019-09-13 19:51:26,076] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat}}
{{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag}}
{{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat}}
{{[2019-09-13 19:51:26,078] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat}}
{{[2019-09-13 19:51:26,078] \{{scheduler_job.py:1403}} DEBUG - Harvested 1 
SimpleDAGs}}
{{[2019-09-13 19:51:26,109] \{{scheduler_job.py:921}} INFO - 1 tasks up for 
execution:}}
{{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00 
[scheduled]>}}
{{[2019-09-13 19:51:26,122] \{{scheduler_job.py:953}} INFO - Figuring out tasks 
to run in Pool(name=default_pool) with 128 open slots and 1 task instances 
ready to be queued}}
{{[2019-09-13 19:51:26,123] \{{scheduler_job.py:981}} INFO - DAG parse_log has 
0/16 running and queued tasks}}
{{[2019-09-13 19:51:26,132] \{{scheduler_job.py:257}} DEBUG - Waiting for 
<Process(DagFileProcessor2-Process, stopped)>}}
{{[2019-09-13 19:51:26,133] \{{scheduler_job.py:257}} DEBUG - Waiting for 
<Process(DagFileProcessor3-Process, stopped)>}}
{{[2019-09-13 19:51:26,142] \{{scheduler_job.py:1031}} INFO - Setting the 
following tasks to queued state:}}
{{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00 
[scheduled]>}}
{{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1107}} INFO - Setting the 
following 1 tasks to queued state:}}
{{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00 
[queued]>}}
{{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1143}} INFO - Sending 
('parse_log', 'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0, 
tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) to executor with priority 
2 and queue default}}
{{[2019-09-13 19:51:26,158] \{{base_executor.py:59}} INFO - Adding to queue: 
['airflow', 'run', 'parse_log', 'xyz_parse_log_2019-09-13', 
'2019-09-12T00:00:00+00:00', '--local', '--pool', 'default_pool', '-sd', 
'/usr/local/airflow/dags/xyz.py']}}
{{[2019-09-13 19:51:26,158] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the 
executor}}
{{[2019-09-13 19:51:26,159] \{{base_executor.py:124}} DEBUG - 0 running task 
instances}}
{{[2019-09-13 19:51:26,159] \{{base_executor.py:125}} DEBUG - 1 in queue}}
{{[2019-09-13 19:51:26,167] \{{base_executor.py:126}} DEBUG - 32 open slots}}
{{[2019-09-13 19:51:26,167] \{{kubernetes_executor.py:764}} INFO - Add task 
('parse_log', 'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0, 
tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) with command ['airflow', 
'run', 'parse_log', 'xyz_parse_log_2019-09-13', '2019-09-12T00:00:00+00:00', 
'--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/xyz.py'] 
with executor_config {}}}
{{[2019-09-13 19:51:26,248] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 83)}}
{{[2019-09-13 19:51:26,314] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 85)}}
{{[2019-09-13 19:51:27,149] \{{scheduler_job.py:257}} DEBUG - Waiting for 
<Process(DagFileProcessor4-Process, stopped)>}}
{{[2019-09-13 19:51:27,150] \{{scheduler_job.py:257}} DEBUG - Waiting for 
<Process(DagFileProcessor5-Process, stopped)>}}
{{[2019-09-13 19:51:27,289] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 101)}}

 

After that, the messages about "Disposing DB connection pool" and "Waiting for 
<Process(DagFileProcessor" repeat endlessly.

 

This is running on an Amazon EKS cluster. No mem/CPU limits on the pod. 

> KubernetesExecutor hangs on task queueing
> -----------------------------------------
>
>                 Key: AIRFLOW-5447
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor-kubernetes
>    Affects Versions: 1.10.4, 1.10.5
>         Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>            Reporter: Henry Cohen
>            Assignee: Daniel Imberman
>            Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to