[ https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930644#comment-16930644 ]
Chris Wegrzyn commented on AIRFLOW-5447: ---------------------------------------- After a bit of wrestling with pyrasite and probably dumb luck, I managed to get what appears to be a telling stack trace: {code:java} Thread 0x7fb39d56d700 File "/usr/local/airflow/.local/bin/airflow", line 32, in <module> args.func(args) File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 74, in wrapper return f(*args, **kwargs) File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1013, in scheduler job.run() File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py", line 213, in run self._execute() File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1350, in _execute self._execute_helper() File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1439, in _execute_helper self.executor.heartbeat() File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py", line 132, in heartbeat self.trigger_tasks(open_slots) File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py", line 156, in trigger_tasks executor_config=simple_ti.executor_config) File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 767, in execute_async self.task_queue.put((key, command, kube_executor_config)) File "<string>", line 2, in put File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod kind, result = conn.recv() File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) File "<string>", line 1, in <module> File "<string>", line 5, in <module>{code} It does seem like something is going wrong with the communication related to the put to the task_queue. > KubernetesExecutor hangs on task queueing > ----------------------------------------- > > Key: AIRFLOW-5447 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5447 > Project: Apache Airflow > Issue Type: Bug > Components: executor-kubernetes > Affects Versions: 1.10.4, 1.10.5 > Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5 > Reporter: Henry Cohen > Assignee: Daniel Imberman > Priority: Blocker > > Starting in 1.10.4, and continuing in 1.10.5, when using the > KubernetesExecutor, with the webserver and scheduler running in the > kubernetes cluster, tasks are scheduled, but when added to the task queue, > the executor process hangs indefinitely. Based on log messages, it appears to > be stuck at this line > https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761 -- This message was sent by Atlassian Jira (v8.3.2#803003)