[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930644#comment-16930644
 ] 

Chris Wegrzyn commented on AIRFLOW-5447:
----------------------------------------

After a bit of wrestling with pyrasite and probably dumb luck, I managed to get 
what appears to be a telling stack trace:

 
{code:java}
Thread 0x7fb39d56d700
  File "/usr/local/airflow/.local/bin/airflow", line 32, in <module>
    args.func(args)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", 
line 74, in wrapper
    return f(*args, **kwargs)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py", 
line 1013, in scheduler
    job.run()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py",
 line 213, in run
    self._execute()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1350, in _execute
    self._execute_helper()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1439, in _execute_helper
    self.executor.heartbeat()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 132, in heartbeat
    self.trigger_tasks(open_slots)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 156, in trigger_tasks
    executor_config=simple_ti.executor_config)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py",
 line 767, in execute_async
    self.task_queue.put((key, command, kube_executor_config))
  File "<string>", line 2, in put
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
    kind, result = conn.recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
    buf = self._recv_bytes()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
    buf = self._recv(4)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
    chunk = read(handle, remaining)
  File "<string>", line 1, in <module>
  File "<string>", line 5, in <module>{code}
It does seem like something is going wrong with the communication related to 
the put to the task_queue.

 

> KubernetesExecutor hangs on task queueing
> -----------------------------------------
>
>                 Key: AIRFLOW-5447
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor-kubernetes
>    Affects Versions: 1.10.4, 1.10.5
>         Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>            Reporter: Henry Cohen
>            Assignee: Daniel Imberman
>            Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to