[ https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930224#comment-16930224 ]
Daniel Imberman edited comment on AIRFLOW-5447 at 9/16/19 4:21 AM: ------------------------------------------------------------------- [~Yuval.Itzchakov] [~cwegrzyn]Thank you guys for getting this info to us. I THINK this might have to do with a bug in the k8s python client which requires "create" and "get" privileges for "pods/exec" [https://stackoverflow.com/questions/53827345/airflow-k8s-operator-xcom-handshake-status-403-forbidden] [https://github.com/kubernetes-client/python/issues/690] The reason I believe this is that this lack of running/updating of pods point to a failure or the KubernetesJobWatcher. When we finally started seeing similar problems we were seeing these failures from the JobWatcher [https://user-images.githubusercontent.com/1036482/64914385-2f0eca80-d71e-11e9-8f8b-44a1c8620b92.png]. I'm going to look into this further tomorrow and get back ASAP. was (Author: dimberman): [~Yuval.Itzchakov] [~cwegrzyn]Thank you guys for getting this info to us. I THINK this might have to do with a bug in the k8s kubernetes client which requires "create" and "get" privileges for "pods/exec" [https://stackoverflow.com/questions/53827345/airflow-k8s-operator-xcom-handshake-status-403-forbidden] [https://github.com/kubernetes-client/python/issues/690] The reason I believe this is that this lack of running/updating of pods point to a failure or the KubernetesJobWatcher. When we finally started seeing similar problems we were seeing these failures from the JobWatcher [https://user-images.githubusercontent.com/1036482/64914385-2f0eca80-d71e-11e9-8f8b-44a1c8620b92.png]. I'm going to look into this further tomorrow and get back ASAP. > KubernetesExecutor hangs on task queueing > ----------------------------------------- > > Key: AIRFLOW-5447 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5447 > Project: Apache Airflow > Issue Type: Bug > Components: executor-kubernetes > Affects Versions: 1.10.4, 1.10.5 > Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5 > Reporter: Henry Cohen > Assignee: Daniel Imberman > Priority: Blocker > > Starting in 1.10.4, and continuing in 1.10.5, when using the > KubernetesExecutor, with the webserver and scheduler running in the > kubernetes cluster, tasks are scheduled, but when added to the task queue, > the executor process hangs indefinitely. Based on log messages, it appears to > be stuck at this line > https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761 -- This message was sent by Atlassian Jira (v8.3.2#803003)