[ https://issues.apache.org/jira/browse/AIRFLOW-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980540#comment-16980540 ]
ASF GitHub Bot commented on AIRFLOW-6040: ----------------------------------------- maxirus commented on pull request #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643 ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues - [\[AIRFLOW-6040\]](https://issues.apache.org/jira/browse/AIRFLOW-6040) ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: - Setting timeout_seconds=50 in the Watch() loop will cause a warning instead of an exception when a worker_uuid does not exist. timeout_seconds targets the list_namespaced_pod method as opposed to the underlying urllib3 library which throws an exception. - Adding worker_uuid to the log message so users know which label is being watched ### Tests ### Commits ### Documentation ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow scheduler with kubernetes executor fails :- Unknown error in > KubernetesJobWatcher > ----------------------------------------------------------------------------------------- > > Key: AIRFLOW-6040 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6040 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, executor-kubernetes, scheduler > Affects Versions: 1.10.6 > Reporter: Ashutosh Srivastava > Assignee: Daniel Imberman > Priority: Major > > I am trying to set up airflow with the kubernetes executor. I have cloned > airflow 1.10.6 and am building the docker image and then deploying it with > kube. The pods are running, the service airflow also starts. The webserver is > working fine. But when I check the logs for the scheduler I get the following > error. > > {{ERROR - Error while health checking kube watcher process. Process died for > unknown reasons > INFO - Event: and now my watch begins starting at resource_version: 0 > ERROR - Unknown error in KubernetesJobWatcher. Failing > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/dist-packages/airflow/contrib/executors/kubernetes_executor.py", > line 333, in run > self.worker_uuid, self.kube_config) > File > "/usr/local/lib/python2.7/dist-packages/airflow/contrib/executors/kubernetes_executor.py", > line 358, in _run > **kwargs): > File "/usr/local/lib/python2.7/dist-packages/kubernetes/watch/watch.py", > line 144, in stream > for line in iter_resp_lines(resp): > File "/usr/local/lib/python2.7/dist-packages/kubernetes/watch/watch.py", > line 48, in iter_resp_lines > for seg in resp.read_chunked(decode_content=False): > File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line > 781, in read_chunked > self._original_response.close() > File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__ > self.gen.throw(type, value, traceback) > File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line > 439, in _error_catcher > raise ReadTimeoutError(self._pool, None, "Read timed out.") > ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed > out.}} -- This message was sent by Atlassian Jira (v8.3.4#803005)