[GitHub] [airflow] jingsong commented on issue #5915: [AIRFLOW-5312] Fix timeout issue in pod launcher / KubernetesPodOperator

2019-09-03 Thread GitBox
jingsong commented on issue #5915: [AIRFLOW-5312] Fix timeout issue in pod 
launcher / KubernetesPodOperator
URL: https://github.com/apache/airflow/pull/5915#issuecomment-527636364
 
 
   We investigated this a bit more and realized that even if we were to add a 
timeout, it would likely not solve the issue. Since `read_namespaced_pod_log` 
is called with `follow=True`, this triggers a `keepalive` connection from the 
client to the Kubernetes API in order to stream logs. Digging into the urllib3 
code, we found it uses a generator to "stream" the logs from Kubernetes API 
back to the client. Adding a timeout here would/could cause several things to 
happen:
   
   (1) tasks that use the KubernetesPodOperator and run longer than the 
specified `kube_api_timeout_seconds` would always trigger the timeout to happen
   (2) when the timeout happens, either:
 a. an exception happens, a retry to read the pod logs happens, and logs 
are duplicated
 b. urllib3 does not respect the timeout on a `keepalive` connection
   
   @rolanddb Replying to your comment above, yes, we also observe the same 
behavior. However, having the worker pods retry ad infinitum may lead to 
repeated and confusing logs, which defeats the purpose of having the logs. I'm 
also not quite sure what you mean when you say `poll indefinitely for the 
status of launched tasks` as `read_namespaced_pod_log` reads logs, not the 
state of the task pod itself. If `read_namespaced_pod_log` is where the hanging 
occurs, this PR will not address that specific issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] jingsong commented on issue #5915: [AIRFLOW-5312] Fix timeout issue in pod launcher / KubernetesPodOperator

2019-08-27 Thread GitBox
jingsong commented on issue #5915: [AIRFLOW-5312] Fix timeout issue in pod 
launcher / KubernetesPodOperator
URL: https://github.com/apache/airflow/pull/5915#issuecomment-525402878
 
 
   Was there a test plan here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services