[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2020-03-05 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-595579525
 
 
   @sbrandtb I think you should take another look at the PR and read the 
comments in this thread again. My PR doesn't change the `_request_timeout` in 
any way.
   
   > I still disagree with you setting the _request_timeout to [60, 60] by 
default
   
   Where am **I** setting this?
   
   > You just assume you are smarter than everyone else and that no one needs 
to set another timeout and just hard code it to some reasonable value (like, 10 
seconds - why not?)
   
   Nope... It's been 
[configurable](https://github.com/apache/airflow/blob/1.10.9/airflow/config_templates/default_airflow.cfg#L965)
 for a number of releases now and **I** didn't set this default value.
   
   > or you use whatever is set in Airflow's settings
   
   Yep.
   
   > or create a new setting for this
   
   Again, read the comments please. That is not how the maintainers wanted to 
handle it (see 
[here](https://github.com/apache/airflow/pull/6643#issuecomment-564987153))
   
   > ...because the first is the default in Airflow. Please, do not 
double-default.
   
   Where's the double default?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2020-03-04 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-595017078
 
 
   @sbrandtb You shouldn't catch (and subsequently ignore) a 
connection/response timeout error. `_request_timeout` is setting the connection 
timeout and response timeout for the underlying urllib3 library. This timeout 
is being hit because Watcher sends no data back when it doesn't have any 
matches for it's query (it's a stream). This will hide the real error when 
there's a case of the API server not responding. 
   
   Setting `timeout_seconds` is telling the Watcher to just stop watching and 
allow processing to continue. This PR is the proper fix as it's gracefully 
handling the case where there are no active pods (airflow tasks) running with 
this particular uuid and the API server has no data about them.
   
   @ashb @dimberman I would suggest taking another look at PR #7616 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2020-02-03 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-581732169
 
 
   Sooo the test framework has a high barrier to entry and there doesn't seem 
to be any existing tests for the `KubernetesJobWatcher` class to piggy-back on. 
I attempted to get the Test platform going, reading what documentation there 
is, but I just don't have the time to invest in it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2020-01-16 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-575436903
 
 
   @mbelang No. Between the holidays & work I have not had time. Hoping to have 
some time this weekend.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2019-12-17 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-566544011
 
 
   > If possible it would be nice to add some unit tests too.
   
   Will try to get to this later this week.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2019-12-12 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-565297542
 
 
   I can implement it like this but do we really want the timeout to always be 
`1` for any integer >0?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2019-12-03 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-561460460
 
 
   @ashb is this `kube_client_request_args.get('_request_timeout', [60,60])[0] 
- 1` the way you want to handle it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2019-11-26 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-558903525
 
 
   The `kube_client_request_args` config parameter is "global" to all of the 
client requests. Setting `timeout_seconds` here causes other methods, such as 
`create_namespaced_pod` to fail as this property is not recognized. If the ask 
here is to make this configurable, we'd need do so by adding another parameter 
to the `airflow.cfg` as to not break current functionality.
   
   My approach assumes that `_request_timeout` being set to 60 as the default 
in the config was deliberate so I wanted to be under this value by a 
considerable margin. That said, querying the API should realistically never 
take more than about 1s.
   
   If we didn't want to "hard-code" this value here, the other approaches I see 
are:
   
   1. Create a new config parameter, `watch_timeout_seconds`
   2. `if "_request_timeout" in kube_client_request_args then timeout_seconds = 
kube_client_request_args['_request_timeout'] - 1`
   3. Check if there are any events for the label/worker uuid first, before the 
watch. If so, then watch
   4. Leave it hard-coded
   5. Other ideas?
   
   P.S.: The Worker UUID seems to not persist and is created at runtime. If I 
follow correctly, this get generated each time the scheduler runs. How is this 
tracked across restarts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services