[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-595579525 @sbrandtb I think you should take another look at the PR and read the comments in this thread again. My PR doesn't change the `_request_timeout` in any way. > I still disagree with you setting the _request_timeout to [60, 60] by default Where am **I** setting this? > You just assume you are smarter than everyone else and that no one needs to set another timeout and just hard code it to some reasonable value (like, 10 seconds - why not?) Nope... It's been [configurable](https://github.com/apache/airflow/blob/1.10.9/airflow/config_templates/default_airflow.cfg#L965) for a number of releases now and **I** didn't set this default value. > or you use whatever is set in Airflow's settings Yep. > or create a new setting for this Again, read the comments please. That is not how the maintainers wanted to handle it (see [here](https://github.com/apache/airflow/pull/6643#issuecomment-564987153)) > ...because the first is the default in Airflow. Please, do not double-default. Where's the double default? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-595017078 @sbrandtb You shouldn't catch (and subsequently ignore) a connection/response timeout error. `_request_timeout` is setting the connection timeout and response timeout for the underlying urllib3 library. This timeout is being hit because Watcher sends no data back when it doesn't have any matches for it's query (it's a stream). This will hide the real error when there's a case of the API server not responding. Setting `timeout_seconds` is telling the Watcher to just stop watching and allow processing to continue. This PR is the proper fix as it's gracefully handling the case where there are no active pods (airflow tasks) running with this particular uuid and the API server has no data about them. @ashb @dimberman I would suggest taking another look at PR #7616 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-581732169 Sooo the test framework has a high barrier to entry and there doesn't seem to be any existing tests for the `KubernetesJobWatcher` class to piggy-back on. I attempted to get the Test platform going, reading what documentation there is, but I just don't have the time to invest in it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-575436903 @mbelang No. Between the holidays & work I have not had time. Hoping to have some time this weekend. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-566544011 > If possible it would be nice to add some unit tests too. Will try to get to this later this week. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-565297542 I can implement it like this but do we really want the timeout to always be `1` for any integer >0? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-561460460 @ashb is this `kube_client_request_args.get('_request_timeout', [60,60])[0] - 1` the way you want to handle it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error URL: https://github.com/apache/airflow/pull/6643#issuecomment-558903525 The `kube_client_request_args` config parameter is "global" to all of the client requests. Setting `timeout_seconds` here causes other methods, such as `create_namespaced_pod` to fail as this property is not recognized. If the ask here is to make this configurable, we'd need do so by adding another parameter to the `airflow.cfg` as to not break current functionality. My approach assumes that `_request_timeout` being set to 60 as the default in the config was deliberate so I wanted to be under this value by a considerable margin. That said, querying the API should realistically never take more than about 1s. If we didn't want to "hard-code" this value here, the other approaches I see are: 1. Create a new config parameter, `watch_timeout_seconds` 2. `if "_request_timeout" in kube_client_request_args then timeout_seconds = kube_client_request_args['_request_timeout'] - 1` 3. Check if there are any events for the label/worker uuid first, before the watch. If so, then watch 4. Leave it hard-coded 5. Other ideas? P.S.: The Worker UUID seems to not persist and is created at runtime. If I follow correctly, this get generated each time the scheduler runs. How is this tracked across restarts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services