hussein-awala commented on code in PR #36882:
URL: https://github.com/apache/airflow/pull/36882#discussion_r1463107125


##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:
##########
@@ -434,9 +434,9 @@ def sync(self) -> None:
                     )
                     self.fail(task[0], e)
                 except ApiException as e:
-                    # These codes indicate something is wrong with pod 
definition; otherwise we assume pod
-                    # definition is ok, and that retrying may work
-                    if e.status in (400, 422):
+                    # In case of the below error codes, fail the task and 
honor the task retires.
+                    # Otherwise, go for continuous/infinite retries.
+                    if e.status in (400, 403, 404, 422):

Review Comment:
   @jedcunningham Currently, when creating a worker pod fails due to quota 
exceeding, which is a temporary failure, the executor retries again and again 
until some resources are free.
   
   This PR updates this to fail the task once the pod creation fails due to 
exceeding quota, which could be considered a breaking (or significant) change 
for users who have a quota set up and have a small number of attempts. In 
addition, it does not offer the possibility to update the new behaviour to 
retry creating the pod without causing the task to fail, so the only solution 
will be to increase the number of attempts, which will be used regardless of 
the type failure, and will add additional latency (retry_delay, 
retry_exponential_backoff, ...) to the task execution.
   
   > Even if we did do another retry counter for this, it still should 
eventually fail.
   
   The quota is the sum of resources used by all the pods in a namespace, so 
when other pods terminate, some resources will be free, and creating a new pod 
will be possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to