jedcunningham commented on code in PR #36882:
URL: https://github.com/apache/airflow/pull/36882#discussion_r1466816732


##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:
##########
@@ -434,9 +434,9 @@ def sync(self) -> None:
                     )
                     self.fail(task[0], e)
                 except ApiException as e:
-                    # These codes indicate something is wrong with pod 
definition; otherwise we assume pod
-                    # definition is ok, and that retrying may work
-                    if e.status in (400, 422):
+                    # In case of the below error codes, fail the task and 
honor the task retires.
+                    # Otherwise, go for continuous/infinite retries.
+                    if e.status in (400, 403, 404, 422):

Review Comment:
   > when creating a worker pod fails due to quota exceeding, which is a 
temporary failure, the executor retries again and again until some resources 
are free.
   
   I view that behavior is a bug.
   
   > The quota is the sum of resources used by all the pods in a namespace, so 
when other pods terminate, some resources will be free, and creating a new pod 
will be possible.
   
   Not necessarily. The pod could be too large to ever be created, and it's 
still stuck in the loop forever.
   
   I still think it's better to use the existing retry mechanism for this, I 
think this is the "right" behavior even if it's significantly different that 
what it is right now.
   
   That said, we do have precedent for the config approach with [celerys 
task_publish_max_retries](https://github.com/apache/airflow/blob/6629e2bf7b80d07f2cf61895873521400cdb0d5b/airflow/providers/celery/provider.yaml#L287).
 If we go this route though, I really hope we can be targeted to quota failures 
specifically, and have a default of 0.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to