dirrao commented on code in PR #36882:
URL: https://github.com/apache/airflow/pull/36882#discussion_r1459273598


##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:
##########
@@ -436,7 +436,7 @@ def sync(self) -> None:
                 except ApiException as e:
                     # These codes indicate something is wrong with pod 
definition; otherwise we assume pod
                     # definition is ok, and that retrying may work
-                    if e.status in (400, 422):
+                    if e.status in (400, 403, 404, 422):

Review Comment:
   > Do you think that there is a valid use case where we want to keep retry 
whenever we get 403 forbidden?
   > I give you an example use case I thought about:
   > Let's say you have a quota in your namespace, and while trying to run the 
task you failed with exceeded quota error.
   > Maybe the user will want the executor to retry until resources will be 
freed-up.
   > Let me know what you think about this use case.
   
   Yes, we thought about that. That's what happening right now. Imagine if all 
the tasks are scheduled at the same time, then they will end up retrying for a 
long period leading to degraded scheduler performance. Imagine if the airflow 
is deployed for multi-tenant use cases. It will lead to other tenants being 
impacted by one tenant. It makes sense to retry after 5 mins instead of 
continuously. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to