dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1459273598
########## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ########## @@ -436,7 +436,7 @@ def sync(self) -> None: except ApiException as e: # These codes indicate something is wrong with pod definition; otherwise we assume pod # definition is ok, and that retrying may work - if e.status in (400, 422): + if e.status in (400, 403, 404, 422): Review Comment: > Do you think that there is a valid use case where we want to keep retry whenever we get 403 forbidden? > I give you an example use case I thought about: > Let's say you have a quota in your namespace, and while trying to run the task you failed with exceeded quota error. > Maybe the user will want the executor to retry until resources will be freed-up. > Let me know what you think about this use case. Yes, we thought about that. That's what happening right now. Imagine if all the tasks are scheduled at the same time, then they will end up retrying for a long period leading to degraded scheduler performance. Imagine if the airflow is deployed for multi-tenant use cases. It will lead to other tenants being impacted by one tenant. It makes sense to retry after 5 mins instead of continuously. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org