dirrao opened a new pull request, #36882: URL: https://github.com/apache/airflow/pull/36882
What happened When the K8 executor is unable to launch the worker pod due to permissions issues or an invalid namespace. The K8 executor keep trying to launch the worker pod and the errors remain persist. So, the task ends up in a queued state for so long/forever. What you think should happen instead We shouldn't retry the worker pods launch continuously in case of persistent/transient errors. Let the executor mark them as failed and let the scheduler honor the task retries with retry delay (5 mins by default) and then fail the task eventually if the error persists. closes: #36403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org