dirrao opened a new pull request, #36882:
URL: https://github.com/apache/airflow/pull/36882

   What happened
   
   When the K8 executor is unable to launch the worker pod due to permissions 
issues or an invalid namespace. The K8 executor keep trying to launch the 
worker pod and the errors remain persist. So, the task ends up in a queued 
state for so long/forever.
   
   What you think should happen instead
   
   We shouldn't retry the worker pods launch continuously in case of 
persistent/transient errors. Let the executor mark them as failed and let the 
scheduler honor the task retries with retry delay (5 mins by default) and then 
fail the task eventually if the error persists.
   
   
   closes: #36403 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to