timgriffiths opened a new issue #19038: URL: https://github.com/apache/airflow/issues/19038
### Apache Airflow version 2.2.0 (latest released) ### Operating System Debian GNU/Linux 10 (buster) ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==2.3.0 apache-airflow-providers-celery==2.1.0 apache-airflow-providers-cncf-kubernetes==2.0.3 apache-airflow-providers-docker==2.2.0 apache-airflow-providers-elasticsearch==2.0.3 apache-airflow-providers-ftp==2.0.1 apache-airflow-providers-google==6.0.0 apache-airflow-providers-grpc==2.0.1 apache-airflow-providers-hashicorp==2.1.1 apache-airflow-providers-http==2.0.1 apache-airflow-providers-imap==2.0.1 apache-airflow-providers-microsoft-azure==3.2.0 apache-airflow-providers-mysql==2.1.1 apache-airflow-providers-odbc==2.0.1 apache-airflow-providers-postgres==2.3.0 apache-airflow-providers-redis==2.0.1 apache-airflow-providers-sendgrid==2.0.1 apache-airflow-providers-sftp==2.1.1 apache-airflow-providers-slack==4.1.0 apache-airflow-providers-sqlite==2.0.1 apache-airflow-providers-ssh==2.2.0 ### Deployment Official Apache Airflow Helm Chart ### Deployment details Helm deployed using the official Apache Airflow Helm chart ### What happened We recently upgraded to 2.2.0 but have now noticed some of the jobs being killed by the scheduler not log after they start. So we are using KubernetesPodOperator to launch all our tasks. What I can see happening is: - Scheduler 1 -> queue job ... it then launches the intermediate pod - Scheduler 2 -> oh a queued job that i haven't seen before ... let me re-schedule that for you - Scheduler 1 -> I can't queue that again ... somethings gone wrong let's me clean up what i was doing - Scheduler 1 -> kill pod, which kills the successfully running pod - Scheduler 1 -> let's queue that again .. it then launches the intermediate pod - Scheduler 2 -> oh a queued job that i haven't seen before ... let me re-schedule that for you and repeat Tracking it back it seems to be introduced in https://github.com/apache/airflow/pull/18152, as this function is now scheduled it looks like you can get into a situation where a job has been launched correctly but the scheduler who kicked that off hasn't had time to update the state from queued to scheduled ### What you expected to happen Tasks that have been scheduled shouldn't be killed ### How to reproduce Startup at least 2 schedulers Launch a set of tasks using the Kubernetes pod operator (or something that will cause a delay a job moving from queued to scheduled) ### Anything else Work around at the moment seems to just use 1 scheduler but it would be great if this could be patched. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org