[GitHub] [airflow] timgriffiths opened a new issue #19038: I believe merge 18152 has introduced a race condition when running multiple schedulers

GitBox Sun, 17 Oct 2021 18:31:52 -0700


timgriffiths opened a new issue #19038:
URL: https://github.com/apache/airflow/issues/19038



   ### Apache Airflow version
   
   2.2.0 (latest released)
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==2.3.0
   apache-airflow-providers-celery==2.1.0
   apache-airflow-providers-cncf-kubernetes==2.0.3
   apache-airflow-providers-docker==2.2.0
   apache-airflow-providers-elasticsearch==2.0.3  
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-google==6.0.0
   apache-airflow-providers-grpc==2.0.1
   apache-airflow-providers-hashicorp==2.1.1      
   apache-airflow-providers-http==2.0.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-microsoft-azure==3.2.0
   apache-airflow-providers-mysql==2.1.1
   apache-airflow-providers-odbc==2.0.1
   apache-airflow-providers-postgres==2.3.0       
   apache-airflow-providers-redis==2.0.1
   apache-airflow-providers-sendgrid==2.0.1       
   apache-airflow-providers-sftp==2.1.1
   apache-airflow-providers-slack==4.1.0
   apache-airflow-providers-sqlite==2.0.1
   apache-airflow-providers-ssh==2.2.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Helm deployed using the official Apache Airflow Helm chart
   
   ### What happened
   
   We recently upgraded to 2.2.0 but have now noticed some of the jobs being 
killed by the scheduler not log after they start.
   
   So we are using KubernetesPodOperator to launch all our tasks. 
   
   What I can see happening is:
   - Scheduler 1 -> queue job ... it then launches the intermediate pod
   - Scheduler 2 -> oh a queued job that i haven't seen before ... let me 
re-schedule that for you
   - Scheduler 1 -> I can't queue that again ... somethings gone wrong let's me 
clean up what i was doing
   - Scheduler 1 -> kill pod, which kills the successfully running pod
   - Scheduler 1 -> let's queue that again .. it then launches the intermediate 
pod
   - Scheduler 2 -> oh a queued job that i haven't seen before ... let me 
re-schedule that for you
   and repeat
   
   Tracking it back it seems to be introduced in 
https://github.com/apache/airflow/pull/18152, as this function is now scheduled 
it looks like you can get into a situation where a job has been launched 
correctly but the scheduler who kicked that off hasn't had time to update the 
state from queued to scheduled
   
   
   ### What you expected to happen
   
   Tasks that have been scheduled shouldn't be killed
   
   ### How to reproduce
   
   Startup at least 2 schedulers 
   
   Launch a set of tasks using the Kubernetes pod operator (or something that 
will cause a delay a job moving from queued to scheduled)
   
   
   
   ### Anything else
   
   Work around at the moment seems to just use 1 scheduler but it would be 
great if this could be patched.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [airflow] timgriffiths opened a new issue #19038: I believe merge 18152 has introduced a race condition when running multiple schedulers

Reply via email to