benrifkind opened a new issue, #28206:
URL: https://github.com/apache/airflow/issues/28206

   ### Apache Airflow version
   
   2.5.0
   
   ### What happened
   
   Tasks are getting stuck in the queued state
   
   ### What you think should happen instead
   
   Tasks should get scheduled and run
   
   ### How to reproduce
   
   I am using the CeleryExecutor and deploying Airflow on AWS's EKS. 
   
   I have 3 DAGs with 10 tasks. Each task is a simple KubernetesPodOperator 
which just exits when it starts. If I run the Airflow deploy with 
`CELERY__WORKER_CONCURRENCY` set to something high like 32, the celery worker 
will fail and the tasks that were queued up to run on it will enter into a bad 
state. Even once I set the concurrency lower (16), the tasks continue to not be 
scheduled. Note that if I set the worker concurrency to 16 on the initial 
deploy the tasks never get into a bad state and everything works fine.
   
   Clearing the tasks does not even fix the issue. I get this log line in the 
scheduler
   ```
   ERROR - could not queue task TaskInstanceKey(dag_id='batch_1', 
task_id='task_2', run_id='scheduled__2022-01-07T04:05:00+00:00', try_number=1, 
map_index=-1) (still running after 4 attempts)
   ```
   To me it seems like the scheduler thinks the task is still running even 
though it is not. 
   
   Clearing the task **and** restarting the scheduler seems to do the trick. 
   
   Happy to give any more information that would be needed. Tasks getting stuck 
in queued also sometimes happens in my production environment which is the 
impetus for this investigation. I'm not sure if it is the same problem but I 
would very really like to figure out if this is a bug or just a 
misconfiguration on my end. Thanks for your help.
   
   
   
   
   ### Operating System
   
   Debian GNU/Linux
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-celery==3.1.0
   apache-airflow-providers-cncf-kubernetes==5.0.0
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   I am using this helm chart to deploy - 
https://github.com/airflow-helm/charts/tree/main/charts/airflow (v8.6.1)
   
   I know that chart is not supported by Apache Airflow but don't think it's 
related to the chart. Based on the logs and the solution it seems like an issue 
with Airflow/Celery. 
   
   
   ### Anything else
   
   This problem can be replicated each time following the steps I detailed 
above. Not sure if the way the celery worker fails matters.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to