dimon222 opened a new issue, #56959: URL: https://github.com/apache/airflow/issues/56959
### Apache Airflow version 2.11.0 ### If "Other Airflow 2/3 version" selected, which one? _No response_ ### What happened? I recently switched to standalone dag processor pod as part of solution for #56294, but noticed new behavior: the dag processor at random time of a day would hang perpetually with no subprocess movement (it would hang and set of 3 subprocesses for dags would be left with no progress). I tried forcefully kill few of these dagfileprocessor threads to see if if recovers but no success. The logs would also stop producing anything. The metrics graph in k8s would still show a bit of activity for the CPU but nowhere near same utilization when it does work correctly. No exceptions or anything meaningful in logs, just everything works correctly and one moment it stops producing all logs and processing dags. ### What you think should happen instead? DagProcessor should recycle subprocesses that might have timed out beyond configured import timeout or be able to self-recover, or at least gracefully crash instead of freezing. ### How to reproduce Unable to determine root cause to replicate consistently. ### Operating System UBI9 (RHEL9) ### Versions of Apache Airflow Providers Latest as of constraints-3.12.txt for Airflow 2.11.0 ### Deployment Other ### Deployment details OpenShift (k8s), pip virtualenv install of airflow on Python 3.12 in UBI9 image. Celery executor. Stack is split on pods: 1. webserver pod. 2. celery scheduler pod. 3. celery worker pod. 4. dag processor pod. 5. postgres pod. 6. PVC (network mount) used to share dag catalog between airflow pods ### Anything else? Seem to happening randomly in period 1-5 days after launch ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
