nookcreed opened a new issue, #36920: URL: https://github.com/apache/airflow/issues/36920
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.7.3 ### What happened? We are encountering an issue in our Apache Airflow setup where, after a few successful DagRuns, the scheduler stops scheduling new runs. The scheduler logs indicate: `{scheduler_job_runner.py:1426} INFO - DAG dag-test scheduling was skipped, probably because the DAG record was locked.` This problem persists despite running a single scheduler pod. Notably, reverting the changes from [PR #31414](https://github.com/apache/airflow/pull/31414) resolves this issue. A similar issue has been discussed on Stack Overflow: [Airflow Kubernetes Executor Scheduling Skipped Because Dag Record Was Locked](https://stackoverflow.com/questions/77405009/airflow-kubernetes-executor-scheduling-skipped-because-dag-record-was-locked). ### What you think should happen instead? The scheduler should consistently schedule new DagRuns as per DAG configurations, without interruption due to DAG record locks. ### How to reproduce Run airflow v.2.7.3 on kubernetes. HA is not required. Trigger multiple DagRuns (We have about 10 DAGs that run every minute). Observe scheduler behavior and logs after a few successful runs. The error shows up after a few minutes ### Operating System centos7 ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==8.10.0 apache-airflow-providers-apache-hive==6.2.0 apache-airflow-providers-apache-livy==3.6.0 apache-airflow-providers-cncf-kubernetes==7.8.0 apache-airflow-providers-common-sql==1.8.0 apache-airflow-providers-ftp==3.6.0 apache-airflow-providers-google==10.11.0 apache-airflow-providers-http==4.6.0 apache-airflow-providers-imap==3.4.0 apache-airflow-providers-papermill==3.4.0 apache-airflow-providers-postgres==5.7.1 apache-airflow-providers-presto==5.2.1 apache-airflow-providers-salesforce==5.5.0 apache-airflow-providers-snowflake==5.1.0 apache-airflow-providers-sqlite==3.5.0 apache-airflow-providers-trino==5.4.0 ### Deployment Other ### Deployment details We have wrappers around the official airflow helm chart and docker images. Environment: Airflow Version: 2.7.3 Kubernetes Version: 1.24 Executor: KubernetesExecutor Database: PostgreSQL (metadata database) Environment/Infrastructure: Kubernetes cluster running Airflow in Docker containers ### Anything else? Actual Behavior: The scheduler stops scheduling new runs after a few DagRuns, with log messages about the DAG record being locked. Workaround: Restarting the scheduler pod releases the lock and allows normal scheduling to resume, but this is not viable in production. Reverting the changes in [PR #31414](https://github.com/apache/airflow/pull/31414) also resolves the issue. Questions/Request for Information: 1. Under what scenarios is the lock on a DAG record typically not released? 2. Are there known issues in Airflow 2.7.3, or specific configurations, that might cause the DAG record to remain locked, thereby preventing new run scheduling? 3. Could the changes made in [PR #31414](https://github.com/apache/airflow/pull/31414) be related to this issue? ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org