8silvergun opened a new issue, #66763: URL: https://github.com/apache/airflow/issues/66763
### Under which category would you file this issue? Airflow Core ### Apache Airflow version 3.2.1 ### What happened and how to reproduce it? #### What happened On a production Airflow 3.2.1 + MySQL deployment with ~3.2M rows in `task_instance`, the scheduler stopped queueing tasks. `TaskInstance`s remained in `scheduled` and did not transition to `queued`. Worker pods that had already completed their work could not report their final state back through the execution API, because metadata DB lock waits piled up behind the scheduler's critical-section query. Worker-side symptom (Airflow 3 execution API client): ``` ReadTimeout: timed out airflow.sdk.api.client.Client.request PATCH /execution/task-instances/<...>/state EOFError: Request socket closed before length BrokenPipeError: [Errno 32] Broken pipe ``` Scheduler-side symptom: ``` sqlalchemy.exc.OperationalError: Query execution was interrupted SELECT task_instance ... LIMIT 16 FOR UPDATE OF task_instance SKIP LOCKED ``` #### Root cause The slow query is generated by `airflow.jobs.scheduler_job_runner.SchedulerJobRunner._executable_task_instances_to_queued`. PR #54103 (shipped in 3.2.0) reshaped this method into a two-step pattern: a `ranked_query` that picks candidate `SCHEDULED` task instances, followed by an **outer query** that rejoins the candidates back to `task_instance` by the composite identity `(dag_id, task_id, run_id, map_index)`. Both the inner ranked query **and** the outer rejoin currently apply the same MySQL index hint: ```python .with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql") ``` The hint is appropriate on the inner query — that query filters by `TI.state == SCHEDULED` and the `ti_state(state)` index is the right one. The hint is **harmful on the outer query**. The outer query is not a state lookup; it is an exact lookup keyed by `(dag_id, task_id, run_id, map_index)`. For that pattern, MySQL should use the `task_instance_composite_key(dag_id, task_id, run_id, map_index)` index. But the hint restricts the optimizer to `ti_state`, so the composite key is unreachable and the optimizer falls back to a full scan of `task_instance`. #### Source check (current `main`, file `airflow-core/src/airflow/jobs/scheduler_job_runner.py`) The function `_executable_task_instances_to_queued` contains **two** occurrences of `.with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql")`: 1. On the inner candidate query that filters `DR.state == RUNNING` and `TI.state == SCHEDULED` — appropriate. 2. On the outer query: `select(TI).with_hint(...).select_from(ranked_query).join(TI, (TI.dag_id == ranked_query.c.dag_id) & ...)` — **misplaced**. #### Evidence (production EXPLAIN on Aurora MySQL 8.0) Metadata table state at the time of incident (2026-05-11 KST): ``` task_instance rows: ~3,221,348 SCHEDULED task_instance: ~187 RUNNING/QUEUED task_instance: 0 RUNNING dag_run: ~141 ``` Relevant existing indexes on `task_instance`: ``` task_instance_composite_key(dag_id, task_id, run_id, map_index) ti_state(state) ti_dag_run(dag_id, run_id) ti_state_lkp(dag_id, task_id, run_id, state) ``` EXPLAIN of the unmodified scheduler query (outer hint present): ``` PRIMARY task_instance ALL rows=3,221,348 Using where; Using join buffer ``` EXPLAIN after removing only the **outer** `USE INDEX (ti_state)` hint: ``` PRIMARY task_instance eq_ref key = task_instance_composite_key rows = 1 Using index ``` Inner ranked query in both cases (kept as-is): ``` DERIVED task_instance ref key = ti_state rows = 187 ``` So the same scheduler loop reads either ~3.2M rows or 1 row per candidate, depending solely on the outer hint. #### Minimal reproduction 1. Stand up Airflow 3.2.0 or 3.2.1 against MySQL 8.0. 2. Populate `task_instance` so it has on the order of millions of total rows but only a small handful in `SCHEDULED` (this matches a long-lived production deployment). 3. Run the scheduler under load. 4. `EXPLAIN` the query emitted from `_executable_task_instances_to_queued`; the outer join to `task_instance` will be `type=ALL`. 5. Remove only the second `.with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql")` (the one immediately before `.select_from(ranked_query)`). Re-run and `EXPLAIN` — the outer join becomes `eq_ref` on `task_instance_composite_key`. ### What you think should happen instead? The outer query (the rejoin to `task_instance` keyed by the composite identity) should use `task_instance_composite_key`, not `ti_state`. The fix is to remove the second `.with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql")` only. The inner hint should stay. After the fix, on the same workload: ``` outer task_instance: task_instance_composite_key, eq_ref, rows=1 inner scheduled-TI scan: ti_state, ref/range over scheduled states ``` ### Operating System Linux (kernel 6.x), Amazon EKS nodes on Bottlerocket / AL2023. ### Deployment Other 3rd-party Helm chart ### Apache Airflow Provider(s) _No response_ ### Versions of Apache Airflow Providers _No response_ ### Official Helm Chart version 1.20.0 ### Kubernetes Version 1.33 ### Helm Chart configuration _No response_ ### Docker Image customizations _No response_ ### Anything else? - The original `USE INDEX (ti_state)` hint was added in apache/airflow#25627 (≈2022, Airflow 2.2.5) to fix a *separate* MySQL optimizer issue where the optimizer ignored `ti_state` on a single-step query. At the time, the hint placement was correct. - PR #54103 (merged 2026-01-28, ships in 3.2.0) refactored `_executable_task_instances_to_queued` into the ranked + outer-rejoin shape and replicated the hint onto the outer rejoin. **This is the regression introduction point.** - The bug is dialect-scoped (`dialect_name="mysql"`), so PostgreSQL deployments are unaffected. This may be why the regression has gone unreported by managed-service users (MWAA/Composer typically default to Postgres). - The bug surfaces only when `task_instance` is large enough that a full scan is materially slower than an `eq_ref` lookup. Small test environments will not reproduce visible symptoms even though the EXPLAIN is wrong. - Internal mitigation already deployed: a sitecustomize-based hotpatch that removes only the second hint at process start. Commit `9482f6f8 fix: hotpatch airflow scheduler ti state hint` in our internal repo. No other behavior changed. - Possibly related but **not the same** issue: apache/airflow#57210 (deferred-task priority in the scheduler selection query) — references the same code path but a different symptom. ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
