steveahnahn opened a new pull request, #68595:
URL: https://github.com/apache/airflow/pull/68595

   Bounds the scheduler's cleanup of orphaned `asset_state_store` rows so it 
can no longer issue a single unbounded delete inside the scheduler loop.
   
   ### Why
   
   `SchedulerJobRunner._cleanup_orphaned_asset_state_store()` issued one bulk 
`DELETE` for every `asset_state_store` row whose asset is no longer active. It 
runs from `_update_asset_orphanage`, a 
`timers.call_regular_interval(parsing_cleanup_interval, ...)` scheduler 
callback, so a large orphan backlog — bulk asset/Dag removal, a mass 
asset-identity change, or the first cleanup after a backlog accumulates — made 
one tick do unbounded transaction and row-lock work, holding locks for the 
whole transaction and stalling the scheduler main loop. This is the pattern the 
contributing guidelines call out: bulk `DELETE`/`UPDATE` in the scheduler loop 
must be bounded.
   
   ### What changed
   
   - Select up to `ORPHANED_ASSET_STATE_STORE_CLEANUP_BATCH_SIZE` (500, 
mirroring the neighbouring `MAX_PARTITION_DAG_RUNS_PER_LOOP`) distinct orphaned 
`asset_id`s and delete those assets' rows via a single-column `asset_id IN 
(...)`. Remaining orphaned assets drain on subsequent orphanage ticks.
   - The method keeps its managed session and does not commit internally, so 
one bounded batch per tick is used rather than an internal loop-with-commits.
   - The asset ids are materialised into the `IN` list (not a `LIMIT` subquery, 
which MySQL rejects); `asset_id` is the leading column of the 
`asset_state_store` primary key, so the filter is index-backed.
   
   ### Tests
   
   Adds `test_cleanup_orphaned_asset_state_store_batches_deletes` (first 
coverage for this method): with the per-tick cap patched to two, the first 
cleanup leaves one orphaned asset pending and the second drains it, while the 
active asset is never touched. Verified the test fails against the previous 
unbounded delete and passes with the bound in place.
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Opus 4.8)
   
   Generated-by: Claude Code (Opus 4.8) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to