dkranchii opened a new issue, #68243:
URL: https://github.com/apache/airflow/issues/68243

   ### Under which category would you file this issue?
   
   Providers
   
   ### Apache Airflow version
   
   3.3.0.dev — current development branch
   
   ### What happened and how to reproduce it?
   
   ## Context
   
   `airflow.models.trigger.Trigger.clean_unused()` is called every triggerer 
tick from `TriggererJobRunner._run_trigger_loop` and currently issues an 
unbounded `DELETE FROM trigger ...` (with a MySQL / non-MySQL fork for the 
`DELETE ... JOIN` problem). It has the same anti-pattern that the 
scheduler-side `SchedulerJobRunner._remove_unreferenced_triggers` was just 
fixed for: holds row locks on `trigger` for the full transaction and stalls the 
triggerer loop while many rows are removed.
   
   Follow-up to #68241, which fixed the equivalent scheduler-side path and 
introduced the `[scheduler] unreferenced_triggers_cleanup_batch_size` config.
   
   ## Proposed fix
   
   Apply the same LIMIT-bounded `select-IDs + delete-by-IDs + 
commit-between-batches` pattern that the scheduler-side fix uses, modelled on 
`airflow.utils.db_cleanup._do_delete` and `airflow.state.metastore.cleanup`. 
Either:
   
   - Reuse the existing `[scheduler] unreferenced_triggers_cleanup_batch_size` 
config, or
   - Add a triggerer-section twin (e.g. `[triggerer] 
unreferenced_triggers_cleanup_batch_size`) if reviewers prefer per-component 
tuning.
   
   ## Acceptance criteria
   
   - The bulk `DELETE` in `Trigger.clean_unused()` is replaced by a batched 
loop.
   - A unit test asserts multiple commits when the matching set exceeds the 
batch size.
   - Existing `clean_unused` tests keep passing (semantics unchanged at default 
batch size).
   
   ## References
   
   - AGENTS.md rule: [batched bulk DELETE/UPDATE in scheduler / interval 
callbacks](https://github.com/apache/airflow/blob/main/AGENTS.md#coding-standards)
   - Template pattern: `airflow-core/src/airflow/utils/db_cleanup.py` — 
`_do_delete`
   - Template pattern: `airflow-core/src/airflow/state/metastore.py` — `cleanup`
   
   
   ### What you think should happen instead?
   
   This is a follow-up tracking issue (not a bug report). It captures deferred 
work from PR #68241, which fixed the scheduler-side 
`_remove_unreferenced_triggers` cleanup but intentionally left the parallel 
`Trigger.clean_unused()` path in the triggerer loop for a separate PR.
   
   The triggerer's `airflow.models.trigger.Trigger.clean_unused()` runs on 
every triggerer tick and issues an unbounded `DELETE FROM trigger ...` (with a 
MySQL / non-MySQL fork for the `DELETE ... JOIN` problem). On busy installs 
this holds row locks on `trigger` for the full transaction and stalls the 
triggerer loop while many rows are removed — the same anti-pattern the 
AGENTS.md rule about batched bulk DELETE/UPDATE in scheduler/interval callbacks 
calls out.
   
   ### Operating System
   
   N/A — not OS-specific (server-side scheduler/triggerer behaviour, applies to 
all deployments)
   
   ### Deployment
   
   None
   
   ### Apache Airflow Provider(s)
   
   _No response_
   
   ### Versions of Apache Airflow Providers
   
   N/A — not provider-specific (issue is in airflow-core's triggerer cleanup 
path)
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   _No response_
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to