dkranchii opened a new issue, #68243: URL: https://github.com/apache/airflow/issues/68243
### Under which category would you file this issue? Providers ### Apache Airflow version 3.3.0.dev — current development branch ### What happened and how to reproduce it? ## Context `airflow.models.trigger.Trigger.clean_unused()` is called every triggerer tick from `TriggererJobRunner._run_trigger_loop` and currently issues an unbounded `DELETE FROM trigger ...` (with a MySQL / non-MySQL fork for the `DELETE ... JOIN` problem). It has the same anti-pattern that the scheduler-side `SchedulerJobRunner._remove_unreferenced_triggers` was just fixed for: holds row locks on `trigger` for the full transaction and stalls the triggerer loop while many rows are removed. Follow-up to #68241, which fixed the equivalent scheduler-side path and introduced the `[scheduler] unreferenced_triggers_cleanup_batch_size` config. ## Proposed fix Apply the same LIMIT-bounded `select-IDs + delete-by-IDs + commit-between-batches` pattern that the scheduler-side fix uses, modelled on `airflow.utils.db_cleanup._do_delete` and `airflow.state.metastore.cleanup`. Either: - Reuse the existing `[scheduler] unreferenced_triggers_cleanup_batch_size` config, or - Add a triggerer-section twin (e.g. `[triggerer] unreferenced_triggers_cleanup_batch_size`) if reviewers prefer per-component tuning. ## Acceptance criteria - The bulk `DELETE` in `Trigger.clean_unused()` is replaced by a batched loop. - A unit test asserts multiple commits when the matching set exceeds the batch size. - Existing `clean_unused` tests keep passing (semantics unchanged at default batch size). ## References - AGENTS.md rule: [batched bulk DELETE/UPDATE in scheduler / interval callbacks](https://github.com/apache/airflow/blob/main/AGENTS.md#coding-standards) - Template pattern: `airflow-core/src/airflow/utils/db_cleanup.py` — `_do_delete` - Template pattern: `airflow-core/src/airflow/state/metastore.py` — `cleanup` ### What you think should happen instead? This is a follow-up tracking issue (not a bug report). It captures deferred work from PR #68241, which fixed the scheduler-side `_remove_unreferenced_triggers` cleanup but intentionally left the parallel `Trigger.clean_unused()` path in the triggerer loop for a separate PR. The triggerer's `airflow.models.trigger.Trigger.clean_unused()` runs on every triggerer tick and issues an unbounded `DELETE FROM trigger ...` (with a MySQL / non-MySQL fork for the `DELETE ... JOIN` problem). On busy installs this holds row locks on `trigger` for the full transaction and stalls the triggerer loop while many rows are removed — the same anti-pattern the AGENTS.md rule about batched bulk DELETE/UPDATE in scheduler/interval callbacks calls out. ### Operating System N/A — not OS-specific (server-side scheduler/triggerer behaviour, applies to all deployments) ### Deployment None ### Apache Airflow Provider(s) _No response_ ### Versions of Apache Airflow Providers N/A — not provider-specific (issue is in airflow-core's triggerer cleanup path) ### Official Helm Chart version Not Applicable ### Kubernetes Version _No response_ ### Helm Chart configuration _No response_ ### Docker Image customizations _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
