(airflow) branch main updated: Document selective-checks algorithm and heuristics (#68116)

shahar Fri, 05 Jun 2026 21:35:48 -0700

This is an automated email from the ASF dual-hosted git repository.

shahar1 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git



The following commit(s) were added to refs/heads/main by this push:
     new 97e90643dbf Document selective-checks algorithm and heuristics (#68116)
97e90643dbf is described below

commit 97e90643dbf1ae12a4411c4bc822cbd54eeb4d8d
Author: Jarek Potiuk <[email protected]>
AuthorDate: Sat Jun 6 06:35:25 2026 +0200

    Document selective-checks algorithm and heuristics (#68116)
    
    Expand dev/breeze/doc/ci/04_selective_checks.md with a "why and how"
    explanation of selective checks: the optimisation goal, the design
    assumptions (changed files as a risk proxy, conservative-when-in-doubt,
    trust earned on main), and a mental model of events -> run mode ->
    file-group matches -> outputs.
    
    Add Mermaid diagrams for the decision pipeline (run-mode selection, the
    full_tests_needed short-circuit ladder, mode -> jobs, test-type
    selection), a table of individually-simple file->effect rules, worked
    examples for pull requests / pushes / release-branch pushes / canary
    runs, and a troubleshooting section for "my CI run does too much".
    
    Refresh the existing decision-rules list to match current behaviour
    (release-branch and only-text PUSH exceptions, API-contract vs API-source
    distinction). Add an AGENTS.md rule to keep selective_checks.py and this
    doc in sync.
    
    Co-authored-by: Claude Opus 4.8 (1M context) <[email protected]>
---
 AGENTS.md                                |   1 +
 dev/breeze/doc/ci/04_selective_checks.md | 306 ++++++++++++++++++++++++++++++-
 2 files changed, 303 insertions(+), 4 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index 0e51013c2cf..10ee55a9ee9 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -141,6 +141,7 @@ reported as such are described in "What is NOT considered a 
security vulnerabili
 - Bulk `DELETE`/`UPDATE` in the scheduler loop or any synchronous interval 
task (e.g. `call_regular_interval` callbacks) must be batched with `LIMIT` and 
committed between batches — never issue a single unbounded bulk write against a 
user-driven table. Unbounded bulk writes hold row locks for the entire 
transaction (blocking concurrent writers) and stall the scheduler main loop. 
Filter columns used by the cleanup must be indexed. Follow the batching pattern 
in `airflow-core/src/airflow/u [...]
 - Name functions and methods with action verbs: `get_`, `extract_`, `find_`, 
`compute_`, `build_`, etc. Avoid noun-only names like `_serialize_keys` or 
`_base_names` — they read as attributes, not callables. Predicates (`is_`, 
`has_`) are the one exception.
 - Apache License header on all new files (prek enforces this).
+- **Keep selective-checks behaviour and its documentation in sync.** The CI 
optimisation logic lives in 
`dev/breeze/src/airflow_breeze/utils/selective_checks.py` (run-mode decisions, 
file-group matching, test-type selection, prek-hook skipping). Whenever you 
change a rule there — add/rename a file group, change what forces 
`full_tests_needed`/`all_versions`, alter how providers or test types are 
selected, or change which prek hooks are skipped — update 
[`dev/breeze/doc/ci/04_selective_ch [...]
 - Newsfragments are only used by distributions whose release process consumes 
them via towncrier — currently `airflow-core/newsfragments/`, 
`chart/newsfragments/`, and `dev/mypy/newsfragments/` — and only for major or 
breaking changes. **Golden rule: never create a newsfragment unless you are 
certain the change is user-facing.** If you are not sure the change is visible 
to users — build/release tooling, CI, packaging, internal refactors with no 
behavior change, dev-only scripts, and test [...]
 
 ## Testing Standards
diff --git a/dev/breeze/doc/ci/04_selective_checks.md 
b/dev/breeze/doc/ci/04_selective_checks.md
index fb0f9ef2e11..2076aa280cd 100644
--- a/dev/breeze/doc/ci/04_selective_checks.md
+++ b/dev/breeze/doc/ci/04_selective_checks.md
@@ -22,6 +22,12 @@
 **Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
 
 - [Selective CI Checks](#selective-ci-checks)
+  - [Why selective checks exist (the optimisation 
goal)](#why-selective-checks-exist-the-optimisation-goal)
+  - [Mental model: events, modes, file groups, 
outputs](#mental-model-events-modes-file-groups-outputs)
+  - [The decision pipeline](#the-decision-pipeline)
+  - [Individually simple rules](#individually-simple-rules)
+  - [Worked examples: pull requests, pushes, and canary 
runs](#worked-examples-pull-requests-pushes-and-canary-runs)
+  - [Troubleshooting: my CI run does too 
much](#troubleshooting-my-ci-run-does-too-much)
   - [Groups of files that selective check make decisions 
on](#groups-of-files-that-selective-check-make-decisions-on)
   - [Selective check decision rules](#selective-check-decision-rules)
   - [Skipping prek hooks (Static checks)](#skipping-prek-hooks-static-checks)
@@ -39,6 +45,290 @@ kind of changes. The logic implemented reflects the 
internal architecture of Air
 and it helps to keep down both the usage of jobs in GitHub Actions and CI 
feedback time to
 contributors in case of simpler changes.
 
+> [!NOTE]
+> All the logic described here lives in a single file:
+> 
[`dev/breeze/src/airflow_breeze/utils/selective_checks.py`](../../src/airflow_breeze/utils/selective_checks.py).
+> When you change that file, **update this document in the same PR** so the 
behaviour and the
+> documentation stay in sync.
+
+## Why selective checks exist (the optimisation goal)
+
+Airflow's full CI matrix is large: multiple Python versions, multiple 
databases (Postgres, MySQL,
+SQLite), Kubernetes versions, Helm tests, 100+ provider packages, UI tests, 
doc builds, image builds
+for two architectures, and dozens of static checks. Running *everything* for 
*every* change would burn
+an enormous amount of GitHub Actions minutes and make contributors wait an 
hour or more for feedback on
+a one-line doc fix.
+
+Selective checks exist to answer one question, cheaply and conservatively:
+
+> **Given exactly which files changed (plus the event, the branch and the 
labels), what is the
+> smallest set of CI work that still gives us confidence the change is 
correct?**
+
+Three design assumptions drive everything:
+
+1. **The changed file list is a reliable proxy for risk.** A change to 
`airflow-core/.../scheduler`
+   can break anything, so it must run broadly. A change confined to 
`providers/amazon` can only
+   plausibly break Amazon (and its direct dependents), so only those need to 
run. A change to a
+   `.md` file can't break runtime behaviour at all.
+2. **Be conservative when in doubt — never trade correctness for speed.** 
Every rule is biased toward
+   *running more*. If a change touches something foundational (the 
environment, the CI scripts, the
+   dependency set, the API contract, core test utilities), selective checks 
fall back to the **full
+   matrix**. Skipping is only ever done when a rule can *prove* the skipped 
work is irrelevant.
+3. **Trust must be earned on `main`.** Pull-request runs are optimised 
aggressively for fast feedback,
+   but the *merge* to `main` (and the nightly **canary** run) always runs the 
full matrix on all
+   versions. So even if a PR-time optimisation was too aggressive, the canary 
catches it before a
+   release.
+
+## Mental model: events, modes, file groups, outputs
+
+Selective checks take a handful of **inputs** and produce a set of GitHub 
Actions **outputs** that the
+workflows read to decide what to run. Everything in between is pure, 
deterministic computation — same
+inputs always give the same outputs, and you can reproduce it locally (see
+[Troubleshooting](#troubleshooting-my-ci-run-does-too-much)).
+
+```mermaid
+flowchart LR
+    subgraph IN[Inputs]
+        F[changed files]
+        EV[github event<br/>PR / PUSH / SCHEDULE / DISPATCH]
+        BR[target branch<br/>main / v3-x-test]
+        LB[PR labels]
+        RP[repository<br/>apache/airflow or fork]
+    end
+    IN --> SC[SelectiveChecks]
+    SC --> MODE[Run 
MODE<br/>full_tests_needed?<br/>all_versions?<br/>is_canary_run?]
+    SC --> MATCH[FILE-GROUP matches<br/>regex per area]
+    MODE --> OUT[GitHub Actions outputs]
+    MATCH --> OUT
+    OUT --> JOBS[which jobs run<br/>which versions<br/>which test 
types<br/>which prek hooks]
+```
+
+The two halves that matter are:
+
+* **The run MODE** — a few booleans (`full_tests_needed`, `all_versions`, 
`is_canary_run`) computed
+  mostly from the *event*, the *branch* and a handful of high-impact 
files/labels. The mode decides
+  whether we take the "run everything" shortcut or the "be selective" path.
+* **The FILE-GROUP matches** — every area of the codebase (UI, Helm, docs, 
Kubernetes, each provider,
+  each mypy target, …) has a list of regexps in `CI_FILE_GROUP_MATCHES`. A 
group is "active" when at
+  least one changed file matches it. In selective mode, active groups are what 
turn individual jobs on.
+
+The key helper that ties the two together is `_should_be_run(group)`:
+
+```text
+_should_be_run(group):
+    if full_tests_needed:  return True          # full mode → every area is 
"on"
+    return (any changed file matches group)     # selective mode → only 
matched areas are "on"
+```
+
+Almost every `run_*` / `*_build` output is just `_should_be_run(<some 
group>)`, so once you understand
+those two halves you understand the whole system.
+
+## The decision pipeline
+
+The computation happens in three conceptual stages. Each individual rule is 
tiny; the apparent
+complexity is only the *number* of rules, not their difficulty.
+
+### Stage 1 — pick the run mode
+
+First selective checks decide whether this run takes the **full-matrix 
shortcut**. The foundational
+helper is `_should_run_all_tests_and_versions()`, which is driven by the 
*event* and a few
+high-blast-radius signals:
+
+```mermaid
+flowchart TD
+    A([_should_run_all_tests_and_versions]) --> B{event is<br/>PUSH / SCHEDULE 
/ DISPATCH?}
+    B -->|yes| C{PUSH and only<br/>.txt/.md files changed?}
+    C -->|yes| NO1[False<br/>skip full tests]
+    C -->|no| D{PUSH and<br/>branch != main?}
+    D -->|yes - release branch push| NO2[False<br/>selective only]
+    D -->|no| YES1[True<br/>run everything]
+    B -->|no - PULL_REQUEST| E{commit_ref missing?}
+    E -->|yes| YES2[True]
+    E -->|no| Fp{pyproject.toml changed?}
+    Fp -->|yes| YES3[True]
+    Fp -->|no| G{generated provider<br/>dependencies changed?}
+    G -->|yes| YES4[True]
+    G -->|no| NO3[False]
+```
+
+`full_tests_needed` then layers a short-circuit ladder on top: it is `True` if
+`_should_run_all_tests_and_versions()` is true **or** any one of a small list 
of high-impact file
+groups changed **or** the `full tests needed` label is set. The first matching 
rule wins — order does
+not matter because any single hit forces the full matrix:
+
+```mermaid
+flowchart TD
+    A([full_tests_needed]) --> M{_should_run_all_tests_and_versions?}
+    M -->|yes| T[TRUE - full matrix]
+    M -->|no| E1{environment files?<br/>.github/workflows, 
dev/breeze/src,<br/>Dockerfile, scripts/ci, ...}
+    E1 -->|yes| T
+    E1 -->|no| E2{API contract / codegen?<br/>generated OpenAPI spec or 
generator}
+    E2 -->|yes| T
+    E2 -->|no| E3{git or standard<br/>provider files?}
+    E3 -->|yes| T
+    E3 -->|no| E4{core test utils?<br/>tests/utils}
+    E4 -->|yes| T
+    E4 -->|no| E5{'full tests needed' label?}
+    E5 -->|yes| T
+    E5 -->|no| Fa[FALSE - be selective]
+```
+
+Two related booleans share this machinery:
+
+* **`all_versions`** — whether to expand the Python/Kubernetes/DB version axis 
to *all* supported
+  versions instead of just the defaults. Labels (`default versions only`, 
`latest versions only`,
+  `all versions`) override; otherwise it follows 
`_should_run_all_tests_and_versions()`.
+* **`is_canary_run`** — `True` for SCHEDULE/PUSH/DISPATCH events **on 
`apache/airflow`** (not forks)
+  when more than just `.txt`/`.md` changed, or whenever the `canary` label is 
set. Canary runs always
+  run unit tests even when `full_tests_needed` is `False` (this is what makes 
a *release-branch push*
+  still run its relevant unit tests — see the examples).
+
+### Stage 2 — decide which jobs run
+
+In **full mode** every area is "on". In **selective mode**, each job is gated 
on its file group via
+`_should_be_run`. The image builds are derived (they turn on only because 
something that needs them
+turned on):
+
+```mermaid
+flowchart TD
+    FTN{full_tests_needed?} -->|yes| ALL[every run_* area = ON<br/>all 
versions, all test types]
+    FTN -->|no| SEL[selective: gate each area on its file group]
+    SEL --> RUT{run_unit_tests?}
+    RUT -->|canary OR any source file matched| UNIT[unit tests ON<br/>test 
types chosen by matched files]
+    RUT -->|only new-UI files, or no source files| NOUNIT[unit tests OFF]
+    SEL --> AREAS[run_ui_tests, run_kubernetes_tests,<br/>run_helm_tests*, 
docs_build, run_api_tests,<br/>run_python_scans, run_task_sdk_tests, ...]
+    UNIT --> IMG
+    AREAS --> IMG
+    IMG[ci_image_build = unit OR docs OR k8s OR ui<br/>OR 
pyproject/provider-yaml changed OR prod_image_build]
+    AREAS --> PROD[prod_image_build = k8s OR helm OR<br/>integration/e2e tests]
+    PROD --> IMG
+```
+
+`*` Helm tests only run on the `main` branch.
+
+### Stage 3 — choose which test types
+
+When unit tests run, selective checks narrow *which* test types execute, 
separately for **core** and
+**providers**:
+
+* **Core test types** (`_get_core_test_types_to_run`): `Always` is always 
included. Each specific type
+  (`API`, `CLI`, `Serialization`, …) is added if its files matched. Then the 
**escape hatch**: if any
+  changed source file is left over after removing provider files, test files, 
UI files, etc. — i.e.
+  something "core/other" changed — *all* core test types are added, because a 
core change can affect
+  anything. In full mode, all core types run unconditionally.
+* **Provider test types** (`_get_providers_test_types_to_run`): empty on 
non-`main` branches. In full
+  mode (or when dependencies were upgraded) → `Providers` (all). Otherwise 
selective checks compute the
+  **affected providers** from the changed files and add their **direct 
upstream and downstream
+  dependents** (not the whole transitive closure). Changes to *common* 
provider code (tests/utils that
+  don't belong to a single provider) escalate to *all* providers. Suspended 
providers are excluded (and
+  a PR that touches one fails unless it carries the `allow suspended provider 
changes` label).
+
+The same matched-file approach drives the **prek hook skip list** 
(`skip_prek_hooks`): each mypy /
+compile / lint hook is skipped when nothing in its area changed. See
+[Skipping prek hooks](#skipping-prek-hooks-static-checks).
+
+## Individually simple rules
+
+The list of rules is long, but each rule is a one-liner you can reason about 
in isolation. A few
+representative examples (file → effect):
+
+| A change to …                                         | … turns on           
                                                 | because |
+|-------------------------------------------------------|-----------------------------------------------------------------------|---------|
+| `README.md` only                                      | (almost) nothing     
                                                 | `.md`/`.txt` are 
text-non-doc; on a push this even skips full tests |
+| `airflow-core/docs/...rst`                            | `docs_build` (+ CI 
image)                                             | matches `DOC_FILES`; docs 
need the CI image to render |
+| `providers/amazon/.../s3.py`                          | unit tests with 
`Providers[amazon, …dependents]` + `Always`          | a provider file → only 
that provider and its direct dependents |
+| `airflow-core/src/airflow/jobs/scheduler_job_runner.py` | unit tests with 
**all** core test types                            | a core/other file → escape 
hatch runs all core types |
+| `pyproject.toml`                                       | **full matrix, all 
versions**                                        | dependency surface changed 
→ `_should_run_all_tests_and_versions` |
+| `.github/workflows/ci-amd.yml` or `dev/breeze/src/...` | **full matrix**     
                                                 | environment files → can 
change the whole CI environment |
+| the generated OpenAPI spec                             | **full matrix**     
                                                 | the API *contract* ripples 
to UI codegen + every client |
+| `chart/templates/...yaml` (on `main`)                  | `run_helm_tests` (+ 
PROD image)                                      | matches `HELM_FILES`; Helm 
tests only on `main` |
+| `airflow-core/src/airflow/ui/...tsx` only              | `run_ui_tests`, 
**no** unit tests                                    | "only new-UI files" 
short-circuit skips Python unit tests |
+
+The "complexity" you feel reading the code is just *many* such rules stacked 
up — each one on its own
+is simple, conservative, and independently testable (see 
`dev/breeze/tests/test_selective_checks.py`).
+
+## Worked examples: pull requests, pushes, and canary runs
+
+The same engine produces very different results depending on the **event**, 
the **branch** and the
+**files**. These four scenarios cover the common cases:
+
+```mermaid
+flowchart TD
+    subgraph PR[Pull request - small provider change]
+        PR1[event=PULL_REQUEST<br/>files=providers/amazon/.../s3.py] --> 
PR2[full_tests_needed=False<br/>all_versions=False<br/>is_canary_run=False]
+        PR2 --> PR3[unit tests: Providers amazon + Always<br/>default versions 
only<br/>amazon prek/mypy only]
+    end
+    subgraph PUSHMAIN[Push / merge to main]
+        PM1[event=PUSH, branch=main] --> 
PM2[full_tests_needed=True<br/>all_versions=True<br/>is_canary_run=True]
+        PM2 --> PM3[full matrix, all versions,<br/>all canary-only jobs]
+    end
+    subgraph RELPUSH[Push to release branch v3-x-test]
+        RP1[event=PUSH, branch!=main] --> 
RP2[full_tests_needed=False<br/>is_canary_run=True]
+        RP2 --> RP3[unit tests run via canary<br/>but selective test 
types<br/>default versions]
+    end
+    subgraph CANARY[Scheduled nightly canary]
+        CN1[event=SCHEDULE on apache/airflow] --> 
CN2[full_tests_needed=True<br/>all_versions=True<br/>is_canary_run=True]
+        CN2 --> CN3[everything, all versions]
+    end
+```
+
+1. **Pull request, small provider change** (`providers/amazon/.../s3.py`). Not 
an environment/contract
+   file, `commit_ref` present, no special label → `full_tests_needed=False`, 
`all_versions=False`.
+   Result: unit tests run only `Providers[amazon]` (plus amazon's direct 
dependents) and `Always`, on
+   default versions only; only the amazon-related prek/mypy hooks run. Fast PR 
feedback.
+2. **Pull request, core change** (`scheduler_job_runner.py`). Still 
`full_tests_needed=False`, but the
+   core/other escape hatch adds **all core test types**. Providers are not 
pulled in (no provider files
+   changed). Default versions.
+3. **Pull request that changes `pyproject.toml`** (or `.github/workflows/...`, 
or the OpenAPI spec).
+   `full_tests_needed=True` (and for `pyproject.toml` also 
`all_versions=True`). The PR runs the **full
+   matrix** — same as a canary — because the change can affect everything.
+4. **Push / merge to `main`.** `_should_run_all_tests_and_versions()` is true 
for PUSH on `main` →
+   `full_tests_needed=True`, `all_versions=True`, `is_canary_run=True`. The 
full matrix plus all
+   canary-only jobs run. This is the safety net that backstops aggressive 
PR-time optimisation.
+5. **Push to a release branch** (`v3-1-test`). PUSH but `branch != main` → 
`full_tests_needed=False`
+   (selective), yet `is_canary_run=True`, so `run_unit_tests=True`. The branch 
runs its *relevant*
+   unit tests on default versions rather than the entire matrix — release 
branches don't need the full
+   cross-version sweep on every push.
+6. **Scheduled nightly canary** (SCHEDULE on `apache/airflow`). Full matrix on 
all versions, every
+   canary-only job. The most thorough run; it is what gives us confidence that 
the PR-time shortcuts
+   never let a regression through.
+
+## Troubleshooting: my CI run does too much
+
+If a CI run is slower or broader than you expect (full matrix on a small 
change, all providers running,
+all versions), the cause is almost always a single rule that fired. To find it:
+
+1. **Read the `selective-checks` job output.** Selective checks print a 
`[warning] …` line for every
+   decision, e.g. *"Running full set of tests because env files changed"* or 
*"Running everything with
+   all versions: changed pyproject.toml"*. That line names the exact rule (and 
usually the file group)
+   that escalated the run. This is the fastest way to diagnose.
+2. **Reproduce locally** with Breeze, pointing at the squashed commit of your 
change:
+
+   ```bash
+   breeze selective-checks --commit-ref <commit_sha>
+   ```
+
+   It prints the same outputs and the same `[warning]` reasons CI uses, so you 
can iterate without
+   pushing.
+3. **Check the usual escalation triggers** (any one of these forces the full 
matrix):
+   * an **environment file** changed — `.github/workflows/*`, 
`dev/breeze/src/*`, `Dockerfile*`,
+     `scripts/ci/*`, `scripts/docker/*`, 
`generated/provider_dependencies.json` (often this is the
+     surprise: editing CI/breeze itself runs everything);
+   * **`pyproject.toml`** or generated provider dependencies changed (also 
forces `all_versions`);
+   * the **generated OpenAPI spec** or the client generator changed (the API 
contract);
+   * **`tests/utils`** or **git/standard provider** files changed;
+   * the **`full tests needed`** or **`all versions`** label is set on the PR.
+4. **All providers running?** That means selective checks decided *all* 
providers are affected — usually
+   because *common* provider code (shared tests/utils not owned by one 
provider) changed, or because
+   dependencies were upgraded, or `full_tests_needed` is on. The reason is 
printed in the
+   provider-selection `[warning]` lines.
+5. **Want to confirm an optimisation is safe?** Remember the canary on `main` 
always runs everything —
+   the worst case of an over-aggressive *skip* is caught at merge time, not in 
production.
+
+The authoritative, exhaustive rule list (kept in sync with the code) is in
+[Selective check decision rules](#selective-check-decision-rules) below; the 
sections above are the
+"why" and the shape, this is the "what" in full detail.
+
 ## Groups of files that selective check make decisions on
 
 We have the following Groups of files for CI that determine which tests are 
run:
@@ -95,11 +385,19 @@ together using `pytest-xdist` (pytest-xdist distributes 
the tests among parallel
 
 ## Selective check decision rules
 
-* `Full tests` case is enabled when the event is PUSH, or SCHEDULE or we miss 
commit info or any of the
-  important environment files (`pyproject.toml`, `Dockerfile`, `scripts`,
-  `generated/provider_dependencies.json` etc.) changed or  when `full tests 
needed` label is set.
+* `Full tests` case is enabled when the event is PUSH **to `main`**, SCHEDULE 
or WORKFLOW_DISPATCH, or we
+  miss commit info, or any of the important environment files 
(`pyproject.toml`, `Dockerfile`, `scripts`,
+  `generated/provider_dependencies.json` etc.) changed, or the API *contract* 
changed (the generated
+  OpenAPI spec or the client generator — plain API source/test edits that 
leave the committed spec
+  untouched do **not** force full tests), or `tests/utils` / git / standard 
provider files changed, or
+  when the `full tests needed` label is set.
   That enables all matrix combinations of variables (representative) and all 
possible test type. No further
-  checks are performed. See also [1] note below.
+  checks are performed. See also [1] note below. Two exceptions narrow this: a 
PUSH that changed **only**
+  `.txt`/`.md` files skips full tests, and a PUSH to a **release branch** 
(`v3-X-test`, i.e. not `main`)
+  runs selective tests only (it is still a `canary` run, so its *relevant* 
unit tests run on default
+  versions). The high-level flow and worked examples for these cases are in
+  [The decision pipeline](#the-decision-pipeline) and
+  [Worked examples](#worked-examples-pull-requests-pushes-and-canary-runs) 
above.
 * Python, Kubernetes, Backend, Kind, Helm versions are limited to "defaults" 
only unless `Full tests` mode
   is enabled.
 * `Python scans`, `Javascript scans`, `API tests/codegen`, `UI`, `WWW`, 
`Kubernetes` tests and `DOC builds`

(airflow) branch main updated: Document selective-checks algorithm and heuristics (#68116)

Reply via email to