shahar1 opened a new pull request, #67972:
URL: https://github.com/apache/airflow/pull/67972
### What
On `pull_request`, scan only the CodeQL languages whose files actually
changed, instead of always running all five (`python`, `javascript`, `actions`,
`go`, `java`). A small `detect-languages` job inspects the PR's changed files
and builds the analysis matrix dynamically. `push` (to `main`) and `schedule`
runs are unchanged — they always scan **every** language, so coverage of the
`main` branch is identical to today.
Result per PR:
- docs-only PR → CodeQL runs **nothing** (just the tiny detect job)
- the common python-only PR → **1** analysis job instead of 5
- multi-language PRs → only the languages they touch
### Why
CodeQL on PRs is **by far the most frequently triggered workflow in the
repo** — on the order of **~1,300+ runs/week** (≈ 87% of all CodeQL runs are
`pull_request`). Every one of those runs currently fans out one job per
language regardless of what changed, so it is a constant, high-volume
contributor to runner/concurrency pressure on the shared Actions pool.
Measuring a sample of recent PRs:
- ~67% are **python-only**, ~12% touch **no scannable code at all**
- `javascript` ~20%, `actions` ~2%, `go` ~1%, `java` ~0%
So the large majority of the language jobs we run on PRs scan code that did
not change. Gating the matrix cuts roughly **~55–60% of CodeQL PR minutes** and
**~80% of CodeQL job-starts**, while keeping full per-language coverage on
`main`.
### Relationship to #45541
#45541 ("CodeQL scanning can run always on all code") deliberately removed
conditional CodeQL logic, on the basis that *"CodeQL scanning is fast and
having custom configuration … makes it unnecessarily complex."* That was true
at the time — CodeQL then scanned **3 fast languages** (`python`, `javascript`,
`actions`).
Two things have changed since:
1. **`go` and especially `java` were added afterwards** — `java` via the
"Add Java SDK" change, which runs a full `setup-java` + `./gradlew classes
testClasses` **Gradle build on every PR**, even though `java-sdk` files change
in well under 1% of PRs. That materially breaks the "CodeQL is fast" premise
the always-on decision rested on.
2. The repo is now hitting **Actions capacity limits**, so the *frequency*
of this workflow (not just per-run cost) matters: trimming ~80% of its
job-starts directly relieves the shared concurrency pool.
The added complexity here is intentionally small and contained to one
workflow (a single detect job + a dynamic matrix), and only affects PR runs —
`main` scanning stays exactly as it is.
### Note for reviewers / branch protection
If `Analyze (…)` CodeQL contexts are configured as **required status
checks** in branch protection, conditionally-skipped matrix entries won't
report and could leave PRs pending. CI gating in this repo is driven by the
selective-checks Tests workflow rather than CodeQL, so this should be fine —
but please confirm the CodeQL checks are not in the required set before merging.
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Code (Opus 4.8)
Generated-by: Claude Code (Opus 4.8) following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]