Rani367 commented on issue #45192:
URL: https://github.com/apache/airflow/issues/45192#issuecomment-4179512237

   I've been following this discussion. I built a tool called 
[`affected`](https://github.com/Rani367/affected) that addresses the core 
problem here — detecting which providers (and their transitive dependents) are 
impacted by a change, and running only those tests.
   
   I cloned this repo and ran it. It auto-detected **128 packages** (core, 
task-sdk, all 80+ providers, shared modules, test suites):
   
   ```
   Ecosystem: python
   Detected: python
   Packages (128 found):
     ● apache-airflow-core                      airflow-core
     ● apache-airflow-providers-amazon          providers/amazon
     ● apache-airflow-providers-google          providers/google
     ● apache-airflow-providers-common-compat   providers/common/compat
     ... (128 total)
   ```
   
   Here's what it finds for the last 15 commits on `main`:
   
   | Commit | Affected | Total | Reduction |
   |--------|----------|-------|-----------|
   | `a7bfdf4` Fix grammar in dag model docstring | 107 | 128 | **16%** |
   | `6339d09` Fix group/extra bug in virtualenv | 108 | 128 | **16%** |
   | `01d0df1` Add OpenLineage to GlueJobOperator | 111 | 128 | **13%** |
   | `9cad4ac` Make common-ai provider to ready state | 113 | 128 | **12%** |
   
   The reduction is modest when core changes (which cascade to everything), but 
the key value is **for provider-only PRs** — a change to just 
`providers/amazon` would only test the amazon provider and its dependents, not 
all 128 packages.
   
   It also shows **why** each package is affected:
   
   ```
   ● apache-airflow-providers-amazon    (directly changed: 
providers/amazon/src/.../glue.py)
   ● apache-airflow-providers-common-compat (directly changed: 
providers/common/compat/.../spark.py)
   ● apache-airflow-providers-airbyte   (depends on: ...airbyte → 
...common-compat)
   ● apache-airflow-providers-google    (depends on: ...google → 
...common-compat)
   ```
   
   ### How it works
   
   1. Scans `pyproject.toml` files to build a dependency graph
   2. Diffs changed files against a base ref (branch, commit, or merge-base)
   3. Computes the transitive closure of affected packages
   4. Outputs CI variables: `affected ci --format github` produces a JSON 
matrix directly consumable by GitHub Actions
   
   It's a single 5MB binary (Rust, MIT licensed). Zero config — it 
auto-detected this repo's Python workspace structure with no setup.
   
   This could potentially complement or simplify parts of the selective-checks 
system for provider-scoped PRs. Would it be worth evaluating, or is the 
existing system covering this well enough?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to