Rani367 commented on issue #45192: URL: https://github.com/apache/airflow/issues/45192#issuecomment-4179512237
I've been following this discussion. I built a tool called [`affected`](https://github.com/Rani367/affected) that addresses the core problem here — detecting which providers (and their transitive dependents) are impacted by a change, and running only those tests. I cloned this repo and ran it. It auto-detected **128 packages** (core, task-sdk, all 80+ providers, shared modules, test suites): ``` Ecosystem: python Detected: python Packages (128 found): ● apache-airflow-core airflow-core ● apache-airflow-providers-amazon providers/amazon ● apache-airflow-providers-google providers/google ● apache-airflow-providers-common-compat providers/common/compat ... (128 total) ``` Here's what it finds for the last 15 commits on `main`: | Commit | Affected | Total | Reduction | |--------|----------|-------|-----------| | `a7bfdf4` Fix grammar in dag model docstring | 107 | 128 | **16%** | | `6339d09` Fix group/extra bug in virtualenv | 108 | 128 | **16%** | | `01d0df1` Add OpenLineage to GlueJobOperator | 111 | 128 | **13%** | | `9cad4ac` Make common-ai provider to ready state | 113 | 128 | **12%** | The reduction is modest when core changes (which cascade to everything), but the key value is **for provider-only PRs** — a change to just `providers/amazon` would only test the amazon provider and its dependents, not all 128 packages. It also shows **why** each package is affected: ``` ● apache-airflow-providers-amazon (directly changed: providers/amazon/src/.../glue.py) ● apache-airflow-providers-common-compat (directly changed: providers/common/compat/.../spark.py) ● apache-airflow-providers-airbyte (depends on: ...airbyte → ...common-compat) ● apache-airflow-providers-google (depends on: ...google → ...common-compat) ``` ### How it works 1. Scans `pyproject.toml` files to build a dependency graph 2. Diffs changed files against a base ref (branch, commit, or merge-base) 3. Computes the transitive closure of affected packages 4. Outputs CI variables: `affected ci --format github` produces a JSON matrix directly consumable by GitHub Actions It's a single 5MB binary (Rust, MIT licensed). Zero config — it auto-detected this repo's Python workspace structure with no setup. This could potentially complement or simplify parts of the selective-checks system for provider-scoped PRs. Would it be worth evaluating, or is the existing system covering this well enough? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
