Re: [I] Implement tiered CI approach [datafusion-comet]

via GitHub Thu, 21 May 2026 12:42:18 -0700


andygrove commented on issue #4389:
URL: 
https://github.com/apache/datafusion-comet/issues/4389#issuecomment-4512172121


   Proposing a minimal first step toward tiered CI: split the existing 
`spark_sql_test.yml` matrix so PRs run a 2-version subset and the GitHub merge 
queue covers the remaining versions. Push to `main` and `workflow_dispatch` 
continue to run all four versions.
   
   ### Proposed split
   
   | Trigger | Spark versions | Jobs (modules x Sparks) |
   |---|---|---|
   | `pull_request` | 3.5.8, 4.1.1 | 7 x 2 = 14 |
   | `merge_group` | 3.4.3, 4.0.2 | 7 x 2 = 14 |
   | `push` to `main` | 3.4.3, 3.5.8, 4.0.2, 4.1.1 | 7 x 4 = 28 (unchanged) |
   
   Net effect per PR: -14 `spark-sql-test/*` jobs from this workflow. The merge 
queue adds 14 jobs, but each PR triggers `merge_group` only once when it enters 
the queue.
   
   ### Mechanism
   
   1. Add `merge_group:` to the workflow `on:` triggers (no `paths:` filter so 
queue runs always execute).
   2. Replace the static `config:` matrix with an event-conditional `\${{ ... 
}}` expression that selects the Spark version subset based on 
`github.event_name`.
   3. Add a single rollup job `spark-sql-test-status` (`needs: spark-sql-test`, 
`if: always()`) that becomes the sole required status check. This decouples 
branch protection from matrix shape so future reshapes do not require 
re-editing required checks.
   
   ### Branch protection coordination
   
   The per-version status check names 
(`spark-sql-{module}/spark-{full}-jdk{java}`) will no longer be produced on PRs 
after this change. Branch protection must be updated to require `Spark SQL 
Tests Status` instead, before merging. Suggested sequence: remove per-version 
required checks, merge the workflow change, verify a `merge_group` run, then 
add the rollup as the sole required check.
   
   ### Scope
   
   In scope: one file only, `.github/workflows/spark_sql_test.yml`.
   
   Out of scope (future follow-ups): `pr_build_linux.yml` (Comet's own tests 
still run all five Spark profiles on PR), `iceberg_spark_test.yml`, macOS, 
benchmarks, fuzz, nightly cron tier. Each of these can be tackled as its own 
follow-up.
   
   ### Branch
   
   The proposed change is on 
[andygrove:ci/tiered-spark-sql-test-matrix](https://github.com/andygrove/datafusion-comet/tree/ci/tiered-spark-sql-test-matrix).
 Not opening a PR yet, pending feedback on the approach and the 
branch-protection coordination plan above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Implement tiered CI approach [datafusion-comet]

Reply via email to