andygrove opened a new pull request, #4126:
URL: https://github.com/apache/datafusion-comet/pull/4126
## Which issue does this PR close?
Part of #4113.
> **WIP / draft**: this PR is layered on top of #4119 and will not be
merge-ready until that lands. Until then the diff against `main` includes
#4119's changes too.
## Rationale for this change
#4119 added a build-only `spark-4.2` Maven profile targeting Spark
4.2.0-preview4. To start exercising Comet against 4.2 in CI (rather than
discovering everything at once when 4.2 GA lands), this PR turns on the
existing PR test matrices for Spark 4.2 and adds dedicated TPC-DS
plan-stability goldens.
This mirrors the approach previously used to bring Spark 4.1 online before
reverting (see commits `622e851e1` and `75e3b3116` on the `spark-4.1.1` branch).
## What changes are included in this PR?
- `.github/workflows/pr_build_linux.yml`: add `Spark 4.2, JDK 17` to the
`linux-test` matrix and a comment explaining why 4.1/4.2 are skipped from the
`lint-java` matrix (semanticdb-scalac is not yet published for Scala
2.13.17/2.13.18).
- `.github/workflows/pr_build_macos.yml`: add `Spark 4.2, JDK 17, Scala
2.13` to the `macos-aarch64-test` matrix.
- `spark/pom.xml`: wire iceberg/jetty test dependencies into the `spark-4.2`
profile (Iceberg falls back to the 4.0 runtime since 4.2 is not yet published;
Jetty pinned at 11.0.26).
- `spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala`:
add `isSpark42Plus` helper.
-
`spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala`:
route `isSpark42Plus` to the new `approved-plans-{v1_4,v2_7}-spark4_2`
directories.
- `dev/regenerate-golden-files.sh`: accept `--spark-version 4.2` and include
4.2 in the default version list.
-
`spark/src/test/resources/tpcds-plan-stability/approved-plans-{v1_4,v2_7}-spark4_2/`:
regenerated golden files. 22 of the generated files differ from the `spark4_0`
directory (`q2`, `q5`, `q33`, `q49`, `q54`, `q56`, `q60`, `q66` in v1_4 and
`q5a`, `q14a`, `q49` in v2_7, both `native_datafusion` and
`native_iceberg_compat` per query); the rest are byte-identical.
This PR does not attempt to fix any 4.2-specific runtime/test failures the
new matrix entries surface; those will be tracked and addressed in follow-up
PRs as we did for Spark 4.1.
## How are these changes tested?
- Local: built `-Pspark-4.2` end-to-end with JDK 17.
- Local: ran `CometTPCDSV1_4_PlanStabilitySuite` (194 tests) and
`CometTPCDSV2_7_PlanStabilitySuite` (64 tests) against `-Pspark-4.2` with
`SPARK_GENERATE_GOLDEN_FILES` unset; both pass with 0 failures.
- CI: this PR will exercise the new Linux and macOS matrix entries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]