xiangfu0 opened a new pull request, #18261: URL: https://github.com/apache/pinot/pull/18261
## Summary - New top-level `pinot-spark-4/` umbrella module (connector + batch-ingestion), gated to JDK 21+ via a root-pom profile with `<jdk>[21,)</jdk>` activation. Spark 3 modules stay as-is and continue to build on JDK 11/17/21. - Migrates `PinotWriteBuilder` from Spark 3's `SupportsOverwrite`/`Filter` to Spark 4's `SupportsOverwriteV2`/`Predicate`; all other source files are verbatim ports from the Spark 3 modules with the package rename `spark3 → spark4` / `v3 → v4`. - Adds a JDK-21-gated end-to-end integration test (`SparkSegmentMetadataPushIntegrationTest4`) in `pinot-integration-tests` — it lives under `src/test/java-spark4/` and is wired up under a new `pinot-spark-4-integration-tests` profile that pins `spark-core`/`spark-sql`/`spark-launcher` to Spark 4 at test scope (needed because the plugin declares Spark as `provided` and the root pom's dependencyManagement otherwise pins those coordinates to Spark 3). - Adds a `PinotDataSourceRegistrationTest` to both the Spark 3 and Spark 4 connectors — verifies the `DataSourceRegister` ServiceLoader wiring so `spark.read.format(\"pinot\")` keeps resolving after any future META-INF/services touch-up. - Updates `pinot-distribution/pinot-assembly.xml` to include the Spark 4 plugin jar via a `<fileSet>` that silently no-ops on JDK 11 builds (where the jar was never built) — single descriptor, no forked variants. - Adds `pinot-spark-4` to the `dependency-verifier` `skipModules` list, matching how the other plugin trees opt out. ## Test plan - [x] `./mvnw -pl pinot-spark-4/pinot-batch-ingestion-spark-4,pinot-spark-4/pinot-spark-4-connector -am test` on JDK 21 — 3 junit cases + 25 scalatest cases, all green. - [x] `./mvnw -pl pinot-integration-tests -am test-compile` on JDK 21 — the JDK-21 profile activates, `SparkSegmentMetadataPushIntegrationTest4.class` is produced. - [x] `./mvnw -pl pinot-connectors/pinot-spark-3-connector test` on JDK 21 — Spark 3 tests still pass (25/25), including the newly added `PinotDataSourceRegistrationTest`. - [x] `./mvnw spotless:apply checkstyle:check license:check` — clean on all modified modules. - [ ] CI needs to verify `bin-dist` end-to-end on both JDK 11 and JDK 21 (the new `<fileSet>` should be a no-op on JDK 11 and include the shaded jar on JDK 21). - [ ] CI needs to verify the Spark 4 integration test actually runs under Spark 4 at runtime (the profile's dependencyManagement override is the load-bearing bit here). ## Notes for reviewers - Scala 2.13 only for Spark 4 (Apache Spark 4 does not publish 2.12 artifacts); the umbrella pom has an enforcer rule that fails fast with a clear message if someone invokes `-pl pinot-spark-4` under `-Pscala-2.12`. - The `pinot-spark-4-connector` pom enumerates the full `--add-opens` list Spark 4 needs on JDK 17+ (via the scalatest-maven-plugin argLine). - The `pinot-spark-4/pinot-batch-ingestion-spark-4/pom.xml` pulls in `jakarta.servlet-api:5.0.0` at test scope because Spark 4 uses Jakarta EE 9 namespaces and the root pom otherwise pins javax-namespace 4.0.4 for the rest of the build. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
