andygrove opened a new pull request, #32:
URL: https://github.com/apache/datafusion-java/pull/32
## Which issue does this PR close?
- Closes #.
## Rationale for this change
The repository lacks runnable end-to-end examples. Code snippets in the docs
are easy to drift out of sync with the API: there is no build step that fails
when a public method is renamed or removed.
Adding an `examples/` Maven module that depends on the library lets the
reactor compile every example on each build, so they cannot fall behind the
API. Doing this requires the repo to be a multi-module Maven project; while
we're there, the parent POM gets shared `dependencyManagement` and plugin
versions so child modules stay terse.
## What changes are included in this PR?
**Multi-module restructure:**
- Root `pom.xml` becomes the parent (`datafusion-java-parent`,
`packaging=pom`) with `dependencyManagement` for arrow, protobuf, junit, and
the library itself.
- `core/` is a new directory holding the existing library
(`datafusion-java`). `src/` moves to `core/src/`.
- `examples/` is a new module (`datafusion-java-examples`) depending on the
library via `${project.version}`. It wires `exec-maven-plugin` so each example
launches with the right `java.library.path` and `--add-opens` flags.
- `native/`, `proto/`, `Makefile`, and `mvnw` stay at the repo root
unchanged.
- Surefire's `java.library.path` now uses
`${maven.multiModuleProjectDirectory}` so it resolves under the reactor
regardless of which module Maven is invoked from.
- `apache-rat-plugin` runs only at the root
(`<inherited>false</inherited>`); the rat exclude list is unchanged.
**Three runnable examples** under
`examples/src/main/java/org/apache/datafusion/examples/`:
- `SqlQueryExample` — `registerCsv` + a SQL `GROUP BY` aggregation.
- `DataFrameExample` — `readCsv` → `filter` / `select` / `withColumnRenamed`
/ `distinct` → `writeParquet(singleFileOutput)` → `readParquet` round-trip.
- `ProtoPlanExample` — build a `LogicalPlanNode` directly via the generated
protobuf classes and execute it through `SessionContext.fromProto`.
Each example creates its own throwaway data in a temp dir and cleans up, so
no external fixtures (TPC-H, etc.) are required.
**Docs:** `docs/source/contributor-guide/development.md` is updated with the
new repo layout and a "Running an example" section that documents the `./mvnw
install -DskipTests` + `exec:exec` flow.
## Are these changes tested?
- The full JVM test suite (`./mvnw test`) still passes against the relocated
`core/src/test/` sources — 61 tests run, 0 failures (12 skipped, same skip
pattern as `main` when TPC-H data is absent).
- Each example was executed end-to-end via `./mvnw -pl
:datafusion-java-examples exec:exec` and produces expected output:
- `SqlQueryExample` prints `HIGH 3 215 / MEDIUM 1 60 / LOW 1 25`.
- `DataFrameExample` prints a 3-row deduped table and `Round-tripped row
count: 3`.
- `ProtoPlanExample` prints `42 7`.
- `spotless:check` and the reactor build are clean across all three modules.
## Are there any user-facing changes?
- The published library artifact (`org.apache.datafusion:datafusion-java`)
is unchanged — same `groupId`, `artifactId`, `version`, and package contents.
- The repo layout changes: source paths move from `src/main/java/...` to
`core/src/main/java/...`. IDE projects pointing at the old location need to
re-import.
- New `datafusion-java-examples` artifact exists but is marked
`maven.install.skip=true` / `maven.deploy.skip=true` and is not intended for
distribution.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]