andygrove opened a new pull request, #32:
URL: https://github.com/apache/datafusion-java/pull/32

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   The repository lacks runnable end-to-end examples. Code snippets in the docs 
are easy to drift out of sync with the API: there is no build step that fails 
when a public method is renamed or removed.
   
   Adding an `examples/` Maven module that depends on the library lets the 
reactor compile every example on each build, so they cannot fall behind the 
API. Doing this requires the repo to be a multi-module Maven project; while 
we're there, the parent POM gets shared `dependencyManagement` and plugin 
versions so child modules stay terse.
   
   ## What changes are included in this PR?
   
   **Multi-module restructure:**
   
   - Root `pom.xml` becomes the parent (`datafusion-java-parent`, 
`packaging=pom`) with `dependencyManagement` for arrow, protobuf, junit, and 
the library itself.
   - `core/` is a new directory holding the existing library 
(`datafusion-java`). `src/` moves to `core/src/`.
   - `examples/` is a new module (`datafusion-java-examples`) depending on the 
library via `${project.version}`. It wires `exec-maven-plugin` so each example 
launches with the right `java.library.path` and `--add-opens` flags.
   - `native/`, `proto/`, `Makefile`, and `mvnw` stay at the repo root 
unchanged.
   - Surefire's `java.library.path` now uses 
`${maven.multiModuleProjectDirectory}` so it resolves under the reactor 
regardless of which module Maven is invoked from.
   - `apache-rat-plugin` runs only at the root 
(`<inherited>false</inherited>`); the rat exclude list is unchanged.
   
   **Three runnable examples** under 
`examples/src/main/java/org/apache/datafusion/examples/`:
   
   - `SqlQueryExample` — `registerCsv` + a SQL `GROUP BY` aggregation.
   - `DataFrameExample` — `readCsv` → `filter` / `select` / `withColumnRenamed` 
/ `distinct` → `writeParquet(singleFileOutput)` → `readParquet` round-trip.
   - `ProtoPlanExample` — build a `LogicalPlanNode` directly via the generated 
protobuf classes and execute it through `SessionContext.fromProto`.
   
   Each example creates its own throwaway data in a temp dir and cleans up, so 
no external fixtures (TPC-H, etc.) are required.
   
   **Docs:** `docs/source/contributor-guide/development.md` is updated with the 
new repo layout and a "Running an example" section that documents the `./mvnw 
install -DskipTests` + `exec:exec` flow.
   
   ## Are these changes tested?
   
   - The full JVM test suite (`./mvnw test`) still passes against the relocated 
`core/src/test/` sources — 61 tests run, 0 failures (12 skipped, same skip 
pattern as `main` when TPC-H data is absent).
   - Each example was executed end-to-end via `./mvnw -pl 
:datafusion-java-examples exec:exec` and produces expected output:
     - `SqlQueryExample` prints `HIGH 3 215 / MEDIUM 1 60 / LOW 1 25`.
     - `DataFrameExample` prints a 3-row deduped table and `Round-tripped row 
count: 3`.
     - `ProtoPlanExample` prints `42  7`.
   - `spotless:check` and the reactor build are clean across all three modules.
   
   ## Are there any user-facing changes?
   
   - The published library artifact (`org.apache.datafusion:datafusion-java`) 
is unchanged — same `groupId`, `artifactId`, `version`, and package contents.
   - The repo layout changes: source paths move from `src/main/java/...` to 
`core/src/main/java/...`. IDE projects pointing at the old location need to 
re-import.
   - New `datafusion-java-examples` artifact exists but is marked 
`maven.install.skip=true` / `maven.deploy.skip=true` and is not intended for 
distribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to