andygrove opened a new pull request, #13: URL: https://github.com/apache/datafusion-java/pull/13
Stacked on #9. Closes #8. > Note: This PR is stacked on #9 and includes its commits in the diff. The 3 new commits unique to this PR live at the tip of the branch (the eight commits since `proto-build`). Once #9 merges, this PR's diff will narrow automatically. Adds the wiring to make the generated `datafusion.LogicalPlanNode` classes from #9 executable. A JVM caller constructs a `LogicalPlanNode` with the generated builders, hands its serialized bytes to `SessionContext.fromProto(byte[])`, and gets back a `DataFrame` that streams Arrow batches via the existing `DataFrame.collect()` path. To make plans that reference parquet files practical, this PR also ships `SessionContext.tableSchema(String)` (Arrow schema of a registered table, transferred via Arrow IPC) and `org.apache.datafusion.proto.SchemaConverter` (Arrow Schema ↔ `datafusion_common.Schema` proto), so the caller can populate `ListingTableScanNode.schema` without hand-coding it. ## What's in this PR (on top of #9) - `native/Cargo.toml`: `datafusion-proto = \"53.1.0\"`, `prost = \"0.14\"` (the version `datafusion-proto 53.1.0` requires). - `native/src/proto.rs`: two JNI methods — `createDataFrameFromProto` (decode + `try_into_logical_plan` + `execute_logical_plan`) and `tableSchemaIpc` (writes the schema via `arrow::ipc::StreamWriter` and returns the bytes). - `SessionContext.fromProto(byte[])` and `SessionContext.tableSchema(String)`. - `SchemaConverter` — pure Java, supports Bool / signed+unsigned Int 8..64 / Float32/64 / Utf8 / Utf8View / LargeUtf8 / Date32 / Decimal128 plus field/schema metadata; anything else raises `UnsupportedOperationException` with a message naming the type. - Tests: `SchemaConverterTest` (3 tests, no DataFusion), `SessionContextProtoTest` (smoke test with `Projection(literal 1) over EmptyRelation`, `tableSchema` against lineitem, integration test that builds a `ListingTableScanNode` and compares its output to identical SQL). ## Not in this PR - Physical-plan submission (`PhysicalPlanNode`). - Custom `LogicalExtensionCodec` for JVM-defined UDFs. - JVM-side fluent plan builder. - Nested + temporal type coverage in `SchemaConverter` (raises a clear exception until extended). ## Design note `datafusion-proto`'s plan deserializer is portable: it reconstructs a fresh `TableProvider` (here, `ListingTable`) from the proto's `paths` + format + schema. It does NOT look up `tableName` against the SessionContext's registered tables — that field is purely a label for query-plan display. The integration test happens to call `registerParquet` first only so `tableSchema(\"lineitem\")` can fetch the schema; the proto plan itself would also execute on a fresh context that never registered `lineitem`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
