andygrove opened a new issue, #8:
URL: https://github.com/apache/datafusion-java/issues/8
Follow-up to #4.
#4 generates Java classes for the `datafusion-proto` schema. This issue is
the end-to-end wiring that makes those classes useful — Java code constructs a
`LogicalPlanNode`, hands it to `SessionContext`, and gets back a `DataFrame`
that streams Arrow batches the same way `ctx.sql(...)` does today.
## Sketch
Three pieces sit between Java-built protobuf and execution:
1. **Java surface.** A new method on `SessionContext`:
```java
public DataFrame fromProto(byte[] planBytes);
```
No other Java work is needed — the protobuf builders generated by #4 are
the public construction API.
2. **JNI bridge.** A single new native method that takes the byte array and
returns a `DataFrame` pointer, modeled on the existing `sql()` JNI shim. Arrow
FFI stays on the result path; the plan crosses JNI as a primitive `byte[]`.
3. **Rust deserialization + execution.** Add `datafusion-proto = \"53\"` to
`native/Cargo.toml`, then in the JNI implementation:
```rust
let node =
datafusion_proto::protobuf::LogicalPlanNode::decode(&bytes[..])?;
let plan = node.try_into_logical_plan(&ctx.state(),
&DefaultLogicalExtensionCodec {})?;
let df = runtime.block_on(ctx.execute_logical_plan(plan))?;
```
Wrap the resulting `DataFrame` the same way `sql()` does and return its
pointer.
## Open design questions for this issue
- **Logical vs. physical plan first.** Recommend logical so DataFusion's own
optimizer runs; physical is a follow-up.
- **Schema discovery.** A JVM caller building a `TableScanNode` needs the
schema of registered tables. Likely a new JNI shim like
`SessionContext.tableSchema(name) ->
org.apache.arrow.vector.types.pojo.Schema`. Without this, the API is not usable
for anything but trivial plans.
- **Extension codec.** `DefaultLogicalExtensionCodec` is fine for now; a
real codec arrives with Java-defined UDFs (the fourth roadmap item).
- **Round-trip tests.** Build a plan in Java, execute it, and compare
results against the same query run via `ctx.sql(...)` over the TPC-H
integration data.
## Out of scope here
- Java-defined UDFs and their custom extension codec.
- Physical plan submission.
- Any builder helpers / fluent API on top of the raw protobuf builders.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]