andygrove opened a new issue, #8:
URL: https://github.com/apache/datafusion-java/issues/8

   Follow-up to #4.
   
   #4 generates Java classes for the `datafusion-proto` schema. This issue is 
the end-to-end wiring that makes those classes useful — Java code constructs a 
`LogicalPlanNode`, hands it to `SessionContext`, and gets back a `DataFrame` 
that streams Arrow batches the same way `ctx.sql(...)` does today.
   
   ## Sketch
   
   Three pieces sit between Java-built protobuf and execution:
   
   1. **Java surface.** A new method on `SessionContext`:
      ```java
      public DataFrame fromProto(byte[] planBytes);
      ```
      No other Java work is needed — the protobuf builders generated by #4 are 
the public construction API.
   
   2. **JNI bridge.** A single new native method that takes the byte array and 
returns a `DataFrame` pointer, modeled on the existing `sql()` JNI shim. Arrow 
FFI stays on the result path; the plan crosses JNI as a primitive `byte[]`.
   
   3. **Rust deserialization + execution.** Add `datafusion-proto = \"53\"` to 
`native/Cargo.toml`, then in the JNI implementation:
      ```rust
      let node = 
datafusion_proto::protobuf::LogicalPlanNode::decode(&bytes[..])?;
      let plan = node.try_into_logical_plan(&ctx.state(), 
&DefaultLogicalExtensionCodec {})?;
      let df = runtime.block_on(ctx.execute_logical_plan(plan))?;
      ```
      Wrap the resulting `DataFrame` the same way `sql()` does and return its 
pointer.
   
   ## Open design questions for this issue
   
   - **Logical vs. physical plan first.** Recommend logical so DataFusion's own 
optimizer runs; physical is a follow-up.
   - **Schema discovery.** A JVM caller building a `TableScanNode` needs the 
schema of registered tables. Likely a new JNI shim like 
`SessionContext.tableSchema(name) -> 
org.apache.arrow.vector.types.pojo.Schema`. Without this, the API is not usable 
for anything but trivial plans.
   - **Extension codec.** `DefaultLogicalExtensionCodec` is fine for now; a 
real codec arrives with Java-defined UDFs (the fourth roadmap item).
   - **Round-trip tests.** Build a plan in Java, execute it, and compare 
results against the same query run via `ctx.sql(...)` over the TPC-H 
integration data.
   
   ## Out of scope here
   
   - Java-defined UDFs and their custom extension codec.
   - Physical plan submission.
   - Any builder helpers / fluent API on top of the raw protobuf builders.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to