LantaoJin opened a new issue, #75: URL: https://github.com/apache/datafusion-java/issues/75
### Is your feature request related to a problem or challenge? `SessionContext.fromProto(byte[])` accepts only DataFusion's *own* `LogicalPlanNode` proto. [Substrait](https://substrait.io/) — the cross-engine logical-plan standard that DataFusion already supports through the [`datafusion-substrait`](https://crates.io/crates/datafusion-substrait) crate — has no Java-side entry point. ### Describe the solution you'd like Add a single new entry point on `SessionContext`: ```java public DataFrame fromSubstrait(byte[] planBytes); ``` `planBytes` is a serialized `substrait.proto.Plan` message. The implementation deserialises with `prost`, hands the resulting `Plan` to `datafusion_substrait::logical_plan::consumer::from_substrait_plan(state, &plan).await`, and wraps the resulting `LogicalPlan` in a `DataFrame` exactly the way `fromProto` does for DataFusion's native plan format. From there callers compose the usual `DataFrame` API: ```java byte[] plan = compileWithCalcite(...); // any Substrait-emitting compiler try (DataFrame df = ctx.fromSubstrait(plan); ArrowReader reader = df.executeStream(allocator)) { while (reader.loadNextBatch()) { /* ... */ } } ``` To avoid bloating builds for users who don't need it, the `datafusion-substrait` Cargo dependency lives behind a `substrait` Cargo feature on the `datafusion-jni` crate. The feature is on by default — Maven users get Substrait support out of the box — but downstream Rust embedders who strip features can opt out, and the JNI handler returns a clear error if invoked in a build that compiled the feature off. ### Describe alternatives you've considered _No response_ ### Additional context - DataFusion 53.1 pulls `substrait = 0.62.2` (proto schema version). The Rust API is stable for the consumer side: `from_substrait_plan(state: &SessionState, plan: &Plan) -> Result<LogicalPlan>` plus `serializer::deserialize_bytes(Vec<u8>) -> Result<Box<Plan>>` for the wire-format decode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
