andygrove opened a new issue, #45:
URL: https://github.com/apache/datafusion-java/issues/45

   ### Is your feature request related to a problem or challenge?
   
   DataFrame introspection is currently limited to `count()`, `show()`,
   and `show(int)`. Users wanting to inspect schema, see the planned
   query, or materialize an intermediate result have no Java entry point.
   
   ### Describe the solution you'd like
   
   - **`DataFrame.schema()`** — return an Arrow `Schema`. Reuse the IPC
     round-trip already established for `SessionContext.tableSchema`
     (`tableSchemaIpc` pattern in `SessionContext.java`). Non-consuming;
     the DataFrame remains usable.
   - **`DataFrame.explain(boolean verbose, boolean analyze)`** — wraps
     `DataFusion::DataFrame::explain`, returning a `DataFrame` whose rows
     are the plan-explanation strings. Caller calls `show()` / `collect()`
     on the result. Matches DataFusion's own semantics.
   - **`DataFrame.cache()`** — materializes the plan into an in-memory
     table and returns a new DataFrame. Async on the Rust side; blocks on
     the Tokio runtime, same pattern as `collect`. Caller-closes the
     returned DataFrame.
   - **`DataFrame.describe()`** — async, returns a DataFrame with summary
     stats (count, mean, stddev, min, max) per numeric column. Same
     pattern as `cache`.
   
   ### Describe alternatives you've considered
   
   For `schema`: `ctx.tableSchema(name)` works only for *registered*
   tables. A user who built a DataFrame via `sql("SELECT …")` or chained
   transformations has no schema accessor.
   
   For `explain`: `ctx.sql("EXPLAIN <query>")` works but only against a
   SQL string.
   
   ### Additional context
   
   All four are non-consuming except `cache` and `describe` (which return
   new DataFrames the caller owns and closes). The schema/explain pair are
   the most-requested and could land first as a smaller PR; `cache` and
   `describe` are independent and can follow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to