andygrove opened a new issue, #43: URL: https://github.com/apache/datafusion-java/issues/43
### Is your feature request related to a problem or challenge? DataFusion's DataFrame API offers eight set-operation methods — union, intersect, except, and their `*_by_name` / `*_distinct` variants — and none of them are reachable from Java today. ### Describe the solution you'd like Expose the following on `DataFrame`, each taking another `DataFrame`: - `union(DataFrame other)` — by-position, keeps duplicates - `unionDistinct(DataFrame other)` — by-position, deduplicated - `unionByName(DataFrame other)` — by-name, keeps duplicates - `unionByNameDistinct(DataFrame other)` — by-name, deduplicated - `intersect(DataFrame other)` — `INTERSECT ALL` - `intersectDistinct(DataFrame other)` — `INTERSECT` - `except(DataFrame other)` — `EXCEPT ALL` - `exceptDistinct(DataFrame other)` — `EXCEPT` Lifecycle question worth deciding up front: do these consume the right-hand DataFrame? DataFusion's Rust API takes `dataframe: DataFrame` (owned), so the Java side will need to either consume `other`'s native handle (and forbid further use, like `collect`) or clone the underlying `LogicalPlan` on the native side. Suggest cloning — simpler caller contract, and `LogicalPlan` clone is cheap. Tests in `DataFrameTransformationsTest` covering each variant against small fixtures. ### Describe alternatives you've considered `UNION` / `INTERSECT` / `EXCEPT` via SQL. Works but requires registering both sides as tables. ### Additional context All eight share one JNI entry point per operation kind plus a boolean flag (by-name, distinct). Could plausibly land as one PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
