andygrove opened a new issue, #42:
URL: https://github.com/apache/datafusion-java/issues/42
### Is your feature request related to a problem or challenge?
Two ordering / layout primitives are missing from the DataFrame API:
- `sort` — no way to order a DataFrame today without dropping to SQL.
- `repartition` — no way to control parallelism / partitioning of a
DataFrame.
### Describe the solution you'd like
**`sort`**. Two ergonomics options worth considering:
1. SQL-string flavour matching `filter(String)` / proposed
`withColumn`: `df.sort("a ASC, b DESC NULLS FIRST")`. Parsed via
`parse_sql_expr` plus an `ORDER BY` shim, or via the SQL parser's
`parse_order_by`. Cheapest to implement; no Java-side model.
2. Typed: a `SortExpr` Java record (column, ascending, nullsFirst) and
`df.sort(SortExpr... exprs)`. Discoverable, IDE-friendly.
Suggest starting with (1) for consistency with `filter`, then layering
(2) on top if/when an `Expr` builder lands for joins.
**`repartition`**. DataFusion's `Partitioning` enum has three variants:
- `RoundRobinBatch(usize)` — `df.repartitionRoundRobin(n)`
- `Hash(Vec<Expr>, usize)` — `df.repartitionHash(int n, String... columns)`
(column-name flavour to start; expression variant later)
- `UnknownPartitioning(usize)` — not user-facing.
Tests in `DataFrameTransformationsTest` (sort round-trip, partition
count assertion via `collect_partitioned` once that's wired, or via
plan inspection).
### Describe alternatives you've considered
`ORDER BY` / `DISTRIBUTE BY` via SQL. Works but loses the lazy
DataFrame composition.
### Additional context
Each carries a small Java-side design choice (sort-expression shape,
partitioning constructor shape); fine to land them as two separate PRs
under this issue if that's cleaner than one batched PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]