avantgardnerio opened a new pull request, #23090:
URL: https://github.com/apache/datafusion/pull/23090

   ## Which issue does this PR close?
   
   Implements the proposal in #23089. (Not using `Closes #23089` so the 
discussion thread can stay open for the broader API conversation if reviewers 
want.)
   
   ## Rationale for this change
   
   See #23089 for the full design rationale, including the dual-semantics 
motivation, alternatives considered, and coexistence with 
`Partitioning::Range`. Short version: range-aware operators (parallel window 
functions, future dynamic-range repartitioning, range-elimination 
optimizations) want to ask an `ExecutionPlan` for the lex-min / lex-max of a 
partition's output along its declared ordering. Today there's no way to ask.
   
   ## What changes are included in this PR?
   
   Pure addition; zero behavior change in any path that doesn't call the new 
method.
   
   - `ExecutionPlan::runtime_partition_extrema(&self, partition) -> 
Result<Option<PartitionExtrema>>` — default `Ok(None)`.
   - `PartitionExtrema { kind, min, max, row_count }` and `enum ExtremaKind { 
Observed, Expanded }`. `Observed` (the only kind any operator in this PR 
returns) means the reported range literally bounds the partition's data. 
`Expanded` is reserved for future operators that deliberately route rows 
outside the reported range as a "halo" for a downstream filter to strip. The 
dual semantics live on the enum so passthroughs that don't care don't have to 
`match`.
   - `SortExec` override: a per-partition slot is populated inside the sort 
code path (each `sort_batch_chunked` call folds first/last sorted rows into the 
slot, zero-copy via `RecordBatch::slice`). Once execution has consumed the 
input, the slot holds the lex-min / lex-max along the declared ordering.
   - `BoundedWindowAggExec` override: passthrough. BWAG extends its input's 
equivalence properties and appends new window-result columns on the right of 
the schema, so the leading sort exprs remain stable in the output along the 
same column indices.
   
   Skipped: `CoalesceBatchesExec` (deprecated since 52.0.0 — coalescing is now 
folded into other operators' streams via arrow-rs `BatchCoalescer`, no 
dedicated plan node to override). `ProjectionExec` (conditional passthrough, 
follow-up). `SortPreservingMergeExec` (N→1 reducer, not a passthrough — needs 
min-of-mins / max-of-maxs, follow-up).
   
   ## Are these changes tested?
   
   7 unit tests in `datafusion/physical-plan/src/sorts/sort.rs::tests`:
   
   - `test_runtime_partition_extrema_before_execute_is_none` — caller contract: 
reading without a poll returns `Ok(None)`.
   - `test_runtime_partition_extrema_after_full_sort` — two batches, in-memory 
merge path; extrema match expected lex-min / lex-max with `kind = Observed`.
   - `test_runtime_partition_extrema_descending_swaps_min_max` — DESC sort: 
`min` is the largest value, `max` is the smallest.
   - `test_runtime_partition_extrema_per_partition` — two input partitions with 
`preserve_partitioning=true`: each output partition's extrema track its own 
range.
   - `test_runtime_partition_extrema_default_is_none` — default trait impl 
returns `Ok(None)` on a non-overriding operator (`EmptyExec`).
   - Plus two more under earlier commits, covering the chunk-fold path.
   
   `cargo clippy --all-features --all-targets -- -D warnings --no-deps` clean. 
`cargo fmt --all` clean. `cargo test -p datafusion-physical-plan --lib`: 1500 
pass.
   
   ## Are there any user-facing changes?
   
   - New public types in `datafusion::physical_plan`: `PartitionExtrema`, 
`ExtremaKind`.
   - New trait method `ExecutionPlan::runtime_partition_extrema` with a default 
`Ok(None)`. Existing custom `ExecutionPlan` implementations are not required to 
change.
   - No SQL surface changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to