mailmindlin opened a new pull request, #22157:
URL: https://github.com/apache/datafusion/pull/22157

   ## Which issue does this PR close?
   
   - Closes #22152 
   
   ## Rationale for this change
   
   `ExecutionPlan::partition_statistics` and `TableProvider::statistics` are 
not currently transported across the DataFusion FFI boundary, so foreign plans 
and providers always report `Statistics::new_unknown` / `None`. This blocks 
optimizer rules that depend on statistics (e.g. join reordering, partition 
pruning) from working with out-of-process plugins, which defeats the point of 
exposing those hooks to plugin authors.
   
   `Statistics` contains `Precision<ScalarValue>` for column min/max/sum. 
`ScalarValue` is a large enum that's impractical to mirror in `#[repr(C)]`, so 
I reuse the existing `datafusion_proto_common::Statistics` prost encoding — the 
same pattern this crate already uses for filter expressions.
   
   ## What changes are included in this PR?
   
   - New `datafusion_ffi::statistics` module with `[de]serialize_statistics` 
helpers wrapping the`datafusion_proto_common::Statistics` round-trip.
   - New `partition_statistics` field on `FFI_ExecutionPlan` and corresponding 
`ExecutionPlan::partition_statistics` impl on `ForeignExecutionPlan`
   - New `statistics` field on `FFI_TableProvider` and corresponding 
`TableProvider::statistics` impl on `ForeignTableProvider`. Since the trait 
returns `Option<Statistics>`, the implementation cannot propagate decode 
errors, it logs a `log::warn!` and triggers a `debug_assert!`.
   
   This PR is expected to be merged after #22136 so it includes those changes.
   
   ## Are these changes tested?
   
   Yes:
   
   - Unit tests in `statistics.rs` cover three round-trip cases: 
`Statistics::new_unknown`, fully-exact statistics with 
`ScalarValue::Int32`/`Int64`/`Utf8` min/max/sum, and mixed 
`Precision::Exact`/`Inexact`/`Absent` values.
   - A new round-trip integration test in `execution_plan.rs` exercises 
`ForeignExecutionPlan::partition_statistics` with both `None` and `Some(idx)` 
partitions, against a plan with no statistics (returns 
`Statistics::new_unknown`) and a plan with concrete statistics.
   - A new round-trip integration test in `table_provider.rs` uses a thin 
`TableWithStats` wrapper over `MemTable` to verify both the `None` path and the 
concrete `Statistics` path through `ForeignTableProvider::statistics`.
   
   ## Are there any user-facing changes?
   
   This is a breaking ABI change for the `datafusion-ffi` crate:
   
   - `FFI_ExecutionPlan` gains a `partition_statistics` field.
   - `FFI_TableProvider` gains a `statistics` field.
   
   Plugins compiled against earlier versions of `datafusion-ffi` will need to 
be recompiled. There are no breaking changes to the Rust trait surface or to 
`Statistics` itself; downstream `ExecutionPlan` / `TableProvider` 
implementations require no changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to