etiennepelissier opened a new pull request, #21112: URL: https://github.com/apache/datafusion/pull/21112
## Which issue does this PR close? Closes #8698 ## Rationale for this change Substrait's `RelCommon.hint.stats.row_count` carries row count statistics as an advisory hint. DataFusion was not reading or writing this field, meaning statistics were silently dropped when round-tripping logical plans through Substrait. This is useful for downstream optimizer rules that rely on row count estimates. ## What changes are included in this PR? **Producer** (`producer/rel/read_rel.rs`): when serializing a `TableScan`, attempt to downcast the `TableSource` to a `TableProvider` and read its `statistics()`. If `num_rows` is `Exact(n)` or `Inexact(n)`, populate `RelCommon.hint.stats.row_count` with `n as f64`. **Consumer** (`consumer/rel/read_rel.rs`): extract `row_count` from `RelCommon.hint.stats` on any `ReadRel`. When the resolved `TableProvider` has no statistics of its own, wrap it with a new private `StatisticsOverrideTableProvider` that returns the Substrait hint as `Precision::Inexact(n)`, making it available to DataFusion's optimizer and physical planning. Local provider statistics always take precedence over the hint. ## Are these changes tested? Two new integration tests in `roundtrip_logical_plan.rs`: - `producer_sets_row_count_hint`: registers a `TableWithStatistics` (exact row count = 100), converts the plan to Substrait, and asserts `ReadRel.common.hint.stats.row_count == 100.0`. - `consumer_injects_row_count_hint`: produces a Substrait plan from a provider with row count 42, consumes it against a `MemTable` (no statistics), and asserts the resulting provider exposes `Precision::Inexact(42)`. ## Are there any user-facing changes? No breaking API changes. The behavior is additive: Substrait plans produced by DataFusion now carry row count hints, and plans consumed by DataFusion now surface those hints through `TableProvider::statistics()` when no local statistics are present. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
