xudong963 commented on code in PR #21815:
URL: https://github.com/apache/datafusion/pull/21815#discussion_r3216860768
##########
datafusion/physical-plan/benches/compute_statistics.rs:
##########
Review Comment:
also worth measuring on a deep `FilterExec` chain queried at `Some(0)`. (The
context is if you ask `compute_statistics(plan, Some(0))` on a deep filter
chain, the framework first walks the *entire* tree computing `None` stats, then
each filter turns around and asks for `Some(0)` stats on demand (which triggers
another cached walk). The shared cache makes the second walk cheap, but for
partition-preserving plans we end up populating both `None` and `Some(p)`
entries for every node
##########
datafusion/physical-plan/src/filter.rs:
##########
@@ -576,9 +577,17 @@ impl ExecutionPlan for FilterExec {
/// The output statistics of a filtering operation can be estimated if the
/// predicate's selectivity value can be determined for the incoming data.
- fn partition_statistics(&self, partition: Option<usize>) ->
Result<Arc<Statistics>> {
- let input_stats =
- Arc::unwrap_or_clone(self.input.partition_statistics(partition)?);
+ fn partition_statistics_with_context(
+ &self,
+ partition: Option<usize>,
+ ctx: &StatisticsContext,
+ ) -> Result<Arc<Statistics>> {
+ let input_stats = match partition {
+ Some(_) => Arc::unwrap_or_clone(
+ ctx.compute_child_statistics(self.input.as_ref(), partition)?,
+ ),
+ None => Arc::unwrap_or_clone(Arc::clone(&ctx.child_stats()[0])),
+ };
Review Comment:
Per-operator boilerplate is repetitive and bug-prone. Almost every
partition-preserving operator now contains:
```rust
let stats = match partition {
Some(_) => ctx.compute_child_statistics(self.input.as_ref(), partition)?,
None => Arc::clone(&ctx.child_stats()[0]),
};
```
This should be a helper on the context: `ctx.child_stats_for(0,
self.input.as_ref(), partition)` or similar. Five identical match blocks across
`FilterExec`, `CoalesceBatchesExec`, `BufferExec`, `CooperativeExec`,
`OutputRequirementExec` is five places to make the same mistake when the
contract evolves.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]