adriangb commented on code in PR #16014:
URL: https://github.com/apache/datafusion/pull/16014#discussion_r2085377284
##########
datafusion/physical-optimizer/src/pruning.rs:
##########
@@ -995,6 +996,184 @@ fn build_statistics_record_batch<S: PruningStatistics>(
})
}
+/// Prune a set of containers represented by their statistics.
Review Comment:
> Pruning on statistics during plan time would potentially be redundant with
also trying to prune again during opening, but it would reduce the files
earlier int he plan
Yeah I don't think it's redundant: you either prune or you don't. If we
prune earlier the files don't make it this far. If we don't we may now be able
to prune them. What's redundant is if there are no changes to the filters (i.e.
no dynamic filters), but that sounds both hard to track and like a possible
future optimization 😄
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]