alamb commented on PR #19609: URL: https://github.com/apache/datafusion/pull/19609#issuecomment-3729712558
> > If we do want to proceed, I think the first thing we should do is figure out how we will eventually unify the existing APIs that have overlap (specifically `PruningPredicate` and `PhysicalExpr::evalute_bounds`, possibly the propagate statistics code too) > > Regarding `PruningPredicate`, I think its major API can eventually remain the same, while its implementation should be replaced with this statistics-propagation-based pruning. Some minor API changes are inevitable. Yes this sounds like a great plan I still really feel that we can unify these APIs somehow. Starting with `vectorized_evaluate_bounds` as sugggested by @ozankabak seems like a good step in that direction Also I shoudl be clear my concern isn't just the multiple implementations are harder to maintain, it is also that we already have significant code and test coverage for single row range analysis -- so replicating it again in vectorized fashion entirely separately won't leverage the past experience, and may result in different behaviors between the two paths -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
