adriangb commented on PR #17452: URL: https://github.com/apache/datafusion/pull/17452#issuecomment-3271833195
> @adriangb I think there is opportunity to simplify the bounds collection for each partition. That is, we can probably just track the min/max across all partitions and build a single `AND` binary expr once we have the final min/max (i.e. all partition bounds have been reported). > > Aside from one less mutex, I think it'll help reduce output in `EXPLAIN` as well. Happy to tackle in a follow-up PR I think that will regress performance: imagine partition 1 has bounds (0, 1) and partition 2 has bounds (999998, 999999). With bounds per partition the value 1234 is filtered out. The merged bounds of (0, 999999) would include that value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
