asolimando commented on PR #21654: URL: https://github.com/apache/datafusion/pull/21654#issuecomment-4267234333
> Can't seem to get some good performance - closing for now. Though I think the idea is sound / should help in certain cases. @Dandandan I agree that this is sound, it's most probably the current level of statistics propagation accuracy that is lacking (~5% according to https://github.com/apache/datafusion/pull/20292). FYI: there is active work on getting better accuracy (https://github.com/apache/datafusion/issues/21120 and https://github.com/apache/datafusion/issues/21443 from myself, https://github.com/apache/datafusion/issues/8227 in general and https://github.com/apache/datafusion/issues/20766 for NDV specifically). I think we can revisit when we get better accuracy. Re. https://github.com/apache/datafusion/pull/20292, we spoke offline with @gabotechs, statistics aggregation/propagation results seem numerically unstable (e.g., different results on different envs, probably due to approximations in the computation and different order of execution), we are currently trying to figure out a better way to frame this so that we can measure accuracy and help getting it better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
