asolimando commented on PR #21654:
URL: https://github.com/apache/datafusion/pull/21654#issuecomment-4267234333

   > Can't seem to get some good performance - closing for now. Though I think 
the idea is sound / should help in certain cases.
   
   @Dandandan I agree that this is sound, it's most probably the current level 
of statistics propagation accuracy that is lacking (~5% according to 
https://github.com/apache/datafusion/pull/20292). 
   
   FYI: there is active work on getting better accuracy 
(https://github.com/apache/datafusion/issues/21120 and 
https://github.com/apache/datafusion/issues/21443 from myself, 
https://github.com/apache/datafusion/issues/8227 in general and 
https://github.com/apache/datafusion/issues/20766 for NDV specifically).
   
   I think we can revisit when we get better accuracy.
   
   Re. https://github.com/apache/datafusion/pull/20292, we spoke offline with 
@gabotechs, statistics aggregation/propagation results seem numerically 
unstable (e.g., different results on different envs, probably due to 
approximations in the computation and different order of execution), we are 
currently trying to figure out a better way to frame this so that we can 
measure accuracy and help getting it better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to