[GitHub] [arrow-datafusion] comphead commented on issue #5276: Update benchmmarks on clickbench

via GitHub Thu, 16 Feb 2023 12:01:21 -0800


comphead commented on issue #5276:
URL: 
https://github.com/apache/arrow-datafusion/issues/5276#issuecomment-1433639200


   @alamb I'm running this on my machine now, not sure if its DISTINCT, once I 
have more details I'll file a ticket. 
   
   But anyway `COUNT(DISTINCT X)` is proven to be slow in most environments. If 
data is sorted by X you can jump between distinct values without scanning it, 
kinda skip scan. Otherwise people uses approximate distinct functions which are 
faster but has some inaccuracy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] comphead commented on issue #5276: Update benchmmarks on clickbench

Reply via email to