[ https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923660#comment-16923660 ]
Peter Ebert commented on IMPALA-8759: ------------------------------------- Yes I am sure that was a thought choice, however even if it's 2x faster with 32 bit vs 64 bit, since it's a single loop at the end of the finalize the cost would be fixed and doubt it would add even a millisecond, more probably nanoseconds. I will say though I don't have much to prove that it is significantly better in estimation. I did some +very+ basic testing (used tpc-ds generated data) and although it was more accurate in my very non-scientific tests it was in the 7th digit and my data set wasn't large enough for that to be significant. Definitely needs more testing but initially it seems like not a large difference. > Use double precision for HLL > ---------------------------- > > Key: IMPALA-8759 > URL: https://issues.apache.org/jira/browse/IMPALA-8759 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 3.2.0 > Reporter: Peter Ebert > Priority: Major > Labels: perf, ramp-up > > For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a > float which is only capable of 6-9 digits of precision. More accurate > estimates for larger cardinalities (beyond 999,999) should be possible with > double precision. Another c++ implementation uses double as well > [https://github.com/dialtr/libcount] -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org