[ https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-7604: ---------------------------------- Target Version: Impala 3.1.0 > In AggregationNode.computeStats, handle cardinality overflow better > ------------------------------------------------------------------- > > Key: IMPALA-7604 > URL: https://issues.apache.org/jira/browse/IMPALA-7604 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 2.12.0 > Reporter: Paul Rogers > Assignee: Tim Armstrong > Priority: Minor > > Consider the cardinality overflow logic inĀ > [{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java]. > Current code: > {noformat} > // if we ended up with an overflow, the estimate is certain to be wrong > if (cardinality_ < 0) cardinality_ = -1; > {noformat} > This code has a number of issues. > * The check is done after looping over all conjuncts. It could be that, as a > result, the number overflowed twice. The check should be done after each > multiplication. > * Since we know that the number overflowed, a better estimate of the total > count is {{Long.MAX_VALUE}}. > * The code later checks for the -1 value and, if found, uses the cardinality > of the first child. This is a worse estimate than using the max value, since > the first child might have a low cardinality (it could be the later children > that caused the overflow.) > * If we really do expect overflow, then we are dealing with very large > numbers. Being accurate to the row is not needed. Better to use a {{double}} > which can handle the large values. > Since overflow probably seldom occurs, this is not an urgent issue. Though, > if overflow does occur, the query is huge, and having at least some estimate > of the hugeness is better than none. Also, seems that this code probably > evolved; this newbie is looking at it fresh and seeing that the accumulated > fixes could be tidied up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org