[ https://issues.apache.org/jira/browse/IMPALA-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned IMPALA-7944: ----------------------------------- Assignee: (was: Paul Rogers) > count(*) correctly has NDV=1 via being labeled as constant > ---------------------------------------------------------- > > Key: IMPALA-7944 > URL: https://issues.apache.org/jira/browse/IMPALA-7944 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.0 > Reporter: Paul Rogers > Priority: Minor > > The {{count\(*)}} function has an NDV of 1: the function always returns a > single value. > Presently, {{count\(*)}} has an NDV of 1 via a broken process, but > {{sum(id)}} incorrectly has the NDV of {{id}}, but should be 1. > This is important because it tells us that the query: > {code:sql} > SELECT COUNT(*) FROM foo > {code} > Returns just one row. All good. > In the analyzer, we set a value of NDV=1 via an incorrect process: by > labeling {{count\(*)}} as constant: > * For historical reasons, NDV calculations occur before a node is analyzed. > * We use the default NDV calc: if the node is constant, set NDV = 1, else > compute it. > * Since the function node for {{count\(*)}} is not analyzed, we determine > constant-ness from an inspection. > * All checks for non-constantness fail, leaving the final check: a function > is constant if either a) it has no arguments, or b) all its arguments are > constant. > * Since {{count\(*)}} has no expression arguments, and is not marked as > non-deterministic, we infer it must be costant. > * Therefore, it's NDV is set to 1. > This, of course, highly unstable for multiple reasons: > * NDV calculations are done before the node is analyzed. This means, NDV > calculations for a {{SlotRef}} would fail because the ref has not yet been > resolved to a column. (The {{SlotRef}} has special code to work around this > fact.) > * The "treat zero-argument functions as constants and so use NDV=1" rule > works for {{count\(*)}}, but not for {{count(c)}}, nor or {{sum(c)}}, both of > which should have NDV=1. > * {{count\(*)}} is not really a constant; its NDV=1 setting should not > really on (benignly) assuming it is. > * The NDV check const-ness is temporary; once the node is analyzed, it is > correctly marked as non-const. So, the calcs rely on one path saying the the > function is const, another path saying it is not const. > The current NDV for a {{sum(id)}} function is the NDV of {{id}}, which is > 7300 in this particular query. The NDV of {{sum(id)}} should be 1. > This should be cleaned up to provide a more reliable, understandable way of > achieving the goal of NDV=1. > As it turns out, this seemed to have been a known issue in the code: > {code:java} > // TODO: we can't correctly determine const-ness before analyzing 'fn_'. > We should > // rework logic so that we do not call this function on unanalyzed exprs. > > // Aggregate functions are never constant. > > {code} > This defect affects memory estimates. With the correct NDV estimates, several > {{PlannerTest}} cases change to include a lower memory reservation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org