Paul Rogers created IMPALA-7602: ----------------------------------- Summary: Definition of NDV differs between planner and stats mechanism Key: IMPALA-7602 URL: https://issues.apache.org/jira/browse/IMPALA-7602 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Paul Rogers
See IMPALA-7310 which says that the Impala NDV function is implemented as "number of non-null distinct values." IMPALA-7310 also says that the stats gathering mechanism uses the same definition. Down in the comments, we point to [{{ExprNdvTest}}|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java] which shows that, in the planner itself, when working with constant expressions, NULL is considered a distinct value. In the case described in IMPALA-7310, this means that a column of only nulls has an NDV=0 if stats are used, NDV=1 if constants are used. This is a minor point, but would be good to use a single definition everywhere. That way, if we use the "number of non-null distinct values" rule, the "adjusted NDV" is always one more than the "raw" NDV. As it is now, we can't be sure when to add the null adjustment because we don't know if it is already included. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org