Paul Rogers created IMPALA-7603: ----------------------------------- Summary: Incorrect NDV expression for col1 op col2 Key: IMPALA-7603 URL: https://issues.apache.org/jira/browse/IMPALA-7603 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Paul Rogers
Consider theĀ [[{{ExprNdvTest}}|] test case. The code contains tests for the CASE expression. Add tests for simple arithmetic expressions: {noformat} verifyNdv("id + 2", 7300); verifyNdv("id * 2", 7300); {noformat} The above suggests that the NDV of a column op const is {noformat} max(NDV(column), NDV(const)) = max(NDV(column), 1) = NDV(column) {noformat} This is good and as expected. Now try two columns: {noformat} verifyNdv("id + int_col", 7300); verifyNdv("id * int_col", 7300); {noformat} This is *not* expected. Though the two columns are from the same table, they are not correlated: there is no reason to believe that the value of "id" determines the value of "int_col" in the general case. (Perhaps the table is the Cartesian product of the two fields.) In this case, the calculation should be: {noformat} NDV(a op b) = NDV(a) * NDV(b) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org