Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11528 to look at the new patch set (#5). Change subject: IMPALA-7310: All-null columns give wrong estimates in planner ...................................................................... IMPALA-7310: All-null columns give wrong estimates in planner Modified the planner to handle low-value NDVs by adjusting them upward by one to account for null values. Thus, even an all-null column, which has an NDV of 0 in stats, will have an NDV of 1 in the planner. (The planner already expects NDV to include nulls.) Modified the front end to allow capturing the full plan for use in a unit test. Added unit tests that verify estimated cardinality for a plan as a way to verify that the fix solved the scenario in IMPALA-7310. Testing required a new table, similar to the existing nulltable, but which has multiple rows and has stats calculated. The change was limited to a very narrow range of cases: * Table column (not an internal column such as COUNT(*)) * Column is nullable * Column has stats * Column does not provide a null count, or null count > 0 * Reported NDV <= 1 In this narrow case, we add one to NDV to account for nulls. (Any larger adjustment throws off the TPC-H tests which have multiple columns, marked as non-null, with low NDV, but which actually include no nulls.) The change minimized impact on PlannerTest, but still some memory numbers needed adjusting for a test in which one column hit the criteria listed above and had its NDV adjusted. Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459 --- M .gitignore M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java A testdata/NullTable/large_data.csv M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test 12 files changed, 449 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/5 -- To view, visit http://gerrit.cloudera.org:8080/11528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459 Gerrit-Change-Number: 11528 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>