Paul Rogers has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12192
Change subject: IMPALA-8058: Fallback for HBase key scan range estimation ...................................................................... IMPALA-8058: Fallback for HBase key scan range estimation HBase provides keys. Impala supports "pushing" of key range predicates to HBase to read only rows within the target key range. The planner estimates the cardinality of such scans by sampling the rows within the range. However, we have seen cases where the predicates are so selective that no keys fall within the sampling range, and we end up without a good cardinality estimate. (Specifically, the code does a division by zero and produces a huge estimate. See the ticket for details.) This fix: * Creates a fall-back strategy that uses table cardinality from HMS and the selectivity of the key predicates to estimate cardinality when the sampling approach fails. * The fall-back strategy requires tracking the predicates used for HBase keys so that they can be applied during fall-back calculations. * Moved HBase key calculation out of the SingleNodePlanner into the HBase scan node as suggested by a "TO DO" in the code. Doing so simplified the new code. * In the spirit of IMPALA-7919, adds the key predicates to the HBase scan node in the EXPLAIN output. Testing: * Adds a query context option to disable the normal key sampling to force the use of the fall-back. Used for testing. * Adds a new set of HBase test cases that use the new feature to check plans with the fall-back approach. * Reran all existing tests. Testing will be improved once IMPALA-8021 is available: we will have the estimated cardinality in both sets of HBase tests and can compare the two approaches. Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce --- M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 10 files changed, 485 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/1 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers <prog...@cloudera.com>