Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12192


Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
......................................................................

IMPALA-8058: Fallback for HBase key scan range estimation

HBase provides keys. Impala supports "pushing" of key range predicates
to HBase to read only rows within the target key range. The planner
estimates the cardinality of such scans by sampling the rows within the
range. However, we have seen cases where the predicates are so selective
that no keys fall within the sampling range, and we end up without a
good cardinality estimate. (Specifically, the code does a division by
zero and produces a huge estimate. See the ticket for details.)

This fix:

* Creates a fall-back strategy that uses table cardinality from HMS and
  the selectivity of the key predicates to estimate cardinality when the
  sampling approach fails.
* The fall-back strategy requires tracking the predicates used for HBase
  keys so that they can be applied during fall-back calculations.
* Moved HBase key calculation out of the SingleNodePlanner into the
  HBase scan node as suggested by a "TO DO" in the code. Doing so
  simplified the new code.
* In the spirit of IMPALA-7919, adds the key predicates to the HBase
  scan node in the EXPLAIN output.

Testing:

* Adds a query context option to disable the normal key sampling to
  force the use of the fall-back. Used for testing.
* Adds a new set of HBase test cases that use the new feature to check
  plans with the fall-back approach.
* Reran all existing tests.

Testing will be improved once IMPALA-8021 is available: we will have the
estimated cardinality in both sets of HBase tests and can compare the
two approaches.

Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
---
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
10 files changed, 485 insertions(+), 124 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/1
--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers <prog...@cloudera.com>

Reply via email to