[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 20:45:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..

IMPALA-8058: Fallback for HBase key scan range estimation

Impala supports "pushing" of HBase key range predicates to HBase so that
Impala reads only rows within the target key range. The planner
estimates the cardinality of such scans by sampling the rows within the
range. However, we have seen cases where sampling returns rows for
unknown reasons. The planner then ends up without a good cardinality
estimate.  (Specifically, the code does a division by zero and produces
a huge estimate.  See the ticket for details.)

Impala appears to use the sampling strategy to compute cardinality
because HBase uses generally do not gather table stats. The resulting
estimates are often off by 2x or more. This is a problem in tests as it
causes cardinality numbers to vary greatly from the expected values.
Fortunately, tests do gather HMS stats. There may be cases where users
do as well. This fix exploits that fact.

This fix:

* Creates a fall-back strategy that uses table cardinality from HMS and
  the selectivity of the key predicates to estimate cardinality when the
  sampling approach fails.
* The fall-back strategy requires tracking the predicates used for HBase
  keys so that their selectivity can be applied during fall-back
  calculations.
* Moved HBase key calculation out of the SingleNodePlanner into the
  HBase scan node as suggested by a "TO DO" in the code. Doing so
  simplified the new code.
* In the spirit of IMPALA-7919, adds the key predicates to the HBase
  scan node in the EXPLAIN output.

Testing:

* Adds a query context option to disable the normal key sampling to
  force the use of the fall-back. Used for testing.
* Adds a new set of HBase test cases that use the new feature to check
  plans with the fall-back approach.
* Reran all existing tests.
* Compared cardinality numbers for the two modes: sampling and HMS using
  the cardinality features of IMPALA-8021. The two approaches provide
  different results, but this is mostly due to the missing selectivity
  estimates for inequality operators. (That's a fix for another time.)

Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Reviewed-on: http://gerrit.cloudera.org:8080/12192
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
10 files changed, 511 insertions(+), 116 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3675/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 16:39:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1869/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 23 Jan 2019 23:21:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-23 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 2:

(2 comments)

Hi Bharath, Thanks much for the review. Addressed your two review comments.

http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG@22
PS2, Line 22: exploints
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java:

http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@412
PS2, Line 412: getTableName().toSql(),
> you can use tbl.getFullName()
Done



--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 23 Jan 2019 22:47:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-23 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Zoram Thanga, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12192

to look at the new patch set (#3).

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..

IMPALA-8058: Fallback for HBase key scan range estimation

Impala supports "pushing" of HBase key range predicates to HBase so that
Impala reads only rows within the target key range. The planner
estimates the cardinality of such scans by sampling the rows within the
range. However, we have seen cases where sampling returns rows for
unknown reasons. The planner then ends up without a good cardinality
estimate.  (Specifically, the code does a division by zero and produces
a huge estimate.  See the ticket for details.)

Impala appears to use the sampling strategy to compute cardinality
because HBase uses generally do not gather table stats. The resulting
estimates are often off by 2x or more. This is a problem in tests as it
causes cardinality numbers to vary greatly from the expected values.
Fortunately, tests do gather HMS stats. There may be cases where users
do as well. This fix exploits that fact.

This fix:

* Creates a fall-back strategy that uses table cardinality from HMS and
  the selectivity of the key predicates to estimate cardinality when the
  sampling approach fails.
* The fall-back strategy requires tracking the predicates used for HBase
  keys so that their selectivity can be applied during fall-back
  calculations.
* Moved HBase key calculation out of the SingleNodePlanner into the
  HBase scan node as suggested by a "TO DO" in the code. Doing so
  simplified the new code.
* In the spirit of IMPALA-7919, adds the key predicates to the HBase
  scan node in the EXPLAIN output.

Testing:

* Adds a query context option to disable the normal key sampling to
  force the use of the fall-back. Used for testing.
* Adds a new set of HBase test cases that use the new feature to check
  plans with the fall-back approach.
* Reran all existing tests.
* Compared cardinality numbers for the two modes: sampling and HMS using
  the cardinality features of IMPALA-8021. The two approaches provide
  different results, but this is mostly due to the missing selectivity
  estimates for inequality operators. (That's a fix for another time.)

Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
---
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
10 files changed, 511 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/3
--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-23 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 2: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG@22
PS2, Line 22: exploints
typo


http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java:

http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@412
PS2, Line 412: getTableName().toSql(),
you can use tbl.getFullName()



--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 23 Jan 2019 21:12:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1832/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Sat, 19 Jan 2019 03:02:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-18 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 2:

(3 comments)

Addressed code review comments. Rebased on latest master, which now shows 
cardinality estimates in the EXPLAIN plan, so updated the new test file 
accordingly.

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java:

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@409
PS1, Line 409:
> nit: add tablename, startKey and endKey too?
Done


http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304
PS1, Line 304: // No useful estimate. Rely on HMS row count stats.
> Probably mention here that this doesn't work as expected if the stats are m
Done


http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304
PS1, Line 304: // No useful estimate. Rely on HMS row count stats.
> Probably mention here that this doesn't work as expected if the stats are m
Done



--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Sat, 19 Jan 2019 02:29:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-18 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Zoram Thanga, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12192

to look at the new patch set (#2).

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..

IMPALA-8058: Fallback for HBase key scan range estimation

Impala supports "pushing" of HBase key range predicates to HBase so that
Impala reads only rows within the target key range. The planner
estimates the cardinality of such scans by sampling the rows within the
range. However, we have seen cases where sampling returns rows for
unknown reasons. The planner then ends up without a good cardinality
estimate.  (Specifically, the code does a division by zero and produces
a huge estimate.  See the ticket for details.)

Impala appears to use the sampling strategy to compute cardinality
because HBase uses generally do not gather table stats. The resulting
estimates are often off by 2x or more. This is a problem in tests as it
causes cardinality numbers to vary greatly from the expected values.
Fortunately, tests do gather HMS stats. There may be cases where users
do as well. This fix exploints that fact.

This fix:

* Creates a fall-back strategy that uses table cardinality from HMS and
  the selectivity of the key predicates to estimate cardinality when the
  sampling approach fails.
* The fall-back strategy requires tracking the predicates used for HBase
  keys so that their selectivity can be applied during fall-back
  calculations.
* Moved HBase key calculation out of the SingleNodePlanner into the
  HBase scan node as suggested by a "TO DO" in the code. Doing so
  simplified the new code.
* In the spirit of IMPALA-7919, adds the key predicates to the HBase
  scan node in the EXPLAIN output.

Testing:

* Adds a query context option to disable the normal key sampling to
  force the use of the fall-back. Used for testing.
* Adds a new set of HBase test cases that use the new feature to check
  plans with the fall-back approach.
* Reran all existing tests.
* Compared cardinality numbers for the two modes: sampling and HMS using
  the cardinality features of IMPALA-8021. The two approaches provide
  different results, but this is mostly due to the missing selectivity
  estimates for inequality operators. (That's a fix for another time.)

Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
---
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
10 files changed, 512 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/2
--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-17 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 1: Code-Review+1

(2 comments)

The patch generally lgtm. While the big question is if any users run stats on 
tables backed by hbase, my opinion is that we can still get this patch in since 
it doesn't regress anything and adds general supportability logging. I'll let 
Zoram take another pass.

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java:

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@409
PS1, Line 409: );
nit: add tablename, startKey and endKey too?


http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304
PS1, Line 304: // No useful estimate. Rely on HMS row count stats
Probably mention here that this doesn't work as expected if the stats are 
missing and if there is any TODO for future?



--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 17 Jan 2019 22:05:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1751/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 09 Jan 2019 19:41:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-09 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 1:

Passed pre-review tests: https://jenkins.impala.io/job/pre-review-test/277/


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 09 Jan 2019 17:35:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-09 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12192


Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..

IMPALA-8058: Fallback for HBase key scan range estimation

HBase provides keys. Impala supports "pushing" of key range predicates
to HBase to read only rows within the target key range. The planner
estimates the cardinality of such scans by sampling the rows within the
range. However, we have seen cases where the predicates are so selective
that no keys fall within the sampling range, and we end up without a
good cardinality estimate. (Specifically, the code does a division by
zero and produces a huge estimate. See the ticket for details.)

This fix:

* Creates a fall-back strategy that uses table cardinality from HMS and
  the selectivity of the key predicates to estimate cardinality when the
  sampling approach fails.
* The fall-back strategy requires tracking the predicates used for HBase
  keys so that they can be applied during fall-back calculations.
* Moved HBase key calculation out of the SingleNodePlanner into the
  HBase scan node as suggested by a "TO DO" in the code. Doing so
  simplified the new code.
* In the spirit of IMPALA-7919, adds the key predicates to the HBase
  scan node in the EXPLAIN output.

Testing:

* Adds a query context option to disable the normal key sampling to
  force the use of the fall-back. Used for testing.
* Adds a new set of HBase test cases that use the new feature to check
  plans with the fall-back approach.
* Reran all existing tests.

Testing will be improved once IMPALA-8021 is available: we will have the
estimated cardinality in both sets of HBase tests and can compare the
two approaches.

Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
---
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
10 files changed, 485 insertions(+), 124 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/1
--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers