[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 20:45:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. IMPALA-8058: Fallback for HBase key scan range estimation Impala supports "pushing" of HBase key range predicates to HBase so that Impala reads only rows within the target key range. The planner estimates the cardinality of such scans by sampling the rows within the range. However, we have seen cases where sampling returns rows for unknown reasons. The planner then ends up without a good cardinality estimate. (Specifically, the code does a division by zero and produces a huge estimate. See the ticket for details.) Impala appears to use the sampling strategy to compute cardinality because HBase uses generally do not gather table stats. The resulting estimates are often off by 2x or more. This is a problem in tests as it causes cardinality numbers to vary greatly from the expected values. Fortunately, tests do gather HMS stats. There may be cases where users do as well. This fix exploits that fact. This fix: * Creates a fall-back strategy that uses table cardinality from HMS and the selectivity of the key predicates to estimate cardinality when the sampling approach fails. * The fall-back strategy requires tracking the predicates used for HBase keys so that their selectivity can be applied during fall-back calculations. * Moved HBase key calculation out of the SingleNodePlanner into the HBase scan node as suggested by a "TO DO" in the code. Doing so simplified the new code. * In the spirit of IMPALA-7919, adds the key predicates to the HBase scan node in the EXPLAIN output. Testing: * Adds a query context option to disable the normal key sampling to force the use of the fall-back. Used for testing. * Adds a new set of HBase test cases that use the new feature to check plans with the fall-back approach. * Reran all existing tests. * Compared cardinality numbers for the two modes: sampling and HMS using the cardinality features of IMPALA-8021. The two approaches provide different results, but this is mostly due to the missing selectivity estimates for inequality operators. (That's a fix for another time.) Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Reviewed-on: http://gerrit.cloudera.org:8080/12192 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 10 files changed, 511 insertions(+), 116 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3675/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 16:39:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1869/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 23 Jan 2019 23:21:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 2: (2 comments) Hi Bharath, Thanks much for the review. Addressed your two review comments. http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG@22 PS2, Line 22: exploints > typo Done http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java: http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@412 PS2, Line 412: getTableName().toSql(), > you can use tbl.getFullName() Done -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 23 Jan 2019 22:47:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Hello Bharath Vissapragada, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12192 to look at the new patch set (#3). Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. IMPALA-8058: Fallback for HBase key scan range estimation Impala supports "pushing" of HBase key range predicates to HBase so that Impala reads only rows within the target key range. The planner estimates the cardinality of such scans by sampling the rows within the range. However, we have seen cases where sampling returns rows for unknown reasons. The planner then ends up without a good cardinality estimate. (Specifically, the code does a division by zero and produces a huge estimate. See the ticket for details.) Impala appears to use the sampling strategy to compute cardinality because HBase uses generally do not gather table stats. The resulting estimates are often off by 2x or more. This is a problem in tests as it causes cardinality numbers to vary greatly from the expected values. Fortunately, tests do gather HMS stats. There may be cases where users do as well. This fix exploits that fact. This fix: * Creates a fall-back strategy that uses table cardinality from HMS and the selectivity of the key predicates to estimate cardinality when the sampling approach fails. * The fall-back strategy requires tracking the predicates used for HBase keys so that their selectivity can be applied during fall-back calculations. * Moved HBase key calculation out of the SingleNodePlanner into the HBase scan node as suggested by a "TO DO" in the code. Doing so simplified the new code. * In the spirit of IMPALA-7919, adds the key predicates to the HBase scan node in the EXPLAIN output. Testing: * Adds a query context option to disable the normal key sampling to force the use of the fall-back. Used for testing. * Adds a new set of HBase test cases that use the new feature to check plans with the fall-back approach. * Reran all existing tests. * Compared cardinality numbers for the two modes: sampling and HMS using the cardinality features of IMPALA-8021. The two approaches provide different results, but this is mostly due to the missing selectivity estimates for inequality operators. (That's a fix for another time.) Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce --- M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 10 files changed, 511 insertions(+), 116 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/3 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 2: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12192/2//COMMIT_MSG@22 PS2, Line 22: exploints typo http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java: http://gerrit.cloudera.org:8080/#/c/12192/2/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@412 PS2, Line 412: getTableName().toSql(), you can use tbl.getFullName() -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 23 Jan 2019 21:12:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1832/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Sat, 19 Jan 2019 03:02:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 2: (3 comments) Addressed code review comments. Rebased on latest master, which now shows cardinality estimates in the EXPLAIN plan, so updated the new test file accordingly. http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java: http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@409 PS1, Line 409: > nit: add tablename, startKey and endKey too? Done http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304 PS1, Line 304: // No useful estimate. Rely on HMS row count stats. > Probably mention here that this doesn't work as expected if the stats are m Done http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304 PS1, Line 304: // No useful estimate. Rely on HMS row count stats. > Probably mention here that this doesn't work as expected if the stats are m Done -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Sat, 19 Jan 2019 02:29:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Hello Bharath Vissapragada, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12192 to look at the new patch set (#2). Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. IMPALA-8058: Fallback for HBase key scan range estimation Impala supports "pushing" of HBase key range predicates to HBase so that Impala reads only rows within the target key range. The planner estimates the cardinality of such scans by sampling the rows within the range. However, we have seen cases where sampling returns rows for unknown reasons. The planner then ends up without a good cardinality estimate. (Specifically, the code does a division by zero and produces a huge estimate. See the ticket for details.) Impala appears to use the sampling strategy to compute cardinality because HBase uses generally do not gather table stats. The resulting estimates are often off by 2x or more. This is a problem in tests as it causes cardinality numbers to vary greatly from the expected values. Fortunately, tests do gather HMS stats. There may be cases where users do as well. This fix exploints that fact. This fix: * Creates a fall-back strategy that uses table cardinality from HMS and the selectivity of the key predicates to estimate cardinality when the sampling approach fails. * The fall-back strategy requires tracking the predicates used for HBase keys so that their selectivity can be applied during fall-back calculations. * Moved HBase key calculation out of the SingleNodePlanner into the HBase scan node as suggested by a "TO DO" in the code. Doing so simplified the new code. * In the spirit of IMPALA-7919, adds the key predicates to the HBase scan node in the EXPLAIN output. Testing: * Adds a query context option to disable the normal key sampling to force the use of the fall-back. Used for testing. * Adds a new set of HBase test cases that use the new feature to check plans with the fall-back approach. * Reran all existing tests. * Compared cardinality numbers for the two modes: sampling and HMS using the cardinality features of IMPALA-8021. The two approaches provide different results, but this is mostly due to the missing selectivity estimates for inequality operators. (That's a fix for another time.) Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce --- M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 10 files changed, 512 insertions(+), 116 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/2 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 1: Code-Review+1 (2 comments) The patch generally lgtm. While the big question is if any users run stats on tables backed by hbase, my opinion is that we can still get this patch in since it doesn't regress anything and adds general supportability logging. I'll let Zoram take another pass. http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java: http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@409 PS1, Line 409: ); nit: add tablename, startKey and endKey too? http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304 PS1, Line 304: // No useful estimate. Rely on HMS row count stats Probably mention here that this doesn't work as expected if the stats are missing and if there is any TODO for future? -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 17 Jan 2019 22:05:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1751/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 09 Jan 2019 19:41:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 1: Passed pre-review tests: https://jenkins.impala.io/job/pre-review-test/277/ -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 09 Jan 2019 17:35:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Paul Rogers has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12192 Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. IMPALA-8058: Fallback for HBase key scan range estimation HBase provides keys. Impala supports "pushing" of key range predicates to HBase to read only rows within the target key range. The planner estimates the cardinality of such scans by sampling the rows within the range. However, we have seen cases where the predicates are so selective that no keys fall within the sampling range, and we end up without a good cardinality estimate. (Specifically, the code does a division by zero and produces a huge estimate. See the ticket for details.) This fix: * Creates a fall-back strategy that uses table cardinality from HMS and the selectivity of the key predicates to estimate cardinality when the sampling approach fails. * The fall-back strategy requires tracking the predicates used for HBase keys so that they can be applied during fall-back calculations. * Moved HBase key calculation out of the SingleNodePlanner into the HBase scan node as suggested by a "TO DO" in the code. Doing so simplified the new code. * In the spirit of IMPALA-7919, adds the key predicates to the HBase scan node in the EXPLAIN output. Testing: * Adds a query context option to disable the normal key sampling to force the use of the fall-back. Used for testing. * Adds a new set of HBase test cases that use the new feature to check plans with the fall-back approach. * Reran all existing tests. Testing will be improved once IMPALA-8021 is available: we will have the estimated cardinality in both sets of HBase tests and can compare the two approaches. Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce --- M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 10 files changed, 485 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/1 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers