Fang-Yu Rao has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/12974 )
Change subject: IMPALA-7608: Estimate row count from file size when no stats available ...................................................................... IMPALA-7608: Estimate row count from file size when no stats available Added the feature that computes an estimated number of rows in the current hdfs table if the statistics for the cardinality of the current hdfs table is not available. Also added an additional query option to revert the change in case of regression. Testing: (1) In CardinalityTest.java, replaced the original statement "verifyCardinality("SELECT a FROM functional.tinytable", -1);" in the method testBasicsWithoutStats() with "verifyCardinality("SELECT a FROM functional.tinytable", 2);". (2) In CarginalityTest.java, added more tests to check the cardinality of most PlanNode implementations. For each tested PlanNode, the behaviors before and after we disable the feature are both tested. (3) In set.test, modified three related test cases to make sure that the added query option is included after executing "set all" in various scenarios. (4) There are 8 JUnit tests in PlannerTest.java that would produce different distributed query plans when this feature is enabled. Added an additional JUnit test for each of those 8 affected JUnit tests when this feature is enabled. Specifically, each tested query in a newly added test files involves at least one hdfs table without available statistics. Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle-hdfs-num-rows-est-enabled.test A testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection-hdfs-num-rows-est-enabled.test A testdata/workloads/functional-planner/queries/PlannerTest/joins-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test A testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test A testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation-hdfs-num-rows-est-enabled.test A testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements-hdfs-num-rows-est-enabled.test A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test A testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/set.test 21 files changed, 2,940 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/12974/14 -- To view, visit http://gerrit.cloudera.org:8080/12974 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a Gerrit-Change-Number: 12974 Gerrit-PatchSet: 14 Gerrit-Owner: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>