Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-5036: Parquet count star optimization ......................................................................
IMPALA-5036: Parquet count star optimization Instead of materializing empty rows when computing count star, we use the data stored in the Parquet RowGroup.num_rows field. The Parquet scanner tuple is modified to have one slot into which we will write the num rows statistic. The aggregate function is changed from count to a special sum function that gets initialized to 0. We also add a rewrite rule so that count(<literal>) is rewritten to count(*) in order to make sure that this optimization is applied in all cases. Testing: - Added functional and planner tests Change-Id: I536b85c014821296aed68a0c68faadae96005e62 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test A testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_aggregation.py M tests/query_test/test_parquet_stats.py 26 files changed, 883 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/5 -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>