Norbert Luksa has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/14347 )
Change subject: IMPALA-6501: Optimize count(star) for Kudu scans ...................................................................... IMPALA-6501: Optimize count(star) for Kudu scans IMPALA-5036 added an optimisation for count(star) in Parquet scans that avoids materialising dummy rows. This change provides similar optimization for Kudu tables. Instead of materializing empty rows when computing count star, we use the NumRows field from the Kudu API. The Kudu scanner tuple is modified to have one slot into which we will write the num rows statistic. The aggregate function is changed from count to a special sum function that gets initialized to 0. Tests: * Added end-to-end tests ̣* Added planner tests * Run performance tests on tpch.lineitem Kudu table with 25 set as scaling factor, on 1 node, with mt_dop set to 1, just to measure the speedup gained when scanning. Counting the rows before the optimization took around 400ms, and around 170ms after. Change-Id: Ic99e0f954d0ca65779bd531ca79ace1fcb066fb9 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-base.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test A testdata/workloads/functional-planner/queries/PlannerTest/kudu-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-query/queries/QueryTest/kudu-stats-agg.test M tests/query_test/test_aggregation.py 18 files changed, 580 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/14347/10 -- To view, visit http://gerrit.cloudera.org:8080/14347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic99e0f954d0ca65779bd531ca79ace1fcb066fb9 Gerrit-Change-Number: 14347 Gerrit-PatchSet: 10 Gerrit-Owner: Norbert Luksa <norbert.lu...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>