Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#2). Change subject: IMPALA-12631: Improve count star performance for parquet scans ...................................................................... IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls. Then each file only needs to be processed once. The planner is also modified to generate only one scan range per file. Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test 3 files changed, 70 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/2 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang <chinazhangyi...@163.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>