Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20804

to look at the new patch set (#2).

Change subject: IMPALA-12631: Improve count star performance for parquet scans
......................................................................

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls. Then each file only needs to
be processed once. The planner is also modified to generate only one
scan range per file.

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
3 files changed, 70 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/2
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang <chinazhangyi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to