Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20804


Change subject: IMPALA-12631: Improve count star performance for parquet scans
......................................................................

IMPALA-12631: Improve count star performance for parquet scans

Backend function HdfsParquetScanner::GetNextInternal() uses the data
stored in the Parquet RowGroup.num_rows field to compute count star,
it still needs to find row groups and sum all RowGroup.num_rows.
This patch uses the 'num_rows' field in Parquet file metadata, it
avoids NextRowGroup() function calls. Then each file only needs to
be processed once. The planner is also modified to generate only one
scan range per file.

Testing:
- Ran PlannerTest#testParquetStatsAgg
- Ran query_test/test_aggregation.py

Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
3 files changed, 68 insertions(+), 35 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/1
--
To view, visit http://gerrit.cloudera.org:8080/20804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd
Gerrit-Change-Number: 20804
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang <chinazhangyi...@163.com>

Reply via email to