Hello Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11575 to look at the new patch set (#17). Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet reader ...................................................................... IMPALA-6964: Track stats about column and page sizes in Parquet reader Adds the following new stats: * ParquetCompressedPageSize - a summary (average, min, max) counter that tracks the size of compressed pages read, if no compressed pages are read then this counter is empty * ParquetUncompressedPageSize - a summary counter that tracks the size of uncompressed pages read, it is updated in two places: (1) when a compressed page is de-compressed, and (2) when a page that is not compressed is read * ParquetCompressedDataReadPerColumn - a summary counter that tracks the amount of compressed data read per column for a scan node * ParquetUncompressedDataReadPerColumn - a summary counter that tracks the amount of uncompressed data read per column for a scan node The PerColumn counters are calculated by aggregating the number of bytes read for each column across all scan ranges processed by a scan node. Each sample in the counter is the size of a single column. Here is an example of what the updated HDFS scan profile looks like: - ParquetCompressedDataReadPerColumn: (Avg: 227.56 KB (233018) ; Min: 225.14 KB (230540) ; Max: 229.98 KB (235496) ; Number of samples: 2) - ParquetUncompressedDataReadPerColumn: (Avg: 227.96 KB (233426) ; Min: 224.91 KB (230306) ; Max: 231.00 KB (236547) ; Number of samples: 2) - ParquetCompressedPageSize: (Avg: 4.46 KB (4568) ; Min: 3.86 KB (3955) ; Max: 5.19 KB (5315) ; Number of samples: 102) - ParquetDecompressedPageSize: (Avg: 4.47 KB (4576) ; Min: 3.86 KB (3950) ; Max: 5.22 KB (5349) ; Number of samples: 102) Testing: * Added new tests to test_scanners.py that do some basic validation of the new counters above Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/util/runtime-profile.cc M tests/infra/test_utils.py M tests/query_test/test_scanners.py A tests/util/counters.py M tests/util/parse_util.py 10 files changed, 320 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/11575/17 -- To view, visit http://gerrit.cloudera.org:8080/11575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817 Gerrit-Change-Number: 11575 Gerrit-PatchSet: 17 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>