Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 )
Change subject: IMPALA-5843: Use page index in Parquet files to skip pages ...................................................................... Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@119 PS7, Line 119: num_stats_filtered_row_groups_by_page_index_counter_ = > I am not sure how to track this, but it is also possible that some columns Should these be NumStatsFilteredRowGroups and NumStatsFilteredPages? http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@813 PS7, Line 813: > Now that ComputeCandidatePages() has an output parameter, setting scalar_re You could pass the scalar reader into ComputeCandidatePages() to call the setter and move the result there to avoid the copy. If you don't want to have that dependency, you could pass a callback lambda to ComputeCandidatePages which you can define here and call the setter from there. http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@644 PS8, Line 644: if (filter_pages && candidate_ranges_.empty()) { Do we expect this to happen in reality if the row group also has stats? If so, can we add it to NumStatsFilteredRowGroups and change that counter's meaning to "Row group that has not been read because of stats"? http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc File be/src/exec/parquet/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc@123 PS7, Line 123: } > I take it as a parameter now, and call it 'candidate_pages'. I think "candidate_pages" is ok as long as it's only ever used as a positive list, i.e. candidates to be read. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 03 Apr 2019 21:52:31 +0000 Gerrit-HasComments: Yes