Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@119
PS7, Line 119:   num_stats_filtered_row_groups_by_page_index_counter_ =
> I am not sure how to track this, but it is also possible that some columns
Should these be NumStatsFilteredRowGroups and NumStatsFilteredPages?


http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@813
PS7, Line 813:
> Now that ComputeCandidatePages() has an output parameter, setting scalar_re
You could pass the scalar reader into ComputeCandidatePages() to call the 
setter and move the result there to avoid the copy. If you don't want to have 
that dependency, you could pass a callback lambda to ComputeCandidatePages 
which you can define here and call the setter from there.


http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@644
PS8, Line 644:       if (filter_pages && candidate_ranges_.empty()) {
Do we expect this to happen in reality if the row group also has stats? If so, 
can we add it to NumStatsFilteredRowGroups and change that counter's meaning to 
"Row group that has not been read because of stats"?


http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc
File be/src/exec/parquet/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc@123
PS7, Line 123:     }
> I take it as a parameter now, and call it 'candidate_pages'.
I think "candidate_pages" is ok as long as it's only ever used as a positive 
list, i.e. candidates to be read.



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Apr 2019 21:52:31 +0000
Gerrit-HasComments: Yes

Reply via email to