Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
......................................................................


Patch Set 18: Code-Review+1

(8 comments)

I mostly checked the changes that pertained to my comments on an older patch 
set. I'm a bit short on time currently but I also don't want to hold up this 
change, so I'd like to defer to Csaba for the +2.

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@464
PS17, Line 464: bool GetRequiredStatsField(const string& fn_name,
I think this could go into parquet-column-stats.{h,cc}.


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@546
PS17, Line 546:     ColumnStatsReader::StatsField stats_field = 
ColumnStatsReader::StatsField::MIN;
I think we usually don't initialize output parameters to make it clear that 
they're expected to be set by the function they're passed into.


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@643
PS17, Line 643:       // We don't need the raw page index buffers anymore.
              :       page_index_.Release();
Can this go to ProcessPageIndex? It already touches a bunch of other state.


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-column-readers.h
File be/src/exec/parquet/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-column-readers.h@400
PS17, Line 400:   /// True, if we are using NextLevels() to readahead the next 
def and rep levels. In this
I feel that this field needs more explanation. From just looking at the comment 
I cannot tell when one would set this, and why it is needed.


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-column-readers.h@403
PS17, Line 403:   bool levels_readahead_ = false;
Would it simplify the code to make this levels_read_ahead_offset_ (being -1 or 
0 (or 0 and 1)) and then just add it where needed?


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index-test.cc
File be/src/exec/parquet/parquet-page-index-test.cc:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index-test.cc@58
PS17, Line 58: void ValidatePageIndexRange(const RowGroupRanges& 
row_group_ranges,
Add a comment for this one, too?


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index.h
File be/src/exec/parquet/parquet-page-index.h:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index.h@56
PS17, Line 56: if there's at least parts of the page index are present
nit: grammar


http://gerrit.cloudera.org:8080/#/c/12065/17/tests/query_test/test_parquet_stats.py
File tests/query_test/test_parquet_stats.py:

http://gerrit.cloudera.org:8080/#/c/12065/17/tests/query_test/test_parquet_stats.py@87
PS17, Line 87:     for batch_size in [0, 1]:
Should we use a proper test dimension for the batch size, e.g. like in 
tests/query_test/test_scanners.py:168 ?



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 18
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Sun, 28 Apr 2019 22:14:41 +0000
Gerrit-HasComments: Yes

Reply via email to