Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered 
out rows in Parquet table.
......................................................................


Patch Set 3:

(17 comments)

Just had time to go through about 2/3 of the changes. Will resume next week.

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h
File be/src/exec/parquet/parquet-column-chunk-reader.h:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h@127
PS3, Line 127: should
             :     // be
nit. format


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h
File be/src/exec/parquet/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@427
PS3, Line 427: ReadNextPageHeader
Name it as ReadNextDataPageHeader()?


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439
PS3, Line 439: would
will?


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439
PS3, Line 439: .
nit. to insert "regardless of the page type" to make it clear.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@513
PS3, Line 513: virtual bool SetRowGroupAtEnd() override;
Missing comment?


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@526
PS3, Line 526:   int64_t LastRowIdxInCurrentPage() const {
missing comment?


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@527
PS3, Line 527:     DCHECK(!candidate_data_pages_.empty());
May add a DCHECK(candidate_page_idx_ <= candidate_data_pages_.size() - 1) here.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@553
PS3, Line 553: bool DoesPageFiltering() const { return 
!candidate_data_pages_.empty(); }
nit. Noticed that this change and all the ones following this one are due to 
format. Is it possible to restore the original format to avoid these changes?


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1151
PS3, Line 1151: Status::OK();
We should return a non Status::OK() object here.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1160
PS3, Line 1160: num_values < 0
nit. UNLIKELY()


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1210
PS3, Line 1210: remaining
init to 0.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1372
PS3, Line 1372: && status
This is redundant.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1479
PS3, Line 1479: JumpNextPageHeader
AdvanceNextPageHeader() sounds better.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1484
PS3, Line 1484: num_buffered_values_ == 0
I wonder if we could skip this empty page and advance to the next one.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1509
PS3, Line 1509: bool BaseScalarColumnReader::SkipRows(int64_t num_rows, int64_t 
skip_row_id) {
              :   if (max_rep_level() > 0) {
              :     return SkipRowsInternal<true>(num_rows, skip_row_id);
              :   } else {
              :     return SkipRowsInternal<false>(num_rows, skip_row_id);
              :   }
              : };
This probably should be inlined.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1517
PS3, Line 1517: /// Wrapper around 'SkipTopLevelRows' to skip across multiple 
pages.
Please add a description for the algorithm used in the method, as well as ones 
for each data block inside the method.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1538
PS3, Line 1538: result =
Do we need to check the status here?



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 19:24:48 +0000
Gerrit-HasComments: Yes

Reply via email to