Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )
Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. ...................................................................... Patch Set 3: (17 comments) Just had time to go through about 2/3 of the changes. Will resume next week. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h File be/src/exec/parquet/parquet-column-chunk-reader.h: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h@127 PS3, Line 127: should : // be nit. format http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h File be/src/exec/parquet/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@427 PS3, Line 427: ReadNextPageHeader Name it as ReadNextDataPageHeader()? http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439 PS3, Line 439: would will? http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439 PS3, Line 439: . nit. to insert "regardless of the page type" to make it clear. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@513 PS3, Line 513: virtual bool SetRowGroupAtEnd() override; Missing comment? http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@526 PS3, Line 526: int64_t LastRowIdxInCurrentPage() const { missing comment? http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@527 PS3, Line 527: DCHECK(!candidate_data_pages_.empty()); May add a DCHECK(candidate_page_idx_ <= candidate_data_pages_.size() - 1) here. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@553 PS3, Line 553: bool DoesPageFiltering() const { return !candidate_data_pages_.empty(); } nit. Noticed that this change and all the ones following this one are due to format. Is it possible to restore the original format to avoid these changes? http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1151 PS3, Line 1151: Status::OK(); We should return a non Status::OK() object here. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1160 PS3, Line 1160: num_values < 0 nit. UNLIKELY() http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1210 PS3, Line 1210: remaining init to 0. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1372 PS3, Line 1372: && status This is redundant. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1479 PS3, Line 1479: JumpNextPageHeader AdvanceNextPageHeader() sounds better. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1484 PS3, Line 1484: num_buffered_values_ == 0 I wonder if we could skip this empty page and advance to the next one. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1509 PS3, Line 1509: bool BaseScalarColumnReader::SkipRows(int64_t num_rows, int64_t skip_row_id) { : if (max_rep_level() > 0) { : return SkipRowsInternal<true>(num_rows, skip_row_id); : } else { : return SkipRowsInternal<false>(num_rows, skip_row_id); : } : }; This probably should be inlined. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1517 PS3, Line 1517: /// Wrapper around 'SkipTopLevelRows' to skip across multiple pages. Please add a description for the algorithm used in the method, as well as ones for each data block inside the method. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1538 PS3, Line 1538: result = Do we need to check the status here? -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 3 Gerrit-Owner: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 01 Oct 2021 19:24:48 +0000 Gerrit-HasComments: Yes