Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 )
Change subject: IMPALA-5843: Use page index in Parquet files to skip pages ...................................................................... Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/hdfs-parquet-scanner.cc@1170 PS15, Line 1170: // Scalar readers that read collections can be ahead of other readers by 1. > :( Invented BaseScalarColumnReader::LastProcessedRow(), with that I could simplify this check back to the original complexity. http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/parquet-column-readers.h File be/src/exec/parquet/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/parquet-column-readers.h@533 PS15, Line 533: isn't anymore > pedantry: aren't any more Done http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.cc@512 PS7, Line 512: if (IsLastCandidateRange()) { > Yeah exactly, the bit-packed runs in Parquet's RLE encoding can have spurio Since then I moved these conditions to ConsumedCurrentCandidateRange() and removed 'num_buffered_values != 0' because it prevented jumping to the next page. It wouldn't produce wrong result sets, but it caused problems in HdfsParquetScanner::CheckPageFiltering() because the 'current rows' could get out-of-sync for a moment. I guard against invalid peeking by moving the num_buffered_values check just before the PeekLevel() == 0 condition. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 23 Apr 2019 13:16:01 +0000 Gerrit-HasComments: Yes