Hello Tamas Mate, lipeng...@sensorsdata.cn, Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19328 to look at the new patch set (#2). Change subject: IMPALA-11780: Wrong FILE__POSITION values for multi row group Parquet files when page filtering is used ...................................................................... IMPALA-11780: Wrong FILE__POSITION values for multi row group Parquet files when page filtering is used Impala generated wrong values for the FILE__POSITION column when the Parquet file contained multiple row groups and page filtering was used as well. We are using the value of 'current_row_' in the Parquet column readers to populate the file position slot. The problem is that 'current_row_' denotes the index of the row within the row group and not within the file. We cannot change 'current_row_' as page filtering depends on its value, as the page index also uses the row group-based indexes of the rows, not the file indexes. In the meantime it turned out FILE__POSITION was also not set correctly in the Parquet late materialization code, as BaseScalarColumnReader::SkipRowsInternal() didn't update 'current_row_' in some code paths. The value of FILE__POSITION is critical for Iceberg V2 tables as position delete files store file positions of the deleted rows. Testing: * added e2e tests * the tests are now running w/o PARQUET_READ_STATISTICS to exercise more code paths Change-Id: I5ef37a1aa731eb54930d6689621cd6169fed6605 (cherry picked from commit b71a18bc82629c71aba8d5a55fe91fb04c975ae1) --- M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M testdata/data/README A testdata/data/customer_nested_multiblock_multipage.parquet M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-parquet.test M tests/query_test/test_scanners.py 6 files changed, 93 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/19328/2 -- To view, visit http://gerrit.cloudera.org:8080/19328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5ef37a1aa731eb54930d6689621cd6169fed6605 Gerrit-Change-Number: 19328 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org>