Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19328
Change subject: IMPALA-11780: Wrong FILE__POSITION values for multi row group Parquet files when page filtering is used ...................................................................... IMPALA-11780: Wrong FILE__POSITION values for multi row group Parquet files when page filtering is used Impala generated wrong values for the FILE__POSITION column when the Parquet file contains multiple row groups and page filtering is being used. We are using the value of 'current_row_' in the Parquet column readers to populate the file position slot. The problem is that 'current_row_' denotes the index of the row within the row group and not withing the file. We cannot change 'current_row_' as page filtering depends on its value, as the page index also uses the row group-based indexes of the rows, not the file indexes. In the meantime it turned out FILE__POSITION was also not set correctly in the Parquet late materialization code, as BaseScalarColumnReader::SkipRowsInternal() didn't update 'current_row_' in some code paths. The value of FILE__POSITION is critical for Iceberg V2 tables as position delete files store file positions of the deleted rows. Testing: * added e2e tests * the tests are now running w/o PARQUET_READ_STATISTICS to exercise more code paths Change-Id: I5ef37a1aa731eb54930d6689621cd6169fed6605 (cherry picked from commit b71a18bc82629c71aba8d5a55fe91fb04c975ae1) --- M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M testdata/data/README A testdata/data/customer_nested_multiblock_multipage.parquet M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-parquet.test M tests/query_test/test_scanners.py 6 files changed, 93 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/19328/1 -- To view, visit http://gerrit.cloudera.org:8080/19328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5ef37a1aa731eb54930d6689621cd6169fed6605 Gerrit-Change-Number: 19328 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>