Hello Tamas Mate, lipeng...@sensorsdata.cn, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19328

to look at the new patch set (#2).

Change subject: IMPALA-11780: Wrong FILE__POSITION values for multi row group 
Parquet files when page filtering is used
......................................................................

IMPALA-11780: Wrong FILE__POSITION values for multi row group Parquet files 
when page filtering is used

Impala generated wrong values for the FILE__POSITION column when the
Parquet file contained multiple row groups and page filtering was
used as well.

We are using the value of 'current_row_' in the Parquet column readers
to populate the file position slot. The problem is that 'current_row_'
denotes the index of the row within the row group and not within the
file. We cannot change 'current_row_' as page filtering depends on its
value, as the page index also uses the row group-based indexes of the
rows, not the file indexes.

In the meantime it turned out FILE__POSITION was also not set correctly
in the Parquet late materialization code, as
BaseScalarColumnReader::SkipRowsInternal() didn't update 'current_row_'
in some code paths.

The value of FILE__POSITION is critical for Iceberg V2 tables as
position delete files store file positions of the deleted rows.

Testing:
 * added e2e tests
 * the tests are now running w/o PARQUET_READ_STATISTICS to exercise
   more code paths

Change-Id: I5ef37a1aa731eb54930d6689621cd6169fed6605
(cherry picked from commit b71a18bc82629c71aba8d5a55fe91fb04c975ae1)
---
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M testdata/data/README
A testdata/data/customer_nested_multiblock_multipage.parquet
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-parquet.test
M tests/query_test/test_scanners.py
6 files changed, 93 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/19328/2
--
To view, visit http://gerrit.cloudera.org:8080/19328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5ef37a1aa731eb54930d6689621cd6169fed6605
Gerrit-Change-Number: 19328
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>

Reply via email to