Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18700 )

Change subject: IMPALA-11414: Off-by-one error in Parquet late materialization
......................................................................

IMPALA-11414: Off-by-one error in Parquet late materialization

With PARQUET_LATE_MATERIALIZATION we can set the number of minimum
consecutive rows that if filtered out, we avoid materialization of rows
in other columns in parquet.

E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we
find at least 10 consecutive rows that don't pass the predicates we
avoid materializing the corresponding rows in the other columns.

But due to an off-by-one error we actually only needed
(PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if
we set PARQUET_LATE_MATERIALIZATION to one, then we need zero
consecutive filtered out elements which leads to a crash/DCHECK. The bug
is in the GetMicroBatches() algorithm when we produce the micro batches
based on the selected rows.

Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it
shouldn't be allowed.

Testing
 * e2e test with PARQUET_LATE_MATERIALIZATION=1
 * e2e test for checking SET PARQUET_LATE_MATERIALIZATION=N

Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa
Reviewed-on: http://gerrit.cloudera.org:8080/18700
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
---
M be/src/exec/scratch-tuple-batch-test.cc
M be/src/exec/scratch-tuple-batch.h
M be/src/service/query-options.cc
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization-unique-db.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
M tests/query_test/test_parquet_late_materialization.py
6 files changed, 46 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18700
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa
Gerrit-Change-Number: 18700
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to