Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18700 )
Change subject: IMPALA-11414: Off-by-one error in Parquet late materialization ...................................................................... IMPALA-11414: Off-by-one error in Parquet late materialization With PARQUET_LATE_MATERIALIZATION we can set the number of minimum consecutive rows that if filtered out, we avoid materialization of rows in other columns in parquet. E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we find at least 10 consecutive rows that don't pass the predicates we avoid materializing the corresponding rows in the other columns. But due to an off-by-one error we actually only needed (PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if we set PARQUET_LATE_MATERIALIZATION to one, then we need zero consecutive filtered out elements which leads to a crash/DCHECK. The bug is in the GetMicroBatches() algorithm when we produce the micro batches based on the selected rows. Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it shouldn't be allowed. Testing * e2e test with PARQUET_LATE_MATERIALIZATION=1 * e2e test for checking SET PARQUET_LATE_MATERIALIZATION=N Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa Reviewed-on: http://gerrit.cloudera.org:8080/18700 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/exec/scratch-tuple-batch-test.cc M be/src/exec/scratch-tuple-batch.h M be/src/service/query-options.cc A testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization-unique-db.test M testdata/workloads/functional-query/queries/QueryTest/set.test M tests/query_test/test_parquet_late_materialization.py 6 files changed, 46 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/18700 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa Gerrit-Change-Number: 18700 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>