Hello Tamas Mate, Gabor Kaszab, Gergely Fürnstáhl, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/20295
to look at the new patch set (#4).
Change subject: IMPALA-12327: Iceberg V2 operator wrong results in PARTITIONED
mode
......................................................................
IMPALA-12327: Iceberg V2 operator wrong results in PARTITIONED mode
The Iceberg delete node tries to do mini merge-joins between data
records and delete records. This works in BROADCAST mode, and most of
the time in PARTITIONED mode as well. Though the Iceberg delete node had
the wrong assumption that if the rows in a row batch belong to the same
file, and come in ascending order, we rely on the previous delete
updating IcebergDeleteState to the next deleted row id and skip the
binary search if it's greater than or equal to the current probe row id.
When PARTITIONED mode is used, we cannot rely on ascending row order,
not even inside row batches, not even when the previous file path is the
same as the current one. This is because files with multiple blocks can
be processed by multiple hosts in parallel, then the rows are getting
hash-exchanged based on their file paths. Then the exchange-receiver at
the LHS coalesces the row batches from multiple senders, hence the row
IDs being unordered.
This patch adds a fix to ignore presumptions and do a binary search when
the position-based difference between the current row and previous row
is not one, and we are in PARTITIONED mode.
Tests:
* added e2e tests
Change-Id: Ib89a53e812af8c3b8ec5bc27bca0a50dcac5d924
---
M be/src/exec/iceberg-delete-node.cc
M testdata/bin/create-load-data.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
5 files changed, 70 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/20295/4
--
To view, visit http://gerrit.cloudera.org:8080/20295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib89a53e812af8c3b8ec5bc27bca0a50dcac5d924
Gerrit-Change-Number: 20295
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>