Gergely Fürnstáhl has uploaded a new patch set (#27). (
http://gerrit.cloudera.org:8080/19850 )
Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg
Position Delete operator
......................................................................
IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete
operator
IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).
ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.
IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.
IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.
Still TODO:
- revisit resource profile/cost calculations
- multi split testing
Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
35 files changed, 3,138 insertions(+), 149 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/27
--
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 27
Gerrit-Owner: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>