Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
......................................................................


Patch Set 1:

(17 comments)

Left a few comments, but the change looks great!

http://gerrit.cloudera.org:8080/#/c/20753/1/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/20753/1/be/src/exec/partitioned-hash-join-builder.h@87
PS1, Line 87: treat_nulls_equal_
IS NOT DISTINCT FROM is a well-known SQL term, I think it would be better to 
keep that, but also add the additional comments about NULL-handling.


http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/CatalogObjects.thrift@625
PS1, Line 625: equality_ids
This might have a better place in TIcebergTable. Though I see this is probably 
temporary, and later we might have the equality_ids in THdfsFileDesc.


http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/PlanNodes.thrift@404
PS1, Line 404: treat_nulls_equal
Again, I think we should keep the SQL terminology here, but keep the additional 
comment about NULLs.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@58
PS1, Line 58: except it returns True if the rhs is NULL
Can we update this comment: except it returns True of both sides are NULLs.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@106
PS1, Line 106: TODO
Could you please add IMPALA-12598?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@108
PS1, Line 108: column
nit: columns


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@160
PS1, Line 160: || isDeleteRowsJoin_
This wouldn't be needed if we passed Operator.NOT_DISTINCT in the equality 
predicates.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@197
PS1, Line 197: positionDeleteFiles_.isEmpty() && equalityDeleteFiles_.isEmpty()
nit: for readability, it might be worth to extract this condition to a 
'noDeleteFiles()' method


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@258
PS1, Line 258: addVirtualDataSeqNumSlot(tblRef_);
nit: this is always needed for equality deletes, so this method call could be 
moved to addEqualityColumnSlots().


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@323
PS1, Line 323:     tblRef.getDesc().getSlots().stream()
             :         .filter(s -> s.getVirtualColumnType() ==
             :             TVirtualColumnType.ICEBERG_DATA_SEQUENCE_NUMBER)
             :         .findFirst()
Maybe SingleNodePlanner.addSlotRefToDesc() could return he slot desc.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@405
PS1, Line 405: dataSlotDesc.getColumn() instanceof IcebergColumn
This must be always true for non-virtual columns, right?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@424
PS1, Line 424: data file
table?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@429
PS1, Line 429: Operator.EQ
Could we have Operator.NOT_DISTINCT here?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@464
PS1, Line 464:     if (getIceTable().getIcebergApiTable().schemas().size() > 1) 
{
             :       throw new ImpalaRuntimeException("Equality delete files 
are not supported for " +
             :           "tables with schema evolution");
             :     }
Why do we have this restriction? We throw an error if there are files with 
different delete columns, or if a delete column is not present in the table. 
Other than these cases, what problems can happen with schema evolution?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@486
PS1, Line 486:     IcebergEqualityDeleteTable deleteTable =
             :         new IcebergEqualityDeleteTable(getIceTable(),
             :             getIceTable().getName() + "-EQUALITY-DELETE-" + 
deleteScanNodeId.toString(),
             :             equalityDeleteFiles_, equalityIds_, 
equalityDeletesRecordCount_);
             :     analyzer_.addVirtualTable(deleteTable);
             :
             :     TableRef deleteTblRef = TableRef.newTableRef(analyzer_,
             :         Arrays.asList(deleteTable.getDb().getName(), 
deleteTable.getName()),
             :         tblRef_.getUniqueAlias() + "-equality-delete-" + 
deleteScanNodeId.toString());
             :     addVirtualDataSeqNumSlot(deleteTblRef);
nit: This is similar to what we have for position delete tables at L249. Can we 
create a helper method for this?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@501
PS1, Line 501: Collections.emptyList(),
Maybe we could have a TODO+Jira about adding conjuncts that could be applied to 
the delete columns.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@517
PS1, Line 517: joinNode.setIsDeleteRowsJoin();
If we passed Operator.NOT_DISTINCT, we wouldn't need to set this, so the hash 
eq conjuncts woul appear in the plans. I.e., it would be easier to verify the 
correctness of the plans.



--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Dec 2023 11:12:00 +0000
Gerrit-HasComments: Yes

Reply via email to