Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20295 )

Change subject: IMPALA-12327: Iceberg V2 operator wrong results in PARTITIONED 
mode
......................................................................


Patch Set 4:

(7 comments)

Thanks for the comments, Gergo!

http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@10
PS2, Line 10: BROADCAST m
> BROADCAST
Done


http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@13
PS2, Line 13: we rely on the previous delete
            : updating IcebergDeleteState to the next deleted row id and s
> we rely on the previous delete updating IcebergDeleteState to the next dele
Done


http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@22
PS2, Line 22: LHS coa
> being
Done


http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@24
PS2, Line 24:
> It's not really a reset, maybe:
Done


http://gerrit.cloudera.org:8080/#/c/20295/2/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/20295/2/be/src/exec/iceberg-delete-node.cc@332
PS2, Line 332:   const int64_t step = current_probe_pos_ - *probe_pos;
> nit: could be const
Done


http://gerrit.cloudera.org:8080/#/c/20295/2/be/src/exec/iceberg-delete-node.cc@348
PS2, Line 348: So we are always doing a binary
             :       // search here to find the proper delete row id.
> We could improve this case if we store the deleted row ids in an std::unord
Yeah I think we should focus on the DIRECTED distribution mode which won't have 
these problems.


http://gerrit.cloudera.org:8080/#/c/20295/2/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/20295/2/testdata/datasets/functional/functional_schema_template.sql@3701
PS2, Line 3701: iceberg_lineitem_multiblock
> I meant to test this case with iceberg_lineitem_multiblock, seems like I me
The missing parts are in create-load-data.sh which sets the dfs.block.size for 
the file. Without that we only have lots of row groups in the Parquet file, but 
at HDFS level only one block.

Anyway, I think it's OK to keep it, as it doesn't hurt, and might still add 
some extra coverage for the case when there are lots of row groups in the file.



--
To view, visit http://gerrit.cloudera.org:8080/20295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib89a53e812af8c3b8ec5bc27bca0a50dcac5d924
Gerrit-Change-Number: 20295
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 01 Aug 2023 16:10:40 +0000
Gerrit-HasComments: Yes

Reply via email to