Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20295 )
Change subject: IMPALA-12327: Iceberg V2 operator wrong results in PARTITIONED mode ...................................................................... Patch Set 4: (7 comments) Thanks for the comments, Gergo! http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@10 PS2, Line 10: BROADCAST m > BROADCAST Done http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@13 PS2, Line 13: we rely on the previous delete : updating IcebergDeleteState to the next deleted row id and s > we rely on the previous delete updating IcebergDeleteState to the next dele Done http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@22 PS2, Line 22: LHS coa > being Done http://gerrit.cloudera.org:8080/#/c/20295/2//COMMIT_MSG@24 PS2, Line 24: > It's not really a reset, maybe: Done http://gerrit.cloudera.org:8080/#/c/20295/2/be/src/exec/iceberg-delete-node.cc File be/src/exec/iceberg-delete-node.cc: http://gerrit.cloudera.org:8080/#/c/20295/2/be/src/exec/iceberg-delete-node.cc@332 PS2, Line 332: const int64_t step = current_probe_pos_ - *probe_pos; > nit: could be const Done http://gerrit.cloudera.org:8080/#/c/20295/2/be/src/exec/iceberg-delete-node.cc@348 PS2, Line 348: So we are always doing a binary : // search here to find the proper delete row id. > We could improve this case if we store the deleted row ids in an std::unord Yeah I think we should focus on the DIRECTED distribution mode which won't have these problems. http://gerrit.cloudera.org:8080/#/c/20295/2/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/20295/2/testdata/datasets/functional/functional_schema_template.sql@3701 PS2, Line 3701: iceberg_lineitem_multiblock > I meant to test this case with iceberg_lineitem_multiblock, seems like I me The missing parts are in create-load-data.sh which sets the dfs.block.size for the file. Without that we only have lots of row groups in the Parquet file, but at HDFS level only one block. Anyway, I think it's OK to keep it, as it doesn't hurt, and might still add some extra coverage for the case when there are lots of row groups in the file. -- To view, visit http://gerrit.cloudera.org:8080/20295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib89a53e812af8c3b8ec5bc27bca0a50dcac5d924 Gerrit-Change-Number: 20295 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 01 Aug 2023 16:10:40 +0000 Gerrit-HasComments: Yes
