Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20677 )
Change subject: IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables ...................................................................... Patch Set 9: (15 comments) Thanks for the comments. I'll look at the profile counters. Maybe we can fix them in a separete patch. http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG@49 PS7, Line 49: The tuples in the rowbatch of MultiDataSink contain slots for all the > Idea for the future: in case of a data and a delete sink, couldn't we only Inside fragments we are just passing pointers. Across fragments we are serializing and shuffling the row batches. To send only the releveant fields, we would need a fork exchange operator. In the design doc I had a note about this (see "Fork the transmission..."): https://docs.google.com/document/d/1GuRiJ3jjqkwINsSCKYaWwcfXHzbMrsd3WEMDOB11Xqw/edit#heading=h.5bmfhbmb4qdk http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG@66 PS7, Line 66: hav > Nit: have. Done http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG@75 PS7, Line 75: Part > nit: Part Done http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc File be/src/exec/iceberg-delete-sink.cc: http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@120 PS7, Line 120: Scal > I think the concrete type (ScalarExpr*) would be more readable. Applies to Done http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@125 PS7, Line 125: ScalarExprEvaluator* filepath_eval = output_expr_evals_[0]; > I think the concrete type (ScalarExprEvaluator*) would be more readable. Ap Done http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@131 PS7, Line 131: > Why do we take the 'val' field? It is an int64_t which is then implicitly c Done http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@133 PS7, Line 133: string filepath(reinterpret_cast<char*>(filepath_sv.ptr), filepath_sv.len); > Couldn't we store the StringVal instance in 'prev_file_path_' instead of an We need prev_file_path_ to be valid across invocations, e.g. when there is a duplicate between row batches. http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@137 PS7, Line 137: UPDATE statement > We could write "UPDATE statement with a JOIN". Done http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@140 PS7, Line 140: prev_file_path_ = filepath; > 'prev_file_path_' and 'prev_position_' are only used in this function, they We need them to keep their states across invocations, e.g. when the duplicates are at row batch boundaries. http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/table-sink-base.cc File be/src/exec/table-sink-base.cc: http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/table-sink-base.cc@430 PS7, Line 430: DCHECK(IsIceberg()); > Optional: we could still add DCHECK(IsIceberg()) here. Done http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/runtime/dml-exec-state.h File be/src/runtime/dml-exec-state.h: http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/runtime/dml-exec-state.h@82 PS7, Line 82: /// can only be called for Iceberg tables. > Could mention that it has to be Iceberg. Done http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java@59 PS7, Line 59: // > Nit: unneeded spaces. Done http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java File fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java: http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java@74 PS7, Line 74: @Override > Could add @Override. Done http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java File fe/src/main/java/org/apache/impala/planner/MultiDataSink.java: http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java@132 PS7, Line 132: MAX_FS_WRITERS > I think we should also mention that 'maxNumberOfMultiSinks_' is set to 'MAX I added a comment to the variable. http://gerrit.cloudera.org:8080/#/c/20677/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test: http://gerrit.cloudera.org:8080/#/c/20677/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test@200 PS3, Line 200: insert into ref_table values (0, 111, 'IMPALA', '2023-11-07'), (3, 222, 'ICEBERG', '2023-11-08'); > Optional: we could insert reminder TODOs about that for part 3. Added a new negative test with comments. -- To view, visit http://gerrit.cloudera.org:8080/20677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08 Gerrit-Change-Number: 20677 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 06 Dec 2023 14:21:48 +0000 Gerrit-HasComments: Yes