Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20677 )

Change subject: IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables
......................................................................


Patch Set 9:

(15 comments)

Thanks for the comments.
I'll look at the profile counters. Maybe we can fix them in a separete patch.

http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG@49
PS7, Line 49: The tuples in the rowbatch of MultiDataSink contain slots for all 
the
> Idea for the future: in case of a data and a delete sink, couldn't we only
Inside fragments we are just passing pointers.
Across fragments we are serializing and shuffling the row batches. To send only 
the releveant fields, we would need a fork exchange operator.
In the design doc I had a note about this (see "Fork the transmission..."): 
https://docs.google.com/document/d/1GuRiJ3jjqkwINsSCKYaWwcfXHzbMrsd3WEMDOB11Xqw/edit#heading=h.5bmfhbmb4qdk


http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG@66
PS7, Line 66: hav
> Nit: have.
Done


http://gerrit.cloudera.org:8080/#/c/20677/7//COMMIT_MSG@75
PS7, Line 75: Part
> nit: Part
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc
File be/src/exec/iceberg-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@120
PS7, Line 120: Scal
> I think the concrete type (ScalarExpr*) would be more readable. Applies to
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@125
PS7, Line 125:   ScalarExprEvaluator* filepath_eval = output_expr_evals_[0];
> I think the concrete type (ScalarExprEvaluator*) would be more readable. Ap
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@131
PS7, Line 131:
> Why do we take the 'val' field? It is an int64_t which is then implicitly c
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@133
PS7, Line 133:     string filepath(reinterpret_cast<char*>(filepath_sv.ptr), 
filepath_sv.len);
> Couldn't we store the StringVal instance in 'prev_file_path_' instead of an
We need prev_file_path_ to be valid across invocations, e.g. when there is a 
duplicate between row batches.


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@137
PS7, Line 137: UPDATE statement
> We could write "UPDATE statement with a JOIN".
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/iceberg-delete-sink.cc@140
PS7, Line 140:     prev_file_path_ = filepath;
> 'prev_file_path_' and 'prev_position_' are only used in this function, they
We need them to keep their states across invocations, e.g. when the duplicates 
are at row batch boundaries.


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/table-sink-base.cc
File be/src/exec/table-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/exec/table-sink-base.cc@430
PS7, Line 430:       DCHECK(IsIceberg());
> Optional: we could still add DCHECK(IsIceberg()) here.
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/20677/7/be/src/runtime/dml-exec-state.h@82
PS7, Line 82:   /// can only be called for Iceberg tables.
> Could mention that it has to be Iceberg.
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java@59
PS7, Line 59: //
> Nit: unneeded spaces.
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java:

http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java@74
PS7, Line 74:   @Override
> Could add @Override.
Done


http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java
File fe/src/main/java/org/apache/impala/planner/MultiDataSink.java:

http://gerrit.cloudera.org:8080/#/c/20677/7/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java@132
PS7, Line 132: MAX_FS_WRITERS
> I think we should also mention that 'maxNumberOfMultiSinks_' is set to 'MAX
I added a comment to the variable.


http://gerrit.cloudera.org:8080/#/c/20677/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test:

http://gerrit.cloudera.org:8080/#/c/20677/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test@200
PS3, Line 200: insert into ref_table values (0, 111, 'IMPALA', '2023-11-07'), 
(3, 222, 'ICEBERG', '2023-11-08');
> Optional: we could insert reminder TODOs about that for part 3.
Added a new negative test with comments.



--
To view, visit http://gerrit.cloudera.org:8080/20677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08
Gerrit-Change-Number: 20677
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Dec 2023 14:21:48 +0000
Gerrit-HasComments: Yes

Reply via email to