Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 )
Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. ...................................................................... Patch Set 5: (26 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/20760/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20760/4//COMMIT_MSG@9 PS4, Line 9: most > nit: I would say "mainly" or "most importantly" Done http://gerrit.cloudera.org:8080/#/c/20760/4//COMMIT_MSG@23 PS4, Line 23: patch int > nit: just patch Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h File be/src/exec/iceberg-buffered-delete-sink.h: http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@48 PS3, Line 48: to the temporary Hdfs files > It can be misunderstood, one may thing that we write _into_ Hdfs files, lik Sorry, this was copy-pasted, updated the comments. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@69 PS3, Line 69: class FilePositionsIterator { > Can this be put into the .cc file? This way I can keep the typedefs private. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@74 PS3, Line 74: file_level_end_ = file_pos.end(); > Shouldn't we check whether file_pos is empty? Added DCHECK that file_pos is not empty. We should only invoke this if we have positions. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@78 PS3, Line 78: bool HasNext() { return file_level_it_ != file_level_end_; } > We should check for emptiness. At this point 'file_level_it_' should always point to a valid entry. I'll merge Get() and Next(), so the semantics will be clearer. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@83 PS3, Line 83: StringValue filepath = file_level_it_->first; > Does HasNext() mean we can call Get() or that there is a valid item after t I merged the Get() and Next(), so the semantics are clearer. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@96 PS3, Line 96: } > A comment on Next() could indicate that Next() shouldn't be called when ite Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@98 PS3, Line 98: > After incrementing 'file_level_it_' on the previous line, it may now point Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@121 PS3, Line 121: /// Sorts the buffered records. > We could include in the comment that it should be called after SortBuffered Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@151 PS3, Line 151: > Nit: serve? Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc File be/src/exec/iceberg-buffered-delete-sink.cc: http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@124 PS3, Line 124: DCHECK_NOTN > We could DCHECK that filepath_sv is not NULL. Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@149 PS3, Line 149: D_TI > Optional: to ease understanding, we could use actual types instead of auto In the for-loop we have std::pair<..., ...> entries, that might necessarily improve readability that much, I tried to use self-explanatory names. In the bodies where it really matters I changed the types. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@161 PS3, Line 161: gstr > See L149. I kept entry as is, but introduced a new variable 'PartitionInfo& part_info' in the body. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@184 PS3, Line 184: > See L149. See above. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@196 PS3, Line 196: v_pos); > We could use 'file' for 'file_and_pos.first' here and with the Len() too. Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@218 PS3, Line 218: i > Isn't it only one partition? Maybe 'partition_encoded' would be more unders Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@273 PS3, Line 273: // With partition evolution it's possible that we have the same partition names > How is this comment connected to the next line? Could you explain? Added additional sentence. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@288 PS3, Line 288: > Shouldn't it be '*buffer'? Good catch, thanks. http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@320 PS3, Line 320: > Just curious, can filepath_ref->GetTupleIdx() be different from position_re It can't, as it comes from the same SCAN node. Added DCHECK. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h File be/src/exec/iceberg-delete-sink-base.h: http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@57 PS4, Line 57: Status ConstructPartitionInfo(int32_t spec_id, const std::string& partitions, > nit: empty line Done http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-delete-sink-base.cc File be/src/exec/iceberg-delete-sink-base.cc: http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-delete-sink-base.cc@39 PS3, Line 39: Status IcebergDeleteSinkBase::Prepare(RuntimeState* state, > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java File fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java: http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java@71 PS4, Line 71: abstract public TSortingOrder getSortingOrder(); > nit: empty line. Done http://gerrit.cloudera.org:8080/#/c/20760/3/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java: http://gerrit.cloudera.org:8080/#/c/20760/3/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@43 PS3, Line 43: this(targetTable, partitionKeyExprs, outputExprs, 0); > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/20760/3/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@48 PS3, Line 48: int deleteTableId) { > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java: http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@67 PS4, Line 67: // per partition. So assuming 0.5 GB > We could use org.apache.impala.common.ByteUnits. Done -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 11 Dec 2023 15:41:05 +0000 Gerrit-HasComments: Yes