Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
......................................................................


Patch Set 5:

(26 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20760/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20760/4//COMMIT_MSG@9
PS4, Line 9: most
> nit: I would say "mainly" or "most importantly"
Done


http://gerrit.cloudera.org:8080/#/c/20760/4//COMMIT_MSG@23
PS4, Line 23: patch int
> nit: just patch
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@48
PS3, Line 48: to the temporary Hdfs files
> It can be misunderstood, one may thing that we write _into_ Hdfs files, lik
Sorry, this was copy-pasted, updated the comments.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@69
PS3, Line 69:   class FilePositionsIterator {
> Can this be put into the .cc file?
This way I can keep the typedefs private.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@74
PS3, Line 74:       file_level_end_ = file_pos.end();
> Shouldn't we check whether file_pos is empty?
Added DCHECK that file_pos is not empty. We should only invoke this if we have 
positions.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@78
PS3, Line 78:     bool HasNext() { return file_level_it_ != file_level_end_; }
> We should check for emptiness.
At this point 'file_level_it_' should always point to a valid entry. I'll merge 
Get() and Next(), so the semantics will be clearer.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@83
PS3, Line 83:       StringValue filepath = file_level_it_->first;
> Does HasNext() mean we can call Get() or that there is a valid item after t
I merged the Get() and Next(), so the semantics are clearer.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@96
PS3, Line 96:       }
> A comment on Next() could indicate that Next() shouldn't be called when ite
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@98
PS3, Line 98:
> After incrementing 'file_level_it_' on the previous line, it may now point
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@121
PS3, Line 121:   /// Sorts the buffered records.
> We could include in the comment that it should be called after SortBuffered
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@151
PS3, Line 151:
> Nit: serve?
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc
File be/src/exec/iceberg-buffered-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@124
PS3, Line 124: DCHECK_NOTN
> We could DCHECK that filepath_sv is not NULL.
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@149
PS3, Line 149: D_TI
> Optional: to ease understanding, we could use actual types instead of auto
In the for-loop we have std::pair<..., ...> entries, that might necessarily 
improve readability that much, I tried to use self-explanatory names.
In the bodies where it really matters I changed the types.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@161
PS3, Line 161: gstr
> See L149.
I kept entry as is, but introduced a new variable 'PartitionInfo& part_info' in 
the body.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@184
PS3, Line 184:
> See L149.
See above.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@196
PS3, Line 196: v_pos);
> We could use 'file' for 'file_and_pos.first' here and with the Len() too.
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@218
PS3, Line 218: i
> Isn't it only one partition? Maybe 'partition_encoded' would be more unders
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@273
PS3, Line 273:   // With partition evolution it's possible that we have the 
same partition names
> How is this comment connected to the next line? Could you explain?
Added additional sentence.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@288
PS3, Line 288:
> Shouldn't it be '*buffer'?
Good catch, thanks.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.cc@320
PS3, Line 320:
> Just curious, can filepath_ref->GetTupleIdx() be different from position_re
It can't, as it comes from the same SCAN node. Added DCHECK.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h
File be/src/exec/iceberg-delete-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@57
PS4, Line 57:   Status ConstructPartitionInfo(int32_t spec_id, const 
std::string& partitions,
> nit: empty line
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-delete-sink-base.cc@39
PS3, Line 39: Status IcebergDeleteSinkBase::Prepare(RuntimeState* state,
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
File fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java:

http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java@71
PS4, Line 71:   abstract public TSortingOrder getSortingOrder();
> nit: empty line.
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java:

http://gerrit.cloudera.org:8080/#/c/20760/3/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@43
PS3, Line 43:     this(targetTable, partitionKeyExprs, outputExprs, 0);
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20760/3/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@48
PS3, Line 48:       int deleteTableId) {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java:

http://gerrit.cloudera.org:8080/#/c/20760/4/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@67
PS4, Line 67:     // per partition. So assuming 0.5 GB
> We could use org.apache.impala.common.ByteUnits.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Dec 2023 15:41:05 +0000
Gerrit-HasComments: Yes

Reply via email to