Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20548 )

Change subject: IMPALA-12308: DIRECTED distribution mode for V2 Iceberg tables
......................................................................


Patch Set 4:

(22 comments)

I don't fully understand it, but here are some comments.

http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/coordinator-backend-state.cc@330
PS4, Line 330:     // Copy the "filename to hosts" mapping from 'exec_params' 
into 'request'.
Could extract this to a function.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/coordinator-backend-state.cc@331
PS4, Line 331: auto
Could we use the concrete types instead of auto here and on L337? It would be 
easier to understand.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/fragment-state.cc
File be/src/runtime/fragment-state.cc:

http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/fragment-state.cc@112
PS4, Line 112: auto
Could write out the type to make it easier to understand.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/fragment-state.cc@116
PS4, Line 116: file_to_hosts_it->second.second
Could extract this to a variable named "path_is_relative" or something similar 
- it would make it easier to understand the code.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.h@297
PS4, Line 297: *
Could it be a reference?


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1085
PS4, Line 1085: row
Shouldn't we call it 'tuple' instead of 'row'?


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1091
PS4, Line 1091: channel
'channels'? It seems there may be multiple channels we send to.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1097
PS4, Line 1097: DebugString
I'm a bit unsure if using DebugString() is good, the name suggests it should be 
used for debugging.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1142
PS4, Line 1142: content
Nit: contents.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1143
PS4, Line 1143: == nullptr
It should never be nullptr right? It could be a DCHECK.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/query-state.cc@271
PS4, Line 271:   // Making a copy of the "filename to fragment instance" 
mapping into std library types.
Could extract this into a function.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/scheduling/scheduler.h@86
PS4, Line 86: that
Nit: those.


http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/scheduling/scheduler.h@425
PS4, Line 425: duplicate_check
Mention in the comment what 'duplicate_check' is used for.


http://gerrit.cloudera.org:8080/#/c/20548/4/common/thrift/Partitions.thrift
File common/thrift/Partitions.thrift:

http://gerrit.cloudera.org:8080/#/c/20548/4/common/thrift/Partitions.thrift@43
PS4, Line 43: content
Nit: contents?


http://gerrit.cloudera.org:8080/#/c/20548/4/common/thrift/Partitions.thrift@44
PS4, Line 44: instance
Can it be multiple instances if the file is big? If so, we should mention it 
here.


http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@684
PS4, Line 684: in
Nit: on.


http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@24
PS4, Line 24: import org.apache.impala.catalog.IcebergPositionDeleteTable;
Do we use this import?


http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@93
PS4, Line 93: fragment_
Is there a reason why you use 'fragment_' here and 'getFragment()' on L87?


http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@68
PS4, Line 68: has
Nit: have.


http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@272
PS4, Line 272: tblRef_.getSampleParams() != null
A comment could be useful explaining why we have this condition here. I know 
it's explained in the commit message but when browsing the code the commit msg 
is not readily available.


http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
File testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test:

http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test@339
PS4, Line 339: # don't return rows that are deleted.
Optional: could mention that we can't use ICEBERG DELETE JOIN for sampling and 
have to use a hash left anti join.


http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test:

http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test@753
PS4, Line 753: # Same as above but here there is a separate fragment for the 
join build.
Can we assert that there is indeed a separate fragment here (and that there 
isn't for the previous query)? For example from the plan?



--
To view, visit http://gerrit.cloudera.org:8080/20548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I212afd7c9e94551a1c50a40ccb0e3c1f7ecdf3d2
Gerrit-Change-Number: 20548
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Oct 2023 14:57:14 +0000
Gerrit-HasComments: Yes

Reply via email to