Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20548 )
Change subject: IMPALA-12308: DIRECTED distribution mode for V2 Iceberg tables ...................................................................... Patch Set 4: (22 comments) I don't fully understand it, but here are some comments. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/coordinator-backend-state.cc@330 PS4, Line 330: // Copy the "filename to hosts" mapping from 'exec_params' into 'request'. Could extract this to a function. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/coordinator-backend-state.cc@331 PS4, Line 331: auto Could we use the concrete types instead of auto here and on L337? It would be easier to understand. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/fragment-state.cc File be/src/runtime/fragment-state.cc: http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/fragment-state.cc@112 PS4, Line 112: auto Could write out the type to make it easier to understand. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/fragment-state.cc@116 PS4, Line 116: file_to_hosts_it->second.second Could extract this to a variable named "path_is_relative" or something similar - it would make it easier to understand the code. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.h File be/src/runtime/krpc-data-stream-sender.h: http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.h@297 PS4, Line 297: * Could it be a reference? http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1085 PS4, Line 1085: row Shouldn't we call it 'tuple' instead of 'row'? http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1091 PS4, Line 1091: channel 'channels'? It seems there may be multiple channels we send to. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1097 PS4, Line 1097: DebugString I'm a bit unsure if using DebugString() is good, the name suggests it should be used for debugging. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1142 PS4, Line 1142: content Nit: contents. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/krpc-data-stream-sender.cc@1143 PS4, Line 1143: == nullptr It should never be nullptr right? It could be a DCHECK. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/runtime/query-state.cc@271 PS4, Line 271: // Making a copy of the "filename to fragment instance" mapping into std library types. Could extract this into a function. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/scheduling/scheduler.h@86 PS4, Line 86: that Nit: those. http://gerrit.cloudera.org:8080/#/c/20548/4/be/src/scheduling/scheduler.h@425 PS4, Line 425: duplicate_check Mention in the comment what 'duplicate_check' is used for. http://gerrit.cloudera.org:8080/#/c/20548/4/common/thrift/Partitions.thrift File common/thrift/Partitions.thrift: http://gerrit.cloudera.org:8080/#/c/20548/4/common/thrift/Partitions.thrift@43 PS4, Line 43: content Nit: contents? http://gerrit.cloudera.org:8080/#/c/20548/4/common/thrift/Partitions.thrift@44 PS4, Line 44: instance Can it be multiple instances if the file is big? If so, we should mention it here. http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@684 PS4, Line 684: in Nit: on. http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@24 PS4, Line 24: import org.apache.impala.catalog.IcebergPositionDeleteTable; Do we use this import? http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@93 PS4, Line 93: fragment_ Is there a reason why you use 'fragment_' here and 'getFragment()' on L87? http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@68 PS4, Line 68: has Nit: have. http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20548/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@272 PS4, Line 272: tblRef_.getSampleParams() != null A comment could be useful explaining why we have this condition here. I know it's explained in the commit message but when browsing the code the commit msg is not readily available. http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test File testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test: http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test@339 PS4, Line 339: # don't return rows that are deleted. Optional: could mention that we can't use ICEBERG DELETE JOIN for sampling and have to use a hash left anti join. http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test: http://gerrit.cloudera.org:8080/#/c/20548/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test@753 PS4, Line 753: # Same as above but here there is a separate fragment for the join build. Can we assert that there is indeed a separate fragment here (and that there isn't for the previous query)? For example from the plan? -- To view, visit http://gerrit.cloudera.org:8080/20548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I212afd7c9e94551a1c50a40ccb0e3c1f7ecdf3d2 Gerrit-Change-Number: 20548 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 18 Oct 2023 14:57:14 +0000 Gerrit-HasComments: Yes