Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 )
Change subject: WIP: IMPALA-9156: share broadcast join builds ...................................................................... Patch Set 7: (6 comments) Went through the java side again, just some minor comments. http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@362 PS7, Line 362: // Assume that each executor in the cluster gets a scan range, unless there are fewer : // scan ranges than nodes. Could be updated to mention instances. http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@371 PS7, Line 371: LOG.trace("computeStats HbaseScan: #nodes=" + numNodes_); numInstances could be also added to the log http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1243 PS7, Line 1243: g nit: could reuse maxInstancesPerNode http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1269 PS7, Line 1269: dummyHostIndex.size() using totalNodes would work the same bit seems a bit clearer. http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1271 PS7, Line 1271: + "TotalNodes %d, Local Ranges %d", Maybe add totalInstances? http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1313 PS7, Line 1313: // Exit early if we have maxed out our estimate of hosts/instances, to avoid : // extraneous work in case the number of scan ranges dominates the number of : // nodes. This is out of scope, but this whole loop seems more complicated to me than it should be. A map like localRangeCounts (but not capped at maxInstancesPerNode) could be built when scanRangeSpecs_.concrete_ranges is collected, end here we could just loop through that, compute totalLocalParallelism / numLocalRanges / numRemoteRanges and compute totalNodes / totalInstances once. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 26 Feb 2020 19:18:17 +0000 Gerrit-HasComments: Yes
