Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15096 )

Change subject: WIP: IMPALA-9156: share broadcast join builds
......................................................................


Patch Set 7:

(6 comments)

Went through the java side again, just some minor comments.

http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@362
PS7, Line 362:     // Assume that each executor in the cluster gets a scan 
range, unless there are fewer
             :     // scan ranges than nodes.
Could  be updated to mention instances.


http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@371
PS7, Line 371:       LOG.trace("computeStats HbaseScan: #nodes=" + numNodes_);
numInstances could be also added to the log


http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1243
PS7, Line 1243: g
nit: could reuse maxInstancesPerNode


http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1269
PS7, Line 1269: dummyHostIndex.size()
using totalNodes would work the same bit seems a bit clearer.


http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1271
PS7, Line 1271:                 + "TotalNodes %d, Local Ranges %d",
Maybe add totalInstances?


http://gerrit.cloudera.org:8080/#/c/15096/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1313
PS7, Line 1313:           // Exit early if we have maxed out our estimate of 
hosts/instances, to avoid
              :           // extraneous work in case the number of scan ranges 
dominates the number of
              :           // nodes.
This is out of scope, but this whole loop seems more complicated to me than it 
should be. A map like localRangeCounts (but not capped at maxInstancesPerNode) 
could be built when scanRangeSpecs_.concrete_ranges is collected, end here we 
could just loop through that, compute totalLocalParallelism / numLocalRanges / 
numRemoteRanges and compute totalNodes / totalInstances once.



--
To view, visit http://gerrit.cloudera.org:8080/15096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7
Gerrit-Change-Number: 15096
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 26 Feb 2020 19:18:17 +0000
Gerrit-HasComments: Yes

Reply via email to