Yingyi Bu has posted comments on this change. Change subject: ASTERIXDB-1407: let the build branch to broadcast for NestedLoopJoin. ......................................................................
Patch Set 4: (4 comments) >> For HashJoin, we are broadcasting the build part, right? HashJoin doesn't need to broadcast an input branch. But it has a similar user-level convention that the inner (build or right) input branch should be smaller than the outer (probe or left) input branch, because the smaller the inner is, the larger the chances that the join algorithm can stay as in-memory hash join instead of degrading into GRACE hash join are. https://asterix-gerrit.ics.uci.edu/#/c/828/4/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/NLJoinPOperator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/NLJoinPOperator.java: Line 61: * Right input can be partitioned in any way. > Update this comments? We broadcast the right side, right? Done Line 117: pv[0] = new StructuralPropertiesVector(null, null); > Maybe add a TODO here for the statistical patch to optimize which part to b Done https://asterix-gerrit.ics.uci.edu/#/c/828/4/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/jobgen/impl/JobBuilder.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/jobgen/impl/JobBuilder.java: Line 82: // might not be able to execute correctly, i.e., > would it be the reason for the *temporary file not found* problem? Probably, for some cases, but I don't think it is the root cause for ASTERIXDB-1336, because hash join doesn't use count constraint. Line 87: new String[] { clusterLocations.getLocations()[Math.abs(jobSpec.hashCode() % nPartitions)] }); > Just curious, why do we need this random number `Math.abs(jobSpec.hashCode( clusterLocations.getLocations()[Math.abs(jobSpec.hashCode() % nPartit ions)] makes sure Nested-Loop-Join and two Aggregates deterministically run on the same node/partition. For AlgebricksCountPartitionConstraint(1), it means the Nested-Loop-Join and two Aggregates run on 1 random node/partition respectively. Then, with the SuperActivity rewriting, we make the lower Aggregate (build branch) and the join-build activity runs on the same node/partition, but there is no guarantee that the upper Aggregate and join-probe activity will run on the same node/partition. -- To view, visit https://asterix-gerrit.ics.uci.edu/828 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0988624406d2f7460f0ee5ac7b4829d81e48c652 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Yingyi Bu <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
