Yingyi Bu has posted comments on this change.

Change subject: ASTERIXDB-1407: let the build branch to broadcast for 
NestedLoopJoin.
......................................................................


Patch Set 4:

(4 comments)

>> For HashJoin, we are broadcasting the build part, right?

HashJoin doesn't need to broadcast an input branch.
But it has a similar user-level convention that the inner (build or right) 
input branch should be smaller than the outer (probe or left) input branch, 
because the smaller the inner is, the larger the chances that the join 
algorithm can stay as in-memory hash join instead of degrading into GRACE hash 
join are.

https://asterix-gerrit.ics.uci.edu/#/c/828/4/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/NLJoinPOperator.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/NLJoinPOperator.java:

Line 61:  * Right input can be partitioned in any way.
> Update this comments? We broadcast the right side, right?
Done


Line 117:         pv[0] = new StructuralPropertiesVector(null, null);
> Maybe add a TODO here for the statistical patch to optimize which part to b
Done


https://asterix-gerrit.ics.uci.edu/#/c/828/4/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/jobgen/impl/JobBuilder.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/jobgen/impl/JobBuilder.java:

Line 82:         // might not be able to execute correctly, i.e.,
> would it be the reason for the *temporary file not found* problem?
Probably, for some cases, but I don't think it is the root cause for 
ASTERIXDB-1336, because hash join doesn't use count constraint.


Line 87:                 new String[] { 
clusterLocations.getLocations()[Math.abs(jobSpec.hashCode() % nPartitions)] });
> Just curious, why do we need this random number `Math.abs(jobSpec.hashCode(
clusterLocations.getLocations()[Math.abs(jobSpec.hashCode() % nPartit
ions)] makes sure Nested-Loop-Join and two Aggregates deterministically run on 
the same node/partition.

For AlgebricksCountPartitionConstraint(1), it means the Nested-Loop-Join and 
two Aggregates run on 1 random node/partition respectively.  Then, with the 
SuperActivity rewriting, we make the lower Aggregate (build branch) and the 
join-build activity runs on the same node/partition,  but there is no guarantee 
that the upper Aggregate and join-probe activity will run on the same 
node/partition.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/828
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0988624406d2f7460f0ee5ac7b4829d81e48c652
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Yingyi Bu <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-HasComments: Yes

Reply via email to