Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14859 )

Change subject: IMPALA-4224: execute separate join builds fragments
......................................................................


Patch Set 35:

(3 comments)

Replying to a few comments, need to make time to go through and make the 
requested changes (they made sense to me).

http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@272
PS35, Line 272: maxRowBufferSize + bufferSize
> Don't we need to multiply by 2? Or is it because the probe will also use th
BufferedTupleStream was designed so that you can write to N streams with N -1 
regular sized buffers and 1 max sized buffer - the trick is that when you write 
a row to larger-than-regular-sized buffer, it immediately advances to the next 
(regular-sized) buffer so that the reservation is immediately available to 
write to a different stream.

This is kinda complicated, but we decided to do it this way to improve the 
tradeoff between the maximum row size that could be processed and the memory 
reservation required.


http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/Planner.java@115
PS35, Line 115: createDistributedPlan
> AFAICT it can still return single-node plans, so why the name change?
Yeah the naming is a bit confusing. I think there are two sets of concepts here:

* The end-result of the plans - single-node/distributed/parallel, which are 
distinguished by which features they use
* The representations - a tree of plan nodes with no fragments, a list of plan 
fragments, and a list of plans.

So maybe I should rename these functions after the representations:

createSingleNodePlan(), createPlanFragments(), createPlans()


http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/Planner.java@283
PS35, Line 283: !ctx_.isSingleNodeExec();
> Why can't we create a parallel plan on a single node?
The naming is a bit confusing - the single node plan here essentially means a 
plan with all the nodes in a single fragment (i.e. no exchanges). In most cases 
you can't run these plans in parallel cause there's no exchange to shuffle or 
aggregate results. (DMLs are an exception, I guess, since they all just write 
output separately, but I don't know if it's an important exception).



--
To view, visit http://gerrit.cloudera.org:8080/14859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Gerrit-Change-Number: 14859
Gerrit-PatchSet: 35
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Feb 2020 01:30:50 +0000
Gerrit-HasComments: Yes

Reply via email to