Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14859 )
Change subject: IMPALA-4224: execute separate join builds fragments ...................................................................... Patch Set 35: (3 comments) Replying to a few comments, need to make time to go through and make the requested changes (they made sense to me). http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@272 PS35, Line 272: maxRowBufferSize + bufferSize > Don't we need to multiply by 2? Or is it because the probe will also use th BufferedTupleStream was designed so that you can write to N streams with N -1 regular sized buffers and 1 max sized buffer - the trick is that when you write a row to larger-than-regular-sized buffer, it immediately advances to the next (regular-sized) buffer so that the reservation is immediately available to write to a different stream. This is kinda complicated, but we decided to do it this way to improve the tradeoff between the maximum row size that could be processed and the memory reservation required. http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/Planner.java@115 PS35, Line 115: createDistributedPlan > AFAICT it can still return single-node plans, so why the name change? Yeah the naming is a bit confusing. I think there are two sets of concepts here: * The end-result of the plans - single-node/distributed/parallel, which are distinguished by which features they use * The representations - a tree of plan nodes with no fragments, a list of plan fragments, and a list of plans. So maybe I should rename these functions after the representations: createSingleNodePlan(), createPlanFragments(), createPlans() http://gerrit.cloudera.org:8080/#/c/14859/35/fe/src/main/java/org/apache/impala/planner/Planner.java@283 PS35, Line 283: !ctx_.isSingleNodeExec(); > Why can't we create a parallel plan on a single node? The naming is a bit confusing - the single node plan here essentially means a plan with all the nodes in a single fragment (i.e. no exchanges). In most cases you can't run these plans in parallel cause there's no exchange to shuffle or aggregate results. (DMLs are an exception, I guess, since they all just write output separately, but I don't know if it's an important exception). -- To view, visit http://gerrit.cloudera.org:8080/14859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Gerrit-Change-Number: 14859 Gerrit-PatchSet: 35 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 06 Feb 2020 01:30:50 +0000 Gerrit-HasComments: Yes