Alex Behm has posted comments on this change. Change subject: IMPALA-4862: make resource profile consistent with backend behaviour ......................................................................
Patch Set 7: (6 comments) The new approach makes sense to me - I Like it. Focusing on high-level comments first since this is a very meaty patch. One thing to keep in mind is that union-node.cc somewhat violates your Open()/Close() assumptions, mostly to satisfy our "guarantee" that row s will be available to fetch immediately after Open() succeeds. Take a look at: test_rows_availability.py I think we should get rid of that guarantee, though it needs some thought to decide whether it's too dangerous for now or not. http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java File fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java: Line 82: private void recomputeResourceProfiles() { This seems weird to me and I believe we can potentially get rid of this if we don't we move computeResourceProfile() out of PlanFragment.finalize(). We'd defer the core computations to Planner.computeResourceReqs() which gets called on the final plan. http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java: Line 44 Good riddance! Though I did like having the core logic in a single place. http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/PlanVisitor.java File fe/src/main/java/org/apache/impala/planner/PlanVisitor.java: Line 6: public interface PlanVisitor { We already have a Visitor and a corresponding accept() function. Can we consolidate them? http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 352: private static class ResourceAggregator implements PlanVisitor { Do we really need this separate visitor that only sums over things we have computed in a previous pass over the plan? Why not have a Visitor that does a pass over the plan to compute the per-fragment peak resources and then simultaneously sum them up? Line 392: planRoots.get(0).walkPlan(aggregator); The new approach makes sense to me - I like it. The code/class structure could use some more thought because I found the code and flow very difficult to follow. In particular, there are now several classes and functions that deal with estimating resources, but it's not always clear what they abstract and how they are intended to work together. Here are a few examples that confused me, and I'll post some more throughout the code in other comments: PlanTreeResources vs. ResourceProfile: It's not clear to me what a ResourceProfile really is. The class comment says "The resources that will be consumed by a set of plan nodes" which seems to me more fitting for PlanTreeResources (at least based on the name). At a high-level it feels like your approach needs the following things: 1. Peak resources consumed by a single PlanNode 2. Peak resources consumed by PlanFragment 3. Then we sum the peak resources from all fragments Visitor and recursion: I'm a fan of visitors, but the mix of visitors and recursion confuses me. One benefit of the visitor pattern is that all the logic for a visitor can be encoded in the visitor itself, but it seems that the core logic is implemented in the various classes by overriding functions. For example, what do you think of a PeakResourcesVisitor instead of the recursive PlanNode.computePeakResources()? The intended logic might be captured in a more straightforward way with a FragmentVisitor and a PlanNodeVisitor. The first goes all over fragments and then uses a PlanNodeVisitor to compute the per-fragment peak. The FragmentVisitor keeps track of the various sums. Not sure if you like the suggestions, my main point is the logic/flow is hard to follow today. Entry point of logic: Maybe I'm seeing it wrong, but to me it looks like the new code is invoked through PlanFragment.finalize() which is called way before Planner.computeResourceReqs(). The latter really only walks the fragments and plan nodes again to compute sums. It seems more intuitive to have Planner.computeResourceReqs() be the driver and not just an aggregator. The current code flow is not easy to follow, and it's not clear whether the division of labor between PlanFragment.finalize() and Planner.computeResourceReqs() even makes sense. http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java: Line 23: * The resources that will be consumed by a set of plan nodes. Does this really represent the resources of a single PlanNode instance or PlanFragment instance? What are these "set of plan nodes"? Surely not an arbitrary collection of them. -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes