Alex Behm has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
......................................................................


Patch Set 7:

(6 comments)

The new approach makes sense to me - I Like it.

Focusing on high-level comments first since this is a very meaty patch.

One thing to keep in mind is that union-node.cc somewhat violates your 
Open()/Close() assumptions, mostly to satisfy our "guarantee" that row s will 
be available to fetch immediately after Open() succeeds. Take a look at: 
test_rows_availability.py

I think we should get rid of that guarantee, though it needs some thought to 
decide whether it's too dangerous for now or not.

http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
File fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java:

Line 82:   private void recomputeResourceProfiles() {
This seems weird to me and I believe we can potentially get rid of this if we 
don't we move computeResourceProfile() out of PlanFragment.finalize(). We'd 
defer the core computations to Planner.computeResourceReqs() which gets called 
on the final plan.


http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java:

Line 44
Good riddance! Though I did like having the core logic in a single place.


http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/PlanVisitor.java
File fe/src/main/java/org/apache/impala/planner/PlanVisitor.java:

Line 6: public interface PlanVisitor {
We already have a Visitor and a corresponding accept() function. Can we 
consolidate them?


http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 352:   private static class ResourceAggregator implements PlanVisitor {
Do we really need this separate visitor that only sums over things we have 
computed in a previous pass over the plan? Why not have a Visitor that does a 
pass over the plan to compute the per-fragment peak resources and then 
simultaneously sum them up?


Line 392:     planRoots.get(0).walkPlan(aggregator);
The new approach makes sense to me - I like it.

The code/class structure could use some more thought because I found the code 
and flow very difficult to follow. In particular, there are now several classes 
and functions that deal with estimating resources, but it's not always clear 
what they abstract and how they are intended to work together. Here are a few 
examples that confused me, and I'll post some more throughout the code in other 
comments:

PlanTreeResources vs. ResourceProfile:
It's not clear to me what a ResourceProfile really is.
The class comment says "The resources that will be consumed by a set of plan 
nodes" which seems to me more fitting for PlanTreeResources (at least based on 
the name).

At a high-level it feels like your approach needs the following things:
1. Peak resources consumed by a single PlanNode
2. Peak resources consumed by PlanFragment
3. Then we sum the peak resources from all fragments

Visitor and recursion:
I'm a fan of visitors, but the mix of visitors and recursion confuses me. One 
benefit of the visitor pattern is that all the logic for a visitor can be 
encoded in the visitor itself, but it seems that the core logic is implemented 
in the various classes by overriding functions.

For example, what do you think of a PeakResourcesVisitor instead of the 
recursive PlanNode.computePeakResources()?

The intended logic might be captured in a more straightforward way with a 
FragmentVisitor and a PlanNodeVisitor. The first goes all over fragments and 
then uses a PlanNodeVisitor to compute the per-fragment peak. The 
FragmentVisitor keeps track of the various sums.

Not sure if you like the suggestions, my main point is the logic/flow is hard 
to follow today.

Entry point of logic:
Maybe I'm seeing it wrong, but to me it looks like the new code is invoked 
through PlanFragment.finalize() which is called way before 
Planner.computeResourceReqs(). The latter really only walks the fragments and 
plan nodes again to compute sums. It seems more intuitive to have 
Planner.computeResourceReqs() be the driver and not just an aggregator. The 
current code flow is not easy to follow, and it's not clear whether the 
division of labor between PlanFragment.finalize() and 
Planner.computeResourceReqs() even makes sense.


http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

Line 23:  * The resources that will be consumed by a set of plan nodes.
Does this really represent the resources of a single PlanNode instance or 
PlanFragment instance?

What are these "set of plan nodes"? Surely not an arbitrary collection of them.


-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to