Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14103 )

Change subject: IMPALA-8858: Add observability of the query load on executor 
groups
......................................................................


Patch Set 7:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/admission-controller.h@943
PS6, Line 943:   /// Updates the list of executor groups for which we maintain 
the query load metrics.
> "groups for which we maintain..."
Done


http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/admission-controller.h@944
PS6, Line 944: from the metric gr
> Should we consider keeping groups with >0 executors around, even if they're
With how the current cancellation logic works, the query is still counted as 
running against a backend by the impala-server till it gets un-registered, 
which means that it would be cancelled regardless of whether it has completed 
its fragments on a particular backend.

However, I still agree with keeping non-zero groups around since they can still 
have an intermediate non zero value for query load which can prove useful.


http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/admission-controller.cc@320
PS6, Line 320:   dequeue_thread_->Join();
> Can the argument be a const&?
changed this mechanism, switched to sending a snapshotPtr instead.


http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/admission-controller.cc@1852
PS6, Line 1852:   for (const auto& group : snapshot->executor_groups) {
> I feel that this function's body could benefit from a comment explaining wh
Done


http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/scheduling/cluster-membership-mgr.h@135
PS6, Line 135:   /// events. They may be called at any time before or after 
calling Init() and are
> The comment should include what gets passed into the callback, or alternati
changed the update mechanism


http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@342
PS3, Line 342:     base_snapshot = current_membership_.get();
> I'm still in favor of this.
Done.
Note: this adds a little extra work to the critical path since now we cant 
optimize work for specific calls:
- NotifyLocalServerForDeletedBackend called every time and not just when 
update_local_server is true
- Admission controller needs to separately create a set a  groups instead of 
leveraging the set previously created during metric updates.

These are pretty minor + cluster changes would be way less frequent in practice 
so shouldn't matter much.


http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@346
PS3, Line 346:   // copying the Snapshot if it isn't.
> ?
Done


http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@551
PS3, Line 551:
             :
             :
> What was the reason for this change?
just seemed consistent with other update calls to send the whole snapshotPtr


http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/util/metrics.h
File be/src/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14103/6/be/src/util/metrics.h@455
PS6, Line 455:   typedef std::unordered_map<std::string, 
std::shared_ptr<Metric>> MetricMap;
> Is there a reason this map needs to be ordered? If so, can you add it to th
oops, didnt notice that a map was being used instead of an unordered_map. Not 
sure what the initial intention was but I dont see any reason why it would 
require an ordered map. changing it to unordered_map. Same goes for 
ChildGroupMap below.


http://gerrit.cloudera.org:8080/#/c/14103/6/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/14103/6/common/thrift/metrics.json@2495
PS6, Line 2495:   "description": "Total number of queries running on executor 
group: $0",
> I'd suggest moving the $0 to the back of the string to make it slightly eas
makes sense to me. Moving it to the end.


http://gerrit.cloudera.org:8080/#/c/14103/6/common/thrift/metrics.json@2502
PS6, Line 2502: ad.$0"
do you think num-running-queries is more appropriate here?



--
To view, visit http://gerrit.cloudera.org:8080/14103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539
Gerrit-Change-Number: 14103
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Sep 2019 21:48:22 +0000
Gerrit-HasComments: Yes

Reply via email to