Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14104 )

Change subject: IMPALA-8803: Coordinator should release admitted memory 
per-backend
......................................................................


Patch Set 2:

(14 comments)

Address Tim's and Michael's comments, still working through Bikram's.

Some other notable changes:

I originally thought all core tests passed, but turns out some are consistently 
failing. I fixed test_auto_scaling.py and test_executor_groups.py in this 
update, but am investigating a few more. The reason these tests started failing 
is that they assume a max number of queries can be run within an executor group 
and enforce this by setting -max_concurrent_queries. The issue is that they 
only set -max_concurrent_queries for executors, and set a much higher value for 
coordinators. That causes problems with this patch because backends might be 
released independently (e.g. executor backends might be released first, and 
then coordinator fragments). This breaks the assertions in 
https://github.com/apache/impala/blob/master/tests/custom_cluster/test_auto_scaling.py#L173).

I added some more state validation into the AdmissionController. Essentially, I 
wanted to add a DCHECK to ensure ReleaseQueryBackends is called for all 
Backends before ReleaseQuery is called. This requires some extra internal 
book-keeping.

http://gerrit.cloudera.org:8080/#/c/14104/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14104/1//COMMIT_MSG@24
PS1, Line 24: * Resources are released at most once every 1 second, this 
prevents
> at most once
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@481
PS1, Line 481: nother value, such as 10, wo
> Please add _MS suffix
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@505
PS1, Line 505: 
> nits: double-negative may not be very comprehensible.
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@527
PS1, Line 527:   /// QuerySchedule associated with the Coordinator that owns 
this BackendResourceState.
> What owns the object that is being referenced? The schedule?
It's owned by the Coordinator, updated the comments with the ownership info.


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@810
PS1, Line 810: Start the 'Timed Re
> I feel like the name is a little confusing in isolation, I'm not sure peopl
Changed it to NumCompletedBackends and moved it out of this class and into the 
Coordinator since it should really determine when a Backend is "Completed".


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@823
PS1, Line 823:
> Coding convention for Impala recommends pre-increment
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@865
PS1, Line 865: as
> nit: as
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.h@530
PS1, Line 530:   /// to satisfy the above preconditions. If the query has an 
expensive finalization
> nit: needs comment
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h@339
PS1, Line 339:   Status SubmitForAdmission(const AdmissionRequest& request,
> Needs a comment. We may also want to document the lifecycle of resource rel
Done

While writing up the class comments, it occurred to me that we want to ensure 
that ReleaseQueryBackends is called for all Backends in a query before 
ReleaseQuery is called. I added some book-keeping and DCHECKs to ensure this.


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h@489
PS1, Line 489:       : name_(name), parent_(parent), agg_num_running_(0), 
agg_num_queued_(0),
> Needs a comment
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h@816
PS1, Line 816:   static bool CanAccommodateMaxInitialReservation(const 
QuerySchedule& schedule,
> Needs a comment
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc@370
PS1, Line 370:  for (auto host_addr : host_addrs) {
             :     auto backend_exec_params = 
schedule.per_backend_exec_params().find(host_addr);
             :     if (backend_exec_params == schedule.
> What happens in non-debug builds ? Should we log here instead and add a con
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc@375
PS1, Line 375:               PrintThrift(host_addr), 
PrintId(schedule.query_id()));
> This pattern appears in a few places - might be worth making a helper funct
Done


http://gerrit.cloudera.org:8080/#/c/14104/1/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/14104/1/tests/query_test/test_result_spooling.py@82
PS1, Line 82:   def get_workload(cls):
> Let's move this into the custom_cluster directory. run-custom-cluster-tests
Done



--
To view, visit http://gerrit.cloudera.org:8080/14104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88bb11e0ede7574568020e0277dd8ac8d2586dc9
Gerrit-Change-Number: 14104
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 19:10:02 +0000
Gerrit-HasComments: Yes

Reply via email to