Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/14104 )
Change subject: IMPALA-8803: Coordinator should release admitted memory per-backend ...................................................................... Patch Set 1: (19 comments) Took a pass over the code, havent looked at the tests yet. Looks good, mostly nits http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@63 PS1, Line 63: const BackendExecParams& exec_params); nit: we can get the query_id and BackendExecParams from the QuerySchedule http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@403 PS1, Line 403: UNRELEASED > Will "IN_USE" be slightly clear ? +1 http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@413 PS1, Line 413: are considered to nit: transition to? http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@469 PS1, Line 469: BackendFinished nit: MarkBackendFinished http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@470 PS1, Line 470: std::vector<BackendState*>* releasable_backend_states nit: I usually try to avoid (input,output) type params if possible. Maybe consider adding a GetReleasableBackends() method. http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@474 PS1, Line 474: released_backend_states nit: how about backends_to_release? http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@474 PS1, Line 474: BackendsReleased nit: how about ReleaseBackends? http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@477 PS1, Line 477: Close nit: how about call this CloseAndGetUnreleasedBackends() then return the unreleased_backend_states instead of having an output param. http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@492 PS1, Line 492: enum ResourceState { UNRELEASED, PENDING, RELEASABLE, RELEASED}; nit: space http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@827 PS1, Line 827: backend_states_.size() nit: use num_backends_ here and below http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@827 PS1, Line 827: last_backend nit: how about is_last_unrelease_backend http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@831 PS1, Line 831: coordinator_last_backend nit: how about is_coord_the_last_unreleased_backend http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@836 PS1, Line 836: released_timer_.ElapsedTime() > RELEASE_BACKEND_STATES_DELAY / 1000000 should be released_timer_.ElapsedTime() > RELEASE_BACKEND_STATES_DELAY * 1000000 since released_timer_.ElapsedTime() is nano-sec and RELEASE_BACKEND_STATES_DELAY is milli-sec Alternatively we can store RELEASE_BACKEND_STATES_DELAY in nano-sec to avoid the multiplication everytime. http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@865 PS1, Line 865: are nit: as http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.h@530 PS1, Line 530: nit: needs comment http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.cc@764 PS1, Line 764: vector<BackendState*> releasable_backends; : backend_resource_state_->BackendFinished(backend_state, &releasable_backends); : if (!releasable_backends.empty()) { : ReleaseBackendAdmissionControlResources(releasable_backends); : backend_resource_state_->BackendsReleased(releasable_backends); : } is it possible that both this thread and a cancellation thread call ReleaseBackendAdmissionControlResources on the same backend? eg. BackendFinished() returns a set of releasable_backends here and the query got cancelled so that thread called Close() and got a set of all unrelease backends, then both called ReleaseBackendAdmissionControlResources() http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc@367 PS1, Line 367: for (auto host_addr : host_addrs) { : auto backend_exec_params = schedule.per_backend_exec_params().find(host_addr); : if (backend_exec_params == schedule.per_ nit: we can probably calculate this in UpdateHostStatsForQueryBackends and directly pass the mem to release. http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc@410 PS1, Line 410: UpdateHostStatsForQueryBackends maybe worth mentioning in its comment that this only used to release the query and hence decrements the stats http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc@420 PS1, Line 420: int64_t mem_to_release = backend_exec_params->second.is_coord_backend ? : schedule.coord_backend_mem_to_admit() : : per_backend_mem_to_admit; : const string host = TNetworkAddressToString(host_addr); : VLOG_ROW << "Update admitted mem reserved for host=" << host : << " prev=" << PrintBytes(host_stats_[host].mem_admitted) : << " new=" << PrintBytes(host_stats_[host].mem_admitted - mem_to_release); : host_stats_[host].mem_admitted -= mem_to_release; : DCHECK_GE(host_stats_[host].mem_admitted, 0); : VLOG_ROW << "Update admitted queries for host=" << host : << " prev=" << host_stats_[host].num_admitted : << " new=" << host_stats_[host].num_admitted - 1; : host_stats_[host].num_admitted -= 1; : DCHECK_GE(host_stats_[host].num_admitted, 0); nit: this is the same case as in UpdateHostStats() maybe refactor it out and resuse -- To view, visit http://gerrit.cloudera.org:8080/14104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88bb11e0ede7574568020e0277dd8ac8d2586dc9 Gerrit-Change-Number: 14104 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 22 Aug 2019 01:43:01 +0000 Gerrit-HasComments: Yes