Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
......................................................................


Patch Set 6:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/16412/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16412/5//COMMIT_MSG@12
PS5, Line 12: This patch adds some simple configuration flags that make it 
possible
> not sure I understand the plan here. this patch adds flags is --is_admissio
My intention with IMPALA-9975 is that there will be a whole new binary, called 
'admissiond', which will be what actually exposes the AdmissionControlService, 
rather than having an impalad do it.

At that point, we won't need the --is_admission_controller flag because there 
will be two possible setups: traditional distributed admission control mode 
(which impalads detect by checking if admission_control_service_addr is empty), 
and new admission control service mode (where the admissiond will know to 
always export the AdmissionControlService).

I think this will be cleaner than having the admission control service run as 
an impalad with --is_admission_controller=true because:
- There are a bunch of things that impalads need that the admissiond won't 
(DataStreamService, ControlService, the catalog client, the hs2 and beeswax 
interfaces, etc.) and explicitly separating out an admissiond makes it more 
natural to exclude these things from it
- I think it makes integration with external deployment systems more natural, 
eg. in a kubernetes setup you might have an istio policy for exposing various 
ports for impalads and you'll need a separate policy for the admissiond anyways 
since the ports that it needs exposed are different. It seems nicer to just 
have admissiond be its own separate thing, rather than always special-casing 
the handling of impalads depending on whether they expose 
AdmissionControlService.

I'll be utilizing the same functionality we already use for the statestored and 
catalogd where its really all the same binary behind the scenes to limit 
linking time (see daemon-main.cc), so in a way the admissiond will just be an 
impalad with --is_admission_controller=true always set.

I can certainly see some downsides to this approach (eg. changes to legacy 
deployment systems will be larger, the need for generating another docker 
container image, etc.), so I'm open to the possibility this is not the right 
approach, depending on what others think.

I already have the admissiond stuff basically written and tested if you want me 
to submit it as a WIP (though its rough at the moment)


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h
File be/src/scheduling/admission-control-service.h:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@76
PS5, Line 76: AdmitQueryRequestP
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@105
PS5, Line 105:     RuntimeProfile* summary_profile;
> might be worth mentioning that this is passed to AdmissionController::Submi
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@118
PS5, Line 118: to
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc
File be/src/scheduling/admission-control-service.cc:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@85
PS5, Line 85: Admission
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@142
PS5, Line 142:   shared_ptr<AdmissionState> admission_state;
> might be too verbose to log.
I moved it to a higher VLOG level.


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@155
PS5, Line 155:
> why wait at all? won't waiting tie up one of the RPC threads? I think the c
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@159
PS5, Line 159:         } else {
> is this expected? do want to DCHECK if the admit_status is not Status::OK()
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@197
PS5, Line 197:     const ReleaseQueryBackendsRequestPB* req, 
ReleaseQueryBackendsResponsePB* resp,
> why does the lock need to be acquired here?
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@209
PS5, Line 209:       req->query_id(), host_addrs);
> might be too verbose to log
Moved to a higher vlog level


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@219
PS5, Line 219:   
admission_state->admit_outcome.Set(AdmissionOutcome::CANCELLED);
> why does the lock need to be acquired here?
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@249
PS5, Line 249: }
             :
             : template <typename ResponsePBType>
> might make the code easier to read if this just created a AdmissionRequest
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-controller.h@316
PS5, Line 316: (usually they
             :   /// are owned by the ClientRequestState o
> might need to revise this now.
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-controller.h@345
PS5, Line 345:   /// will by OK if 'block' was false and no decision was made 
yet.
> not clear to me what the returned Status is suppose to indicate.
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.h
File be/src/scheduling/remote-admission-control-client.h:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.h@53
PS5, Line 53:   /// Protects 'pending_admit_' and 'cancelled_'.
> what is the lock used to protect?
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc@63
PS5, Line 63:   KrpcSerializer serializer;
            :   int sidecar_idx1;
            :   RETURN_IF_ERROR(
            :       serializer.SerializeToSidecar(&request.request, 
&rpc_controller, &sidecar_idx1));
            :   req.set_query_exec_request_sidecar_idx(sidecar_idx1);
            :
            :   for (const NetworkAddressPB& address : 
request.blacklisted_executor_addresses) {
            :     *req.add_blacklisted_executor_addresses() = address;
            :   }
            :
            :   query_events->MarkEvent(QUERY_EVENT_SUBMIT_FOR_A
> isn't TQueryOptions accessible from a TQueryExecRequest?
I think you're right that the two copies of TQueryOptions are equivalent, 
though its a bit of a mess:

The TQueryOptions in the TClientRequest is originally the values passed in by 
the client which are then given to the FE. The FE assigns the TExecRequest's 
TQueryOptions to the same object, then makes changes to some of the query 
option values, then when the TExecRequest is serialized to return to the BE we 
end up with two identical copies of the TQueryOptions.

More generally, its a planned follow up task to evaluate the size of the 
parameters being sent here and determine to what extent it would be valuable to 
separate out the things actually needed for scheduling from the 
TQueryExecRequest/TQueryOptions. (I just filed IMPALA-10204 as a reminder). 
Maybe when this work is done I'll be able to clean up some of the above.

Sending the whole thing as sidecars was an easy way to get stuff working, and 
I've done some preliminary perf testing and have found it to not be a big 
problem yet, but its definitely important to know for sure before we start 
recommending this setup to users.


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc@86
PS5, Line 86:         proxy->AdmitQuery(req, &resp, &rpc_controller), 
"AdmitQuery rpc failed");
            :     Status admit_status(resp.status());
            :     RETURN_IF_ERROR(admit_status);
            :
> nvm, seems there is a separate client per query.
Added a comment that should clarify the motivation here


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/service/impala-http-handler.cc@1057
PS5, Line 1057: void ImpalaHttpHandler::AdmissionStateHandler(
> perhaps this will done in another patch, but will the /admission endpoint o
Yeah, what I've done here in this patch is a bit hack-y to get 
TestAdmissionController to be happy. In the next set of patches where I add the 
admissiond (assuming that actually happens, see my very long comment responding 
to you in the commit message), these endpoints will be moved to only appear on 
the admissiond when the AdmissionControlService is in use.

Then, once everything is working e2e there will probably be another patch or 
two around observability that considers what the impalads can/should expose vs. 
what the admissiond can in a more holistic way.


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/util/sharded-query-map-util.cc
File be/src/util/sharded-query-map-util.cc:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/util/sharded-query-map-util.cc@21
PS5, Line 21: #include "scheduling/admission-control-service.h"
> seems odd that this needs to be imported here. can the templates be moved i
I'm a little fuzzy on how C++ templates really work, so I could be wrong, but 
it's my understanding that the only way to define template functions in .cc 
files and have it still work is to also enumerate the needed template 
specializations in the same .cc file (because the compiler needs to know what 
specializations to generate at the same time it processes the definitions, and 
.cc files are processed separately)

See for example this explanation: 
https://isocpp.org/wiki/faq/templates#templates-defn-vs-decl


http://gerrit.cloudera.org:8080/#/c/16412/5/common/protobuf/admission_control_service.proto
File common/protobuf/admission_control_service.proto:

http://gerrit.cloudera.org:8080/#/c/16412/5/common/protobuf/admission_control_service.proto@162
PS5, Line 162: TQueryExecRequest
> nit: typo
Done



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Oct 2020 22:55:28 +0000
Gerrit-HasComments: Yes

Reply via email to