Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10060 )
Change subject: IMPALA-5216: Make admission control queuing async ...................................................................... Patch Set 11: (29 comments) That looks good. My review comments are mostly to clarify some of the code comments. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@44 PS9, Line 44: if add "... or the caller wants to cancel" or something like that about cancellation. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@204 PS9, Line 204: Returns immediately if rejected let's say what the return value and promise value are for each case, i.e. something like: If rejected, returns immediately with an error status indicating the reason and sets 'admit_outcome' to REJECTED_OR_TIMED_OUT. ... and same type of explanation for timed out, cancel and admitted case. That will also make it explicit that by the time this returns, the promise is always set. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@206 PS9, Line 206: ANCELLED i.e. explain status in this case. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@212 PS9, Line 212: Promise<AdmissionOutcome, PromiseMode::MULTIPLE_PRODUCER> could create a typedef for that since it appears in several places and is kinda long, but okay to ignore. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc@581 PS11, Line 581: result is set or the query is cancelled or it : // times out. the "result is set" also happens in the cancel case. To clarify, how about something like: Block in Get() up to the time out waiting for the promise to be set when the query is admitted or cancelled. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc@590 PS11, Line 590: SleepIfSetInDebugOptions(schedule->query_options(), SLEEP_AFTER_ADMISSION_OUTCOME_MS); what's interesting about this time interval? Is it to give the dequeue loop time to notice the cancellation? http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc@607 PS11, Line 607: stats->Dequeue(*schedule, true); nit: it was a bit hard to see that this block is really the same as the else case (which just different arguments) because things are done in a slightly different order. for consistency, how about moving this to after line 604? http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@70 PS11, Line 70: /// the query to the Admission controller for asynchronous admission control. is there some post condition around the operation state we can document? like that on success, this sets operation state to RUNNING_STATE or PENDING_STATE. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@131 PS11, Line 131: and sets the : /// query status to CANCELLE I thought you decided to hold off on that change. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@153 PS11, Line 153: T missing space before http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@286 PS11, Line 286: dequeuing thread from this context, not clear what thread this is talking about. How about: ... set asynchronously by the admission controller (to admit). Also the "admission control thread" is probably referring to what's now called the async exec thread? This comment has a lot of admission controller details, rather than the high level overview of what this promise is. How about something like: Promise used by the admission controller. AdmissionController:AdmitQuery() will block on this promise until the query is either rejected, admitted, times out, or is cancelled. Can be set to CANCELLED by the ClientRequestState in order to cancel, but otherwise is set by AdmissionController with the admission decision. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@442 PS11, Line 442: rocessing a query or dml execution request and submitting it to the : /// admission controller this is confusing since it doesn't actually do that work (it happens asynchronously.). I think this comment could use more of a rewrite given the new code structure. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@508 PS11, Line 508: : /// Submits the QuerySchedule to the admission controller and on successful admission, : /// starts up the coordinator execution, makes it accessible by setting : /// 'coord_exec_started_' to true and advances operation_state_ to RUNNING. Handles : /// async cancellation of queries and cleans up state if needed. : void FinishExecQueryOrDmlRequest(); let's move that to be right after ExecAsyncQueryOrDmlRequest() since it's so closely related. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@153 PS9, Line 153: GetCoord I think you ended up calling it GetCoordinator(), either update the comment or the method name. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@173 PS9, Line 173: /// Only set for 'QUERY' and 'DML' statement types and is available only after Exec() little hard to understand what "set" and "available" mean from just the interface. how about: Returns the Coordinator for 'QUERY' and 'DML' requests once Coordinator::Exec() completes successfully. Otherwise returns null. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@176 PS9, Line 176: coord_exec_started_ I think I was a bit confused about the "started" in the name until I read the code since it kinda sounds like this is true once Exec() starts, but i think the name really means once async execution has been started. From the perspective of Exec(), this should be coord_exec_done_, right? but I think that's also confusing since execution is potentially still in progress. So I guess I don't have a great suggestion. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@182 PS9, Line 182: uses_admission_control how about: has_async_exec_thread()? does it need to be public? http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@325 PS9, Line 325: : /// Not set for ddl queries. It should only be acce I think phrasing this in terms of why may help clarify this: ... accessed through GetCoordinator() since most Coordinator methods can be called only after Coordinator::Exec() returns success. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@327 PS9, Line 327: makes sure that it can onl FinishExecQueryOrDmlRequest() http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@330 PS9, Line 330: rted_' to true, and : /// UpdateBacken I think clearer to say: ... can be called before Coordinator::Exec() returns. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@233 PS11, Line 233: UpdateNonErrorOperationState(TOperationState::PENDING_STATE); The async-exec thread could race ahead and set this to RUNNING before we get here. But I guess UpdateNonErrorOperationState() ensures we don't go "backwards" in states? We could consider doing this as the first thing in the async-exec thread, though I suppose having it here is also nice because it means that we're guaranteeing that ExecStmt() RPC will always have transitioned away from INITIALIZING before returning. Just thinking aloud. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@751 PS11, Line 751: operation_state_ < new_state at this point, is it clearer to just list the possible states? it's just INITIALIZED, PENDING, RUNNING right? And you can dcheck that new_state != operation_state_ if you wanted to preserve that check. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@915 PS11, Line 915: uses_admission_control() once renamed, this won't make as much sense. I think we should just unconditionally do this Set(). That should be fine, right? http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@1151 PS11, Line 1151: void ClientRequestState::FinishExecQueryOrDmlRequest() { like the decl, would be nice to move this up to be adjacent to AsyncExecQueryOrDmlRequest() since it's closely related. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@1176 PS11, Line 1176: that coord_ is : // visible to the cancellation thread if the query is marked cancelled after exiting : // this critical section. How about being explicit about why that's important rather than stating just the fact that it must be visible: Once the lock is dropped, any future calls to Cancel() are responsible for calling Coordinator::Cancel(), so while holding the lock we need to both perform a check for cancellation and make the coord_ visible. http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@1199 PS11, Line 1199: updates are : // only received after Exec() has been called successfully on it that's not quite accurate. It's more that we these are safe to call concurrently with Coordinator::Exec(). http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/impala-server.h@429 PS11, Line 429: It represents the set of queries that : /// can be cancelled: either the query is queued or has started executing. That's kind of secondary. The primary purpose is to have a list of queries that needs to be closed if the session is closed or expires. maybe just revert that comment. http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/util/debug-util.h File be/src/util/debug-util.h: http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/util/debug-util.h@117 PS9, Line 117: sleep_option this is really the code location label or identifier, rather than the option for the sleep, right? maybe call it 'sleep_label' or 'sleep_location'? http://gerrit.cloudera.org:8080/#/c/10060/11/www/query_backends.tmpl File www/query_backends.tmpl: http://gerrit.cloudera.org:8080/#/c/10060/11/www/query_backends.tmpl@90 PS11, Line 90: has completed, or has no backends or has not started : any backends, yet. the comma locations are kinda weird in this user facing message. rather than tacking something to the end, let's word it in a way that makes sense. -- To view, visit http://gerrit.cloudera.org:8080/10060 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf Gerrit-Change-Number: 10060 Gerrit-PatchSet: 11 Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 21 May 2018 22:32:11 +0000 Gerrit-HasComments: Yes