Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10060 )

Change subject: [Preview]IMPALA-5216: Make admission control queuing async
......................................................................


Patch Set 2:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.h@329
PS2, Line 329:   bool execution_started_ = false;
leftover?


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@45
PS2, Line 45: AdmissionStatus
Maybe it should be AdmissionOutcome? This is really the final outcome of 
admission rather than the intermediate states, right?


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@199
PS2, Line 199:   /// returns an OK status, schedule->is_admitted() is true and 
admit_status is ADMITTED.
Is (admit_status.IsSet() && admit_status.Get() == ADMITTED) equivalent to 
schedule->is_admitted()? If so we should just remove the redundancy somehow.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@395
PS2, Line 395:     Promise<AdmissionStatus>* admit_status;
What owns the memory of admit_status?


http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/admission-controller.cc@518
PS1, Line 518:       
admit_status->TrySet(AdmissionStatus::REJECTED_OR_TIMED_OUT);
> This was my initial intuition too, but like you mentioned, I saw that sched
Yeah, I actually think it would be good to remove the is_admitted_ flag out of 
QuerySchedule to make it more consistent.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@505
PS2, Line 505:     lock_guard<mutex> lock(admission_ctrl_lock_);
Do we need to worry about races between admit_status being set to CANCELLED and 
it being set below? What happens if the TrySet(ADMITTED) call below fails 
because the query was cancelled? Are the admission results, etc correct?


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@576
PS2, Line 576:         
admit_status->TrySet(AdmissionStatus::REJECTED_OR_TIMED_OUT);
It would be nice to log the outcome of admission in all cases. Maybe we should 
have a class that is a wrapper around the promise and guarantees that the 
outcome of admission is logged?


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@595
PS2, Line 595:       if (queue->Remove(&queue_node)) {
One line?


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@885
PS2, Line 885:         // TODO: Maybe dont even check cancelled here, just try 
admitting it and let the
+1 to removing a code path if it isn't totally necessary. As long as we catch 
it before starting fragments it seems ok to detect the cancellation slightly 
later.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@901
PS2, Line 901: DCHECK
We should really log the actual value if the DCHECK fails. I wish we have a 
convenience macro for comparing two strongly-typed enums. Might be worth adding?


http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/query-schedule.h@274
PS1, Line 274:
> the only reason I kept this was because the coordinator has a const referen
Can the coordinator have a reference to Promise<AdmitStatus> then?  Having the 
redundant state and inconsistency between them seems confusing.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@279
PS2, Line 279:   /// admit or reject or mark timed out) or on cancellation of 
query (to mark cancelled).
Maybe slightly expand on how cancellation works. I think a few important things 
are:
* It can be called from client-RPC-handling threads.
* Cancellation is performed by AdmitQuery() or DequeueLoop() asynchronously.
* TrySet(CANCELLED) ensures async cancellation of the query is queued or about 
to be queued or is a no-op if the query has already been admitted.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@282
PS2, Line 282:   Promise<AdmissionStatus> admit_status_;
After reading through the rest of the patch I feel like it might be beneficial 
to have a lightweight abstraction around admit_status_ to force callers to use 
it in the right way.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@318
PS2, Line 318: Exec
Exec()


http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/client-request-state.h@316
PS1, Line 316:   /// after Exec() has been successfully called on it (that is, 
if 'coord_exec_started_'
> I see. would you suggest I create an AtomicBool class like what we have for
I think that would work. It would also be ok if it was a wrapper around 
std::atomic<bool> that only exposed a subset of methods.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@906
PS2, Line 906:    if(uses_admission_control()){
Nit: missing whitespace. Fits on one line?


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@909
PS2, Line 909:    coordinator = coord();
Does this line need to be in the critical section? It seems like we don't need 
the coordinator until l920 anyway so we could initialise the variable down 
there.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1121
PS2, Line 1121: nullptr
nullptr arg not needed


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1146
PS2, Line 1146:   if (exec_env_->admission_controller() != nullptr) {
Let's remove the exec_env_ member and use ExecEnv::GetInstance() while we're 
here.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1147
PS2, Line 1147: status
It's a bit hard to follow the control flow around 'status'. I wonder if it 
would be clearer to have separate variables for the outcome of each 
status-returning operation, since it seems like the result of AdmitQuery() is 
handled differently from the result of coord->Exec().


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1151
PS2, Line 1151:     if (int sleep_ms = GetSleepTimeFromDebugOptions(
We generally avoid assigning variables inside if statements. This isn't too bad 
but it seems better just to initialize the variable in a separate statement.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1159
PS2, Line 1159:       // TODO: this is a very small window in practice, not 
sure if its worth the
So it's very unlikely that this would save us work? It looks like it could save 
us a lot of work in coord->Exec() but if it's very unlikely to hit it then we 
can just remove it. I think it's more important to keep the code simple and 
correct than it is to minimise the latency of cancellation.

Maybe we can think about this in terms of latency of cancellation? Cancellation 
generally would be user or admin-initiated and I think having a 1s latency 
between cancellation being initiated and it taking effect would be acceptable. 
Lower is of course better.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1178
PS2, Line 1178:   if (int sleep_ms = GetSleepTimeFromDebugOptions(
Same comment about if statement.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1198
PS2, Line 1198:   if(cancelled) {
whitespace


http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/impala-hs2-server.cc@465
PS1, Line 465:   // TODO: Does it make sense to change "in_flight_queries" to 
"initialized_queries".
> Agreed. "submitted_queries" sounds perfect. However I will make changes in
I'm ok with deferring the work to a later cleanup patch too so long as we 
document inflight_queries_ with a comment that explains what it means :)



--
To view, visit http://gerrit.cloudera.org:8080/10060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Gerrit-Change-Number: 10060
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Apr 2018 23:09:59 +0000
Gerrit-HasComments: Yes

Reply via email to