Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15154 )

Change subject: IMPALA-8712: Make ExecQueryFInstances async
......................................................................


Patch Set 3:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/15154/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15154/2//COMMIT_MSG@10
PS2, Line 10: Previously, Impala would issue all the Exec()s, wait for all of 
them
            : to complete, and then check if any of them resulted in an error. 
We
            : now stop issuing Exec()s and cancel any that are still in flight 
as
            : soon as an error occurs.
> so this covers IMPALA-6788 as well, right?
Not completely - there's still the case where we get an error report from a 
running fragment while still doing startup. With this patch, we'll still wait 
for all Exec()s to complete before processing the error.

I added a TODO with this JIRA to call this out.


http://gerrit.cloudera.org:8080/#/c/15154/2//COMMIT_MSG@23
PS2, Line 23: Removing this thread pool has potential performance implications, 
as
            : it means that the Exec() parameters are serialized in serialize 
rather
            : than in parallel (with the level of parallelism determined by the 
size
            : of the thread pool, which was configurable by an Advanced flag and
            : defaulted to 12).
> do we have a runtime counter to measure this overhead?
Yes, we record the amount of time query startup takes, the time between when we 
mark the event "Ready to start on X backends" to when we mark "All X execution 
backends started"


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.h@82
PS2, Line 82: Return true
> nit: "Returns true"
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.h@96
PS2, Line 96: after WaitOnExecRpc(
> nit: needs to be updated
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.h@102
PS2, Line 102: Exec
> nit: should the docs of this method be updated to indicate that this method
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.h@108
PS2, Line 108: imes and f
> might want to mention some more docs about thread safety - e.g. is a client
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.h@369
PS2, Line 369:
             :
> same as below
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.h@375
PS2, Line 375:
             :   int64_t last_report_time_ms_ = 0;
> i believe this isn't thread safe. even if it is no longer set after Exec()
Went ahead and protected this with 'lock_' always.

It does mean I removed some DCHECKs on it in places where we're not taking 
'lock_' otherwise, but I don't think that's a big deal.


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.cc@a452
PS2, Line 452:
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
> looks like dead code, but are any of the TODOs still relevant?
Sort of. I basically replaced this with the large comment I added earlier in 
this function, which addresses the same basic issue but better reflects the 
actual current state of things.


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.cc@173
PS2, Line 173:
> i believe we generally try to notify any waiting threads after the lock has
Yes, there can potentially be multiple calls to WaitOnExec(), eg. if there are 
multiple calls to Coordinator::WaitOnExecRpcs(), as discussed in another 
comment.


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.cc@193
PS2, Line 193:
> nit: won't this be called twice, once in SetExecError and again when method
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.cc@228
PS2, Line 228: if (IsEmptyBackend
> is this thread safe? this gets set without holding onto the lock, and its p
So I went ahead and fixed this by holding 'lock_' for all of the Exec() 
function, rather than just taking it right before sending the rpc.

Its nice, because since we also hold lock_ for all of Cancel(), it 
significantly simplifies reasoning about possible interleavings between those 
functions.

The downside is that we can't stop ourselves from sending an exec rpc on to 
backend if we've already started executing BackendState::Exec(), eg. if we 
realize we hit an error from a different backend while we're serializing the 
exec rpc params (which was possible in my original patch). That's not something 
we ever do now anyways. You could imagine doing it when we implement 
IMPALA-6788, but I think the performance benefits would be minimal vs. the 
additional code complexity.


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.cc@236
PS2, Line 236: Status get_p
> same issue as above, this sets exec_done_ without holding the lock
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator-backend-state.cc@853
PS2, Line 853: value->AddMember("
> should be run after acquiring the lock, or made atomic
Done


http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/15154/2/be/src/runtime/coordinator.cc@427
PS2, Line 427:   if (exec_rpcs_complete_.Load()) return;
> is the idea here that this method should only ever be executed once? whats
No, this is potentially called many times, eg. its called in UpdateFilter, 
which is called once per runtime filter per backend producing the filter. So, 
the point of this branch is to provide a fast path so that we don't have to 
acquire BackendState::lock_ and check BackendState::exec_done_ for every 
backend every time its called.

This function is also thread safe, whether or not exec_rpcs_complete_ has been 
set to true already, since exec_rpcs_complete_ is atomic and the function is 
idempotent. I added a mention of this in the class comment.



--
To view, visit http://gerrit.cloudera.org:8080/15154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32
Gerrit-Change-Number: 15154
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 23:02:36 +0000
Gerrit-HasComments: Yes

Reply via email to