Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4896 to look at the new patch set (#4). Change subject: IMPALA-4409: respect lock order in QueryExecState::CancelInternal() ...................................................................... IMPALA-4409: respect lock order in QueryExecState::CancelInternal() The code previously violated the (partially documented) lock order in ImpalaServer. An example of a possible cycle in the dependency graph is: * SetQueryInFlight() holds SessionState::lock_ and waits for 'query_expiration_lock_' * ExpireQueries() holds 'query_expiration_lock_' and waits for 'query_exec_state_map_lock_' * GetQueryExecState() holds 'query_exec_state_map_lock_' and waits for QueryExecState::lock_ * QES::Cancel() holds QueryExecState::lock_ and waits for SessionState::lock It's not clear how likely the above scenario is, but it's hard to rule it out. We have not seen this hang in the wild but have seen similar ones. Testing: Ran local stress test on 3-node minicluster with TPC-H 20 and 50% of queries being cancelled. Change-Id: I785fea0163a90d0633fb6ed77ec7c6882ab5c110 --- M be/src/runtime/coordinator.h M be/src/service/impala-server.h M be/src/service/query-exec-state.cc M be/src/service/query-exec-state.h 4 files changed, 50 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/4896/4 -- To view, visit http://gerrit.cloudera.org:8080/4896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I785fea0163a90d0633fb6ed77ec7c6882ab5c110 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>