Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18439
Change subject: IMPALA-11263: Coordinator hang when cancelling a query ...................................................................... IMPALA-11263: Coordinator hang when cancelling a query In a rare case, callback Coordinator::BackendState::ExecCompleteCb() was not called for the corresponding ExecQueryFInstances RPC. This caused coordinator to wait indefinitely when calling Coordinator::BackendState::Cancel() to cancel one fragment instance. This patch added timeout for BackendState::WaitOnExecLocked() so that coordinator will not be blocked indefinitely when cancelling a query. Testing: - Added a test case to simulate the callback missing when a query is failed. Verified that the coordinator would hang without the fixing. - Passed core tests. Change-Id: I915511afe2df3017cbbf37f6aff3c5ff7f5473be --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M tests/custom_cluster/test_rpc_timeout.py 3 files changed, 154 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18439/1 -- To view, visit http://gerrit.cloudera.org:8080/18439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I915511afe2df3017cbbf37f6aff3c5ff7f5473be Gerrit-Change-Number: 18439 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>