Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18439


Change subject: IMPALA-11263: Coordinator hang when cancelling a query
......................................................................

IMPALA-11263: Coordinator hang when cancelling a query

In a rare case, callback Coordinator::BackendState::ExecCompleteCb()
was not called for the corresponding ExecQueryFInstances RPC. This
caused coordinator to wait indefinitely when calling
Coordinator::BackendState::Cancel() to cancel one fragment instance.

This patch added timeout for BackendState::WaitOnExecLocked()
so that coordinator will not be blocked indefinitely when cancelling
a query.

Testing:
 - Added a test case to simulate the callback missing when a query
   is failed. Verified that the coordinator would hang without
   the fixing.
 - Passed core tests.

Change-Id: I915511afe2df3017cbbf37f6aff3c5ff7f5473be
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M tests/custom_cluster/test_rpc_timeout.py
3 files changed, 154 insertions(+), 96 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18439/1
--
To view, visit http://gerrit.cloudera.org:8080/18439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I915511afe2df3017cbbf37f6aff3c5ff7f5473be
Gerrit-Change-Number: 18439
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>

Reply via email to