Thomas Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12299
Change subject: IMPALA-2990: timeout unresponsive queries in coordinator ...................................................................... IMPALA-2990: timeout unresponsive queries in coordinator The coordinator currently waits indefinitely if it does not receive a status report from a backend. This could cause a query to hang indefinitely in certain situations, for example if the backend decides to cancel itself as a result of failed status report rpcs. This patch adds a thread to ImpalaServer which periodically iterates over all queries for which that server is the coordinator and cancels any that haven't had a report from a backend in greater than a configurable time. It introduces two new flags: --hung_query_check_interval_s: the frequency that the thread will wake up to do the checking --max_report_lag_s: the amount of time to wait for a report from a backend before cancelling the query TODO: - Run real cluster tests to determine appropriate default values for the flags and how scalable this approach is (eg. should we use a thread pool instead of a single thread?) - Write functional tests once the appropriate mechanisms are in place to simulate errors (IMPALA-8138) Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 --- M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/impala-server.cc M be/src/service/impala-server.h 5 files changed, 64 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/12299/1 -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall <thomasmarsh...@cmu.edu>