Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 )
Change subject: IMPALA-2990: timeout unresponsive queries in coordinator ...................................................................... Patch Set 1: (5 comments) Cool. http://gerrit.cloudera.org:8080/#/c/12299/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12299/1//COMMIT_MSG@29 PS1, Line 29: - Write functional tests once the appropriate mechanisms are in place I think you can "kill -STOP" an impalad running a query, and it will end up looking like this to the coordinator. Of course, maybe something else will timeout. http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc@213 PS1, Line 213: backend_states_init_ = true; This could probably just use backend_states_.empty() instead. I don't mind the extra variable, though. We have .empty() usages elsewhere. http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc@744 PS1, Line 744: int64_t current_time = MonotonicMillis(); It's confusing, as be/src/runtime/coordinator-backend-state.h has similarly named variables that are in UnixMillis() and MonotonicMillis(). I think this is correct at the moment, which is good. http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc@2212 PS1, Line 2212: true Do we have a style to write TerminatedByServer(...., true /* expire */) ? http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc@2209 PS1, Line 2209: int64_t lag_time_ms = coord->GetMaxBackendStateLagMs(); : if (lag_time_ms > FLAGS_max_report_lag_s * 1000) { : cancellation_thread_pool_->Offer(CancellationWork::TerminatedByServer( : request_state->query_id(), Status("hung query"), true)); If I were a user receiving this error, I'd immediately want to know which backend failed. Let's include more information here, including the values of the flags, the max lag, and where the max lag was observed. -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Comment-Date: Tue, 29 Jan 2019 23:31:01 +0000 Gerrit-HasComments: Yes