Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 )
Change subject: IMPALA-2990: timeout unresponsive queries in coordinator ...................................................................... Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@215 PS4, Line 215: DEFINE_int32(status_report_interval_ms, 5000, > I think you're saying to set this to 1.1 * GetMaxReportRetryMs()? right, I think I meant 0.1 * GetMaxReportRetryMs(). ie we need to check much more frequently than the timeout, or else we could overshoot it by a factor of 2 http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@398 PS4, Line 398: > Of course, even in this configuration, backends will report their final sta k. We could deprecate it in a separate patch I suppose. Fine with this as is. http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2288 PS4, Line 2288: return Status::OK(); : } : : void ImpalaServer::ExpireQuery(ClientRequestState* crs, const Status& status) { : DCHECK(!status.ok()); : cancellation_thread_pool_->Offer( > Yes, all of these flags are documented and settable. k, would be interested what Michael thinks about that idea http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc@218 PS7, Line 218: DEFINE_int32(status_report_max_retries, 3, Is our timeout too aggressive here? If one of the backends goes into some kind of 15s process-wide pause (eg as we've seen in the past with tcmalloc releasing memory in https://issues.apache.org/jira/browse/IMPALA-2800) we'll end up cancelling. Am afraid that being too aggressive will end up cancelling queries incorrectly. Curious for Michael's thoughts http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc@2258 PS7, Line 2258: max_lag_ms * 1.1 per comment elsewhere: don't we need to wake up more often than the timeout? otherwise we can overshoot by a factor of 2 here. -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 26 Mar 2019 23:16:03 +0000 Gerrit-HasComments: Yes