[ https://issues.apache.org/jira/browse/IMPALA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Ho updated IMPALA-2990: ------------------------------- Summary: Coordinator should timeout and cancel queries with unresponsive / stuck executors (was: Coordinator should timeout a connection for an unresponsive backend) > Coordinator should timeout and cancel queries with unresponsive / stuck > executors > --------------------------------------------------------------------------------- > > Key: IMPALA-2990 > URL: https://issues.apache.org/jira/browse/IMPALA-2990 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec > Affects Versions: Impala 2.3.0 > Reporter: Sailesh Mukil > Assignee: Thomas Tauber-Marshall > Priority: Critical > Labels: hang, observability, supportability > > The coordinator currently waits indefinitely if it does not hear back from a > backend. This could cause a query to hang indefinitely in case of a network > error, etc. > We should add logic for determining when a backend is unresponsive and kill > the query. The logic should mostly revolve around Coordinator::Wait() and > Coordinator::UpdateFragmentExecStatus() based on whether it receives periodic > updates from a backed (via FragmentExecState::ReportStatusCb()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org