[ https://issues.apache.org/jira/browse/MESOS-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler updated MESOS-4658: ----------------------------------- Assignee: Benjamin Mahler (was: Anand Mazumdar) Description: The {{Connection}} abstraction is prone to deadlocks arising from last reference of {{Connection}} getting destructed by the {{ConnectionProcess}} execution context, at which point {{ConnectionProcess}} waits on itself (deadlock). Consider this example: {code} Option<Connection> connection = process::http::connect(...).get(); // When the ConnectionProcess completes the Future, if 'connection' // is the last copy of the Connection it will wait on itself! connection.disconnected() .onAny(defer(self(), &SomeFunc, connection)); connection.disconnect(); connection = None(); {code} In the above snippet, deadlock can occur as follows: 1. {{Connection = None() executes}}, the last copy of the {{Connection}} remains within the disconnected Future. 2. {{ConnectionProcess::disconnect}} completes the disconnection Future and executes SomeFunc. The Future then clears the callbacks which destructs the last copy of the {{Connection}}. 3. {{Connection::~Data}} waits on the {{ConnectionProcess}} from within the {{ConnectionProcess}} execution context. Deadlock. We do have a snippet in our existing code that alludes to such occurrences happening: https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325 {code} // This is a one time request which will close the connection when // the response is received. Since 'Connection' is reference-counted, // we must keep a copy around until the disconnection occurs. Note // that in order to avoid a deadlock (Connection destruction occurring // from the ConnectionProcess execution context), we use 'async'. {code} was: The {{Connection}} abstraction is prone to deadlocks arising from the object being destroyed inside the same execution context. Consider this example: {code} Option<Connection> connection = process::http::connect(...).get(); connection.disconnected() .onAny(defer(self(), &SomeFunc, connection)); connection.disconnect(); connection = None(); {code} In the above snippet, if the {{connection = None()}} gets executed first before the actual dispatch to {{ConnectionProcess}} happens. You might loose the only existing reference to {{Connection}} object inside {{ConnectionProcess::disconnect}}. This would lead to the destruction of the {{Connection}} object in the {{ConnectionProcess}} execution context. We do have a snippet in our existing code that alludes to such occurrences happening: https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325 {code} // This is a one time request which will close the connection when // the response is received. Since 'Connection' is reference-counted, // we must keep a copy around until the disconnection occurs. Note // that in order to avoid a deadlock (Connection destruction occurring // from the ConnectionProcess execution context), we use 'async'. {code} AFAICT, for scenarios where we need to hold on to the {{Connection}} object for later, this approach does not suffice. Summary: process::Connection can lead to process::wait deadlock (was: process::Connection can lead to deadlock around execution in the same context.) > process::Connection can lead to process::wait deadlock > ------------------------------------------------------ > > Key: MESOS-4658 > URL: https://issues.apache.org/jira/browse/MESOS-4658 > Project: Mesos > Issue Type: Bug > Components: HTTP API, libprocess > Reporter: Anand Mazumdar > Assignee: Benjamin Mahler > Labels: mesosphere > > The {{Connection}} abstraction is prone to deadlocks arising from last > reference of {{Connection}} getting destructed by the {{ConnectionProcess}} > execution context, at which point {{ConnectionProcess}} waits on itself > (deadlock). > Consider this example: > {code} > Option<Connection> connection = process::http::connect(...).get(); > // When the ConnectionProcess completes the Future, if 'connection' > // is the last copy of the Connection it will wait on itself! > connection.disconnected() > .onAny(defer(self(), &SomeFunc, connection)); > connection.disconnect(); > connection = None(); > {code} > In the above snippet, deadlock can occur as follows: > 1. {{Connection = None() executes}}, the last copy of the {{Connection}} > remains within the disconnected Future. > 2. {{ConnectionProcess::disconnect}} completes the disconnection Future and > executes SomeFunc. The Future then clears the callbacks which destructs the > last copy of the {{Connection}}. > 3. {{Connection::~Data}} waits on the {{ConnectionProcess}} from within the > {{ConnectionProcess}} execution context. Deadlock. > We do have a snippet in our existing code that alludes to such occurrences > happening: > https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325 > {code} > // This is a one time request which will close the connection when > // the response is received. Since 'Connection' is reference-counted, > // we must keep a copy around until the disconnection occurs. Note > // that in order to avoid a deadlock (Connection destruction occurring > // from the ConnectionProcess execution context), we use 'async'. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)