[ 
https://issues.apache.org/jira/browse/MESOS-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-4658:
-----------------------------------
       Assignee: Benjamin Mahler  (was: Anand Mazumdar)
    Description: 
The {{Connection}} abstraction is prone to deadlocks arising from last 
reference of {{Connection}} getting destructed by the {{ConnectionProcess}} 
execution context, at which point {{ConnectionProcess}} waits on itself 
(deadlock).

Consider this example:

{code}
Option<Connection> connection = process::http::connect(...).get();

// When the ConnectionProcess completes the Future, if 'connection'
// is the last copy of the Connection it will wait on itself!
connection.disconnected()
  .onAny(defer(self(), &SomeFunc, connection));

connection.disconnect();
connection = None();
{code}

In the above snippet, deadlock can occur as follows:

1. {{Connection = None() executes}}, the last copy of the {{Connection}} 
remains within the disconnected Future.
2. {{ConnectionProcess::disconnect}} completes the disconnection Future and 
executes SomeFunc. The Future then clears the callbacks which destructs the 
last copy of the {{Connection}}.
3. {{Connection::~Data}} waits on the {{ConnectionProcess}} from within the 
{{ConnectionProcess}} execution context. Deadlock.

We do have a snippet in our existing code that alludes to such occurrences 
happening: 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325

{code}
  // This is a one time request which will close the connection when
  // the response is received. Since 'Connection' is reference-counted,
  // we must keep a copy around until the disconnection occurs. Note
  // that in order to avoid a deadlock (Connection destruction occurring
  // from the ConnectionProcess execution context), we use 'async'.
{code}

  was:
The {{Connection}} abstraction is prone to deadlocks arising from the object 
being destroyed inside the same execution context.

Consider this example:

{code}
Option<Connection> connection = process::http::connect(...).get();
connection.disconnected()
  .onAny(defer(self(), &SomeFunc, connection));

connection.disconnect();
connection = None();
{code}

In the above snippet, if the {{connection = None()}} gets executed first before 
the actual dispatch to {{ConnectionProcess}} happens. You might loose the only 
existing reference to {{Connection}} object inside 
{{ConnectionProcess::disconnect}}. This would lead to the destruction of the 
{{Connection}} object in the {{ConnectionProcess}} execution context.

We do have a snippet in our existing code that alludes to such occurrences 
happening: 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325

{code}
  // This is a one time request which will close the connection when
  // the response is received. Since 'Connection' is reference-counted,
  // we must keep a copy around until the disconnection occurs. Note
  // that in order to avoid a deadlock (Connection destruction occurring
  // from the ConnectionProcess execution context), we use 'async'.
{code}

AFAICT, for scenarios where we need to hold on to the {{Connection}} object for 
later, this approach does not suffice.


        Summary: process::Connection can lead to process::wait deadlock  (was: 
process::Connection can lead to deadlock around execution in the same context.)

> process::Connection can lead to process::wait deadlock
> ------------------------------------------------------
>
>                 Key: MESOS-4658
>                 URL: https://issues.apache.org/jira/browse/MESOS-4658
>             Project: Mesos
>          Issue Type: Bug
>          Components: HTTP API, libprocess
>            Reporter: Anand Mazumdar
>            Assignee: Benjamin Mahler
>              Labels: mesosphere
>
> The {{Connection}} abstraction is prone to deadlocks arising from last 
> reference of {{Connection}} getting destructed by the {{ConnectionProcess}} 
> execution context, at which point {{ConnectionProcess}} waits on itself 
> (deadlock).
> Consider this example:
> {code}
> Option<Connection> connection = process::http::connect(...).get();
> // When the ConnectionProcess completes the Future, if 'connection'
> // is the last copy of the Connection it will wait on itself!
> connection.disconnected()
>   .onAny(defer(self(), &SomeFunc, connection));
> connection.disconnect();
> connection = None();
> {code}
> In the above snippet, deadlock can occur as follows:
> 1. {{Connection = None() executes}}, the last copy of the {{Connection}} 
> remains within the disconnected Future.
> 2. {{ConnectionProcess::disconnect}} completes the disconnection Future and 
> executes SomeFunc. The Future then clears the callbacks which destructs the 
> last copy of the {{Connection}}.
> 3. {{Connection::~Data}} waits on the {{ConnectionProcess}} from within the 
> {{ConnectionProcess}} execution context. Deadlock.
> We do have a snippet in our existing code that alludes to such occurrences 
> happening: 
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325
> {code}
>   // This is a one time request which will close the connection when
>   // the response is received. Since 'Connection' is reference-counted,
>   // we must keep a copy around until the disconnection occurs. Note
>   // that in order to avoid a deadlock (Connection destruction occurring
>   // from the ConnectionProcess execution context), we use 'async'.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to