Andrei Sekretenko created MESOS-9808:
----------------------------------------

             Summary: libprocess can deadlock on termination (cleanup() vs 
use() + terminate())
                 Key: MESOS-9808
                 URL: https://issues.apache.org/jira/browse/MESOS-9808
             Project: Mesos
          Issue Type: Bug
            Reporter: Andrei Sekretenko
         Attachments: deadlock_stacks.txt, deadlock_stacks_filtered.txt

Using the process::loop() together with the common pattern of using libprocess 
(Process wrapper + dispatching) is prone to causing a deadlock on libprocess 
termination if the code does not wait for the loop exit before termination.

*The deadlock itself is not directly caused by the process::loop(), though.*
It occurs in a following setup with two processes (let's name them A and B).

Thread 1 tries to cleanup process A. It locks processes_mutex and hangs here:
 
[https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3079]
 waiting for the process A to have no strong references.

Thread 2 begins with creating a ProcessReference in 
ProcessManager::deliver(UPID&) called for process: 
[https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L2799]

and ends up waiting for processes_mutex in ProcessManager::terminate() for 
process B:
 
[https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3155]

-----------------
 In the observed case, terminate() for process B was triggered by a destructor 
of a process-wrapping object owned by a libprocess loop executing on A.

I'm attaching the stacks captured at the deadlock. Stacks of the threads which 
lock one another are in deadlocks_stacks_filtered.txt. Note frame #1 in Thread 
5 (waiting for all references to expire) and frames #48 and #8 in Thread 19 
(creating a reference and waiting for a processes_mutex).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to