Re: [Libevent-users] Deadlock when calling bufferevent_free from an other thread

Zack Weinberg Thu, 28 Feb 2013 17:30:01 -0800

On 2013-02-18 5:12 PM, Zack Weinberg wrote:

To fix things, we need to change abstract behavior in two places. First,
when you event_del() an event while the event loop to which it belongs
is running, it does not actually get disassociated from the event loop
until the next time the loop is about to either block or exit (this is
called the "stabilization point").  It is not safe to deallocate the
event object before then.  Second, we consider each event object to
_own_ its context data (if any), and therefore to be responsible for
deallocating it at the appropriate time, via an application-provided
callback (henceforth the "deallocation hook").


Concretely, when you call event_free, it _may_ synchronously deallocate
the event and call your deallocation hook to free the context data, but
if the event loop is running, it will just event_del the event and
return, and the actual deallocation will be queued to occur when the
event loop reaches the stabilization point.  Similarly for
bufferevent_free and all other libevent objects that are wrapped around
an event.

I have realized that this is only good enough for applications in whicheach thread has its own event_base. If more than one thread uses thesame event_base, then we can have this race:


thread 1        thread 2
select          select
wakeup          .
evmap: A        .
handler(A)      .
.               wakeup, sem++
.               evmap: A
.               handler(A)
event_free(A)   .
return          .
deallocate A    .
select          .
.               A->thing [CRASH]

I see two possible fixes for this race. The first is to use a countingsemaphore to figure out when a thread is the _last_ thread to be aboutto block. Continuing with the above example, that would look like this:


semaphore    thread 1        thread 2
0            select          select
1            wakeup, sem++   .
1            evmap: A        .
1            handler(A)      .
2            .               wakeup, sem++
2            .               evmap: A
2            .               handler(A)
2            event_free(A)   .
2            return          .
1            --sem > 0       .
1            select          .
1            .               A->thing
1            .               .
1            .               return
0            .               --sem = 0
0            .               deallocate A
0            .               select

I'm reasonably confident this is high-level sound, but there still mightbe raceable gaps between exiting select and bumping the semaphore, orbetween dropping the semaphore and entering select. Perhaps worse, itdepends on regular idle points at which all threads but one are blocked.Under load, it might well be that at least two threads are alwaysactive, and then deallocations will be delayed indefinitely.

The other possibility is to give every event object a reference count(currently, as far as I can tell, bufferevents and evbuffers arerefcounted but events themselves aren't). Having been event_add()edcounts as 1, and every thread which is running handlers for an eventalso counts as a reference. Then we have


A refcnt     thread 1        thread 2
1            select          select
1            wakeup          .
2            evmap: A        .
2            handler(A)      .
2            .               wakeup
3            .               evmap: A
3            .               handler(A)
2            event_free(A)   .
2            return          .
1            --ref(A) > 0    .
1            select          .
1            .               A->thing
1            .               .
1            .               return
0            .               --ref(A) == 0
0            .               deallocate A
0            .               select

This way, deallocations cannot get starved, and there's also no need fora pending-deallocations queue, *provided* that each thread is carefulnot to drop any reference count until it is completely done runninghandlers for that event. (The problem I alluded to in my originalmessage, involving bufferevent use-after-free in a *single*-threadedapplication, appeared to be a case where the read callback got invokedimmediately after the event callback had called bufferevent_free,probably because the kernel posted "data available" and "connectionclosed" simultaneously. I never did track that down with certainty,though.)

My worry with this one is that we might have unavoidable races insideevmap. I haven't really looked at that layer at all, but I'm imagininga situation where thread 2 gets a wakeup on a fd that is about to becomestale, looks up the event object, and is about to incref it when thread1 deallocates it.


zw
***********************************************************************
To unsubscribe, send an e-mail to majord...@freehaven.net with
unsubscribe libevent-users    in the body.

Re: [Libevent-users] Deadlock when calling bufferevent_free from an other thread

Reply via email to