Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
On Wed, Jun 19, 2013 at 11:27:25AM +0200, Paolo Bonzini wrote: > Il 19/06/2013 00:26, mdroth ha scritto: > > On Tue, Jun 18, 2013 at 09:20:26PM +0200, Paolo Bonzini wrote: > >> Il 18/06/2013 17:14, mdroth ha scritto: > >>> Could we possibly simplify this by introducing a recursive mutex that we > >>> could use to protect the whole list loop and hold even during the cb? > >> > >> If it is possible, we should avoid recursive locks. It makes impossible > >> to establish a lock hierarchy. For example: > >> > >>> I assume we can't hold the lock during the cb currently since we might > >>> try to reschedule, but if it's a recursive mutex would that simplify > >>> things? > >> > >> If you have two callbacks in two different AioContexts, both of which do > >> bdrv_drain_all(), you get an AB-BA deadlock > > > > I think I see what you mean. That problem exists regardless of whether we > > introduce a recursive mutex though right? > > Without a recursive mutex, you only hold one lock at a time in each thread. > > > I guess the main issue is the > > fact that we'd be encouraging sloppy locking practices without > > addressing the root problem? > > Yeah. We're basically standing where the Linux kernel stood 10 years > ago (let's say 2.2 timeframe). If Linux got this far without recursive > mutexes, we can at least try. :) FWIW I was also looking into recursive mutexes for the block layer. What scared me a little is that they make it tempting to stop thinking about locks since you know you'll be able to reacquire locks you already hold. Especially when converting existing code, I think we need to be rigorous about exploring every function and thinking about the locks it needs and which child functions it calls. Otherwise we'll have code paths hidden away somewhere that were never truly thought through. Stefan
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
Il 19/06/2013 00:26, mdroth ha scritto: > On Tue, Jun 18, 2013 at 09:20:26PM +0200, Paolo Bonzini wrote: >> Il 18/06/2013 17:14, mdroth ha scritto: >>> Could we possibly simplify this by introducing a recursive mutex that we >>> could use to protect the whole list loop and hold even during the cb? >> >> If it is possible, we should avoid recursive locks. It makes impossible >> to establish a lock hierarchy. For example: >> >>> I assume we can't hold the lock during the cb currently since we might >>> try to reschedule, but if it's a recursive mutex would that simplify >>> things? >> >> If you have two callbacks in two different AioContexts, both of which do >> bdrv_drain_all(), you get an AB-BA deadlock > > I think I see what you mean. That problem exists regardless of whether we > introduce a recursive mutex though right? Without a recursive mutex, you only hold one lock at a time in each thread. > I guess the main issue is the > fact that we'd be encouraging sloppy locking practices without > addressing the root problem? Yeah. We're basically standing where the Linux kernel stood 10 years ago (let's say 2.2 timeframe). If Linux got this far without recursive mutexes, we can at least try. :) > I'm just worried what other subtle problems pop up if we instead rely > heavily on memory barriers and inevitably forget one here or there, but > maybe that's just me not having a good understanding of when to use them. That's true. I hope that the docs in patch 1 help, and (much) more thorough docs are available in the Linux kernel. > But doesn't rcu provide higher-level interfaces for these kinds of things? > Is it possible to hide any of this behind our list interfaces? My atomics header file provides higher-level interfaces, but in most cases barriers are not that harder to use and the docs help converting one style to the other. So far I've not used RCU for lists, only for entire data structures. Paolo
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
On Tue, Jun 18, 2013 at 09:20:26PM +0200, Paolo Bonzini wrote: > Il 18/06/2013 17:14, mdroth ha scritto: > > Could we possibly simplify this by introducing a recursive mutex that we > > could use to protect the whole list loop and hold even during the cb? > > If it is possible, we should avoid recursive locks. It makes impossible > to establish a lock hierarchy. For example: > > > I assume we can't hold the lock during the cb currently since we might > > try to reschedule, but if it's a recursive mutex would that simplify > > things? > > If you have two callbacks in two different AioContexts, both of which do > bdrv_drain_all(), you get an AB-BA deadlock I think I see what you mean. That problem exists regardless of whether we introduce a recursive mutex though right? I guess the main issue is the fact that we'd be encouraging sloppy locking practices without addressing the root problem? I'm just worried what other subtle problems pop up if we instead rely heavily on memory barriers and inevitably forget one here or there, but maybe that's just me not having a good understanding of when to use them. But doesn't rcu provide higher-level interfaces for these kinds of things? Is it possible to hide any of this behind our list interfaces? > > Also, the barriers for qemu_bh_schedule are needed anyway. It's only > those that order bh->next vs. ctx->first_bh that would go away. I see, I guess it's unavoidable for some cases. > > Paolo > > > I've been doing something similar with IOHandlers for the QContext > > stuff, and that's the approach I took. This patch introduces the > > recursive mutex: > > > > https://github.com/mdroth/qemu/commit/c7ee0844da62283c9466fcb10ddbfadd0b8bfc53 >
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
Il 18/06/2013 17:14, mdroth ha scritto: > Could we possibly simplify this by introducing a recursive mutex that we > could use to protect the whole list loop and hold even during the cb? If it is possible, we should avoid recursive locks. It makes impossible to establish a lock hierarchy. For example: > I assume we can't hold the lock during the cb currently since we might > try to reschedule, but if it's a recursive mutex would that simplify > things? If you have two callbacks in two different AioContexts, both of which do bdrv_drain_all(), you get an AB-BA deadlock Also, the barriers for qemu_bh_schedule are needed anyway. It's only those that order bh->next vs. ctx->first_bh that would go away. Paolo > I've been doing something similar with IOHandlers for the QContext > stuff, and that's the approach I took. This patch introduces the > recursive mutex: > > https://github.com/mdroth/qemu/commit/c7ee0844da62283c9466fcb10ddbfadd0b8bfc53
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
On Tue, Jun 18, 2013 at 10:14:38AM -0500, mdroth wrote: > On Sun, Jun 16, 2013 at 07:21:21PM +0800, Liu Ping Fan wrote: > > BH will be used outside big lock, so introduce lock to protect > > between the writers, ie, bh's adders and deleter. > > Note that the lock only affects the writers and bh's callback does > > not take this extra lock. > > > > Signed-off-by: Liu Ping Fan > > --- > > async.c | 21 + > > include/block/aio.h | 3 +++ > > 2 files changed, 24 insertions(+) > > > > diff --git a/async.c b/async.c > > index 90fe906..6a3269f 100644 > > --- a/async.c > > +++ b/async.c > > @@ -47,8 +47,12 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void > > *opaque) > > bh->ctx = ctx; > > bh->cb = cb; > > bh->opaque = opaque; > > +qemu_mutex_lock(&ctx->bh_lock); > > bh->next = ctx->first_bh; > > +/* Make sure the memebers ready before putting bh into list */ > > +smp_wmb(); > > ctx->first_bh = bh; > > +qemu_mutex_unlock(&ctx->bh_lock); > > return bh; > > } > > > > @@ -61,12 +65,18 @@ int aio_bh_poll(AioContext *ctx) > > > > ret = 0; > > for (bh = ctx->first_bh; bh; bh = next) { > > +/* Make sure fetching bh before accessing its members */ > > +smp_read_barrier_depends(); > > next = bh->next; > > if (!bh->deleted && bh->scheduled) { > > bh->scheduled = 0; > > if (!bh->idle) > > ret = 1; > > bh->idle = 0; > > +/* Paired with write barrier in bh schedule to ensure reading > > for > > + * callbacks coming after bh's scheduling. > > + */ > > +smp_rmb(); > > bh->cb(bh->opaque); > > Could we possibly simplify this by introducing a recursive mutex that we > could use to protect the whole list loop and hold even during the cb? > > I assume we can't hold the lock during the cb currently since we might > try to reschedule, but if it's a recursive mutex would that simplify > things? > > I've been doing something similar with IOHandlers for the QContext > stuff, and that's the approach I took. This patch introduces the > recursive mutex: > > https://github.com/mdroth/qemu/commit/c7ee0844da62283c9466fcb10ddbfadd0b8bfc53 Doh, missing this fix: diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c index 0623eca..a4d8170 100644 --- a/util/qemu-thread-win32.c +++ b/util/qemu-thread-win32.c @@ -81,7 +81,7 @@ int qemu_mutex_trylock(QemuMutex *mutex) void qemu_mutex_unlock(QemuMutex *mutex) { assert(mutex->owner == GetCurrentThreadId()); -if (!mutex->recursive || mutex->recursion_depth == 0) { +if (!mutex->recursive || --mutex->recursion_depth == 0) { mutex->owner = 0; } LeaveCriticalSection(&mutex->lock); > > > > } > > } > > @@ -75,6 +85,7 @@ int aio_bh_poll(AioContext *ctx) > > > > /* remove deleted bhs */ > > if (!ctx->walking_bh) { > > +qemu_mutex_lock(&ctx->bh_lock); > > bhp = &ctx->first_bh; > > while (*bhp) { > > bh = *bhp; > > @@ -85,6 +96,7 @@ int aio_bh_poll(AioContext *ctx) > > bhp = &bh->next; > > } > > } > > +qemu_mutex_unlock(&ctx->bh_lock); > > } > > > > return ret; > > @@ -94,6 +106,10 @@ void qemu_bh_schedule_idle(QEMUBH *bh) > > { > > if (bh->scheduled) > > return; > > +/* Make sure any writes that are needed by the callback are done > > + * before the locations are read in the aio_bh_poll. > > + */ > > +smp_wmb(); > > bh->scheduled = 1; > > bh->idle = 1; > > } > > @@ -102,6 +118,10 @@ void qemu_bh_schedule(QEMUBH *bh) > > { > > if (bh->scheduled) > > return; > > +/* Make sure any writes that are needed by the callback are done > > + * before the locations are read in the aio_bh_poll. > > + */ > > +smp_wmb(); > > bh->scheduled = 1; > > bh->idle = 0; > > aio_notify(bh->ctx); > > @@ -211,6 +231,7 @@ AioContext *aio_context_new(void) > > ctx = (AioContext *) g_source_new(&aio_source_funcs, > > sizeof(AioContext)); > > ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD)); > > ctx->thread_pool = NULL; > > +qemu_mutex_init(&ctx->bh_lock); > > event_notifier_init(&ctx->notifier, false); > > aio_set_event_notifier(ctx, &ctx->notifier, > > (EventNotifierHandler *) > > diff --git a/include/block/aio.h b/include/block/aio.h > > index 1836793..971fbef 100644 > > --- a/include/block/aio.h > > +++ b/include/block/aio.h > > @@ -17,6 +17,7 @@ > > #include "qemu-common.h" > > #include "qemu/queue.h" > > #include "qemu/event_notifier.h" > > +#include "qemu/thread.h" > > > > typedef struct BlockDriverAIOCB BlockDriverAIOCB; > > typedef void BlockDriverCompletionFunc(void *opaque, int ret); > > @@ -53,6 +54,8 @@ typedef struct AioContext { > > */
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
On Sun, Jun 16, 2013 at 07:21:21PM +0800, Liu Ping Fan wrote: > BH will be used outside big lock, so introduce lock to protect > between the writers, ie, bh's adders and deleter. > Note that the lock only affects the writers and bh's callback does > not take this extra lock. > > Signed-off-by: Liu Ping Fan > --- > async.c | 21 + > include/block/aio.h | 3 +++ > 2 files changed, 24 insertions(+) > > diff --git a/async.c b/async.c > index 90fe906..6a3269f 100644 > --- a/async.c > +++ b/async.c > @@ -47,8 +47,12 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void > *opaque) > bh->ctx = ctx; > bh->cb = cb; > bh->opaque = opaque; > +qemu_mutex_lock(&ctx->bh_lock); > bh->next = ctx->first_bh; > +/* Make sure the memebers ready before putting bh into list */ > +smp_wmb(); > ctx->first_bh = bh; > +qemu_mutex_unlock(&ctx->bh_lock); > return bh; > } > > @@ -61,12 +65,18 @@ int aio_bh_poll(AioContext *ctx) > > ret = 0; > for (bh = ctx->first_bh; bh; bh = next) { > +/* Make sure fetching bh before accessing its members */ > +smp_read_barrier_depends(); > next = bh->next; > if (!bh->deleted && bh->scheduled) { > bh->scheduled = 0; > if (!bh->idle) > ret = 1; > bh->idle = 0; > +/* Paired with write barrier in bh schedule to ensure reading for > + * callbacks coming after bh's scheduling. > + */ > +smp_rmb(); > bh->cb(bh->opaque); Could we possibly simplify this by introducing a recursive mutex that we could use to protect the whole list loop and hold even during the cb? I assume we can't hold the lock during the cb currently since we might try to reschedule, but if it's a recursive mutex would that simplify things? I've been doing something similar with IOHandlers for the QContext stuff, and that's the approach I took. This patch introduces the recursive mutex: https://github.com/mdroth/qemu/commit/c7ee0844da62283c9466fcb10ddbfadd0b8bfc53 > } > } > @@ -75,6 +85,7 @@ int aio_bh_poll(AioContext *ctx) > > /* remove deleted bhs */ > if (!ctx->walking_bh) { > +qemu_mutex_lock(&ctx->bh_lock); > bhp = &ctx->first_bh; > while (*bhp) { > bh = *bhp; > @@ -85,6 +96,7 @@ int aio_bh_poll(AioContext *ctx) > bhp = &bh->next; > } > } > +qemu_mutex_unlock(&ctx->bh_lock); > } > > return ret; > @@ -94,6 +106,10 @@ void qemu_bh_schedule_idle(QEMUBH *bh) > { > if (bh->scheduled) > return; > +/* Make sure any writes that are needed by the callback are done > + * before the locations are read in the aio_bh_poll. > + */ > +smp_wmb(); > bh->scheduled = 1; > bh->idle = 1; > } > @@ -102,6 +118,10 @@ void qemu_bh_schedule(QEMUBH *bh) > { > if (bh->scheduled) > return; > +/* Make sure any writes that are needed by the callback are done > + * before the locations are read in the aio_bh_poll. > + */ > +smp_wmb(); > bh->scheduled = 1; > bh->idle = 0; > aio_notify(bh->ctx); > @@ -211,6 +231,7 @@ AioContext *aio_context_new(void) > ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext)); > ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD)); > ctx->thread_pool = NULL; > +qemu_mutex_init(&ctx->bh_lock); > event_notifier_init(&ctx->notifier, false); > aio_set_event_notifier(ctx, &ctx->notifier, > (EventNotifierHandler *) > diff --git a/include/block/aio.h b/include/block/aio.h > index 1836793..971fbef 100644 > --- a/include/block/aio.h > +++ b/include/block/aio.h > @@ -17,6 +17,7 @@ > #include "qemu-common.h" > #include "qemu/queue.h" > #include "qemu/event_notifier.h" > +#include "qemu/thread.h" > > typedef struct BlockDriverAIOCB BlockDriverAIOCB; > typedef void BlockDriverCompletionFunc(void *opaque, int ret); > @@ -53,6 +54,8 @@ typedef struct AioContext { > */ > int walking_handlers; > > +/* lock to protect between bh's adders and deleter */ > +QemuMutex bh_lock; > /* Anchor of the list of Bottom Halves belonging to the context */ > struct QEMUBH *first_bh; > > -- > 1.8.1.4 > >
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
On Tue, Jun 18, 2013 at 10:19:40AM +0800, liu ping fan wrote: > On Mon, Jun 17, 2013 at 11:28 PM, Stefan Hajnoczi wrote: > > On Sun, Jun 16, 2013 at 07:21:21PM +0800, Liu Ping Fan wrote: > > Why lock bh_lock before assigning bh->next? Could you lock the mutex > > here and then drop the smp_wmb() since the pthread function already does > > that? > > > As Paolo's explain, it will open race gap for writers. Right, thanks!
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
On Mon, Jun 17, 2013 at 11:28 PM, Stefan Hajnoczi wrote: > On Sun, Jun 16, 2013 at 07:21:21PM +0800, Liu Ping Fan wrote: >> @@ -47,8 +47,12 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void >> *opaque) >> bh->ctx = ctx; >> bh->cb = cb; >> bh->opaque = opaque; >> +qemu_mutex_lock(&ctx->bh_lock); >> bh->next = ctx->first_bh; >> +/* Make sure the memebers ready before putting bh into list */ > > s/memebers/members/ > Will fix, thanks. >> +smp_wmb(); > > Why lock bh_lock before assigning bh->next? Could you lock the mutex > here and then drop the smp_wmb() since the pthread function already does > that? > As Paolo's explain, it will open race gap for writers. Thanks and regards, Pingfan > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
Il 17/06/2013 17:28, Stefan Hajnoczi ha scritto: >> > +qemu_mutex_lock(&ctx->bh_lock); >> > bh->next = ctx->first_bh; >> > +/* Make sure the memebers ready before putting bh into list */ > s/memebers/members/ > >> > +smp_wmb(); > Why lock bh_lock before assigning bh->next? Could you lock the mutex > here and then drop the smp_wmb() since the pthread function already does > that? > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11 Not sure I understand, ctx->first_bh is read here and that's what the lock protects. thread 1 thread 2 -- bh->next = ctx->first_bh; bh->next = ctx->first_bh; lock ctx->first_bh = bh; unlock lock ctx->first_bh = bh; unlock and thread 2's bottom half is gone. There is also a similar race that leaves a dangling pointer if aio_bh_new races against aio_bh_poll. Paolo
Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant
On Sun, Jun 16, 2013 at 07:21:21PM +0800, Liu Ping Fan wrote: > @@ -47,8 +47,12 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void > *opaque) > bh->ctx = ctx; > bh->cb = cb; > bh->opaque = opaque; > +qemu_mutex_lock(&ctx->bh_lock); > bh->next = ctx->first_bh; > +/* Make sure the memebers ready before putting bh into list */ s/memebers/members/ > +smp_wmb(); Why lock bh_lock before assigning bh->next? Could you lock the mutex here and then drop the smp_wmb() since the pthread function already does that? http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11