Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-02 Thread Marcin Gibuła

I'll test it tomorrow. I assume you want to avoid calling
event_notifier_set() until function is reentered via aio_pool?


Yes.  But actually, I need to check if it's possible to fix
bdrv_drain_all.  If you're in coroutine context, you can defer the
draining to a safe point using a bottom half.  If you're not in
coroutine context, perhaps bdrv_drain_all has to be made illegal.  Which
means a bunch of code auditing...


For what it's worth, your solution also works fine, I couldn't recreate 
hang with it. Updated patch proposal posted earlier today.


--
mg



Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-02 Thread Paolo Bonzini

Il 01/06/2014 21:02, Marcin Gibuła ha scritto:

Good catch!  The main problem with the patch is that you need to use
atomic_inc/atomic_dec to increment and decrement
pool->pending_completions.


Ok.


Secondarily, event_notifier_set is pretty heavy-weight, does it work if
you wrap the loop like this?

restart:
 QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
 ...
 }
 if (pool->pending_completions) {
 goto restart;
 }
 event_notifier_test_and_clear(notifier);
 if (pool->pending_completions) {
 event_notifier_set(notifier);
 goto restart;
 }


I'll test it tomorrow. I assume you want to avoid calling
event_notifier_set() until function is reentered via aio_pool?


Yes.  But actually, I need to check if it's possible to fix 
bdrv_drain_all.  If you're in coroutine context, you can defer the 
draining to a safe point using a bottom half.  If you're not in 
coroutine context, perhaps bdrv_drain_all has to be made illegal.  Which 
means a bunch of code auditing...


Paolo


Finally, the same bug is also in block/linux-aio.c and
block/win32-aio.c.


I can try with linux-aio, but my knowledge of windows api is zero...






Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-01 Thread Marcin Gibuła

Good catch!  The main problem with the patch is that you need to use
atomic_inc/atomic_dec to increment and decrement pool->pending_completions.


Ok.


Secondarily, event_notifier_set is pretty heavy-weight, does it work if
you wrap the loop like this?

restart:
 QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
 ...
 }
 if (pool->pending_completions) {
 goto restart;
 }
 event_notifier_test_and_clear(notifier);
 if (pool->pending_completions) {
 event_notifier_set(notifier);
 goto restart;
 }


I'll test it tomorrow. I assume you want to avoid calling 
event_notifier_set() until function is reentered via aio_pool?


> Finally, the same bug is also in block/linux-aio.c and
> block/win32-aio.c.

I can try with linux-aio, but my knowledge of windows api is zero...

--
mg



Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-01 Thread Paolo Bonzini

Il 31/05/2014 20:29, Marcin Gibuła ha scritto:


@@ -185,6 +191,14 @@ restart:
 }
 if (elem->state == THREAD_DONE && elem->common.cb) {
 QLIST_REMOVE(elem, all);
+/* If more completed requests are waiting, notifier needs
+ * to be rearmed so callback can progress with aio_pool().
+ */
+pool->pending_completions--;
+if (pool->pending_completions) {
+event_notifier_set(notifier);
+}
+
 /* Read state before ret.  */
 smp_rmb();
 elem->common.cb(elem->common.opaque, elem->ret);


Good catch!  The main problem with the patch is that you need to use 
atomic_inc/atomic_dec to increment and decrement pool->pending_completions.


Secondarily, event_notifier_set is pretty heavy-weight, does it work if 
you wrap the loop like this?


restart:
QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
...
}
if (pool->pending_completions) {
goto restart;
}
event_notifier_test_and_clear(notifier);
if (pool->pending_completions) {
event_notifier_set(notifier);
goto restart;
}

Finally, the same bug is also in block/linux-aio.c and block/win32-aio.c.

Paolo



[Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-05-31 Thread Marcin Gibuła
When two coroutines submit I/O and first coroutine depends on second to 
complete (by calling bdrv_drain_all), deadlock may occur.


This is because both requests may have completed before thread pool 
notifier got called. Then, when notifier gets executed and first 
coroutine calls aio_pool() to make progress, it will hang forever, as 
notifier's descriptor has been already marked clear.


This patch fixes this, by rearming thread pool notifier if there are 
more than one completed requests on list.


Without this patch, I could reproduce this bug with snapshot-commit with 
about 1 per 10 tries. With this patch, I couldn't reproduce it any more.


Signed-off-by: Marcin Gibula 
---

--- thread-pool.c   2014-04-17 15:44:45.0 +0200
+++ thread-pool.c   2014-05-31 20:20:26.083011514 +0200
@@ -76,6 +76,8 @@ struct ThreadPool {
 int new_threads; /* backlog of threads we need to create */
 int pending_threads; /* threads created but not running yet */
 int pending_cancellations; /* whether we need a cond_broadcast */
+int pending_completions; /* whether we need to rearm notifier when
+executing callback */
 bool stopping;
 };

@@ -110,6 +112,10 @@ static void *worker_thread(void *opaque)
 ret = req->func(req->arg);

 req->ret = ret;
+if (req->common.cb) {
+pool->pending_completions++;
+}
+
 /* Write ret before state.  */
 smp_wmb();
 req->state = THREAD_DONE;
@@ -185,6 +191,14 @@ restart:
 }
 if (elem->state == THREAD_DONE && elem->common.cb) {
 QLIST_REMOVE(elem, all);
+/* If more completed requests are waiting, notifier needs
+ * to be rearmed so callback can progress with aio_pool().
+ */
+pool->pending_completions--;
+if (pool->pending_completions) {
+event_notifier_set(notifier);
+}
+
 /* Read state before ret.  */
 smp_rmb();
 elem->common.cb(elem->common.opaque, elem->ret);