From: Michael Qiu <qiud...@huayun.com>

Currently, if guest has workloads, IO thread will acquire aio_context
lock before do io_submit, it leads to segmentfault when do block commit
after snapshot. Just like below:
[Switching to thread 2 (Thread 0x7f046c312700 (LWP 108791))]
#0  0x00005573f57930db in bdrv_mirror_top_pwritev ... at block/mirror.c:1420
1420    in block/mirror.c
(gdb) p s->job
$17 = (MirrorBlockJob *) 0x0
(gdb) p s->stop
$18 = false
(gdb)
(gdb) bt
#0  0x00005573f57930db in bdrv_mirror_top_pwritev ... at block/mirror.c:1420
#1  0x00005573f5798ceb in bdrv_driver_pwritev ... at block/io.c:1183
#2  0x00005573f579ae7a in bdrv_aligned_pwritev ... at block/io.c:1980
#3  0x00005573f579b667 in bdrv_co_pwritev_part ... at block/io.c:2137
#4  0x00005573f57886c8 in blk_do_pwritev_part ... at block/block-backend.c:1231
#5  0x00005573f578879d in blk_aio_write_entry ... at block/block-backend.c:1439
#6  0x00005573f58317cb in coroutine_trampoline ... at 
util/coroutine-ucontext.c:115
#7  0x00007f047414a0d0 in __start_context () at /lib64/libc.so.6
#8  0x00007f046c310e60 in  ()
#9  0x0000000000000000 in  ()

Switch to qmp:
#0  0x00007f04744dd4ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f04744d8de6 in _L_lock_941 () at /lib64/libpthread.so.0
#2  0x00007f04744d8cdf in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x00005573f581de89 in qemu_mutex_lock_impl ... at 
util/qemu-thread-posix.c:78
#4  0x00005573f575789e in block_job_add_bdrv ... at blockjob.c:223
#5  0x00005573f5757ebd in block_job_create ... at blockjob.c:441
#6  0x00005573f5792430 in mirror_start_job ... at block/mirror.c:1604
#7  0x00005573f5794b6f in commit_active_start ... at block/mirror.c:1789

in IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field
is false, this means the s object has not been initialized, and this object
is initialized by block_job_create(), but the initialize process stuck in
acquire the lock.

The rootcause is that qemu do release/acquire when hold the lock,
at the same time, IO thread get the lock after release stage, and the crash
occured.

Actually, in this situation, job->job.aio_context will not equal to
qemu_get_aio_context(), and will be the same as bs->aio_context,
thus, no need to release the lock, becasue bdrv_root_attach_child()
will not change the context.

This patch fix this issue.

Signed-off-by: Michael Qiu <qiud...@huayun.com>
---
 blockjob.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index c6e20e2f..e1d41db9 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -214,12 +214,14 @@ int block_job_add_bdrv(BlockJob *job, const char *name, 
BlockDriverState *bs,
     BdrvChild *c;
 
     bdrv_ref(bs);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_release(job->job.aio_context);
     }
     c = bdrv_root_attach_child(bs, name, &child_job, job->job.aio_context,
                                perm, shared_perm, job, errp);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_acquire(job->job.aio_context);
     }
     if (c == NULL) {
-- 
2.22.0



Reply via email to