On Fri, Apr 7, 2017 at 7:33 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > On Fri, Apr 07, 2017 at 09:30:33AM +0800, 858585 jemmy wrote: >> On Thu, Apr 6, 2017 at 10:02 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: >> > On Wed, Apr 05, 2017 at 05:27:58PM +0800, jemmy858...@gmail.com wrote: >> >> From: Lidong Chen <lidongc...@tencent.com> >> >> >> >> when migration with high speed, mig_save_device_bulk invoke >> >> bdrv_is_allocated too frequently, and cause vnc reponse slowly. >> >> this patch limit the time used for bdrv_is_allocated. >> > >> > bdrv_is_allocated() is supposed to yield back to the event loop if it >> > needs to block. If your VNC session is experiencing jitter then it's >> > probably because a system call in the bdrv_is_allocated() code path is >> > synchronous when it should be asynchronous. >> > >> > You could try to identify the system call using strace -f -T. In the >> > output you'll see the duration of each system call. I guess there is a >> > file I/O system call that is taking noticable amounts of time. >> >> yes, i find the reason where bdrv_is_allocated needs to block. >> >> the mainly reason is caused by qemu_co_mutex_lock invoked by >> qcow2_co_get_block_status. >> qemu_co_mutex_lock(&s->lock); >> ret = qcow2_get_cluster_offset(bs, sector_num << 9, &bytes, >> &cluster_offset); >> qemu_co_mutex_unlock(&s->lock); >> >> other reason is caused by l2_load invoked by >> qcow2_get_cluster_offset. >> >> /* load the l2 table in memory */ >> >> ret = l2_load(bs, l2_offset, &l2_table); >> if (ret < 0) { >> return ret; >> } > > The migration thread is holding the QEMU global mutex, the AioContext, > and the qcow2 s->lock while the L2 table is read from disk. > > The QEMU global mutex is needed for block layer operations that touch > the global drives list. bdrv_is_allocated() can be called without the > global mutex. > > The VNC server's file descriptor is not in the BDS AioContext. > Therefore it can be processed while the migration thread holds the > AioContext and qcow2 s->lock. > > Does the following patch solve the problem? > > diff --git a/migration/block.c b/migration/block.c > index 7734ff7..072fc20 100644 > --- a/migration/block.c > +++ b/migration/block.c > @@ -276,6 +276,7 @@ static int mig_save_device_bulk(QEMUFile *f, > BlkMigDevState *bmds) > if (bmds->shared_base) { > qemu_mutex_lock_iothread(); > aio_context_acquire(blk_get_aio_context(bb)); > + qemu_mutex_unlock_iothread(); > /* Skip unallocated sectors; intentionally treats failure as > * an allocated sector */ > while (cur_sector < total_sectors && > @@ -283,6 +284,7 @@ static int mig_save_device_bulk(QEMUFile *f, > BlkMigDevState *bmds) > MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) { > cur_sector += nr_sectors; > } > + qemu_mutex_lock_iothread(); > aio_context_release(blk_get_aio_context(bb)); > qemu_mutex_unlock_iothread(); > } >
this patch don't work. the qemu lockup. the stack of main thread. (gdb) bt #0 0x00007f4256c89264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f4256c84523 in _L_lock_892 () from /lib64/libpthread.so.0 #2 0x00007f4256c84407 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000949f47 in qemu_mutex_lock (mutex=0x1b04a60) at util/qemu-thread-posix.c:60 #4 0x00000000009424cf in aio_context_acquire (ctx=0x1b04a00) at util/async.c:484 #5 0x0000000000942b86 in thread_pool_completion_bh (opaque=0x1b25a10) at util/thread-pool.c:168 #6 0x0000000000941610 in aio_bh_call (bh=0x1b1d570) at util/async.c:90 #7 0x00000000009416bb in aio_bh_poll (ctx=0x1b04a00) at util/async.c:118 #8 0x0000000000946baa in aio_dispatch (ctx=0x1b04a00) at util/aio-posix.c:429 #9 0x0000000000941b30 in aio_ctx_dispatch (source=0x1b04a00, callback=0, user_data=0x0) at util/async.c:261 #10 0x00007f4257670f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #11 0x0000000000945282 in glib_pollfds_poll () at util/main-loop.c:213 #12 0x00000000009453a3 in os_host_main_loop_wait (timeout=754229747) at util/main-loop.c:261 #13 0x000000000094546e in main_loop_wait (nonblocking=0) at util/main-loop.c:517 #14 0x00000000005c7664 in main_loop () at vl.c:1898 #15 0x00000000005ceb27 in main (argc=49, argv=0x7fff7907ab28, envp=0x7fff7907acb8) at vl.c:4709 the stack of migration thread. (gdb) bt #0 0x00007f4256c89264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f4256c84508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f4256c843d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000949f47 in qemu_mutex_lock (mutex=0xfc5200) at util/qemu-thread-posix.c:60 #4 0x0000000000459e08 in qemu_mutex_lock_iothread () at /root/qemu/cpus.c:1516 #5 0x00000000007d2e04 in mig_save_device_bulk (f=0x2489720, bmds=0x7f42500008f0) at migration/block.c:287 #6 0x00000000007d3579 in blk_mig_save_bulked_block (f=0x2489720) at migration/block.c:484 #7 0x00000000007d3ebf in block_save_iterate (f=0x2489720, opaque=0xfd3e20) at migration/block.c:773 #8 0x000000000049e840 in qemu_savevm_state_iterate (f=0x2489720, postcopy=false) at /root/qemu/migration/savevm.c:1044 #9 0x00000000007c635d in migration_thread (opaque=0xf7d160) at migration/migration.c:1976 #10 0x00007f4256c829d1 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f42569cf8fd in clone () from /lib64/libc.so.6 vcpu thread. (gdb) bt #0 0x00007f4256c89264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f4256c84508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f4256c843d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000949f47 in qemu_mutex_lock (mutex=0xfc5200) at util/qemu-thread-posix.c:60 #4 0x0000000000459e08 in qemu_mutex_lock_iothread () at /root/qemu/cpus.c:1516 #5 0x00000000004146bb in prepare_mmio_access (mr=0x39010f0) at /root/qemu/exec.c:2703 #6 0x0000000000414ad3 in address_space_read_continue (as=0xf9c520, addr=1018, attrs=..., buf= 0x7f4259464000 "%\001", len=1, addr1=2, l=1, mr=0x39010f0) at /root/qemu/exec.c:2827 #7 0x0000000000414d81 in address_space_read_full (as=0xf9c520, addr=1018, attrs=..., buf= 0x7f4259464000 "%\001", len=1) at /root/qemu/exec.c:2895 #8 0x0000000000414e4b in address_space_read (as=0xf9c520, addr=1018, attrs=..., buf= 0x7f4259464000 "%\001", len=1, is_write=false) at /root/qemu/include/exec/memory.h:1671 #9 address_space_rw (as=0xf9c520, addr=1018, attrs=..., buf=0x7f4259464000 "%\001", len=1, is_write= false) at /root/qemu/exec.c:2909 #10 0x00000000004753c9 in kvm_handle_io (port=1018, attrs=..., data=0x7f4259464000, direction=0, size= 1, count=1) at /root/qemu/kvm-all.c:1803 #11 0x0000000000475c15 in kvm_cpu_exec (cpu=0x1b827b0) at /root/qemu/kvm-all.c:2032 #12 0x00000000004591c8 in qemu_kvm_cpu_thread_fn (arg=0x1b827b0) at /root/qemu/cpus.c:1087 #13 0x00007f4256c829d1 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f42569cf8fd in clone () from /lib64/libc.so.6 the main thread hold qemu_mutex_lock_iothread first, and then aio_context_acquire. the migration thread hold aio_context_acquire first, then qemu_mutex_lock_iothread.