Re: [Question] qemu-img convert block alignment

2021-08-04 Thread Zhenyu Ye
On 2021/8/3 23:03, Eric Blake wrote:
> On Fri, Apr 02, 2021 at 11:52:25AM +0800, Zhenyu Ye wrote:
>> Hi all,
>>
>> commit 8dcd3c9b91 ("qemu-img: align result of is_allocated_sectors")
>> introduces block alignment when doing qemu-img convert. However, the
>> alignment is:
>>
>>  s.alignment = MAX(pow2floor(s.min_sparse),
>>   DIV_ROUND_UP(out_bs->bl.request_alignment,
>>BDRV_SECTOR_SIZE));
>>
>> (where the default s.min_sparse is 8)
>> When the target device's bl.request_alignment is smaller than 4K, this
>> will cause additional write-zero overhead and makes the size of target
>> file larger.
>>
>> Is this as expected?  Should we change the MAX() to MIN()?
> 
> Yes it is expected, and no we shouldn't change it.  Even when a target
> advertises a bl.request_alignment of 512, our goal is to avoid needing
> read-modify-write cycles when that target is really on top of a 4k
> sector disk.  Writing extra 0s out to the 4k boundaries does not
> change the fact that allocation is in 4k chunks anyways, regardless of
> whether the disk supports smaller 512-byte reads.
> 

Thanks for your reply.

Zhenyu



Re: [Question] qemu-img convert block alignment

2021-08-02 Thread Zhenyu Ye
ping?

On 2021/4/2 11:52, Zhenyu Ye wrote:
> Hi all,
> 
> commit 8dcd3c9b91 ("qemu-img: align result of is_allocated_sectors")
> introduces block alignment when doing qemu-img convert. However, the
> alignment is:
> 
>   s.alignment = MAX(pow2floor(s.min_sparse),
>   DIV_ROUND_UP(out_bs->bl.request_alignment,
>BDRV_SECTOR_SIZE));
> 
> (where the default s.min_sparse is 8)
> When the target device's bl.request_alignment is smaller than 4K, this
> will cause additional write-zero overhead and makes the size of target
> file larger.
> 
> Is this as expected?  Should we change the MAX() to MIN()?
> 
> 
> Thanks,
> zhenyu
> 



[Question] qemu-img convert block alignment

2021-04-01 Thread Zhenyu Ye
Hi all,

commit 8dcd3c9b91 ("qemu-img: align result of is_allocated_sectors")
introduces block alignment when doing qemu-img convert. However, the
alignment is:

s.alignment = MAX(pow2floor(s.min_sparse),
  DIV_ROUND_UP(out_bs->bl.request_alignment,
   BDRV_SECTOR_SIZE));

(where the default s.min_sparse is 8)
When the target device's bl.request_alignment is smaller than 4K, this
will cause additional write-zero overhead and makes the size of target
file larger.

Is this as expected?  Should we change the MAX() to MIN()?


Thanks,
zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-12-21 Thread Zhenyu Ye
On 2020/12/15 0:33, Stefan Hajnoczi wrote:
> On Tue, Dec 08, 2020 at 08:47:42AM -0500, Glauber Costa wrote:
>> The work we did at the time was in fixing those things in the kernel
>> as much as we could.
>> But the API is just like that...
> 

The best way for us is to replace io_submit with io_uring.

Thanks for the discussion!

Zhenyu





Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-19 Thread Zhenyu Ye
On 2020/10/19 21:25, Paolo Bonzini wrote:
> On 19/10/20 14:40, Zhenyu Ye wrote:
>> The kernel backtrace for io_submit in GUEST is:
>>
>>  guest# ./offcputime -K -p `pgrep -nx fio`
>>  b'finish_task_switch'
>>  b'__schedule'
>>  b'schedule'
>>  b'io_schedule'
>>  b'blk_mq_get_tag'
>>  b'blk_mq_get_request'
>>  b'blk_mq_make_request'
>>  b'generic_make_request'
>>  b'submit_bio'
>>  b'blkdev_direct_IO'
>>  b'generic_file_read_iter'
>>  b'aio_read'
>>  b'io_submit_one'
>>  b'__x64_sys_io_submit'
>>  b'do_syscall_64'
>>  b'entry_SYSCALL_64_after_hwframe'
>>  -fio (1464)
>>  40031912
>>
>> And Linux io_uring can avoid the latency problem.
> 
> What filesystem are you using?
> 

On host, the VM image and disk images are based on ext4 filesystem.
In guest, the '/' uses xfs filesystem, and the disks are raw devices.

guest# df -hT
Filesystem  Type  Size  Used Avail Use% Mounted on
devtmpfsdevtmpfs   16G 0   16G   0% /dev
tmpfs   tmpfs  16G 0   16G   0% /dev/shm
tmpfs   tmpfs  16G  976K   16G   1% /run
/dev/mapper/fedora-root xfs   8.0G  3.2G  4.9G  40% /
tmpfs   tmpfs  16G 0   16G   0% /tmp
/dev/sda1   xfs  1014M  181M  834M  18% /boot
tmpfs   tmpfs 3.2G 0  3.2G   0% /run/user/0

guest# lsblk
NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda   8:00  10G  0 disk
├─sda18:10   1G  0 part /boot
└─sda28:20   9G  0 part
  ├─fedora-root 253:00   8G  0 lvm  /
  └─fedora-swap 253:10   1G  0 lvm  [SWAP]
vda 252:00  10G  0 disk
vdb 252:16   0  10G  0 disk
vdc 252:32   0  10G  0 disk
vdd 252:48   0  10G  0 disk

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-19 Thread Zhenyu Ye
Hi Stefan,

On 2020/10/13 18:00, Stefan Hajnoczi wrote:
> 
> Sorry, I lost track of this on-going email thread.
> 
> Thanks for the backtrace. It shows the io_submit call is done while the
> AioContext lock is held. The monitor thread is waiting for the
> IOThread's AioContext lock. vcpus threads can get stuck waiting on the
> big QEMU lock (BQL) that is held by the monitor in the meantime.
> 
> Please collect the kernel backtrace for io_submit so we can understand
> why multi-second io_submit latencies happen.
> 
> I also suggest trying aio=io_uring to check if Linux io_uring avoids the
> latency problem.
> 

The kernel backtrace for io_submit in GUEST is:

guest# ./offcputime -K -p `pgrep -nx fio`
b'finish_task_switch'
b'__schedule'
b'schedule'
b'io_schedule'
b'blk_mq_get_tag'
b'blk_mq_get_request'
b'blk_mq_make_request'
b'generic_make_request'
b'submit_bio'
b'blkdev_direct_IO'
b'generic_file_read_iter'
b'aio_read'
b'io_submit_one'
b'__x64_sys_io_submit'
b'do_syscall_64'
b'entry_SYSCALL_64_after_hwframe'
-fio (1464)
40031912

And Linux io_uring can avoid the latency problem.

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-18 Thread Zhenyu Ye
On 2020/9/18 22:06, Fam Zheng wrote:
> 
> I can see how blocking in a slow io_submit can cause trouble for main
> thread. I think one way to fix it (until it's made truly async in new
> kernels) is moving the io_submit call to thread pool, and wrapped in a
> coroutine, perhaps.
>

I'm not sure if any other operation will block the main thread, other
than io_submit().

> I'm not sure qmp timeout is a complete solution because we would still
> suffer from a blocked state for a period, in this exact situation before
> the timeout.

Anyway, the qmp timeout may be the last measure to prevent the VM
soft lockup.  Ideally, after your fix of io_submit, this mechanism will
never be triggered.

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-18 Thread Zhenyu Ye
Hi Stefan, Fam,

On 2020/9/18 0:01, Fam Zheng wrote:
> On 2020-09-17 16:44, Stefan Hajnoczi wrote:
>> On Thu, Sep 17, 2020 at 03:36:57PM +0800, Zhenyu Ye wrote:
>>> When the hang occurs, the QEMU is blocked at:
>>>
>>> #0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
>>> #1  0x9575bd88 in pthread_mutex_lock () from 
>>> target:/usr/lib64/libpthread.so.0
>>> #2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
>>> file=0xbb4e1bd0 
>>> "/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
>>> #3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
>>> #4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
>>> p_info=0xccc41e18, errp=0xca669118)
>>> #5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
>>> bs=0xcc934620,
>>> flat=false, errp=0xca6692b8)
>>> #6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
>>> p_info=0xcd29c9a8,
>>> errp=0xca6692b8)
>>> #7  0xbb106c14 in qmp_query_block (errp=0x0)
>>> #8  0xbacb8e6c in hmp_info_block (mon=0xca6693d0, 
>>> qdict=0xcd089790)
>>
>> Great, this shows that the main loop thread is stuck waiting for the
>> AioContext lock.
>>
>> Please post backtraces from all QEMU threads ((gdb) thread apply all bt)
>> so we can figure out which thread is holding up the main loop.
> 
> I think that is reflected in the perf backtrace posted by Zheny already:
> 
> And in the host, the information of sys_enter_io_submit() is:
> 
> Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count
> (approx.): 3150
>Children  Self  Trace output
>-   66.70%66.70%  ctx_id: 0x9c044000,
>nr: 0x0001, iocbpp: 0x9f7fad28
>0xae7f871c
>0xae8a27c4
>qemu_thread_start
>iothread_run
>aio_poll
>aio_dispatch_ready_handlers
>aio_dispatch_handler
>virtio_queue_host_notifier_aio_read
>virtio_queue_notify_aio_vq
>virtio_blk_data_plane_handle_output
>virtio_blk_handle_vq
>blk_io_unplug
>bdrv_io_unplug
>bdrv_io_unplug
>raw_aio_unplug
>laio_io_unplug
>syscall
> 
> So the iothread is blocked by a slow io_submit holding the AioContext
> lock.
> 
> It would be interesting to know what in kernel is blocking io_submit
> from returning.
> 

Sorry, I only get the backtraces of all QEMU threads, like below:

(gdb) thread apply all bt

Thread 35 (LWP 49700):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)
at ../util/thread-pool.c:91
#3  0xe85f269c in qemu_thread_start (args=0xf0204be0)
at ../util/qemu-thread-posix.c:521
#4  0x831428bc in  () at /lib64/libpthread.so.0
#5  0x8308aa1c in  () at /lib64/libc.so.6

Thread 34 (LWP 45337):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)
at ../util/thread-pool.c:91
#3  0xe85f269c in qemu_thread_start (args=0xfffcc8000b20)
at ../util/qemu-thread-posix.c:521
#4  0x831428bc in  () at /lib64/libpthread.so.0
#5  0x8308aa1c in  () at /lib64/libc.so.6

Thread 33 (LWP 45336):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)
at ../util/thread-pool.c:91
#3  0xe85f269c in qemu_thread_start (args=0xfffd14000b20)
at ../util/qemu-thread-posix.c:521
#4  0x831428bc in  () at /lib64/libpthread.so.0
#5  0x8308aa1c in  () at /lib64/libc.so.6

Thread 32 (LWP 45335):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Zhenyu Ye
Hi Daniel,

On 2020/9/14 22:42, Daniel P. Berrangé wrote:
> On Tue, Aug 11, 2020 at 09:54:08PM +0800, Zhenyu Ye wrote:
>> Hi Kevin,
>>
>> On 2020/8/10 23:38, Kevin Wolf wrote:
>>> Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
>>>> Before doing qmp actions, we need to lock the qemu_global_mutex,
>>>> so the qmp actions should not take too long time.
>>>>
>>>> Unfortunately, some qmp actions need to acquire aio context and
>>>> this may take a long time.  The vm will soft lockup if this time
>>>> is too long.
>>>
>>> Do you have a specific situation in mind where getting the lock of an
>>> AioContext can take a long time? I know that the main thread can
>>> block for considerable time, but QMP commands run in the main thread, so
>>> this patch doesn't change anything for this case. It would be effective
>>> if an iothread blocks, but shouldn't everything running in an iothread
>>> be asynchronous and therefore keep the AioContext lock only for a short
>>> time?
>>>
>>
>> Theoretically, everything running in an iothread is asynchronous. However,
>> some 'asynchronous' actions are not non-blocking entirely, such as
>> io_submit().  This will block while the iodepth is too big and I/O pressure
>> is too high.  If we do some qmp actions, such as 'info block', at this time,
>> may cause vm soft lockup.  This series can make these qmp actions safer.
>>
>> I constructed the scene as follow:
>> 1. create a vm with 4 disks, using iothread.
>> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
>> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
>> some parameters are:
>>
>>   fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
>>
>> 4. do block query actions, for example, by virsh:
>>
>>  virsh qemu-monitor-command [vm name] --hmp info block
>>
>> Then the vm will soft lockup, the calltrace is:
> 
> [snip]
> 
>> This problem can be avoided after this series applied.
> 
> At what cost though ?   With this timeout, QEMU is going to start
> reporting bogus failures for various QMP commands when running
> under high load, even if those commands would actually run
> successfully.  This will turn into an error report from libvirt
> which will in turn probably cause an error in the mgmt application
> using libvirt, and in turn could break the user's automation.
> 

I think it's worth reporting an error to avoid the VM softlockup.
The VM may even crash if kernel.softlockup_panic is configured!

We can increase the timeout value (close to the VM cpu soft lock time)
to avoid unnecessary errors.

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Zhenyu Ye
Hi Stefan,

On 2020/9/14 21:27, Stefan Hajnoczi wrote:
>>
>> Theoretically, everything running in an iothread is asynchronous. However,
>> some 'asynchronous' actions are not non-blocking entirely, such as
>> io_submit().  This will block while the iodepth is too big and I/O pressure
>> is too high.  If we do some qmp actions, such as 'info block', at this time,
>> may cause vm soft lockup.  This series can make these qmp actions safer.
>>
>> I constructed the scene as follow:
>> 1. create a vm with 4 disks, using iothread.
>> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
>> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
>> some parameters are:
>>
>>   fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
>>
>> 4. do block query actions, for example, by virsh:
>>
>>  virsh qemu-monitor-command [vm name] --hmp info block
>>
>> Then the vm will soft lockup, the calltrace is:
>>
>> [  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
>> [kworker/1:1:33]
> 
> Hi,
> Sorry I haven't had time to investigate this myself yet.
> 
> Do you also have a QEMU backtrace when the hang occurs?
> 
> Let's find out if QEMU is stuck in the io_submit(2) syscall or whether
> there's an issue in QEMU itself that causes the softlockup (for example,
> aio_poll() with the global mutex held).

Sorry for the delay.

I traced the io_submit() within the guest. The maximum execution time exceeds 
16s:

guest# perf trace -p `pidof -s fio` --duration 1
14.808 (3843.082 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
  3861.336 (11470.608 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 15341.998 (479.283 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 15831.380 (3704.997 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 19547.869 (3412.839 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 22966.953 (1080.854 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 24062.689 (6939.813 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 31019.219 (532.147 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 31556.610 (3469.920 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 35038.885 (9007.175 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 44053.578 (16006.405 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 60068.944 (3068.748 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 63138.474 (13012.499 ms): io_getevents(ctx_id: 281472924459008, 
min_nr: 511, nr: 511, events: 0x19a83150) = 511
 76191.073 (2872.657 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
...

And in the host, the information of sys_enter_io_submit() is:

Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count 
(approx.): 3150
Children  Self  Trace output
-   66.70%66.70%  ctx_id: 0x9c044000, nr: 0x0001, iocbpp: 
0x9f7fad28
0xae7f871c
0xae8a27c4
qemu_thread_start
iothread_run
aio_poll
aio_dispatch_ready_handlers
aio_dispatch_handler
virtio_queue_host_notifier_aio_read
virtio_queue_notify_aio_vq
virtio_blk_data_plane_handle_output
virtio_blk_handle_vq
blk_io_unplug
bdrv_io_unplug
bdrv_io_unplug
raw_aio_unplug
laio_io_unplug
syscall


When the hang occurs, the QEMU is blocked at:

#0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
#1  0x9575bd88 in pthread_mutex_lock () from 
target:/usr/lib64/libpthread.so.0
#2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
file=0xbb4e1bd0 
"/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
#3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
#4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
p_info=0xccc41e18, errp=0xca669118)
#5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
bs=0xcc934620,
flat=false, errp=0xca6692b8)
#6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
p_info=0xcd29c9a8,
errp=0xca6692b8)
#7  0xbb106c14 in qmp_query_block (errp=0x0)
#8  0xbacb8e6c in hmp_info_block (mon=0xca6693d0, 
qdict=0xcd089790)
#9  0xbb0068f0 in handle_hmp_command (mon=0xca6693d0,
cmdline

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-12 Thread Zhenyu Ye
Hi Stefan,

On 2020/8/12 21:51, Stefan Hajnoczi wrote:
> On Mon, Aug 10, 2020 at 10:52:44PM +0800, Zhenyu Ye wrote:
>> Before doing qmp actions, we need to lock the qemu_global_mutex,
>> so the qmp actions should not take too long time.
>>
>> Unfortunately, some qmp actions need to acquire aio context and
>> this may take a long time.  The vm will soft lockup if this time
>> is too long.
>>
>> So add a timeout mechanism while doing qmp actions.
> 
> aio_context_acquire_timeout() is a workaround for code that blocks the
> event loop. Ideally there should be no code that blocks the event loop.
> 
> Which cases have you found where the event loop is blocked?
> 

Currently I only found the io_submit() will block while I/O pressure
is too high, for details, see:


https://lore.kernel.org/qemu-devel/c6d75e49-3e36-6a76-fdc8-cdf09e7c3...@huawei.com/

io_submit can not ensure non-blocking at any time.

> I think they should be investigated and fixed (if possible) before
> introducing an API like aio_context_acquire_timeout().
> 

We cannot ensure that everything is non-blocking in iothread, because
some actions seems like asynchronous but will block in some times (such
as io_submit).  Anyway, the _timeout() API can make these qmp commands
(which need to get aio_context) be safer.

Thanks,
Zhenyu




Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-11 Thread Zhenyu Ye
Hi Kevin,

On 2020/8/10 23:38, Kevin Wolf wrote:
> Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
>> Before doing qmp actions, we need to lock the qemu_global_mutex,
>> so the qmp actions should not take too long time.
>>
>> Unfortunately, some qmp actions need to acquire aio context and
>> this may take a long time.  The vm will soft lockup if this time
>> is too long.
> 
> Do you have a specific situation in mind where getting the lock of an
> AioContext can take a long time? I know that the main thread can
> block for considerable time, but QMP commands run in the main thread, so
> this patch doesn't change anything for this case. It would be effective
> if an iothread blocks, but shouldn't everything running in an iothread
> be asynchronous and therefore keep the AioContext lock only for a short
> time?
> 

Theoretically, everything running in an iothread is asynchronous. However,
some 'asynchronous' actions are not non-blocking entirely, such as
io_submit().  This will block while the iodepth is too big and I/O pressure
is too high.  If we do some qmp actions, such as 'info block', at this time,
may cause vm soft lockup.  This series can make these qmp actions safer.

I constructed the scene as follow:
1. create a vm with 4 disks, using iothread.
2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
3. add press to the 4 disks in the vm at the same time.  I used the fio and
some parameters are:

 fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4

4. do block query actions, for example, by virsh:

virsh qemu-monitor-command [vm name] --hmp info block

Then the vm will soft lockup, the calltrace is:

[  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
[kworker/1:1:33]
[  192.314241] Kernel panic - not syncing: softlockup: hung tasks
[  192.316370] CPU: 1 PID: 33 Comm: kworker/1:1 Kdump: loaded Tainted: G
   OEL4.19.36+ #16
[  192.318765] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  192.325638] Workqueue: events drm_fb_helper_dirty_work
[  192.roorr327238] Call trace:
[  192.331528]  dump_backtrace+0x0/0x198
[  192.332695]  show_stack+0x24/0x30
[  192.334219]  dump_stack+0xa4/0xcc
[  192.335578]  panic+0x12c/0x314
[  192.336565]  watchdog_timoot_fn+0x3e4/0x3e8
[  192.339984]  __hrtimer_run_queues+0x114/0x358
[  192.341576]  hrtimer_interrupt+0x104/0x2d8
[  192.343247]  arch_timer_handler_virt+0x38/0x58
[  192.345074]  handle_percpu_devid_irq+0x90/0x248
[  192.347238]  generic_handle_irq+0x34/0x50
[  192.349536]  __handle_domain_irq+0x68/0xc0
[  192.351193]  gic_handle_irq+0x6c/0x150
[  192.352639]  el1_irq+0xb8/0x140
[  192.353855]  vp_notify+0x28/0x38 [virtio_pci]
[  192.355646]  virtqueue_kick+0x3c/0x78 [virtio_ring]
[  192.357539]  virtio_gpu_queue_ctrl_buffer_locked+0x180/0x248 [virtio_gpu]
[  192.359869]  virtio_gpu_queue_ctrl_buffer+0x50/0x78 [virtio_gpu]
[  192.361456]  virtio_gpu_cmd_resource_flush+0x8c/0xb0 [virtio_gpu]
[  192.363422]  virtio_gpu_surface_dirty+0x60/0x110 [virtio_gpu]
[  192.365215]  virtio_gpu_framebuffer_surface_dirty+0x34/0x48 [virtio_gpu]
[  192.367452]  drm_fb_helper_dirty_work+0x178/0x1c0
[  192.368912]  process_one_work+0x1b4/0x3f8
[  192.370192]  worker_thread+0x54/0x470
[  192.371370]  kthread+0x134/0x138
[  192.379241]  ret_from_fork+0x10/0x18
[  192.382688] kernel fault(0x5) notification starting on CPU 1
[  192.385059] kernel fault(0x5) notification finished on CPU 1
[  192.387311] SMP: stopping secondary CPUs
[  192.391024] Kernel Offset: disabled
[  192.392111] CPU features: 0x0,a1806008
[  192.393306] Memory Limit: none
[  192.396701] Starting crashdump kernel...
[  192.397821] Bye!

This problem can be avoided after this series applied.

Thanks,
Zhenyu




[PATCH v1 1/2] util: introduce aio_context_acquire_timeout

2020-08-10 Thread Zhenyu Ye
Currently we only have the aio_context_acquire() to take the
ownership of the AioContext.  If the mutex is locked by other
threads, this function will wait util the mutex is released.

Sometimes we may want to get some return values after waiting
for some time.  So introduce aio_context_acquire_timeout(),
which will return ETIMEDOUT after @t seconds.

This will be used in next patch.

Signed-off-by: Zhenyu Ye 
---
 include/block/aio.h |  5 +
 include/qemu/thread-posix.h |  1 +
 include/qemu/thread.h   |  1 +
 util/async.c| 10 ++
 util/qemu-thread-posix.c|  6 ++
 5 files changed, 23 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index b2f703fa3f..cb178cdf5a 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -287,6 +287,11 @@ void aio_context_unref(AioContext *ctx);
  */
 void aio_context_acquire(AioContext *ctx);
 
+/* Add timeout to aio_context_acquire().  If the time for obtaining
+ * the lock exceeds @t, return ETIMEDOUT.
+ */
+int aio_context_acquire_timeout(AioContext *ctx, int t);
+
 /* Relinquish ownership of the AioContext. */
 void aio_context_release(AioContext *ctx);
 
diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index c903525062..2e62181959 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -9,6 +9,7 @@ typedef QemuMutex QemuRecMutex;
 #define qemu_rec_mutex_lock_implqemu_mutex_lock_impl
 #define qemu_rec_mutex_trylock_impl qemu_mutex_trylock_impl
 #define qemu_rec_mutex_unlock qemu_mutex_unlock
+#define qemu_rec_mutex_timed_lock qemu_mutex_timed_lock
 
 struct QemuMutex {
 pthread_mutex_t lock;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 4baf4d1715..7a96a5df2a 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -27,6 +27,7 @@ void qemu_mutex_destroy(QemuMutex *mutex);
 int qemu_mutex_trylock_impl(QemuMutex *mutex, const char *file, const int 
line);
 void qemu_mutex_lock_impl(QemuMutex *mutex, const char *file, const int line);
 void qemu_mutex_unlock_impl(QemuMutex *mutex, const char *file, const int 
line);
+int qemu_mutex_timed_lock(QemuMutex *mutex, const struct timespec *tsptr);
 
 typedef void (*QemuMutexLockFunc)(QemuMutex *m, const char *f, int l);
 typedef int (*QemuMutexTrylockFunc)(QemuMutex *m, const char *f, int l);
diff --git a/util/async.c b/util/async.c
index 1319eee3bc..c478d3dbf5 100644
--- a/util/async.c
+++ b/util/async.c
@@ -605,6 +605,16 @@ void aio_context_acquire(AioContext *ctx)
 qemu_rec_mutex_lock(&ctx->lock);
 }
 
+int aio_context_acquire_timeout(AioContext *ctx, int t)
+{
+struct timespec tout;
+
+clock_gettime(CLOCK_REALTIME, &tout);
+tout.tv_sec += t;
+
+return qemu_rec_mutex_timed_lock(&ctx->lock, &tout);
+}
+
 void aio_context_release(AioContext *ctx)
 {
 qemu_rec_mutex_unlock(&ctx->lock);
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index b4c2359272..49fb98a2e8 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -82,6 +82,12 @@ void qemu_mutex_lock_impl(QemuMutex *mutex, const char 
*file, const int line)
 qemu_mutex_post_lock(mutex, file, line);
 }
 
+int qemu_mutex_timed_lock(QemuMutex *mutex, const struct timespec *tsptr)
+{
+assert(mutex->initialized);
+return pthread_mutex_timedlock(&mutex->lock, tsptr);
+}
+
 int qemu_mutex_trylock_impl(QemuMutex *mutex, const char *file, const int line)
 {
 int err;
-- 
2.22.0.windows.1





[PATCH v1 2/2] qmp: use aio_context_acquire_timeout replace aio_context_acquire

2020-08-10 Thread Zhenyu Ye
Before doing qmp actions, we need to lock the qemu_global_mutex,
so the qmp actions should not take too long time.

Unfortunately, some qmp actions need to acquire aio context and
this may take a long time.  So replace aio_context_acquire()
by aio_context_acquire_timeout() and return ETIMEDOUT if the
waiting time exceeds LOCK_TIMEOUT (default set to 3 seconds).

Signed-off-by: Zhenyu Ye 
---
 block/qapi-sysemu.c |  7 ++-
 block/qapi.c|  6 +-
 blockdev.c  | 35 ++-
 include/block/aio.h |  1 +
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 8498402ad4..e7cd7d3c70 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -439,6 +439,7 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error 
**errp)
 BlockDriverState *bs;
 BlockBackend *blk;
 AioContext *aio_context;
+int ret;
 
 blk = qmp_get_blk(arg->has_device ? arg->device : NULL,
   arg->has_id ? arg->id : NULL,
@@ -448,7 +449,11 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error 
**errp)
 }
 
 aio_context = blk_get_aio_context(blk);
-aio_context_acquire(aio_context);
+ret = aio_context_acquire_timeout(aio_context, LOCK_TIMEOUT);
+if (ret) {
+error_setg_errno(errp, ret, "acquire aio context failed");
+return;
+}
 
 bs = blk_bs(blk);
 if (!bs) {
diff --git a/block/qapi.c b/block/qapi.c
index afd9f3b4a7..ff2454daff 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -271,7 +271,11 @@ void bdrv_query_image_info(BlockDriverState *bs,
 Error *err = NULL;
 ImageInfo *info;
 
-aio_context_acquire(bdrv_get_aio_context(bs));
+ret = aio_context_acquire_timeout(bdrv_get_aio_context(bs), LOCK_TIMEOUT);
+if (ret) {
+error_setg_errno(errp, ret, "acquire aio context failed");
+return;
+}
 
 size = bdrv_getlength(bs);
 if (size < 0) {
diff --git a/blockdev.c b/blockdev.c
index 3848a9c8ab..ce133c89e2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2431,6 +2431,7 @@ void qmp_block_resize(bool has_device, const char *device,
 BlockBackend *blk = NULL;
 BlockDriverState *bs;
 AioContext *aio_context;
+int ret;
 
 bs = bdrv_lookup_bs(has_device ? device : NULL,
 has_node_name ? node_name : NULL,
@@ -2441,7 +2442,11 @@ void qmp_block_resize(bool has_device, const char 
*device,
 }
 
 aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(aio_context);
+ret = aio_context_acquire_timeout(aio_context, LOCK_TIMEOUT);
+if (ret) {
+error_setg_errno(errp, ret, "acquire aio context failed");
+return;
+}
 
 if (size < 0) {
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
@@ -3016,7 +3021,11 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
 }
 
 aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(aio_context);
+ret = aio_context_acquire_timeout(aio_context, LOCK_TIMEOUT);
+if (ret) {
+error_setg_errno(errp, ret, "acquire aio context failed");
+return;
+}
 
 if (!arg->has_mode) {
 arg->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
@@ -3184,12 +3193,20 @@ void qmp_blockdev_mirror(bool has_job_id, const char 
*job_id,
 /* Honor bdrv_try_set_aio_context() context acquisition requirements. */
 old_context = bdrv_get_aio_context(target_bs);
 aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(old_context);
+ret = aio_context_acquire_timeout(old_context, LOCK_TIMEOUT);
+if (ret) {
+error_setg_errno(errp, ret, "acquire aio context failed");
+return;
+}
 
 ret = bdrv_try_set_aio_context(target_bs, aio_context, errp);
 
 aio_context_release(old_context);
-aio_context_acquire(aio_context);
+ret = aio_context_acquire_timeout(aio_context, LOCK_TIMEOUT);
+if (ret) {
+error_setg_errno(errp, ret, "acquire aio context failed");
+return;
+}
 
 if (ret < 0) {
 goto out;
@@ -3603,6 +3620,7 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = &head;
 BlockJob *job;
+int ret;
 
 for (job = block_job_next(NULL); job; job = block_job_next(job)) {
 BlockJobInfoList *elem;
@@ -3613,7 +3631,14 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 }
 elem = g_new0(BlockJobInfoList, 1);
 aio_context = blk_get_aio_context(job->blk);
-aio_context_acquire(aio_context);
+ret = aio_context_acquire_timeout(aio_context, LOCK_TIMEOUT);
+if (ret) {
+g_free(elem);
+qapi_free_BlockJobInfoList(head);
+error_setg_errno(errp, ret, "acquire aio context failed");
+return NULL;
+

[PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-10 Thread Zhenyu Ye
Before doing qmp actions, we need to lock the qemu_global_mutex,
so the qmp actions should not take too long time.

Unfortunately, some qmp actions need to acquire aio context and
this may take a long time.  The vm will soft lockup if this time
is too long.

So add a timeout mechanism while doing qmp actions.

Zhenyu Ye (2):
  util: introduce aio_context_acquire_timeout
  qmp: use aio_context_acquire_timeout replace aio_context_acquire

 block/qapi-sysemu.c |  7 ++-
 block/qapi.c|  6 +-
 blockdev.c  | 35 ++-
 include/block/aio.h |  6 ++
 include/qemu/thread-posix.h |  1 +
 include/qemu/thread.h   |  1 +
 util/async.c| 10 ++
 util/qemu-thread-posix.c|  6 ++
 8 files changed, 65 insertions(+), 7 deletions(-)

-- 
2.22.0.windows.1





[PATCH v1] qmp: don't hold ctx lock while querying blockstats

2020-07-10 Thread Zhenyu Ye
Because the QMP command runs in the main thread, and changes
to the aio context of iothread will only be executed in the
main thread (they will not be in parallel), so there is no
need a lock protection while querying blockstats.

If we hold the lock here, while the I/O pressure is high in
vm and the I/O returns slowly, the main thread will be stuck
until the lock is released, which will affect the vcpu operation
and finall cause the vm to be stuck.

Signed-off-by: Zhenyu Ye 
---
 block/qapi.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index afd9f3b4a7..fa56bc145d 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -609,11 +609,8 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 if (has_query_nodes && query_nodes) {
 for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) {
 BlockStatsList *info = g_malloc0(sizeof(*info));
-AioContext *ctx = bdrv_get_aio_context(bs);
 
-aio_context_acquire(ctx);
 info->value = bdrv_query_bds_stats(bs, false);
-aio_context_release(ctx);
 
 *p_next = info;
 p_next = &info->next;
@@ -621,7 +618,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 } else {
 for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
 BlockStatsList *info;
-AioContext *ctx = blk_get_aio_context(blk);
 BlockStats *s;
 char *qdev;
 
@@ -629,7 +625,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 continue;
 }
 
-aio_context_acquire(ctx);
 s = bdrv_query_bds_stats(blk_bs(blk), true);
 s->has_device = true;
 s->device = g_strdup(blk_name(blk));
@@ -643,7 +638,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 }
 
 bdrv_query_blk_stats(s->stats, blk);
-aio_context_release(ctx);
 
 info = g_malloc0(sizeof(*info));
 info->value = s;
-- 
2.19.1