On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote: > Sorry, resending with correct indentation and quoting. > > On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote: > > 01.12.2021 12:48, Rao, Lei wrote: > > > > > > > > > -----Original Message----- > > > From: Daniel P. Berrangé <berra...@redhat.com> > > > Sent: Wednesday, December 1, 2021 5:11 PM > > > To: Rao, Lei <lei....@intel.com> > > > Cc: Zhang, Chen <chen.zh...@intel.com>; ebl...@redhat.com; > > > vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; > > > qemu-bl...@nongnu.org; qemu-devel@nongnu.org > > > Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO > > > exits cleanly in some corner case > > > > > > On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: > > > > We found that the QIO channel coroutine could not be awakened in > > > > some corner cases during our stress test for COLO. > > > > The patch fixes as follow: > > > > #0 0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, > > > > timeout=<optimized out>, sigmask=0x0) at > > > > ../sysdeps/unix/sysv/linux/ppoll.c:44 > > > > #1 0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, > > > > nfds=1, timeout=599999697630) at util/qemu-timer.c:348 > > > > #2 0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, > > > > blocking=true) at util/aio-posix.c:669 > > > > #3 0x00005563d68ba24f in bdrv_do_drained_begin > > > > (bs=0x5563d75ea0c0, recursive=false, parent=0x0, > > > > ignore_bds_parents=false, poll=true) at block/io.c:432 > > > > #4 0x00005563d68ba338 in bdrv_drained_begin > > > > (bs=0x5563d75ea0c0) at block/io.c:438 > > > > #5 0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, > > > > child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 > > > > #6 0x00005563d684328f in bdrv_del_child > > > > (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) > > > > at block.c:6568 > > > > #7 0x00005563d658499e in qmp_x_blockdev_change > > > > (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, > > > > child=0x5563d7577d30 "children.1", has_node=false, > > > > node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 > > > > #8 0x00005563d67e8b4e in qmp_marshal_x_blockdev_change > > > > (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at > > > > qapi/qapi-commands-block-core.c:1538 > > > > #9 0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 > > > > <qmp_commands>, request=0x7fad64009d80, allow_oob=true, > > > > errp=0x7fff3cf5ba98) > > > > at qapi/qmp-dispatch.c:132 > > > > #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 > > > > <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at > > > > qapi/qmp-dispatch.c:175 > > > > #11 0x00005563d67b7787 in monitor_qmp_dispatch > > > > (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145 > > > > #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) > > > > at monitor/qmp.c:234 > > > > #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at > > > > util/async.c:89 > > > > #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at > > > > util/async.c:117 > > > > #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at > > > > util/aio-posix.c:459 > > > > #16 0x00005563d6975917 in aio_ctx_dispatch > > > > (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260 > > > > #17 0x00007fad730e1fbd in g_main_context_dispatch () from > > > > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > > > > #18 0x00005563d6978cda in glib_pollfds_poll () at > > > > util/main-loop.c:219 > > > > #19 0x00005563d6978d58 in os_host_main_loop_wait > > > > (timeout=977650) at util/main-loop.c:242 > > > > #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at > > > > util/main-loop.c:518 > > > > #21 0x00005563d658f5ed in main_loop () at vl.c:1814 > > > > #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, > > > > envp=0x7fff3cf5c2b8) at vl.c:450 > > > > > > > > From the call trace, we can see that the QEMU main thread is > > > > waiting for the in_flight return to zero. But the in_filght never > > > > reaches 0. > > > > After analysis, the root cause is that the coroutine of NBD was > > > > not awakened. Although this bug was found in the COLO test, I think > > > > this is a > > > > universal problem in the QIO module. This issue also affects other > > > > modules depending on QIO such as NBD. We dump the following data: > > > > $2 = { > > > > in_flight = 2, > > > > state = NBD_CLIENT_QUIT, > > > > connect_status = 0, > > > > connect_err = 0x0, > > > > wait_in_flight = false, > > > > requests = {{ > > > > coroutine = 0x0, > > > > offset = 273077071872, > > > > receiving = false, > > > > }, { > > > > coroutine = 0x7f1164571df0, > > > > offset = 498792529920, > > > > receiving = false, > > > > }, { > > > > coroutine = 0x0, > > > > offset = 500663590912, > > > > receiving = false, > > > > }, { > > > > ... > > > > } }, > > > > reply = { > > > > simple = { > > > > magic = 1732535960, > > > > error = 0, > > > > handle = 0 > > > > }, > > > > structured = { > > > > magic = 1732535960, > > > > flags = 0, > > > > type = 0, > > > > handle = 0, > > > > length = 0 > > > > }, > > > > { > > > > magic = 1732535960, > > > > _skip = 0, > > > > handle = 0 > > > > } > > > > }, > > > > bs = 0x7f11640e2140, > > > > reconnect_delay = 0, > > > > saddr = 0x7f11640e1f80, > > > > export = 0x7f11640dc560 "parent0", > > > > } > > > > From the data, we can see the coroutine of NBD does not exit > > > > normally when killing the NBD server(we kill the Secondary VM in the > > > > COLO stress test). > > > > Then it will not execute in_flight--. So, the QEMU main thread > > > > will hang here. Further analysis, I found the coroutine of NBD will > > > > yield > > > > in nbd_send_request() or qio_channel_write_all() in > > > > nbd_co_send_request. By the way, the yield is due to the kernel return > > > > EAGAIN(under the stress test). > > > > However, NBD didn't know it would yield here. So, the > > > > nbd_recv_coroutines_wake won't wake it up because req->receiving is > > > > false. if the NBD server > > > > is terminated abnormally at the same time. The G_IO_OUT will be > > > > invalid, the coroutine will never be awakened. In addition, the s->ioc > > > > will be released > > > > immediately. if we force to wake up the coroutine of NBD, access > > > > to the ioc->xxx will cause segment fault. Finally, I add a state named > > > > force_quit to > > > > the QIOChannel to ensure that QIO will exit immediately. And I add > > > > the function of qio_channel_coroutines_wake to wake it up. > > > > If qio_channel_shutdown() can't kill request that is in > > qio_channel_write_all() - it's not a reponsibility of nbd driver, qio > > channel layer should take care of this.. > > > > Or, you probably need some keep-alive settings set up. > > > > > > > > > > Signed-off-by: Lei Rao <lei....@intel.com> > > > > Signed-off-by: Zhang Chen <chen.zh...@intel.com> > > > > --- > > > > block/nbd.c | 5 +++++ > > > > include/io/channel.h | 19 +++++++++++++++++++ > > > > io/channel.c | 22 ++++++++++++++++++++++ > > > > 3 files changed, 46 insertions(+) > > > > > > > > diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57 > > > > 100644 > > > > --- a/block/nbd.c > > > > +++ b/block/nbd.c > > > > @@ -208,6 +208,8 @@ static void > > > > nbd_teardown_connection(BlockDriverState *bs) > > > > assert(!s->in_flight); > > > > if (s->ioc) { > > > > + qio_channel_set_force_quit(s->ioc, true); > > > > + qio_channel_coroutines_wake(s->ioc); > > > > qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, > > > > NULL); > > > > > > Calling shutdown here should already be causing the qio_chanel_readv_all > > > to wakeup and break out of its > > > poll() loop. We shouldn't need to add a second way to break out of the > > > poll(). > > > > > > Calling shutdown can wake up the coroutines which is polling. But I think > > > it's not enough. I tried to forcibly wake up the NBD coroutine, > > > It may cause segment fault. The root cause is that it will continue to > > > access ioc->xxx in qio_channel_yield(), but the ioc has been released and > > > set it NULL such as in > > > nbd_co_do_establish_connection(); I think call shutdown will have the > > > same result. So, I add the force_quit, once set it true, it will > > > immediately exit without accessing IOC. > > > > > > > What do you mean by "the NBD coroutine" and by "forcibly wake up"? > > > > In recent Qemu there is no specific NBD coroutine. Only request coroutines > > should exist. > > > > Forcibly waking up any coroutine is generally unsafe in Qemu, most of the > > code is not prepared for this, crash is normal result for such try. > > > > Also, there is good mechanism to debug coroutines in gdb: > > > > source scripts/qemu-gdb.py > > qemu coroutine <coroutine pointer> > > > > - it will dump stack trace of a coroutine, it may help. > > > > Also, to find coroutines look at bs->tracked_requests list, all requests of > > bs are in the list with coroutine pointers in .co field. > > I am sorry for the unclear description. The NBD coroutine means the request > coroutines. > > About "the forcibly wake up" I just set the receiving to true before > qio_channel_writev_all() in nbd_co_send_request() > to ensure that the request coroutines can be awakened by > nbd_recv_coroutine_wake_one(). But that's just a test. > > The issue is, only waking up the request coroutine or shutdown the QIO is not > enough because there will be a synchronization issue. > For example, When the NBD request coroutine is awakened in > qio_channel_writev_full_all(), it will continue to access the ioc->xxx, > but the ioc has been set to NULL in nbd_co_do_establish_connection(); it will > cause segment fault.
That is a serious bug. Nothing is permitted to free the QIOChannel while a qio_channel_writev_full_all API call (or any other I/O call is in progress). If multiple threads are involved, then each needs to be holding a reference the QIOChannel to stop it being freed while in use. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|