>> I won't be able to reply until November 2nd. Maybe Kevin or Hanna can>> >> discuss this with you in the meantime. Thanks @Kevin, @Hanna, Do you have any oponions on this issue and my solution ?
Original Mail Sender: StefanHajnoczi To: lv mengzhao10286442; CC: stefa...@redhat.com;m...@redhat.com;kw...@redhat.com;hre...@redhat.com;qemu-devel@nongnu.org;hu jian10269307;cv11...@126.com; Date: 2023/10/19 20:16 Subject: Re: Re: [PATCH] virtio-blk: don't start dataplane during the stop of dataplane On Thu, 19 Oct 2023 at 00:31, <lv.mengz...@zte.com.cn> wrote: > > On Tue, Oct 17, 2023 at 10:04PM +0800, stefa...@redhat.com wrote: > > > > From: hujian <hu.j...@zte.com.cn> > > > > > > > > During the stop of dataplane for virtio-blk, > > > virtio_bus_cleanup_host_notifier() is be > > > > called to clean up notifier at the end, if polled ioeventfd, > > > virtio_blk_handle_output() > > > > is used to handle io request. But due to s->dataplane_disabled is false, > > > it will be > > > > returned directly, which drops io request. > > > > Backtrace: > > > > ->virtio_blk_data_plane_stop > > > > ->virtio_bus_cleanup_host_notifier > > > > ->virtio_queue_host_notifier_read > > > > ->virtio_queue_notify_vq > > > > ->vq->handle_output > > > > ->virtio_blk_handle_output > > > > ->if (s->dataplane && !s->dataplane_stoped) > > > > ->if (!s->dataplane_disabled) > > > > ->return * > > > > ->virtio_blk_handle_output_do > > > > The above problem can occur when using "virsh reset" cmdline to reset > > > guest, while > > > > guest does io. > > > > To fix this problem, don't try to start dataplane if s->stopping is true, > > > and io would > > > > be handled by virtio_blk_handle_vq(). > > > > > > > > Signed-off-by: hujian <hu.j...@zte.com.cn> > > > > --- > > > > hw/block/virtio-blk.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > I have dropped this patch again after Fiona pointed out it does not > > > compile and Kevin noticed that handling requests from the main loop > > > thread while the I/O is still being processed in the IOThread is going > > > to cause thread-safety issues. > > > > > > Can you explain the problem you are seeing in more detail? You run > > > "virsh reset" while the guest is doing I/O. Then what happens? > > > > > > Stefan > > > 1 Compilation issues > > I'm sorry to be in such a hurry to submit the patch that I forgot to compile > it locally. > > Compilable patches are at the bottom. > > > 2 Troubleshooting I won't be able to reply until November 2nd. Maybe Kevin or Hanna can discuss this with you in the meantime. Stefan > We have done a lifecycle test for the VM (QEMU version: 4.1.0, Host kernel > version: 4.18), > > which is loop execution: virsh create -> virsh suspend -> virsh resume -> > virsh reset -> > > virsh shutdown, and io stress test inside the virtual machine. After the loop > is executed > > about 200 times, after "virsh reset" is executed, the virtual machine goes > into emergency > > mode and fails to start normally. Theoretically, "virsh reset" may causes > data loss of > > virtual machine, but not enough to put it into emergency mode. > > > Coincidentally, I happen to be fixing another different fault with the same > phenomenon, > > which is caused by a our private patch, this patch calls > virtio_blk_data_plane_ [stop|start] > > to operate the dataplane, if QEMU is processing io requests at same time, it > may cause the > > loss of io requests. > > > Analyzing virtio_blk_data_plane_stop(), virtio_bus_cleanup_host_notifier() is > used to > > clean up notifiers, and my understanding is that it would handle the > remaining IO requests. > > The stack is as follows, I add the print at the star line and find that > virtio_blk_handle_output() > > returned directly instead of continuing to call virtio_blk_handle_vq() to > handle io. So I modify > > the code here to make it don't return during the stop of dataplane, and our > internal private patch > > works normally. > > > Backtrace: > > ->virtio_blk_data_plane_stop > > ->virtio_bus_cleanup_host_notifier > > ->virtio_queue_host_notifier_read > > ->virtio_queue_notify_vq > > ->vq->handle_output > > ->virtio_blk_handle_output > > ->if (s->dataplane && !s->dataplane_stoped) > > ->if (!s->dataplane_disabled) > > ->return * > > ->virtio_blk_handle_vq > > > Back to the problem caused by the virsh reset, libvirt sends the > "system_reset" QEMU > > monitor command to QEMU, and QEMU calls qmp_system_reset(), and eventually > calls > > virtio_blk_data_plane_[stop|start] to reset devices. I suspect that io loss > will > > also occur if QEMU still has io to process during the stop of dataplane. > > > 3 Thread-safety issues > > virtio_blk_data_plane_stop() calls blk_set_aio_context() to switch bs back to > the QEMU > > main loop after virtio_bus_cleanup_host_notifier(), so the remaining IO > requests > > are still handling by iothread(if configured). I'm a little confused as to > why there > > is thread-safety issues. > > > Lastly, please CC to cv11...@126.com, this is my private email, so I can > contact with > > you in my free time, Thanks. > > > 4 Compilable patches > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > > index 39e7f23..d3a719c 100644 > > --- a/hw/block/virtio-blk.c > > +++ b/hw/block/virtio-blk.c > > @@ -1165,8 +1165,9 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq) > > static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) > > { > > VirtIOBlock *s = (VirtIOBlock *)vdev; > > + VirtIOBlockDataPlane *dp = s->dataplane; > > > - if (s->dataplane && !s->dataplane_started) { > > + if (s->dataplane && !s->dataplane_started && !dp->stopping) { > > /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start > > * dataplane here instead of waiting for .set_status(). > > */ > > > > >