>> I won't be able to reply until November 2nd. Maybe Kevin or Hanna can>> 
>> discuss this with you in the meantime.
Thanks
 @Kevin, @Hanna, Do you have any oponions on this issue and my solution ?







Original Mail


Sender: StefanHajnoczi
To: lv mengzhao10286442;
CC: 
stefa...@redhat.com;m...@redhat.com;kw...@redhat.com;hre...@redhat.com;qemu-devel@nongnu.org;hu
 jian10269307;cv11...@126.com;
Date: 2023/10/19 20:16
Subject: Re: Re: [PATCH] virtio-blk: don't start dataplane during the stop of 
dataplane

On Thu, 19 Oct 2023 at 00:31, <lv.mengz...@zte.com.cn> wrote:
> 
> On Tue, Oct 17, 2023 at 10:04PM +0800, stefa...@redhat.com wrote:
> 
> > > From: hujian <hu.j...@zte.com.cn> 
> 
> > > 
> 
> > > During the stop of dataplane for virtio-blk, 
> > > virtio_bus_cleanup_host_notifier() is be
> 
> > > called to clean up notifier at the end, if polled ioeventfd, 
> > > virtio_blk_handle_output()
> 
> > > is used to handle io request. But due to s->dataplane_disabled is false, 
> > > it will be
> 
> > > returned directly, which drops io request.
> 
> > > Backtrace:
> 
> > > ->virtio_blk_data_plane_stop
> 
> > >   ->virtio_bus_cleanup_host_notifier
> 
> > >     ->virtio_queue_host_notifier_read
> 
> > >       ->virtio_queue_notify_vq
> 
> > >         ->vq->handle_output
> 
> > >           ->virtio_blk_handle_output
> 
> > >             ->if (s->dataplane  && !s->dataplane_stoped)
> 
> > >               ->if (!s->dataplane_disabled)
> 
> > >                 ->return *
> 
> > >             ->virtio_blk_handle_output_do
> 
> > > The above problem can occur when using "virsh reset" cmdline to reset 
> > > guest, while
> 
> > > guest does io.
> 
> > > To fix this problem, don't try to start dataplane if s->stopping is true, 
> > > and io would
> 
> > > be handled by virtio_blk_handle_vq().
> 
> > > 
> 
> > > Signed-off-by: hujian <hu.j...@zte.com.cn> 
> 
> > > ---
> 
> > >  hw/block/virtio-blk.c | 2 +-
> 
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> > 
> 
> > I have dropped this patch again after Fiona pointed out it does not
> 
> > compile and Kevin noticed that handling requests from the main loop
> 
> > thread while the I/O is still being processed in the IOThread is going
> 
> > to cause thread-safety issues.
> 
> > 
> 
> > Can you explain the problem you are seeing in more detail? You run
> 
> > "virsh reset" while the guest is doing I/O. Then what happens?
> 
> > 
> 
> > Stefan
> 
> 
> 1 Compilation issues
> 
> I'm sorry to be in such a hurry to submit the patch that I forgot to compile 
> it locally.
> 
> Compilable patches are at the bottom.
> 
> 
> 2 Troubleshooting
 
I won't be able to reply until November 2nd. Maybe Kevin or Hanna can
discuss this with you in the meantime.
 
Stefan
 
> We have done a lifecycle test for the VM (QEMU version: 4.1.0, Host kernel 
> version: 4.18),
> 
> which is loop execution: virsh create -> virsh suspend -> virsh resume -> 
> virsh reset -> 
> 
> virsh shutdown, and io stress test inside the virtual machine. After the loop 
> is executed
> 
> about 200 times, after "virsh reset" is executed, the virtual machine goes 
> into emergency
> 
> mode and fails to start normally. Theoretically, "virsh reset" may causes 
> data loss of
> 
> virtual machine, but not enough to put it into emergency mode.
> 
> 
> Coincidentally, I happen to be fixing another different fault with the same 
> phenomenon,
> 
> which is caused by a our private patch, this patch calls 
> virtio_blk_data_plane_ [stop|start]
> 
> to operate the dataplane, if QEMU is processing io requests at same time, it 
> may cause the
> 
> loss of io requests.
> 
> 
> Analyzing virtio_blk_data_plane_stop(), virtio_bus_cleanup_host_notifier() is 
> used to
> 
> clean up notifiers, and my understanding is that it would handle the 
> remaining IO requests.
> 
> The stack is as follows, I add the print at the star line and find that 
> virtio_blk_handle_output()
> 
> returned directly instead of continuing to call virtio_blk_handle_vq() to 
> handle io. So I modify
> 
> the code here to make it don't return during the stop of dataplane, and our 
> internal private patch
> 
> works normally.
> 
> 
> Backtrace:
> 
> ->virtio_blk_data_plane_stop
> 
>   ->virtio_bus_cleanup_host_notifier
> 
>     ->virtio_queue_host_notifier_read
> 
>       ->virtio_queue_notify_vq
> 
>         ->vq->handle_output
> 
>           ->virtio_blk_handle_output
> 
>             ->if (s->dataplane  && !s->dataplane_stoped)
> 
>               ->if (!s->dataplane_disabled)
> 
>                 ->return *
> 
>             ->virtio_blk_handle_vq
> 
> 
> Back to the problem caused by the virsh reset, libvirt sends the 
> "system_reset" QEMU
> 
> monitor command to QEMU, and QEMU calls qmp_system_reset(), and eventually 
> calls
> 
> virtio_blk_data_plane_[stop|start] to reset devices. I suspect that io loss 
> will
> 
> also occur if QEMU still has io to process during the stop of dataplane.
> 
> 
> 3 Thread-safety issues
> 
> virtio_blk_data_plane_stop() calls blk_set_aio_context() to switch bs back to 
> the QEMU
> 
> main loop after virtio_bus_cleanup_host_notifier(), so the remaining IO 
> requests
> 
> are still handling by iothread(if configured). I'm a little confused as to 
> why there
> 
> is thread-safety issues.
> 
> 
> Lastly, please CC to cv11...@126.com, this is my private email, so I can 
> contact with
> 
> you in my free time, Thanks.
> 
> 
> 4 Compilable patches
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> 
> index 39e7f23..d3a719c 100644
> 
> --- a/hw/block/virtio-blk.c
> 
> +++ b/hw/block/virtio-blk.c
> 
> @@ -1165,8 +1165,9 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
> 
>  static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> 
>  {
> 
>      VirtIOBlock *s = (VirtIOBlock *)vdev;
> 
> +    VirtIOBlockDataPlane *dp = s->dataplane;
> 
> 
> -    if (s->dataplane && !s->dataplane_started) {
> 
> +    if (s->dataplane && !s->dataplane_started && !dp->stopping) {
> 
>          /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> 
>           * dataplane here instead of waiting for .set_status().
> 
>           */
> 
> 
> 
> 
>

Reply via email to