On Thu, 19 Oct 2023 at 00:31, <lv.mengz...@zte.com.cn> wrote:
>
> On Tue, Oct 17, 2023 at 10:04PM +0800, stefa...@redhat.com wrote:
>
> > > From: hujian <hu.j...@zte.com.cn>
>
> > >
>
> > > During the stop of dataplane for virtio-blk, 
> > > virtio_bus_cleanup_host_notifier() is be
>
> > > called to clean up notifier at the end, if polled ioeventfd, 
> > > virtio_blk_handle_output()
>
> > > is used to handle io request. But due to s->dataplane_disabled is false, 
> > > it will be
>
> > > returned directly, which drops io request.
>
> > > Backtrace:
>
> > > ->virtio_blk_data_plane_stop
>
> > >   ->virtio_bus_cleanup_host_notifier
>
> > >     ->virtio_queue_host_notifier_read
>
> > >       ->virtio_queue_notify_vq
>
> > >         ->vq->handle_output
>
> > >           ->virtio_blk_handle_output
>
> > >             ->if (s->dataplane  && !s->dataplane_stoped)
>
> > >               ->if (!s->dataplane_disabled)
>
> > >                 ->return *
>
> > >             ->virtio_blk_handle_output_do
>
> > > The above problem can occur when using "virsh reset" cmdline to reset 
> > > guest, while
>
> > > guest does io.
>
> > > To fix this problem, don't try to start dataplane if s->stopping is true, 
> > > and io would
>
> > > be handled by virtio_blk_handle_vq().
>
> > >
>
> > > Signed-off-by: hujian <hu.j...@zte.com.cn>
>
> > > ---
>
> > >  hw/block/virtio-blk.c | 2 +-
>
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> >
>
> > I have dropped this patch again after Fiona pointed out it does not
>
> > compile and Kevin noticed that handling requests from the main loop
>
> > thread while the I/O is still being processed in the IOThread is going
>
> > to cause thread-safety issues.
>
> >
>
> > Can you explain the problem you are seeing in more detail? You run
>
> > "virsh reset" while the guest is doing I/O. Then what happens?
>
> >
>
> > Stefan
>
>
> 1 Compilation issues
>
> I'm sorry to be in such a hurry to submit the patch that I forgot to compile 
> it locally.
>
> Compilable patches are at the bottom.
>
>
> 2 Troubleshooting

I won't be able to reply until November 2nd. Maybe Kevin or Hanna can
discuss this with you in the meantime.

Stefan

> We have done a lifecycle test for the VM (QEMU version: 4.1.0, Host kernel 
> version: 4.18),
>
> which is loop execution: virsh create -> virsh suspend -> virsh resume -> 
> virsh reset ->
>
> virsh shutdown, and io stress test inside the virtual machine. After the loop 
> is executed
>
> about 200 times, after "virsh reset" is executed, the virtual machine goes 
> into emergency
>
> mode and fails to start normally. Theoretically, "virsh reset" may causes 
> data loss of
>
> virtual machine, but not enough to put it into emergency mode.
>
>
> Coincidentally, I happen to be fixing another different fault with the same 
> phenomenon,
>
> which is caused by a our private patch, this patch calls 
> virtio_blk_data_plane_ [stop|start]
>
> to operate the dataplane, if QEMU is processing io requests at same time, it 
> may cause the
>
> loss of io requests.
>
>
> Analyzing virtio_blk_data_plane_stop(), virtio_bus_cleanup_host_notifier() is 
> used to
>
> clean up notifiers, and my understanding is that it would handle the 
> remaining IO requests.
>
> The stack is as follows, I add the print at the star line and find that 
> virtio_blk_handle_output()
>
> returned directly instead of continuing to call virtio_blk_handle_vq() to 
> handle io. So I modify
>
> the code here to make it don't return during the stop of dataplane, and our 
> internal private patch
>
> works normally.
>
>
> Backtrace:
>
> ->virtio_blk_data_plane_stop
>
>   ->virtio_bus_cleanup_host_notifier
>
>     ->virtio_queue_host_notifier_read
>
>       ->virtio_queue_notify_vq
>
>         ->vq->handle_output
>
>           ->virtio_blk_handle_output
>
>             ->if (s->dataplane  && !s->dataplane_stoped)
>
>               ->if (!s->dataplane_disabled)
>
>                 ->return *
>
>             ->virtio_blk_handle_vq
>
>
> Back to the problem caused by the virsh reset, libvirt sends the 
> "system_reset" QEMU
>
> monitor command to QEMU, and QEMU calls qmp_system_reset(), and eventually 
> calls
>
> virtio_blk_data_plane_[stop|start] to reset devices. I suspect that io loss 
> will
>
> also occur if QEMU still has io to process during the stop of dataplane.
>
>
> 3 Thread-safety issues
>
> virtio_blk_data_plane_stop() calls blk_set_aio_context() to switch bs back to 
> the QEMU
>
> main loop after virtio_bus_cleanup_host_notifier(), so the remaining IO 
> requests
>
> are still handling by iothread(if configured). I'm a little confused as to 
> why there
>
> is thread-safety issues.
>
>
> Lastly, please CC to cv11...@126.com, this is my private email, so I can 
> contact with
>
> you in my free time, Thanks.
>
>
> 4 Compilable patches
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>
> index 39e7f23..d3a719c 100644
>
> --- a/hw/block/virtio-blk.c
>
> +++ b/hw/block/virtio-blk.c
>
> @@ -1165,8 +1165,9 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
>
>  static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>
>  {
>
>      VirtIOBlock *s = (VirtIOBlock *)vdev;
>
> +    VirtIOBlockDataPlane *dp = s->dataplane;
>
>
> -    if (s->dataplane && !s->dataplane_started) {
>
> +    if (s->dataplane && !s->dataplane_started && !dp->stopping) {
>
>          /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>
>           * dataplane here instead of waiting for .set_status().
>
>           */
>
>
>
>
>

Reply via email to