On Thu, 20 Apr 2023, Mark Syms wrote:
> Updated patch to address intermittent SIGSEGV on domain disconnect/shutdown.
> 
> Mark Syms (1):
>   Ensure the PV ring is drained on disconnect
> 
>  hw/block/dataplane/xen-block.c | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> -- 
> 2.40.0
> 
> >From 21724baa15a72534d98aa2653e9ec39e83559319 Mon Sep 17 00:00:00 2001
> From: Mark Syms <mark.s...@citrix.com>
> Date: Thu, 20 Apr 2023 11:08:34 +0100
> Subject: [PATCH 1/1] Ensure the PV ring is drained on disconnect
> 
> Also ensure all pending AIO is complete.

Hi Mark, can you please add more info on the problem you are trying to
solve? Also add any stacktrace if you get any due to this error.


> Signed-off-by: Mark Syms <mark.s...@citrix.com>
> ---
>  hw/block/dataplane/xen-block.c | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
> index 734da42ea7..d9da4090bf 100644
> --- a/hw/block/dataplane/xen-block.c
> +++ b/hw/block/dataplane/xen-block.c
> @@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane 
> *dataplane)
>  
>      dataplane->more_work = 0;
>  
> +    if (dataplane->sring == 0) {
> +        return done_something;

done_something cannot be changed by now, so I would just do

    return false;


> +    }
> +
>      rc = dataplane->rings.common.req_cons;
>      rp = dataplane->rings.common.sring->req_prod;
>      xen_rmb(); /* Ensure we see queued requests up to 'rp'. */
> @@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane 
> *dataplane)
>  void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
>  {
>      XenDevice *xendev;
> +    XenBlockRequest *request, *next;
>  
>      if (!dataplane) {
>          return;
>      }
>  
> +    /* We're about to drain the ring. We can cancel the scheduling of any
> +     * bottom half now */
> +    qemu_bh_cancel(dataplane->bh);
> +
> +    /* Ensure we have drained the ring */
> +    aio_context_acquire(dataplane->ctx);

Would it make sense to move the 2 loops below under the existing
aio_context_acquire also below?


> +    do {
> +        xen_block_handle_requests(dataplane);
> +    } while (dataplane->more_work);
> +    aio_context_release(dataplane->ctx);
> +
> +    /* Now ensure that all inflight requests are complete */
> +    while (!QLIST_EMPTY(&dataplane->inflight)) {
> +        QLIST_FOREACH_SAFE(request, &dataplane->inflight, list, next) {
> +            blk_aio_flush(request->dataplane->blk, xen_block_complete_aio,
> +                        request);
> +        }
> +    }

especially because I would think that blk_aio_flush needs to be called
with aio_context_acquired ?



>      xendev = dataplane->xendev;
>  
>      aio_context_acquire(dataplane->ctx);
> +

move the new code here


>      if (dataplane->event_channel) {
>          /* Only reason for failure is a NULL channel */
>          xen_device_set_event_channel_context(xendev, 
> dataplane->event_channel,
> @@ -684,12 +709,6 @@ void xen_block_dataplane_stop(XenBlockDataPlane 
> *dataplane)
>      blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), 
> &error_abort);
>      aio_context_release(dataplane->ctx);
>  
> -    /*
> -     * Now that the context has been moved onto the main thread, cancel
> -     * further processing.
> -     */
> -    qemu_bh_cancel(dataplane->bh);
> -
>      if (dataplane->event_channel) {
>          Error *local_err = NULL;
>  
> -- 
> 2.40.0
> 

Reply via email to