Am 23.10.2020 um 12:41 hat Greg Kurz geschrieben:
> If a BDS gets deleted during blk_drain_all(), it might miss a
> call to bdrv_do_drained_end(). This means missing a call to
> aio_enable_external() and the AIO context remains disabled for
> ever. This can cause a device to become irresponsive and to
> disrupt the guest execution, ie. hang, loop forever or worse.
> 
> This scenario is quite easy to encounter with virtio-scsi
> on POWER when punching multiple blockdev-create QMP commands
> while the guest is booting and it is still running the SLOF
> firmware. This happens because SLOF disables/re-enables PCI
> devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
> register after the initial probe/feature negotiation, as it
> tends to work with a single device at a time at various stages
> like probing and running block/network bootloaders without
> doing a full reset in-between. This naturally generates many
> dataplane stops and starts, and thus many drain sections that
> can race with blockdev_create_run(). In the end, SLOF bails
> out.
> 
> It is somehow reproducible on x86 but it requires to generate
> articial dataplane start/stop activity with stop/cont QMP
> commands. In this case, seabios ends up looping for ever,
> waiting for the virtio-scsi device to send a response to
> a command it never received.
> 
> Add a helper that pairs all previously called bdrv_do_drained_begin()
> with a bdrv_do_drained_end() and call it from bdrv_close().
> While at it, update the "/bdrv-drain/graph-change/drain_all"
> test in test-bdrv-drain so that it can catch the issue.
> 
> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
> Signed-off-by: Greg Kurz <gr...@kaod.org>
> ---
>  block.c                 |    9 +++++++++
>  block/io.c              |   13 +++++++++++++
>  include/block/block.h   |    6 ++++++
>  tests/test-bdrv-drain.c |    1 +
>  4 files changed, 29 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 430edf79bb10..ddcb36dd48a8 100644
> --- a/block.c
> +++ b/block.c
> @@ -4458,6 +4458,15 @@ static void bdrv_close(BlockDriverState *bs)
>      }
>      QLIST_INIT(&bs->aio_notifiers);
>      bdrv_drained_end(bs);
> +
> +    /*
> +     * If we're still inside some bdrv_drain_all_begin()/end() sections, end
> +     * them now since this BDS won't exist anymore when bdrv_drain_all_end()
> +     * gets called.
> +     */
> +    if (bs->quiesce_counter) {
> +        bdrv_drained_end_quiesce(bs);
> +    }
>  }
>  
>  void bdrv_close_all(void)
> diff --git a/block/io.c b/block/io.c
> index 54f0968aee27..8a0da06bbb14 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -633,6 +633,19 @@ void bdrv_drain_all_begin(void)
>      }
>  }
>  
> +void bdrv_drained_end_quiesce(BlockDriverState *bs)

I think the name should make clear that this is meant as a counterpart
for bdrv_drain_all_begin(), so maybe bdrv_drain_all_end_quiesce()?

(The function is not suitable for any other kinds of drain because the
parameters it passes to bdrv_do_drained_end() are only the same as for
bdrv_drain_all_begin().)

> +{
> +    int drained_end_counter = 0;
> +
> +    g_assert_cmpint(bs->quiesce_counter, >, 0);
> +    g_assert_cmpint(bs->refcnt, ==, 0);

By the way, I didn't know about the problem with these macros either.

> +    while (bs->quiesce_counter) {
> +        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
> +    }
> +    BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
> +}
> +
>  void bdrv_drain_all_end(void)
>  {
>      BlockDriverState *bs = NULL;
> diff --git a/include/block/block.h b/include/block/block.h
> index d16c401cb44e..c0ce6e690081 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -779,6 +779,12 @@ void bdrv_drained_end(BlockDriverState *bs);
>   */
>  void bdrv_drained_end_no_poll(BlockDriverState *bs, int 
> *drained_end_counter);
>  
> +/**
> + * End all quiescent sections started by bdrv_drain_all_begin(). This is
> + * only needed when deleting a BDS before bdrv_drain_all_end() is called.
> + */
> +void bdrv_drained_end_quiesce(BlockDriverState *bs);
> +
>  /**
>   * End a quiescent section started by bdrv_subtree_drained_begin().
>   */
> diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
> index 1595bbc92e9e..8a29e33e004a 100644
> --- a/tests/test-bdrv-drain.c
> +++ b/tests/test-bdrv-drain.c
> @@ -594,6 +594,7 @@ static void test_graph_change_drain_all(void)
>  
>      g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
>      g_assert_cmpint(b_s->drain_count, ==, 0);
> +    g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);

But here in the test case we should keep g_assert_cmpint() because it
gives better error messages when it fails (and checkpatch doesn't warn
about it in tests).

Apart from the naming and checkpatch, this looks good to me.

Kevin


Reply via email to