Sorry for the long delay, I was out of action for a week.

Ryan Harper <ry...@us.ibm.com> writes:

> When removing a drive from the host-side via drive_del we currently have the
> following path:
>
> drive_del
> qemu_aio_flush()
> bdrv_close()
> drive_uninit()
> bdrv_delete()
>
> When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
> however, the block devices retain a copy of this pointer, see
> hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.
>
> We now have a use-after-free situation.  If the guest continues to issue IO
> against the device, and we've reallocated the memory that the BlockDriverState
> pointed at, then we will fail the bs->drv checks in the various bdrv_ methods.

"we will fail the bs->drv checks" is misleading, in my opinion.  Here's
what happens:

1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get
   dropped.  Works as designed.

2. drive_uninit() frees the bs.  Since the device is still connected to
   bs, any subsequent I/O is a use-after-free.

   The value of bs->drv becomes unpredictable on free.  As long as it
   remains null, I/O still gets dropped.  I/O crashes or worse once that
   changed.  Could be right on free, could be much later.

If you respin anyway, please clarify your description.

> To resolve this issue as simply as possible, we can chose to not actually
> delete the BlockDriverState pointer.  Since bdrv_close() handles setting the 
> drv
> pointer to NULL, we just need to remove the BlockDriverState from the QLIST
> that is used to enumerate the block devices.  This is currently handled within
> bdrv_delete, so move this into it's own function, bdrv_remove().

Why do we remove the BlockDriverState from bdrv_states?  Because we want
drive_del make its *name* go away.

Begs the question: is the code prepared for a BlockDriverState object
that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
creates such objects when the device_name is empty.  This is used for
internal BlockDriverStates such as COW backing files.  Your code makes
device_name empty when taking the object off bdrv_states, so we're good.

Begs yet another question: how does the behavior of a BlockDriverState
change when it's taken off bdrv_states, and is that the behavior we
want?  Changes:

* bdrv_delete() no longer takes it off bdrv_states.  Good.

* bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
  cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
  do nothing anyway for closed BlockDriverStates.

* "info block" and "info blockstats" no longer show it, because
  bdrv_info() and bdrv_info_stats() no longer see it.  Okay.

* bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
  Please check their uses and report.

> The result is that we can now invoke drive_del, this closes the file 
> descriptors
> and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
> and since we do not free BlockDriverState, we don't have to worry about the 
> copy
> retained in the block devices.

Yep.  But there's one more question: is the BlockDriverState freed when
the device using it gets destroyed?

qdev_free() runs prop->info->free() for all properties.  The drive
property's free() is free_drive():

  static void free_drive(DeviceState *dev, Property *prop)
  {
      BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);

      if (*ptr) {
          bdrv_detach(*ptr, dev);
          blockdev_auto_del(*ptr);
      }
  }

This should indeed delete the drive.  But only if the property still
points to it.  See below.

> Reported-by: Marcus Armbruster <arm...@redhat.com>
> Signed-off-by: Ryan Harper <ry...@us.ibm.com>
> ---
> v1->v2
>  - NULL bs->device_name after removing from list to prevent
>    second removal.
>
>  block.c    |   12 +++++++++---
>  block.h    |    1 +
>  blockdev.c |    2 +-
>  3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index 1544d81..0df9942 100644
> --- a/block.c
> +++ b/block.c
> @@ -697,14 +697,20 @@ void bdrv_close_all(void)
>      }
>  }
>  
> +void bdrv_remove(BlockDriverState *bs)
> +{
> +    if (bs->device_name[0] != '\0') {
> +        QTAILQ_REMOVE(&bdrv_states, bs, list);
> +    }
> +    bs->device_name[0] = '\0';
> +}
> +

I don't like this name.  What's the difference between "delete" and
"remove"?

The function zaps the device name.  bdrv_make_anon()?

>  void bdrv_delete(BlockDriverState *bs)
>  {
>      assert(!bs->peer);
>  
>      /* remove from list, if necessary */
> -    if (bs->device_name[0] != '\0') {
> -        QTAILQ_REMOVE(&bdrv_states, bs, list);
> -    }
> +    bdrv_remove(bs);
>  
>      bdrv_close(bs);
>      if (bs->file != NULL) {
> diff --git a/block.h b/block.h
> index 5d78fc0..8447397 100644
> --- a/block.h
> +++ b/block.h
> @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>      QEMUOptionParameter *options);
>  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>  BlockDriverState *bdrv_new(const char *device_name);
> +void bdrv_remove(BlockDriverState *bs);
>  void bdrv_delete(BlockDriverState *bs);
>  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
> diff --git a/blockdev.c b/blockdev.c
> index 0690cc8..1f57b50 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
> QObject **ret_data)

Let me add a bit more context:

      bdrv_flush(bs);
      bdrv_close(bs);

      /* clean up guest state from pointing to host resource by
       * finding and removing DeviceState "drive" property */
      if (bs->peer) {
          for (prop = bs->peer->info->props; prop && prop->name; prop++) {
              if (prop->info->type == PROP_TYPE_DRIVE) {
                  ptr = qdev_get_prop_ptr(bs->peer, prop);
                  if (*ptr == bs) {
                      bdrv_detach(bs, bs->peer);
                      *ptr = NULL;
                      break;
                  }
              }
          }
>      }

This zaps the drive property.  A subsequent free_drive() will do
nothing.  We leak the BlockDriverState on device unplug, I'm afraid.

Any reason why we still want to zap the drive property?

>  
>      /* clean up host side */
> -    drive_uninit(drive_get_by_blockdev(bs));
> +    bdrv_remove(bs);
>  
>      return 0;
>  }

Reply via email to