* Markus Armbruster <arm...@redhat.com> [2011-03-24 07:27]:
> Whoops, almost missed this.  Best to cc: me to avoid that.
> 

It was sent directly to you:

>   Sender: qemu-devel-bounces+ryanh=us.ibm....@nongnu.org
>   From: Ryan Harper <ry...@us.ibm.com>
>   Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when 
> deleting the drive
>   Date: Tue, 22 Mar 2011 20:53:47 -0500
>   Message-ID: <20110323015347.ga20...@us.ibm.com>
>   User-Agent: Mutt/1.5.6+20040907i
>   To: Markus Armbruster <arm...@redhat.com>
>   Cc: Kevin Wolf <kw...@redhat.com>, Ryan Harper <ry...@us.ibm.com>,
>       qemu-devel@nongnu.org


> Ryan Harper <ry...@us.ibm.com> writes:
> 
> > * Markus Armbruster <arm...@redhat.com> [2011-03-15 04:48]:
> >> Sorry for the long delay, I was out of action for a week.
> >> 
> >> Ryan Harper <ry...@us.ibm.com> writes:
> >> 
> >> > When removing a drive from the host-side via drive_del we currently have 
> >> > the
> >> > following path:
> >> >
> >> > drive_del
> >> > qemu_aio_flush()
> >> > bdrv_close()
> >> > drive_uninit()
> >> > bdrv_delete()
> >> >
> >> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
> >> > however, the block devices retain a copy of this pointer, see
> >> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.
> >> >
> >> > We now have a use-after-free situation.  If the guest continues to issue 
> >> > IO
> >> > against the device, and we've reallocated the memory that the 
> >> > BlockDriverState
> >> > pointed at, then we will fail the bs->drv checks in the various bdrv_ 
> >> > methods.
> >> 
> >> "we will fail the bs->drv checks" is misleading, in my opinion.  Here's
> >> what happens:
> >> 
> >> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get
> >>    dropped.  Works as designed.
> >> 
> >> 2. drive_uninit() frees the bs.  Since the device is still connected to
> >>    bs, any subsequent I/O is a use-after-free.
> >> 
> >>    The value of bs->drv becomes unpredictable on free.  As long as it
> >>    remains null, I/O still gets dropped.  I/O crashes or worse once that
> >>    changed.  Could be right on free, could be much later.
> >> 
> >> If you respin anyway, please clarify your description.
> >
> > Sure.  I wasn't planning a new version, but I'll update and send anyhow
> > as I didn't see it get included in pull from the block branch.
> >> 
> >> > To resolve this issue as simply as possible, we can chose to not actually
> >> > delete the BlockDriverState pointer.  Since bdrv_close() handles setting 
> >> > the drv
> >> > pointer to NULL, we just need to remove the BlockDriverState from the 
> >> > QLIST
> >> > that is used to enumerate the block devices.  This is currently handled 
> >> > within
> >> > bdrv_delete, so move this into it's own function, bdrv_remove().
> >> 
> >> Why do we remove the BlockDriverState from bdrv_states?  Because we want
> >> drive_del make its *name* go away.
> >> 
> >> Begs the question: is the code prepared for a BlockDriverState object
> >> that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
> >> creates such objects when the device_name is empty.  This is used for
> >> internal BlockDriverStates such as COW backing files.  Your code makes
> >> device_name empty when taking the object off bdrv_states, so we're good.
> >> 
> >> Begs yet another question: how does the behavior of a BlockDriverState
> >> change when it's taken off bdrv_states, and is that the behavior we
> >> want?  Changes:
> >> 
> >> * bdrv_delete() no longer takes it off bdrv_states.  Good.
> >> 
> >> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
> >>   cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
> >>   do nothing anyway for closed BlockDriverStates.
> >> 
> >> * "info block" and "info blockstats" no longer show it, because
> >>   bdrv_info() and bdrv_info_stats() no longer see it.  Okay.
> >> 
> >> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
> >>   Please check their uses and report.
> >
> >    1    664  block-migration.c <<block_load>>
> >              bs = bdrv_find(device_name);
> >
> >     - no longer see it.  This is fine since we can't migrate a block
> >     device that has been removed
> >
> >    2    562  blockdev.c <<do_commit>>
> >              bs = bdrv_find(device);
> >
> >     - do_commit won't see it in either when calling bdrv_commit_all()
> >     Fine as you mention above.  If user specifies the device name
> >     we won't find it, that's OK because we can't commit data against
> >     a closed BlockDriverState.
> >
> >    3    587  blockdev.c <<do_snapshot_blkdev>>
> >              bs = bdrv_find(device);
> >
> >    - OK, cannot take a snapshot against a deleted BlockDriverState
> >
> >    4    662  blockdev.c <<do_eject>>
> >              bs = bdrv_find(filename);
> >
> >    - OK, cannot eject a deleted BlockDriverState; 
> >
> >    5    676  blockdev.c <<do_block_set_passwd>>
> >              bs = bdrv_find(qdict_get_str(qdict, "device"));
> >
> >    - OK, cannot set password a deleted BlockDriverState; 
> >
> >    6    701  blockdev.c <<do_change_block>>
> >              bs = bdrv_find(device);
> >
> >    - OK, cannot change the file/device of a deleted BlockDriverState; 
> >
> >    7    732  blockdev.c <<do_drive_del>>
> >              bs = bdrv_find(id);
> >
> >    - OK, cannot delete an already deleted Drive
> >
> >    8    783  blockdev.c <<do_block_resize>>
> >              bs = bdrv_find(device);
> >
> >    - OK, cannot resize a deleted Drive
> >
> >    9    312  hw/qdev-properties.c <<parse_drive>>
> >              bs = bdrv_find(str);
> >
> >    - Used when invoking qdev_prop_drive .parse method;  parse method is 
> > invoked via
> >     qdev_device_add() which calls set_property() which invokes parse.  
> > AFAICT, this is OK
> >     since we won't be going down the device add path worrying about a
> >     deleted block device.
> 
> Thanks for checking!
> 
> >> > The result is that we can now invoke drive_del, this closes the file 
> >> > descriptors
> >> > and sets BlockDriverState->drv to NULL which prevents futher IO to the 
> >> > device,
> >> > and since we do not free BlockDriverState, we don't have to worry about 
> >> > the copy
> >> > retained in the block devices.
> >> 
> >> Yep.  But there's one more question: is the BlockDriverState freed when
> >> the device using it gets destroyed?
> >> 
> >> qdev_free() runs prop->info->free() for all properties.  The drive
> >> property's free() is free_drive():
> >> 
> >>   static void free_drive(DeviceState *dev, Property *prop)
> >>   {
> >>       BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
> >> 
> >>       if (*ptr) {
> >>           bdrv_detach(*ptr, dev);
> >>           blockdev_auto_del(*ptr);
> >>       }
> >>   }
> >> 
> >> This should indeed delete the drive.  But only if the property still
> >> points to it.  See below.
> >> 
> >> > Reported-by: Marcus Armbruster <arm...@redhat.com>
> >> > Signed-off-by: Ryan Harper <ry...@us.ibm.com>
> >> > ---
> >> > v1->v2
> >> >  - NULL bs->device_name after removing from list to prevent
> >> >    second removal.
> >> >
> >> >  block.c    |   12 +++++++++---
> >> >  block.h    |    1 +
> >> >  blockdev.c |    2 +-
> >> >  3 files changed, 11 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/block.c b/block.c
> >> > index 1544d81..0df9942 100644
> >> > --- a/block.c
> >> > +++ b/block.c
> >> > @@ -697,14 +697,20 @@ void bdrv_close_all(void)
> >> >      }
> >> >  }
> >> >  
> >> > +void bdrv_remove(BlockDriverState *bs)
> >> > +{
> >> > +    if (bs->device_name[0] != '\0') {
> >> > +        QTAILQ_REMOVE(&bdrv_states, bs, list);
> >> > +    }
> >> > +    bs->device_name[0] = '\0';
> >> > +}
> >> > +
> >> 
> >> I don't like this name.  What's the difference between "delete" and
> >> "remove"?
> >> 
> >> The function zaps the device name.  bdrv_make_anon()?
> >
> > bdrv_delist? bdrv_hide?  I'm also fine with make_anon.
> 
> Any of the three works for me.  "delist" seems the least obvious, but
> that's a matter of taste.
> 
> >> >  void bdrv_delete(BlockDriverState *bs)
> >> >  {
> >> >      assert(!bs->peer);
> >> >  
> >> >      /* remove from list, if necessary */
> >> > -    if (bs->device_name[0] != '\0') {
> >> > -        QTAILQ_REMOVE(&bdrv_states, bs, list);
> >> > -    }
> >> > +    bdrv_remove(bs);
> >> >  
> >> >      bdrv_close(bs);
> >> >      if (bs->file != NULL) {
> >> > diff --git a/block.h b/block.h
> >> > index 5d78fc0..8447397 100644
> >> > --- a/block.h
> >> > +++ b/block.h
> >> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
> >> >      QEMUOptionParameter *options);
> >> >  int bdrv_create_file(const char* filename, QEMUOptionParameter 
> >> > *options);
> >> >  BlockDriverState *bdrv_new(const char *device_name);
> >> > +void bdrv_remove(BlockDriverState *bs);
> >> >  void bdrv_delete(BlockDriverState *bs);
> >> >  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int 
> >> > flags);
> >> >  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
> >> > diff --git a/blockdev.c b/blockdev.c
> >> > index 0690cc8..1f57b50 100644
> >> > --- a/blockdev.c
> >> > +++ b/blockdev.c
> >> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
> >> > QObject **ret_data)
> >> 
> >> Let me add a bit more context:
> >> 
> >>       bdrv_flush(bs);
> >>       bdrv_close(bs);
> >> 
> >>       /* clean up guest state from pointing to host resource by
> >>        * finding and removing DeviceState "drive" property */
> >>       if (bs->peer) {
> >>           for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> >>               if (prop->info->type == PROP_TYPE_DRIVE) {
> >>                   ptr = qdev_get_prop_ptr(bs->peer, prop);
> >>                   if (*ptr == bs) {
> >>                       bdrv_detach(bs, bs->peer);
> >>                       *ptr = NULL;
> >>                       break;
> >>                   }
> >>               }
> >>           }
> >> >      }
> >> 
> >> This zaps the drive property.  A subsequent free_drive() will do
> >> nothing.  We leak the BlockDriverState on device unplug, I'm afraid.
> >> 
> >> Any reason why we still want to zap the drive property?
> >
> > IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if
> > we're keeping the bs around anyhow, then I don't think we need to remove
> > the property.
> 
> Agree.
> 
> >                One last check would be to see what if the device still
> > shows up in qtree if we don't remove the property.
> 
> I expect it to show up as "drive = " instead of "drive = <null>",
> because:
> 
>     static int print_drive(DeviceState *dev, Property *prop, char *dest, 
> size_t len)
>     {
>         BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
>         return snprintf(dest, len, "%s",
>                         *ptr ? bdrv_get_device_name(*ptr) : "<null>");
>     }
> 
> and
> 
>     const char *bdrv_get_device_name(BlockDriverState *bs)
>     {
>         return bs->device_name;
>     }
> 
> If the empty right hand side bothers you, you can change print_drive()
> to print something else then, say "<anon>", or "<gone>".
> 
> Looking forward to v3 :)

After our chat: here's what I've got in v3:

- Update drive_del use after free description
- s/bdrv_remove/bdrv_make_anon/g
- Don't remove qdev property since we don't delete bs any more
- If (bs->peer) bdrv_make_anon else bdrv_delete to handle removing
  drives with no device.

Sending out v3 as a top-post now.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Reply via email to