Whoops, almost missed this.  Best to cc: me to avoid that.

Ryan Harper <ry...@us.ibm.com> writes:

> * Markus Armbruster <arm...@redhat.com> [2011-03-15 04:48]:
>> Sorry for the long delay, I was out of action for a week.
>> 
>> Ryan Harper <ry...@us.ibm.com> writes:
>> 
>> > When removing a drive from the host-side via drive_del we currently have 
>> > the
>> > following path:
>> >
>> > drive_del
>> > qemu_aio_flush()
>> > bdrv_close()
>> > drive_uninit()
>> > bdrv_delete()
>> >
>> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
>> > however, the block devices retain a copy of this pointer, see
>> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.
>> >
>> > We now have a use-after-free situation.  If the guest continues to issue IO
>> > against the device, and we've reallocated the memory that the 
>> > BlockDriverState
>> > pointed at, then we will fail the bs->drv checks in the various bdrv_ 
>> > methods.
>> 
>> "we will fail the bs->drv checks" is misleading, in my opinion.  Here's
>> what happens:
>> 
>> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get
>>    dropped.  Works as designed.
>> 
>> 2. drive_uninit() frees the bs.  Since the device is still connected to
>>    bs, any subsequent I/O is a use-after-free.
>> 
>>    The value of bs->drv becomes unpredictable on free.  As long as it
>>    remains null, I/O still gets dropped.  I/O crashes or worse once that
>>    changed.  Could be right on free, could be much later.
>> 
>> If you respin anyway, please clarify your description.
>
> Sure.  I wasn't planning a new version, but I'll update and send anyhow
> as I didn't see it get included in pull from the block branch.
>> 
>> > To resolve this issue as simply as possible, we can chose to not actually
>> > delete the BlockDriverState pointer.  Since bdrv_close() handles setting 
>> > the drv
>> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST
>> > that is used to enumerate the block devices.  This is currently handled 
>> > within
>> > bdrv_delete, so move this into it's own function, bdrv_remove().
>> 
>> Why do we remove the BlockDriverState from bdrv_states?  Because we want
>> drive_del make its *name* go away.
>> 
>> Begs the question: is the code prepared for a BlockDriverState object
>> that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
>> creates such objects when the device_name is empty.  This is used for
>> internal BlockDriverStates such as COW backing files.  Your code makes
>> device_name empty when taking the object off bdrv_states, so we're good.
>> 
>> Begs yet another question: how does the behavior of a BlockDriverState
>> change when it's taken off bdrv_states, and is that the behavior we
>> want?  Changes:
>> 
>> * bdrv_delete() no longer takes it off bdrv_states.  Good.
>> 
>> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
>>   cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
>>   do nothing anyway for closed BlockDriverStates.
>> 
>> * "info block" and "info blockstats" no longer show it, because
>>   bdrv_info() and bdrv_info_stats() no longer see it.  Okay.
>> 
>> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
>>   Please check their uses and report.
>
>    1    664  block-migration.c <<block_load>>
>              bs = bdrv_find(device_name);
>
>     - no longer see it.  This is fine since we can't migrate a block
>     device that has been removed
>
>    2    562  blockdev.c <<do_commit>>
>              bs = bdrv_find(device);
>
>     - do_commit won't see it in either when calling bdrv_commit_all()
>     Fine as you mention above.  If user specifies the device name
>     we won't find it, that's OK because we can't commit data against
>     a closed BlockDriverState.
>
>    3    587  blockdev.c <<do_snapshot_blkdev>>
>              bs = bdrv_find(device);
>
>    - OK, cannot take a snapshot against a deleted BlockDriverState
>
>    4    662  blockdev.c <<do_eject>>
>              bs = bdrv_find(filename);
>
>    - OK, cannot eject a deleted BlockDriverState; 
>
>    5    676  blockdev.c <<do_block_set_passwd>>
>              bs = bdrv_find(qdict_get_str(qdict, "device"));
>
>    - OK, cannot set password a deleted BlockDriverState; 
>
>    6    701  blockdev.c <<do_change_block>>
>              bs = bdrv_find(device);
>
>    - OK, cannot change the file/device of a deleted BlockDriverState; 
>
>    7    732  blockdev.c <<do_drive_del>>
>              bs = bdrv_find(id);
>
>    - OK, cannot delete an already deleted Drive
>
>    8    783  blockdev.c <<do_block_resize>>
>              bs = bdrv_find(device);
>
>    - OK, cannot resize a deleted Drive
>
>    9    312  hw/qdev-properties.c <<parse_drive>>
>              bs = bdrv_find(str);
>
>    - Used when invoking qdev_prop_drive .parse method;  parse method is 
> invoked via
>     qdev_device_add() which calls set_property() which invokes parse.  
> AFAICT, this is OK
>     since we won't be going down the device add path worrying about a
>     deleted block device.

Thanks for checking!

>> > The result is that we can now invoke drive_del, this closes the file 
>> > descriptors
>> > and sets BlockDriverState->drv to NULL which prevents futher IO to the 
>> > device,
>> > and since we do not free BlockDriverState, we don't have to worry about 
>> > the copy
>> > retained in the block devices.
>> 
>> Yep.  But there's one more question: is the BlockDriverState freed when
>> the device using it gets destroyed?
>> 
>> qdev_free() runs prop->info->free() for all properties.  The drive
>> property's free() is free_drive():
>> 
>>   static void free_drive(DeviceState *dev, Property *prop)
>>   {
>>       BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
>> 
>>       if (*ptr) {
>>           bdrv_detach(*ptr, dev);
>>           blockdev_auto_del(*ptr);
>>       }
>>   }
>> 
>> This should indeed delete the drive.  But only if the property still
>> points to it.  See below.
>> 
>> > Reported-by: Marcus Armbruster <arm...@redhat.com>
>> > Signed-off-by: Ryan Harper <ry...@us.ibm.com>
>> > ---
>> > v1->v2
>> >  - NULL bs->device_name after removing from list to prevent
>> >    second removal.
>> >
>> >  block.c    |   12 +++++++++---
>> >  block.h    |    1 +
>> >  blockdev.c |    2 +-
>> >  3 files changed, 11 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/block.c b/block.c
>> > index 1544d81..0df9942 100644
>> > --- a/block.c
>> > +++ b/block.c
>> > @@ -697,14 +697,20 @@ void bdrv_close_all(void)
>> >      }
>> >  }
>> >  
>> > +void bdrv_remove(BlockDriverState *bs)
>> > +{
>> > +    if (bs->device_name[0] != '\0') {
>> > +        QTAILQ_REMOVE(&bdrv_states, bs, list);
>> > +    }
>> > +    bs->device_name[0] = '\0';
>> > +}
>> > +
>> 
>> I don't like this name.  What's the difference between "delete" and
>> "remove"?
>> 
>> The function zaps the device name.  bdrv_make_anon()?
>
> bdrv_delist? bdrv_hide?  I'm also fine with make_anon.

Any of the three works for me.  "delist" seems the least obvious, but
that's a matter of taste.

>> >  void bdrv_delete(BlockDriverState *bs)
>> >  {
>> >      assert(!bs->peer);
>> >  
>> >      /* remove from list, if necessary */
>> > -    if (bs->device_name[0] != '\0') {
>> > -        QTAILQ_REMOVE(&bdrv_states, bs, list);
>> > -    }
>> > +    bdrv_remove(bs);
>> >  
>> >      bdrv_close(bs);
>> >      if (bs->file != NULL) {
>> > diff --git a/block.h b/block.h
>> > index 5d78fc0..8447397 100644
>> > --- a/block.h
>> > +++ b/block.h
>> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>> >      QEMUOptionParameter *options);
>> >  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>> >  BlockDriverState *bdrv_new(const char *device_name);
>> > +void bdrv_remove(BlockDriverState *bs);
>> >  void bdrv_delete(BlockDriverState *bs);
>> >  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int 
>> > flags);
>> >  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> > diff --git a/blockdev.c b/blockdev.c
>> > index 0690cc8..1f57b50 100644
>> > --- a/blockdev.c
>> > +++ b/blockdev.c
>> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
>> > QObject **ret_data)
>> 
>> Let me add a bit more context:
>> 
>>       bdrv_flush(bs);
>>       bdrv_close(bs);
>> 
>>       /* clean up guest state from pointing to host resource by
>>        * finding and removing DeviceState "drive" property */
>>       if (bs->peer) {
>>           for (prop = bs->peer->info->props; prop && prop->name; prop++) {
>>               if (prop->info->type == PROP_TYPE_DRIVE) {
>>                   ptr = qdev_get_prop_ptr(bs->peer, prop);
>>                   if (*ptr == bs) {
>>                       bdrv_detach(bs, bs->peer);
>>                       *ptr = NULL;
>>                       break;
>>                   }
>>               }
>>           }
>> >      }
>> 
>> This zaps the drive property.  A subsequent free_drive() will do
>> nothing.  We leak the BlockDriverState on device unplug, I'm afraid.
>> 
>> Any reason why we still want to zap the drive property?
>
> IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if
> we're keeping the bs around anyhow, then I don't think we need to remove
> the property.

Agree.

>                One last check would be to see what if the device still
> shows up in qtree if we don't remove the property.

I expect it to show up as "drive = " instead of "drive = <null>",
because:

    static int print_drive(DeviceState *dev, Property *prop, char *dest, size_t 
len)
    {
        BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
        return snprintf(dest, len, "%s",
                        *ptr ? bdrv_get_device_name(*ptr) : "<null>");
    }

and

    const char *bdrv_get_device_name(BlockDriverState *bs)
    {
        return bs->device_name;
    }

If the empty right hand side bothers you, you can change print_drive()
to print something else then, say "<anon>", or "<gone>".

Looking forward to v3 :)

Reply via email to