Whoops, almost missed this. Best to cc: me to avoid that. Ryan Harper <ry...@us.ibm.com> writes:
> * Markus Armbruster <arm...@redhat.com> [2011-03-15 04:48]: >> Sorry for the long delay, I was out of action for a week. >> >> Ryan Harper <ry...@us.ibm.com> writes: >> >> > When removing a drive from the host-side via drive_del we currently have >> > the >> > following path: >> > >> > drive_del >> > qemu_aio_flush() >> > bdrv_close() >> > drive_uninit() >> > bdrv_delete() >> > >> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer >> > however, the block devices retain a copy of this pointer, see >> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs. >> > >> > We now have a use-after-free situation. If the guest continues to issue IO >> > against the device, and we've reallocated the memory that the >> > BlockDriverState >> > pointed at, then we will fail the bs->drv checks in the various bdrv_ >> > methods. >> >> "we will fail the bs->drv checks" is misleading, in my opinion. Here's >> what happens: >> >> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get >> dropped. Works as designed. >> >> 2. drive_uninit() frees the bs. Since the device is still connected to >> bs, any subsequent I/O is a use-after-free. >> >> The value of bs->drv becomes unpredictable on free. As long as it >> remains null, I/O still gets dropped. I/O crashes or worse once that >> changed. Could be right on free, could be much later. >> >> If you respin anyway, please clarify your description. > > Sure. I wasn't planning a new version, but I'll update and send anyhow > as I didn't see it get included in pull from the block branch. >> >> > To resolve this issue as simply as possible, we can chose to not actually >> > delete the BlockDriverState pointer. Since bdrv_close() handles setting >> > the drv >> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST >> > that is used to enumerate the block devices. This is currently handled >> > within >> > bdrv_delete, so move this into it's own function, bdrv_remove(). >> >> Why do we remove the BlockDriverState from bdrv_states? Because we want >> drive_del make its *name* go away. >> >> Begs the question: is the code prepared for a BlockDriverState object >> that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already >> creates such objects when the device_name is empty. This is used for >> internal BlockDriverStates such as COW backing files. Your code makes >> device_name empty when taking the object off bdrv_states, so we're good. >> >> Begs yet another question: how does the behavior of a BlockDriverState >> change when it's taken off bdrv_states, and is that the behavior we >> want? Changes: >> >> * bdrv_delete() no longer takes it off bdrv_states. Good. >> >> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer >> cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() >> do nothing anyway for closed BlockDriverStates. >> >> * "info block" and "info blockstats" no longer show it, because >> bdrv_info() and bdrv_info_stats() no longer see it. Okay. >> >> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? >> Please check their uses and report. > > 1 664 block-migration.c <<block_load>> > bs = bdrv_find(device_name); > > - no longer see it. This is fine since we can't migrate a block > device that has been removed > > 2 562 blockdev.c <<do_commit>> > bs = bdrv_find(device); > > - do_commit won't see it in either when calling bdrv_commit_all() > Fine as you mention above. If user specifies the device name > we won't find it, that's OK because we can't commit data against > a closed BlockDriverState. > > 3 587 blockdev.c <<do_snapshot_blkdev>> > bs = bdrv_find(device); > > - OK, cannot take a snapshot against a deleted BlockDriverState > > 4 662 blockdev.c <<do_eject>> > bs = bdrv_find(filename); > > - OK, cannot eject a deleted BlockDriverState; > > 5 676 blockdev.c <<do_block_set_passwd>> > bs = bdrv_find(qdict_get_str(qdict, "device")); > > - OK, cannot set password a deleted BlockDriverState; > > 6 701 blockdev.c <<do_change_block>> > bs = bdrv_find(device); > > - OK, cannot change the file/device of a deleted BlockDriverState; > > 7 732 blockdev.c <<do_drive_del>> > bs = bdrv_find(id); > > - OK, cannot delete an already deleted Drive > > 8 783 blockdev.c <<do_block_resize>> > bs = bdrv_find(device); > > - OK, cannot resize a deleted Drive > > 9 312 hw/qdev-properties.c <<parse_drive>> > bs = bdrv_find(str); > > - Used when invoking qdev_prop_drive .parse method; parse method is > invoked via > qdev_device_add() which calls set_property() which invokes parse. > AFAICT, this is OK > since we won't be going down the device add path worrying about a > deleted block device. Thanks for checking! >> > The result is that we can now invoke drive_del, this closes the file >> > descriptors >> > and sets BlockDriverState->drv to NULL which prevents futher IO to the >> > device, >> > and since we do not free BlockDriverState, we don't have to worry about >> > the copy >> > retained in the block devices. >> >> Yep. But there's one more question: is the BlockDriverState freed when >> the device using it gets destroyed? >> >> qdev_free() runs prop->info->free() for all properties. The drive >> property's free() is free_drive(): >> >> static void free_drive(DeviceState *dev, Property *prop) >> { >> BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); >> >> if (*ptr) { >> bdrv_detach(*ptr, dev); >> blockdev_auto_del(*ptr); >> } >> } >> >> This should indeed delete the drive. But only if the property still >> points to it. See below. >> >> > Reported-by: Marcus Armbruster <arm...@redhat.com> >> > Signed-off-by: Ryan Harper <ry...@us.ibm.com> >> > --- >> > v1->v2 >> > - NULL bs->device_name after removing from list to prevent >> > second removal. >> > >> > block.c | 12 +++++++++--- >> > block.h | 1 + >> > blockdev.c | 2 +- >> > 3 files changed, 11 insertions(+), 4 deletions(-) >> > >> > diff --git a/block.c b/block.c >> > index 1544d81..0df9942 100644 >> > --- a/block.c >> > +++ b/block.c >> > @@ -697,14 +697,20 @@ void bdrv_close_all(void) >> > } >> > } >> > >> > +void bdrv_remove(BlockDriverState *bs) >> > +{ >> > + if (bs->device_name[0] != '\0') { >> > + QTAILQ_REMOVE(&bdrv_states, bs, list); >> > + } >> > + bs->device_name[0] = '\0'; >> > +} >> > + >> >> I don't like this name. What's the difference between "delete" and >> "remove"? >> >> The function zaps the device name. bdrv_make_anon()? > > bdrv_delist? bdrv_hide? I'm also fine with make_anon. Any of the three works for me. "delist" seems the least obvious, but that's a matter of taste. >> > void bdrv_delete(BlockDriverState *bs) >> > { >> > assert(!bs->peer); >> > >> > /* remove from list, if necessary */ >> > - if (bs->device_name[0] != '\0') { >> > - QTAILQ_REMOVE(&bdrv_states, bs, list); >> > - } >> > + bdrv_remove(bs); >> > >> > bdrv_close(bs); >> > if (bs->file != NULL) { >> > diff --git a/block.h b/block.h >> > index 5d78fc0..8447397 100644 >> > --- a/block.h >> > +++ b/block.h >> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, >> > QEMUOptionParameter *options); >> > int bdrv_create_file(const char* filename, QEMUOptionParameter *options); >> > BlockDriverState *bdrv_new(const char *device_name); >> > +void bdrv_remove(BlockDriverState *bs); >> > void bdrv_delete(BlockDriverState *bs); >> > int bdrv_file_open(BlockDriverState **pbs, const char *filename, int >> > flags); >> > int bdrv_open(BlockDriverState *bs, const char *filename, int flags, >> > diff --git a/blockdev.c b/blockdev.c >> > index 0690cc8..1f57b50 100644 >> > --- a/blockdev.c >> > +++ b/blockdev.c >> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, >> > QObject **ret_data) >> >> Let me add a bit more context: >> >> bdrv_flush(bs); >> bdrv_close(bs); >> >> /* clean up guest state from pointing to host resource by >> * finding and removing DeviceState "drive" property */ >> if (bs->peer) { >> for (prop = bs->peer->info->props; prop && prop->name; prop++) { >> if (prop->info->type == PROP_TYPE_DRIVE) { >> ptr = qdev_get_prop_ptr(bs->peer, prop); >> if (*ptr == bs) { >> bdrv_detach(bs, bs->peer); >> *ptr = NULL; >> break; >> } >> } >> } >> > } >> >> This zaps the drive property. A subsequent free_drive() will do >> nothing. We leak the BlockDriverState on device unplug, I'm afraid. >> >> Any reason why we still want to zap the drive property? > > IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if > we're keeping the bs around anyhow, then I don't think we need to remove > the property. Agree. > One last check would be to see what if the device still > shows up in qtree if we don't remove the property. I expect it to show up as "drive = " instead of "drive = <null>", because: static int print_drive(DeviceState *dev, Property *prop, char *dest, size_t len) { BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); return snprintf(dest, len, "%s", *ptr ? bdrv_get_device_name(*ptr) : "<null>"); } and const char *bdrv_get_device_name(BlockDriverState *bs) { return bs->device_name; } If the empty right hand side bothers you, you can change print_drive() to print something else then, say "<anon>", or "<gone>". Looking forward to v3 :)