* Markus Armbruster <arm...@redhat.com> [2010-11-11 04:48]: > Ryan Harper <ry...@us.ibm.com> writes: > > > * Markus Armbruster <arm...@redhat.com> [2010-11-10 11:40]: > >> Ryan Harper <ry...@us.ibm.com> writes: > >> > >> > * Markus Armbruster <arm...@redhat.com> [2010-11-10 06:48]: > >> >> One real question, and a couple of nits. > >> >> > >> >> Ryan Harper <ry...@us.ibm.com> writes: > >> >> > >> >> > Block hot unplug is racy since the guest is required to acknowlege > >> >> > the ACPI > >> >> > unplug event; this may not happen synchronously with the device > >> >> > removal command > >> >> > >> >> Well, I wouldn't call unplug "racy". It just takes an unpredictable > >> >> length of time, possibly forever. To make a race, you need to throw in > >> >> a client assuming (incorrectly) that unplug is instantaneous, as > >> >> described in your next paragraph. > >> >> > >> >> Moreover, all PCI unplug is that way, not just block. > >> >> > >> >> > This series aims to close a gap where by mgmt applications that > >> >> > assume the > >> >> > block resource has been removed without confirming that the guest has > >> >> > acknowledged the removal may re-assign the underlying device to a > >> >> > second guest > >> >> > leading to data leakage. > >> >> > >> >> Yes, the incorrect assumption is a problem. But with that fixed (in the > >> >> management application), we run right into the next problem: there is no > >> >> way for the management application to reliably disconnect the guest from > >> >> a block device. And that's the problem you're fixing. > >> > > >> > Yeah, that's the right way to word it; providing a method to forcibly > >> > disconnect the guest from the host device. > >> >> > >> >> > This series introduces a new montor command to decouple asynchornous > >> >> > device > >> >> > >> >> Typos "montor" and "asynchornous". You might want to use a spell > >> >> checker :) > >> >> > >> >> Lines are a bit long. Recommend wrap at column 70. > >> >> > >> >> > removal from restricting guest access to a block device. We do this > >> >> > by creating > >> >> > a new monitor command drive_del which maps to a bdrv_unplug() command > >> >> > which > >> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, > >> >> > subsequent > >> >> > IO is rejected from the device and the guest will get IO errors but > >> >> > continue to > >> >> > function. In addition to preventing further IO, we clean up state > >> >> > pointers > >> >> > between host (BlockDriverState) and guest (DeviceInfo). > >> >> > > >> >> > A subsequent device removal command can be issued to remove the > >> >> > device, to which > >> >> > the guest may or maynot respond, but as long as the unplugged bit is > >> >> > set, no IO > >> >> > >> >> "maynot" is not a word. > >> >> > >> >> > will be sumbitted. > >> >> > >> >> This suggests to drive_del before device_del, which makes the device > >> >> goes through a "broken device" state on its way to unplug. If the guest > >> >> accesses the device in that state, it gets I/O errors. Not nice. > >> >> > >> >> Instead, I'd recommend device_del, wait for the device to go away, > >> >> drive_del on time out. If the guest reacts to the ACPI unplug promptly, > >> >> it's never exposed to the "broken device" state. Note: if the drive_del > >> >> fails because the device doesn't exist, we lost the race with the > >> >> automatic destruction, which is harmless. Ignore that error. > >> > > >> > Honestly, other than describing what happens if you sever the connection > >> > when the guest isn't aware of it; I don't want to try to capture how the > >> > mgmt layer implements the removal. > >> > > >> > One may want to force the disconnect before attempting to remove the > >> > device; or the other way around; that's really the mgmt layer's call. > >> > >> Fair enough. > >> > >> >> > Signed-off-by: Ryan Harper <ry...@us.ibm.com> > >> >> > --- > >> >> > block.c | 7 +++++++ > >> >> > block.h | 1 + > >> >> > blockdev.c | 36 ++++++++++++++++++++++++++++++++++++ > >> >> > blockdev.h | 1 + > >> >> > hmp-commands.hx | 18 ++++++++++++++++++ > >> >> > 5 files changed, 63 insertions(+), 0 deletions(-) > >> >> > > >> >> > diff --git a/block.c b/block.c > >> >> > index 6b505fb..c76a796 100644 > >> >> > --- a/block.c > >> >> > +++ b/block.c > >> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, > >> >> > int removable) > >> >> > } > >> >> > } > >> >> > > >> >> > +void bdrv_unplug(BlockDriverState *bs) > >> >> > +{ > >> >> > + qemu_aio_flush(); > >> >> > + bdrv_flush(bs); > >> >> > + bdrv_close(bs); > >> >> > +} > >> >> > + > >> >> > >> >> Unless we expect more users, I'd inline this into its only caller. > >> >> Matter of taste. > >> > > >> > Works for me. > >> > > >> >> > >> >> > int bdrv_is_removable(BlockDriverState *bs) > >> >> > { > >> >> > return bs->removable; > >> >> > diff --git a/block.h b/block.h > >> >> > index 78ecfac..581414c 100644 > >> >> > --- a/block.h > >> >> > +++ b/block.h > >> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, > >> >> > BlockErrorAction on_read_error, > >> >> > BlockErrorAction on_write_error); > >> >> > BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int > >> >> > is_read); > >> >> > void bdrv_set_removable(BlockDriverState *bs, int removable); > >> >> > +void bdrv_unplug(BlockDriverState *bs); > >> >> > int bdrv_is_removable(BlockDriverState *bs); > >> >> > int bdrv_is_read_only(BlockDriverState *bs); > >> >> > int bdrv_is_sg(BlockDriverState *bs); > >> >> > diff --git a/blockdev.c b/blockdev.c > >> >> > index 6cb179a..ee8c2ec 100644 > >> >> > --- a/blockdev.c > >> >> > +++ b/blockdev.c > >> >> > @@ -14,6 +14,8 @@ > >> >> > #include "qemu-option.h" > >> >> > #include "qemu-config.h" > >> >> > #include "sysemu.h" > >> >> > +#include "hw/qdev.h" > >> >> > +#include "block_int.h" > >> >> > > >> >> > static QTAILQ_HEAD(drivelist, DriveInfo) drives = > >> >> > QTAILQ_HEAD_INITIALIZER(drives); > >> >> > > >> >> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char > >> >> > *device, > >> >> > } > >> >> > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); > >> >> > } > >> >> > + > >> >> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject > >> >> > **ret_data) > >> >> > +{ > >> >> > + const char *id = qdict_get_str(qdict, "id"); > >> >> > + BlockDriverState *bs; > >> >> > + Property *prop; > >> >> > + > >> >> > + bs = bdrv_find(id); > >> >> > + if (!bs) { > >> >> > + qerror_report(QERR_DEVICE_NOT_FOUND, id); > >> >> > + return -1; > >> >> > + } > >> >> > + > >> >> > + /* quiesce block driver; prevent further io */ > >> >> > + bdrv_unplug(bs); > >> >> > + > >> >> > + /* clean up guest state from pointing to host resource by > >> >> > + * finding and removing DeviceState "drive" property */ > >> >> > + for (prop = bs->peer->info->props; prop && prop->name; prop++) { > >> >> > + if ((prop->info->type == PROP_TYPE_DRIVE) && > >> >> > + (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) > >> >> > == bs)) { > >> >> > + if (prop->info->free) { > >> >> > + prop->info->free(bs->peer, prop); > >> >> > + } > >> > >> Your use of prop->info->free() in this context is wrong. More below. > >> > >> >> > >> >> Does this null the drive property? I doubt it. Quick check in the > >> >> debugger? > >> >> > >> >> The free callbacks generally don't zap the properties, because they run > >> >> from qdev_free(). > >> > > >> > To be honest; I didn't see anything that looked like "remove this > >> > property" in the qdev api. Any pointers? > >> > >> The closest we have is indeed the Property method free(), but that's not > >> quite right. It's really only for use by qdev_free(). > >> > >> > should I be calling qdev_free() on the dev? > >> > >> No, because then the whole device is gone, not just the property :) > >> > >> > I don't quite understand > >> > the distinction between the info list of properties and the device > >> > itself, nor specifically what we need to remove in the drive_del() > >> > operation versus the device_del() portion. > >> > >> device_del / qdev_free() destroy a qdev, such as a "virtio-blk-pci" > >> device (C type VirtIOPCIProxy). > >> > >> drive_del destroys something else, namely the block device host part > >> (BlockDriverState + DeviceInfo). Obviously, it needs to zap all > >> pointers to the host part along with it. Specifically, it needs to zap > >> the device's pointer to it. > >> > >> Example: if a "virtio-blk-pci" device is using drive "foo", then > >> "drive_del foo" needs to zap its member block.bs. > >> > >> Complication: we don't (want to) know what kind of device exactly is > >> using the drive. But we do know that a drive property must be > >> describing it. > >> > >> So we search the properties (for (prop...)) for a drive property > >> (prop->info->type == PROP_TYPE_DRIVE) that points to this drive (... == > >> bs). > >> > >> Result: > >> > >> BlockDriverState *bs; > >> Property *prop; > >> BlockDriverState **ptr; > >> [...] > >> for (prop = bs->peer->info->props; prop && prop->name; prop++) { > >> if ((prop->info->type == PROP_TYPE_DRIVE)) { > >> ptr = qdev_get_prop_ptr(dev, prop); > >> if (*ptr == bs) { > >> bdrv_detach(bs, bs->peer); > > > > Invoking the free method on the drive property does do detach: > > > > free_drive > > { > > BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); > > > > if (*ptr) { > > bdrv_detach(*ptr, dev); > > blockdev_auto_del(*ptr); > > } > > } > > > > and the bdrv_delete() > > > > takes out the bs pointer. > > Which pointer? Which bdrv_delete()?
I suppose it's the BlockDriverState returned from bdrv_find() since I'm invoking bdrv_delete(bs); And I suppose qdev_get_prop_ptr() is returning a different ptr to the same bs; in which case we'll still need the null you had suggested? > > >> Only then are we ready to destroy the host part: > >> > >> drive_uninit(drive_get_by_blockdev(bs)); > > > > And if auto-deletion it set, then it handles the drive_uninit(). Do you > > think > > we should explicitly invoke drive_uninit() ? > > Actually, blockdev_auto_del() deletes the block device only if DriveInfo > has auto_del set. Why is that? Quote blockdev.c: > > /* > * We automatically delete the drive when a device using it gets > * unplugged. Questionable feature, but we can't just drop it. > * Device models call blockdev_mark_auto_del() to schedule the > * automatic deletion, and generic qdev code calls blockdev_auto_del() > * when deletion is actually safe. > */ > > Thus, you need to blockdev_mark_auto_del() before blockdev_auto_del(). > > However, my blockdev_add work-in-progress changes these two functions to > *only* delete block devices created the old way (-drive, drive_add). > You want them deleted regardless of how they were created. That's why I > asked you to use drive_uninit() directly. Gotcha. > > You could argue that Property method free() *should* work here. Fair > point. If you want to clean that up, you're quite welcome. But I don't > want to burden your fix with that, so feel free to add a suitable > comment instead. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com