On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <kw...@redhat.com> wrote: > Am 08.08.2011 16:49, schrieb Stefan Hajnoczi: >> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <kw...@redhat.com> wrote: >>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi: >>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <kw...@redhat.com> wrote: >>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi: >>>>>> We've discussed safe methods for reopening image files (e.g. useful for >>>>>> changing the hostcache parameter). The problem is that closing the file >>>>>> first >>>>>> and then opening it again exposes us to the error case where the open >>>>>> fails. >>>>>> At that point we cannot get to the file anymore and our options are to >>>>>> terminate QEMU, pause the VM, or offline the block device. >>>>>> >>>>>> This window of vulnerability can be eliminated by keeping the file >>>>>> descriptor >>>>>> around and falling back to it should the open fail. >>>>>> >>>>>> The challenge for the file descriptor approach is that image formats, >>>>>> like >>>>>> VMDK, can span multiple files. Therefore the solution is not as simple >>>>>> as >>>>>> stashing a single file descriptor and reopening from it. >>>>> >>>>> So far I agree. The rest I believe is wrong because you can't assume >>>>> that every backend uses file descriptors. The qemu block layer is based >>>>> on BlockDriverStates, not fds. They are a concept that should be hidden >>>>> in raw-posix. >>>>> >>>>> I think something like this could do: >>>>> >>>>> struct BDRVReopenState { >>>>> BlockDriverState *bs; >>>>> /* can be extended by block drivers */ >>>>> }; >>>>> >>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, int >>>>> flags); >>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state); >>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state); >>>>> >>>>> raw-posix would store the old file descriptor in its reopen_state. On >>>>> commit, it closes the old descriptors, on abort it reverts to the old >>>>> one and closes the newly opened one. >>>>> >>>>> Makes things a bit more complicated than the simple bdrv_reopen I had in >>>>> mind before, but it allows VMDK to get an all-or-nothing semantics. >>>> >>>> Can you show how bdrv_reopen() would use these new interfaces? I'm >>>> not 100% clear on the idea. >>> >>> Well, you wouldn't only call bdrv_reopen, but also either >>> bdrv_reopen_commit/abort (for the top-level caller we can have a wrapper >>> function that does both, but that's syntactic sugar). >>> >>> For example we would have: >>> >>> int vmdk_reopen() >> >> .bdrv_reopen() is a confusing name for this operation because it does >> not reopen anything. bdrv_prepare_reopen() might be clearer. > > Makes sense. > >> >>> { >>> *((VMDKReopenState**) rs) = malloc(); >>> >>> foreach (extent in s->extents) { >>> ret = bdrv_reopen(extent->file, &extent->reopen_state) >>> if (ret < 0) >>> goto fail; >>> } >>> return 0; >>> >>> fail: >>> foreach (extent in rs->already_reopened) { >>> bdrv_reopen_abort(extent->reopen_state); >>> } >>> return ret; >>> } >> >>> void vmdk_reopen_commit() >>> { >>> foreach (extent in s->extents) { >>> bdrv_reopen_commit(extent->reopen_state); >>> } >>> free(rs); >>> } >>> >>> void vmdk_reopen_abort() >>> { >>> foreach (extent in s->extents) { >>> bdrv_reopen_abort(extent->reopen_state); >>> } >>> free(rs); >>> } >> >> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs, >> &rs)? > > No. Closing the old backend would be part of bdrv_reopen_commit. > > Do you have a use case where it would be helpful if the caller invoked > bdrv_close?
When the caller does bdrv_close() two BlockDriverStates are never open for the same image file. I thought this was a property we wanted. Also, in the block_set_hostcache case we need to reopen without switching to a new BlockDriverState instance. That means the reopen needs to be in-place with respect to the BlockDriverState *bs pointer. We cannot create a new instance. Stefan