Am 16.09.2014 um 14:35 hat Paolo Bonzini geschrieben: > Il 16/09/2014 14:34, Kevin Wolf ha scritto: > > I think bdrv_invalidate_cache() really needs to call bdrv_drain_all() > > before starting to reopen stuff. There could be requests in flight > > without holding the lock and if you can indeed reopen their BDS under > > their feet without breaking things (I doubt it), that would be pure > > luck. > > But even that's not enough without a lock if .bdrv_invalidate_cache (the > callback) is called from a coroutine. As soon as it yields, another > request can come in, for example from the NBD server.
Yes, that's true. We can't fix this problem in qcow2, though, because it's a more general one. I think we must make sure that bdrv_invalidate_cache() doesn't yield. Either by forbidding to run bdrv_invalidate_cache() in a coroutine and moving the problem to the caller (where and why is it even called from a coroutine?), or possibly by creating a new coroutine for the driver callback and running that in a nested event loop that only handles bdrv_invalidate_cache() callbacks, so that the NBD server doesn't get a chance to process new requests in this thread. Forbidding to run in a coroutine sounds easier, but I don't see yet which caller would have to be fixed. Kevin