Am 12.05.2020 um 12:57 hat Kevin Wolf geschrieben: > Am 11.05.2020 um 18:50 hat Stefan Reiter geschrieben: > > Just because we're in a coroutine doesn't imply ownership of the context > > of the flushed drive. In such a case use the slow path which explicitly > > enters bdrv_flush_co_entry in the correct AioContext. > > > > Signed-off-by: Stefan Reiter <s.rei...@proxmox.com> > > --- > > > > We've experienced some lockups in this codepath when taking snapshots of VMs > > with drives that have IO-Threads enabled (we have an async 'savevm' > > implementation running from a coroutine). > > > > Currently no reproducer for upstream versions I could find, but in testing > > this > > patch fixes all issues we're seeing and I think the logic checks out. > > > > The fast path pattern is repeated a few times in this file, so if this > > change > > makes sense, it's probably worth evaluating the other occurences as well. > > What do you mean by "owning" the context? If it's about taking the > AioContext lock, isn't the problem more with calling bdrv_flush() from > code that doesn't take the locks? > > Though I think we have some code that doesn't only rely on holding the > AioContext locks, but that actually depends on running in the right > thread, so the change looks right anyway.
Well, the idea is right, but the change itself isn't, of course. If we're already in coroutine context, we must not busy wait with BDRV_POLL_WHILE(). I'll see if I can put something together after lunch. Kevin