Am 12.05.2020 um 12:57 hat Kevin Wolf geschrieben:
> Am 11.05.2020 um 18:50 hat Stefan Reiter geschrieben:
> > Just because we're in a coroutine doesn't imply ownership of the context
> > of the flushed drive. In such a case use the slow path which explicitly
> > enters bdrv_flush_co_entry in the correct AioContext.
> > 
> > Signed-off-by: Stefan Reiter <s.rei...@proxmox.com>
> > ---
> > 
> > We've experienced some lockups in this codepath when taking snapshots of VMs
> > with drives that have IO-Threads enabled (we have an async 'savevm'
> > implementation running from a coroutine).
> > 
> > Currently no reproducer for upstream versions I could find, but in testing 
> > this
> > patch fixes all issues we're seeing and I think the logic checks out.
> > 
> > The fast path pattern is repeated a few times in this file, so if this 
> > change
> > makes sense, it's probably worth evaluating the other occurences as well.
> 
> What do you mean by "owning" the context? If it's about taking the
> AioContext lock, isn't the problem more with calling bdrv_flush() from
> code that doesn't take the locks?
> 
> Though I think we have some code that doesn't only rely on holding the
> AioContext locks, but that actually depends on running in the right
> thread, so the change looks right anyway.

Well, the idea is right, but the change itself isn't, of course. If
we're already in coroutine context, we must not busy wait with
BDRV_POLL_WHILE(). I'll see if I can put something together after lunch.

Kevin


Reply via email to