On Thu, Apr 13, 2017 at 09:45:49AM -0500, Eric Blake wrote:
> On 04/13/2017 09:39 AM, Stefan Hajnoczi wrote:
> > On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote:
> >>
> >>
> >> On 13/04/2017 09:11, Jeff Cody wrote:
> >>>> It didn't make it into 2.9-rc4 because of limited time. :(
> >>>>
> >>>> Looks like there is no -rc5, we'll have to document this as a known 
> >>>> issue.
> >>>> Users should "block-job-complete/cancel" as soon as possible to avoid 
> >>>> such a
> >>>> hang.
> >>>
> >>> I'd argue for including a fix for 2.9, since this is both a regression, 
> >>> and
> >>> a hard lock without possible recovery short of restarting the QEMU 
> >>> process.
> >>
> >> It is a bit of a corner case (and jobs on I/O thread are relatively rare
> >> too), so maybe it's not worth delaying 2.9.  It has been delayed already
> >> quite a bit.  Another reason I think I prefer to wait is to ensure that
> >> we have an entry in qemu-iotests to avoid the future regression.
> > 
> > I also think this does not require delaying the release:
> > 
> > 1. It needs to be marked as a known issue in the release notes.
> > 2. Let's roll the 2.9.1 stable release within a month of 2.9.0.
> > 
> > If both conditions are met then very few end users will be exposed to
> > the problem.  I hope libvirt will create IOThreads by default soon but
> > for the time being it is not a widely used configuration.
> 
> Also, is it something that can be avoided by not doing a system_reset
> while a block job is still running? Libvirt can be taught to block reset
> while a job has still not been finished, if needs be.
>

No - if the guest initiates a reboot itself, we still end up deadlocked.

-Jeff


Reply via email to