I tried bisecting as well, and I wound up at:

1a423896 -- five out of five boot attempts succeeded.
d759c951 -- five out of five boot attempts failed.


d759c951f3287fad04210a52f2dc93f94cf58c7f is the first bad commit
commit d759c951f3287fad04210a52f2dc93f94cf58c7f
Author: Alex Bennée <alex.ben...@linaro.org>
Date:   Tue Feb 27 12:52:48 2018 +0300

    replay: push replay_mutex_lock up the call tree



My methodology was to boot QEMU like this:

./x86_64-softmmu/qemu-system-x86_64 -m 4096 -cpu host -M q35 -enable-kvm
-smp 4 -drive id=sda,if=none,file=/home/bos/jhuston/windows_10.qcow
-device ide-hd,drive=sda -qmp tcp::4444,server,nowait

and run it three times with -snapshot and see if it hung during boot; if
it did, I marked the commit bad. If it did not, I booted and attempted
to log in and run CrystalDiskMark. If it froze before I even launched
CDM, I marked it bad.

Interestingly enough, on a subsequent (presumably bad) commit (6dc0f529)
which hangs fairly reliably on bootup (66%) I can occasionally get into
Windows 10 and run CDM -- and that unfortunately does not seem to
trigger the error again, so CDM doesn't look like a reliable way to
trigger the hangs.



Anyway, d759c951 definitely appears to change the odds of AHCI locking up 
during boot for me, and I suppose it might have something to do with how it is 
changing the BQL acquisition/release in main-loop.c, but I am not sure why/what 
yet.

Before this patch, we only lock the iothread and re-lock it if there was
a timeout, and after this patch we *always* lock and unlock the
iothread. This is probably just exposing some latent bug in the AHCI
emulator that has always existed, but now the odds of seeing it are much
higher.

I'll have to dig as to what the race is -- I'm not sure just yet.


If those of you who are seeing this bug too could confirm for me that d759c951 
appears to be the guilty party, that probably wouldn't hurt.

Thanks!
--js

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1769189

Title:
  Issue with qemu 2.12.0 + SATA

Status in QEMU:
  New

Bug description:
  [EDIT: I first thought that OVMF was the issue, but it turns out to be
  SATA]

  I had a Windows 10 VM running perfectly fine with a SATA drive, since
  I upgraded to qemu 2.12, the guests hangs for a couple of minutes,
  works for a few seconds, and hangs again, etc. By "hang" I mean it
  doesn't freeze, but it looks like it's waiting on IO or something, I
  can move the mouse but everything needing disk access is unresponsive.

  What doesn't work: qemu 2.12 with SATA
  What works: using VirIO-SCSI with qemu 2.12 or downgrading qemu to 2.11.1 and 
keep using SATA.

  Platform is arch linux 4.16.7 on skylake and Haswell, I have attached
  the vm xml file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1769189/+subscriptions

Reply via email to