On Thu, Apr 09, 2015 at 11:21:10AM +0200, Paolo Bonzini wrote: > There are two problems with memory barriers in async.c. The fix is > to use atomic_xchg in order to achieve sequential consistency between > the scheduling of a bottom half and the corresponding execution. > > First, if bh->scheduled is already 1 in qemu_bh_schedule, QEMU does > not execute a memory barrier to order any writes needed by the callback > before the read of bh->scheduled. If the other side sees req->state as > THREAD_ACTIVE, the callback is not invoked and you get deadlock. > > Second, the memory barrier in aio_bh_poll is too weak. Without this > patch, it is possible that bh->scheduled = 0 is not "published" until > after the callback has returned. Another thread wants to schedule the > bottom half, but it sees bh->scheduled = 1 and does nothing. This causes > a lost wakeup. The memory barrier should have been changed to smp_mb() > in commit 924fe12 (aio: fix qemu_bh_schedule() bh->ctx race condition, > 2014-06-03) together with qemu_bh_schedule()'s. Guess who reviewed > that patch? > > Both of these involve a store and a load, so they are reproducible > on x86_64 as well. Paul Leveille however reported how to trigger the > problem within 15 minutes on x86_64 as well. His (untested) recipe, > reproduced here for reference, is the following: > > 1) Qcow2 (or 3) is critical – raw files alone seem to avoid the problem. > > 2) Use “cache=directsync” rather than the default of > “cache=none” to make it happen easier. > > 3) Use a server with a write-back RAID controller to allow for rapid > IO rates. > > 4) Run a random-access load that (mostly) writes chunks to various > files on the virtual block device. > > a. I use ‘diskload.exe c:25’, a Microsoft HCT load > generator, on Windows VMs. > > b. Iometer can probably be configured to generate a similar load. > > 5) Run multiple VMs in parallel, against the same storage device, > to shake the failure out sooner. > > 6) IvyBridge and Haswell processors for certain; not sure about others. > > A similar patch survived over 12 hours of testing, where an unpatched > QEMU would fail within 15 minutes. > > This bug is, most likely, also involved in the failures in the libguestfs > testsuite on AArch64 (reported by Laszlo Ersek and Richard Jones). However, > the patch is not enough to fix that. > > Thanks to Stefan Hajnoczi for suggesting closer examination of > qemu_bh_schedule, and to Paul for providing test input and a prototype > patch. > > Cc: qemu-sta...@nongnu.org > Reported-by: Laszlo Ersek <ler...@redhat.com> > Reported-by: Paul Leveille <paul.levei...@stratus.com> > Reported-by: John Snow <js...@redhat.com> > Suggested-by: Paul Leveille <paul.levei...@stratus.com> > Suggested-by: Stefan Hajnoczi <stefa...@redhat.com> > Tested-by: Paul Leveille <paul.levei...@stratus.com> > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > --- > async.c | 28 ++++++++++++---------------- > 1 file changed, 12 insertions(+), 16 deletions(-)
Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
pgpoPVnKDhF8m.pgp
Description: PGP signature