[Bug 1761798] Re: live migration intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8"

Kashyap Chamarthy Mon, 23 Nov 2020 09:12:41 -0800

[Copy/pasting my comment from here:
https://bugs.launchpad.net/nova/+bug/1737625/comments/4]


I just talked to Dave Gilbert from upstream QEMU. Overall, as I implied
in comment#2, this gnarly issue requires specialized debugging, digging
deep into the bowels of QEMU, 'virtio-blk' and 'virtio.

That said, Dave notes that we get this "guest index inconsistent" error
when the migrated RAM is inconsistent with the migrated 'virtio' device
state. And a common case is where a 'virtio' device does an operation
after the vCPU is stopped and after RAM has been transmitted.

Dave makes some guesswork of a potential scenario where this can occur:

  - Guest is running
  - ... live migration starts
  - ... a "block read" request gets submitteed
  - ... live migration stops the vCPUs, finishes transmitting RAM
  - ... the "block read" completes, 'virtio-blk' updates pointers
  - ... live migration "serializes" the 'virito-blk' state

So the "guest index inconsistent" state would only happen if you got
unlucky with the timing of that read.

Another possibility, Dave points out, is that the guest has screwed up
the device state somehow; the migration code in 'virtio' checks the
state a lot. We have ruled this possibility out becausethe guest is just
a garden-variety CirrOS instance idling; nothing special about it.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1761798

Title:
  live migration intermittently fails in CI with "VQ 0 size 0x80 Guest
  index 0x12c inconsistent with Host index 0x134: delta 0xfff8"

Status in OpenStack Compute (nova):
  Confirmed
Status in QEMU:
  New

Bug description:
  Seen here:

  http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-
  multinode-live-
  migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-00000002.txt.gz

  2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev 
pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to 
/dev/pts/0 (label charserial0)
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 
0x12c inconsistent with Host index 0x134: delta 0xfff8
  2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load 
virtio-blk:virtio
  2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device '0000:00:04.0/virtio-blk'
  2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: 
Operation not permitted
  2018-04-05 21:48:43.198+0000: shutting down, reason=crashed

  And in the n-cpu logs on the other host:

  http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-
  multinode-live-
  migration/8de6e74/logs/screen-n-cpu.txt.gz#_Apr_05_21_48_43_257541

  There is a related Red Hat bug:

  https://bugzilla.redhat.com/show_bug.cgi?id=1450524

  The CI job failures are at present using the Pike UCA:

  ii  libvirt-bin                         3.6.0-1ubuntu6.2~cloud0

  ii  qemu-system-x86                     1:2.10+dfsg-0ubuntu3.5~cloud0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1761798/+subscriptions

[Bug 1761798] Re: live migration intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8"

Reply via email to