On Tue, Oct 27, 2020 at 01:02:59PM +0000, Felipe Franciosi wrote: > > > > On Oct 27, 2020, at 12:56 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > > > > On Tue, Oct 27, 2020 at 12:53:29PM +0000, Felipe Franciosi wrote: > >> > >> > >>> On Oct 27, 2020, at 12:25 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > >>> > >>> On Tue, Oct 27, 2020 at 11:30:49AM +0000, Stefan Hajnoczi wrote: > >>>> On Mon, Oct 26, 2020 at 03:13:32PM +0000, Felipe Franciosi wrote: > >>>>> QEMU must be careful when loading device state off migration streams to > >>>>> prevent a malicious source from exploiting the emulator. Overdoing these > >>>>> checks has the side effect of allowing a guest to "pin itself" in cloud > >>>>> environments by messing with state which is entirely in its control. > >>>>> > >>>>> Similarly to what f3081539 achieved in usb_device_post_load(), this > >>>>> commit removes such a check from virtio_load(). Worth noting, the result > >>>>> of a load without this check is the same as if a guest enables a VQ with > >>>>> invalid indexes to begin with. That is, the virtual device is set in a > >>>>> broken state (by the datapath handler) and must be reset. > >>>>> > >>>>> Signed-off-by: Felipe Franciosi <fel...@nutanix.com> > >>>>> --- > >>>>> hw/virtio/virtio.c | 12 ------------ > >>>>> 1 file changed, 12 deletions(-) > >>>>> > >>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > >>>>> index 6f8f865aff..0561bdb857 100644 > >>>>> --- a/hw/virtio/virtio.c > >>>>> +++ b/hw/virtio/virtio.c > >>>>> @@ -3136,8 +3136,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, > >>>>> int version_id) > >>>>> RCU_READ_LOCK_GUARD(); > >>>>> for (i = 0; i < num; i++) { > >>>>> if (vdev->vq[i].vring.desc) { > >>>>> - uint16_t nheads; > >>>>> - > >>>>> /* > >>>>> * VIRTIO-1 devices migrate desc, used, and avail ring > >>>>> addresses so > >>>>> * only the region cache needs to be set up. Legacy devices > >>>>> need > >>>>> @@ -3157,16 +3155,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, > >>>>> int version_id) > >>>>> continue; > >>>>> } > >>>>> > >>>>> - nheads = vring_avail_idx(&vdev->vq[i]) - > >>>>> vdev->vq[i].last_avail_idx; > >>>>> - /* Check it isn't doing strange things with descriptor > >>>>> numbers. */ > >>>>> - if (nheads > vdev->vq[i].vring.num) { > >>>>> - error_report("VQ %d size 0x%x Guest index 0x%x " > >>>>> - "inconsistent with Host index 0x%x: delta > >>>>> 0x%x", > >>>>> - i, vdev->vq[i].vring.num, > >>>>> - vring_avail_idx(&vdev->vq[i]), > >>>>> - vdev->vq[i].last_avail_idx, nheads); > >>>>> - return -1; > >>>>> - } > >>>> > >>>> Michael, the commit that introduced this check seems to have been for > >>>> debugging rather than to prevent a QEMU crash, so this removing the > >>>> check may be safe: > >>>> > >>>> commit 258dc7c96bb4b7ca71d5bee811e73933310e168c > >>>> Author: Michael S. Tsirkin <m...@redhat.com> > >>>> Date: Sun Oct 17 20:23:48 2010 +0200 > >>>> > >>>> virtio: sanity-check available index > >>>> > >>>> Checking available index upon load instead of > >>>> only when vm is running makes is easier to > >>>> debug failures. > >>> > >>> Agreed. Given this, let's keep the message around, just with > >>> LOG_GUEST_ERROR ? > >> > >> I thought about it. Happy to send a v2 which keeps the check and logs > >> without throwing an error. > >> > >> Separately, I'm thinking of hooking up QEMU with VRING_ERR so datapath > >> handlers can notify QEMU that something went broken and NEEDS_RESET > >> can be flipped on the status register, possibly along a configuration > >> interrupt. I can see libvhost-user supports that, but are there any > >> reasons QEMU doesn't do this already? > > > > Mostly because guest support isn't there. That in turn isn't easy, > > lots of synchronization is needed within guests. > > Do you mean guest support to reset when seeing that bit in status > following the configuration interrupt? > > It should be safe, though. I can have a stab to see if it breaks > Windows/Linux at least, and share an RFC if I get anywhere. > > Unless you think it's a waste of time. Ideally guests shouldn't find > themselves in this situation to begin with, and if they did, resetting > would arguably just lead them into corruption again (for example). But > it does provide a mechanism for QEMU to find out that the vhost > backend stopped. That would help in the context of this patch. > > F.
It's not a waste of time, it's just a lot of work within guests. > > > > > >>> > >>>> Felipe: Did you audit the code to make sure the invalid avail_idx value > >>>> and the fields it is propagated to (e.g. shadow_avail_idx) are always > >>>> used in a safe way? > >>> > >> > >> I did it briefly. I also wrote a mock userspace driver that creates > >> this condition in a very controlled way (so I can step half-way > >> through setting up VQs and trigger a migration, for example). But you > >> know how manual tests are... I may have missed something. > >> Your expert eyes are most welcome. :) > >> > >> F. > >