David Gibson <da...@gibson.dropbear.id.au> writes: > Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766 > "virtio_balloon: Fix endian bug" and > 3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling > of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side) > handling of the virtio balloon. In practice, these bugs only affected > powerpc guests, which is big-endian and frequently configured for 64k > base page size. Attempting to use the balloon with the buggy guest > would usually result in an immediate guest crash.
You should create a new feature VIRTIO_BALLOON_F_ENDIAN_SAFE, advertise it in the host, and add a guest kernel patch to ack it in newer kernels. Older kernels won't ack this feature which gives you a safe way to to disable the driver on a big endian host. You won't get support for 3.4 kernels but it's much nicer to handle it this way. Regards, Anthony Liguori > > The bugs are fixed now, of course and the balloon works fine with > kernels v3.4 and later, but unfortunately there are many distro > releases still in use which still have buggy kernels. > > The nature of the page size bug makes it impossible to work around > from the host side (there simply isn't enough information supplied to > operate the balloon correctly). However, it *is* possible with some > fiddling to safely detect the endian bug in the guest kernel, and > disable the balloon in this case. The two fixes were applied to the > mainline kernel almost consecutively, so there are no released kernels > with one fix but not the other, meaning we can use the presence of the > endian bug as a proxy for the presence of the page size bug. > > This patch implements such a test, working as follows. > > For a fixed guest kernel: > 1. qemu sets a state variable to "TESTING" and the initial balloon > target size to 16 (4k pages). > 2. When the guest balloon driver starts, it sees the target, finds > either 16 unused 4k pages or 1 unused 64k page (depending on > guest config) and submits the 16 resulting 4k pfns to the > host. qemu, in TESTING state, ignores the submitted pages for > now. > 4. The guest then updates the 'actual' field in the balloon config > space to 16. qemu sees this and determines that the guest is not > buggy, it moves to CLEANUP state, and sets the target balloon > size back to 0. > 5. The guest sees the target go back to 0, and reclaims its page(s) > from the balloon. qemu continues to ignore the page addresses > for now in CLEANUP state. > 6. The guest updates the actual field to 0. qemu now considers > cleanup complete and moves to GOOD state. The balloon now > operates normally. > > For a buggy kernel: > 1. qemu sets a state variable to "TESTING" and the initial balloon > target size to 16 (4k pages). > 2. When the guest balloon driver starts it sees the non-zero target, > and misinterprets it as 268435456 (endian bug). It starts trying > to find pages to free. > 3. The guest is probably newly booted, so it almost certainly finds > 256 pages easily and submits incorrect addresses for them (page > size bug) to the host. qemu, in TESTING state ignores the (wrong) > addresses for now. > 4. The guest updates the actual field in config space to 256. qemu > sees this, and determines that the guest is buggy. It moves to > BUGGY state and sets the balloon target back to zero. > 5. The guest, before attempting to find the next batch of pages to > free, rechecks the target and discovers it is now zero. It > reclaims the pages by submitting more wrong addresses, which qemu > ignores. > 6. The balloon is now disabled, if the user attempts to change the > balloon size, qemu print an error message and otherwise ignore > it. > > > I'm aware that this patch needs a bunch more comments (largely based > on the info above), but otherwise do people think this is a reasonable > approach. It doesn't (and can't) fix the balloon for buggy kernels, > but it does make the failure mode a lot less ugly. > > From dbc721f5e50a39ca3b40d81f060d8bb0e6312995 Mon Sep 17 00:00:00 2001 > From: David Gibson <da...@gibson.dropbear.id.au> > Date: Thu, 8 Nov 2012 14:49:38 +1100 > Subject: [PATCH] Detection of buggy guest balloon > > --- > hw/virtio-balloon.c | 77 > +++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 72 insertions(+), 5 deletions(-) > > diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c > index dd1a650..6a9cd3f 100644 > --- a/hw/virtio-balloon.c > +++ b/hw/virtio-balloon.c > @@ -37,8 +37,16 @@ typedef struct VirtIOBalloon > VirtQueueElement stats_vq_elem; > size_t stats_vq_offset; > DeviceState *qdev; > + > + int guest_bug_state; > +#define GUEST_BUG_TESTING 0 > +#define GUEST_BUG_CLEANUP 1 > +#define GUEST_BUG_BUGGY 2 > +#define GUEST_BUG_GOOD 3 > } VirtIOBalloon; > > +#define GUEST_BUG_TARGET 16 > + > static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev) > { > return (VirtIOBalloon *)vdev; > @@ -84,6 +92,11 @@ static void virtio_balloon_handle_output(VirtIODevice > *vdev, VirtQueue *vq) > pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT; > offset += 4; > > + if (s->guest_bug_state != GUEST_BUG_GOOD) { > + /* Still bug testing, not ready to use balloon yet */ > + continue; > + } > + > /* FIXME: remove get_system_memory(), but how? */ > section = memory_region_find(get_system_memory(), pa, 1); > if (!section.size || !memory_region_is_ram(section.mr)) > @@ -134,8 +147,23 @@ static void virtio_balloon_get_config(VirtIODevice > *vdev, uint8_t *config_data) > { > VirtIOBalloon *dev = to_virtio_balloon(vdev); > struct virtio_balloon_config config; > + uint32_t num_pages; > + > + switch (dev->guest_bug_state) { > + case GUEST_BUG_TESTING: > + num_pages = GUEST_BUG_TARGET; > + break; > > - config.num_pages = cpu_to_le32(dev->num_pages); > + case GUEST_BUG_CLEANUP: > + case GUEST_BUG_BUGGY: > + num_pages = 0; > + break; > + > + default: > + num_pages = dev->num_pages; > + } > + > + config.num_pages = cpu_to_le32(num_pages); > config.actual = cpu_to_le32(dev->actual); > > memcpy(config_data, &config, 8); > @@ -147,11 +175,41 @@ static void virtio_balloon_set_config(VirtIODevice > *vdev, > VirtIOBalloon *dev = to_virtio_balloon(vdev); > struct virtio_balloon_config config; > uint32_t oldactual = dev->actual; > + > memcpy(&config, config_data, 8); > dev->actual = le32_to_cpu(config.actual); > - if (dev->actual != oldactual) { > - qemu_balloon_changed(ram_size - > - (dev->actual << VIRTIO_BALLOON_PFN_SHIFT)); > + > + switch (dev->guest_bug_state) { > + case GUEST_BUG_TESTING: > + if (dev->actual == 0) { > + /* Both buggy and non-buggy guests write a 0 before going > + * on to write a meaningful value */ > + break; > + } > + > + if (dev->actual > GUEST_BUG_TARGET) { > + fprintf(stderr, "virtio-balloon: Buggy guest detected, disabling > balloon\n"); > + dev->guest_bug_state = GUEST_BUG_BUGGY; > + } else { > + dev->guest_bug_state = GUEST_BUG_CLEANUP; > + } > + /* Changing bug state implicitly alters the config */ > + virtio_notify_config(&dev->vdev); > + break; > + > + case GUEST_BUG_CLEANUP: > + if (dev->actual == 0) { > + /* Cleanup completed, proceed with normal operation */ > + dev->guest_bug_state = GUEST_BUG_GOOD; > + virtio_notify_config(&dev->vdev); > + } > + break; > + > + default: > + if (dev->actual != oldactual) { > + qemu_balloon_changed(ram_size - > + (dev->actual << VIRTIO_BALLOON_PFN_SHIFT)); > + } > } > } > > @@ -194,12 +252,21 @@ static void virtio_balloon_to_target(void *opaque, > ram_addr_t target) > { > VirtIOBalloon *dev = opaque; > > + if (dev->guest_bug_state == GUEST_BUG_BUGGY) { > + fprintf(stderr, "Guest is buggy, cannot use balloon\n"); > + } > + > if (target > ram_size) { > target = ram_size; > } > if (target) { > dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT; > - virtio_notify_config(&dev->vdev); > + > + /* If we're still testing for guest bugs, delay the change > + * interrupt until we've finished that */ > + if (dev->guest_bug_state == GUEST_BUG_GOOD) { > + virtio_notify_config(&dev->vdev); > + } > } > } > > -- > 1.7.10.4 > > > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ > _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson