David Gibson <da...@gibson.dropbear.id.au> writes:

> Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766
> "virtio_balloon: Fix endian bug" and
> 3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling
> of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side)
> handling of the virtio balloon.  In practice, these bugs only affected
> powerpc guests, which is big-endian and frequently configured for 64k
> base page size.  Attempting to use the balloon with the buggy guest
> would usually result in an immediate guest crash.

You should create a new feature VIRTIO_BALLOON_F_ENDIAN_SAFE,
advertise it in the host, and add a guest kernel patch to ack it in
newer kernels.

Older kernels won't ack this feature which gives you a safe way to to
disable the driver on a big endian host.

You won't get support for 3.4 kernels but it's much nicer to handle it
this way.

Regards,

Anthony Liguori

>
> The bugs are fixed now, of course and the balloon works fine with
> kernels v3.4 and later, but unfortunately there are many distro
> releases still in use which still have buggy kernels.
>
> The nature of the page size bug makes it impossible to work around
> from the host side (there simply isn't enough information supplied to
> operate the balloon correctly).  However, it *is* possible with some
> fiddling to safely detect the endian bug in the guest kernel, and
> disable the balloon in this case.  The two fixes were applied to the
> mainline kernel almost consecutively, so there are no released kernels
> with one fix but not the other, meaning we can use the presence of the
> endian bug as a proxy for the presence of the page size bug.
>
> This patch implements such a test, working as follows.
>
> For a fixed guest kernel:
>   1. qemu sets a state variable to "TESTING" and the initial balloon
>      target size to 16 (4k pages).
>   2. When the guest balloon driver starts, it sees the target, finds
>      either 16 unused 4k pages or 1 unused 64k page (depending on
>      guest config) and submits the 16 resulting 4k pfns to the
>      host. qemu, in TESTING state, ignores the submitted pages for
>      now.
>   4. The guest then updates the 'actual' field in the balloon config
>      space to 16.  qemu sees this and determines that the guest is not
>      buggy, it moves to CLEANUP state, and sets the target balloon
>      size back to 0.
>   5. The guest sees the target go back to 0, and reclaims its page(s)
>      from the balloon.  qemu continues to ignore the page addresses
>      for now in CLEANUP state.
>   6. The guest updates the actual field to 0.  qemu now considers
>      cleanup complete and moves to GOOD state.  The balloon now
>      operates normally.
>
> For a buggy kernel:
>   1. qemu sets a state variable to "TESTING" and the initial balloon
>      target size to 16 (4k pages).
>   2. When the guest balloon driver starts it sees the non-zero target,
>      and misinterprets it as 268435456 (endian bug).  It starts trying
>      to find pages to free.
>   3. The guest is probably newly booted, so it almost certainly finds
>      256 pages easily and submits incorrect addresses for them (page
>      size bug) to the host. qemu, in TESTING state ignores the (wrong)
>      addresses for now.
>   4. The guest updates the actual field in config space to 256.  qemu
>      sees this, and determines that the guest is buggy.  It moves to
>      BUGGY state and sets the balloon target back to zero.
>   5. The guest, before attempting to find the next batch of pages to
>      free, rechecks the target and discovers it is now zero.  It
>      reclaims the pages by submitting more wrong addresses, which qemu
>      ignores.
>   6. The balloon is now disabled, if the user attempts to change the
>      balloon size, qemu print an error message and otherwise ignore
>      it.
>
>
> I'm aware that this patch needs a bunch more comments (largely based
> on the info above), but otherwise do people think this is a reasonable
> approach.  It doesn't (and can't) fix the balloon for buggy kernels,
> but it does make the failure mode a lot less ugly.
>
> From dbc721f5e50a39ca3b40d81f060d8bb0e6312995 Mon Sep 17 00:00:00 2001
> From: David Gibson <da...@gibson.dropbear.id.au>
> Date: Thu, 8 Nov 2012 14:49:38 +1100
> Subject: [PATCH] Detection of buggy guest balloon
>
> ---
>  hw/virtio-balloon.c |   77 
> +++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 72 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> index dd1a650..6a9cd3f 100644
> --- a/hw/virtio-balloon.c
> +++ b/hw/virtio-balloon.c
> @@ -37,8 +37,16 @@ typedef struct VirtIOBalloon
>      VirtQueueElement stats_vq_elem;
>      size_t stats_vq_offset;
>      DeviceState *qdev;
> +
> +    int guest_bug_state;
> +#define GUEST_BUG_TESTING       0
> +#define GUEST_BUG_CLEANUP       1
> +#define GUEST_BUG_BUGGY         2
> +#define GUEST_BUG_GOOD          3
>  } VirtIOBalloon;
>  
> +#define GUEST_BUG_TARGET        16
> +
>  static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
>  {
>      return (VirtIOBalloon *)vdev;
> @@ -84,6 +92,11 @@ static void virtio_balloon_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>              pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT;
>              offset += 4;
>  
> +            if (s->guest_bug_state != GUEST_BUG_GOOD) {
> +                /* Still bug testing, not ready to use balloon yet */
> +                continue;
> +            }
> +
>              /* FIXME: remove get_system_memory(), but how? */
>              section = memory_region_find(get_system_memory(), pa, 1);
>              if (!section.size || !memory_region_is_ram(section.mr))
> @@ -134,8 +147,23 @@ static void virtio_balloon_get_config(VirtIODevice 
> *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = to_virtio_balloon(vdev);
>      struct virtio_balloon_config config;
> +    uint32_t num_pages;
> +
> +    switch (dev->guest_bug_state) {
> +    case GUEST_BUG_TESTING:
> +        num_pages = GUEST_BUG_TARGET;
> +        break;
>  
> -    config.num_pages = cpu_to_le32(dev->num_pages);
> +    case GUEST_BUG_CLEANUP:
> +    case GUEST_BUG_BUGGY:
> +        num_pages = 0;
> +        break;
> +
> +    default:
> +        num_pages = dev->num_pages;
> +    }
> +
> +    config.num_pages = cpu_to_le32(num_pages);
>      config.actual = cpu_to_le32(dev->actual);
>  
>      memcpy(config_data, &config, 8);
> @@ -147,11 +175,41 @@ static void virtio_balloon_set_config(VirtIODevice 
> *vdev,
>      VirtIOBalloon *dev = to_virtio_balloon(vdev);
>      struct virtio_balloon_config config;
>      uint32_t oldactual = dev->actual;
> +
>      memcpy(&config, config_data, 8);
>      dev->actual = le32_to_cpu(config.actual);
> -    if (dev->actual != oldactual) {
> -        qemu_balloon_changed(ram_size -
> -                             (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> +
> +    switch (dev->guest_bug_state) {
> +    case GUEST_BUG_TESTING:
> +        if (dev->actual == 0) {
> +            /* Both buggy and non-buggy guests write a 0 before going
> +             * on to write a meaningful value */
> +            break;
> +        }
> +
> +        if (dev->actual > GUEST_BUG_TARGET) {
> +            fprintf(stderr, "virtio-balloon: Buggy guest detected, disabling 
> balloon\n");
> +            dev->guest_bug_state = GUEST_BUG_BUGGY;
> +        } else {
> +            dev->guest_bug_state = GUEST_BUG_CLEANUP;
> +        }
> +        /* Changing bug state implicitly alters the config */
> +        virtio_notify_config(&dev->vdev);
> +        break;
> +
> +    case GUEST_BUG_CLEANUP:
> +        if (dev->actual == 0) {
> +            /* Cleanup completed, proceed with normal operation */
> +            dev->guest_bug_state = GUEST_BUG_GOOD;
> +            virtio_notify_config(&dev->vdev);
> +        }
> +        break;
> +
> +    default:
> +        if (dev->actual != oldactual) {
> +            qemu_balloon_changed(ram_size -
> +                                 (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> +        }
>      }
>  }
>  
> @@ -194,12 +252,21 @@ static void virtio_balloon_to_target(void *opaque, 
> ram_addr_t target)
>  {
>      VirtIOBalloon *dev = opaque;
>  
> +    if (dev->guest_bug_state == GUEST_BUG_BUGGY) {
> +        fprintf(stderr, "Guest is buggy, cannot use balloon\n");
> +    }
> +
>      if (target > ram_size) {
>          target = ram_size;
>      }
>      if (target) {
>          dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
> -        virtio_notify_config(&dev->vdev);
> +
> +        /* If we're still testing for guest bugs, delay the change
> +         * interrupt until we've finished that */
> +        if (dev->guest_bug_state == GUEST_BUG_GOOD) {
> +            virtio_notify_config(&dev->vdev);
> +        }
>      }
>  }
>  
> -- 
> 1.7.10.4
>
>
>
> -- 
> David Gibson                  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au        | minimalist, thank you.  NOT _the_ 
> _other_
>                               | _way_ _around_!
> http://www.ozlabs.org/~dgibson

Reply via email to