On Thu, Jun 30, 2016 at 10:39 AM, Li, Liang Z <liang.z...@intel.com> wrote: >> On Thu, Jun 30, 2016 at 9:31 AM, Liang Li <liang.z...@intel.com> wrote: >> > After live migration, 'guest-stats' can't get the expected memory >> > status in the guest. This issue is caused by commit 4eae2a657d. >> > The value of 's->stats_vq_elem' will be NULL after live migration, and >> > the check in the function 'balloon_stats_poll_cb()' will prevent the >> > 'virtio_notify()' from executing. So guest will not update the memory >> > status. >> > >> > Signed-off-by: Liang Li <liang.z...@intel.com> >> > Cc: Michael S. Tsirkin <m...@redhat.com> >> > Cc: Ladi Prosek <lpro...@redhat.com> >> > Cc: Paolo Bonzini <pbonz...@redhat.com> >> > --- >> > hw/virtio/virtio-balloon.c | 8 +++++++- >> > 1 file changed, 7 insertions(+), 1 deletion(-) >> > >> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c >> > index 557d3f9..cc6947f 100644 >> > --- a/hw/virtio/virtio-balloon.c >> > +++ b/hw/virtio/virtio-balloon.c >> > @@ -98,13 +98,19 @@ static void balloon_stats_poll_cb(void *opaque) { >> > VirtIOBalloon *s = opaque; >> > VirtIODevice *vdev = VIRTIO_DEVICE(s); >> > + VirtQueueElement elem = {0}; >> > >> > - if (s->stats_vq_elem == NULL || !balloon_stats_supported(s)) { >> > + if (!balloon_stats_supported(s)) { >> > /* re-schedule */ >> > balloon_stats_change_timer(s, s->stats_poll_interval); >> > return; >> > } >> > >> > + if (s->stats_vq_elem == NULL) { >> > + virtqueue_push(s->svq, &elem, 0); >> > + virtio_notify(vdev, s->svq); >> > + return; >> > + } >> >> What if we are not in the just-after-live-migration situation though? >> If the guest simply didn't add a buffer to the queue for some reason, >> wouldn't this newly added push/notify break the balloon protocol? >> > Could you elaborate how it happens? > The added code only works for the situation just after live migration. right?
I don't believe so. As eloquently stated in the spec (5.5.6.3 Memory Statistics), the stats queue is "atypical". If everything goes well, the stats_vq_elem field is indeed expected to be non-NULL here in balloon_stats_poll_cb. But it may as well be NULL, for example if step 3. "The driver collects memory statistics and writes them into a new buffer." takes a long time and the driver doesn't make it by the time the callback fires. Basically, the driver and device play ping-pong and the value of stats_vq_elem is NULL or non-NULL depending on which side the ball is. It looks like stats_vq_elem should be part of the QEMU state that is preserved across migrations, although I'll admit that I am unfamiliar with how migrations work and may be missing something important. Thanks! Ladi > Thanks! > Liang