Re: [PATCH v10 13/14] vfio-user: handle device interrupts

2022-06-06 Thread Alexander Duyck
On Mon, Jun 6, 2022 at 12:29 PM Jag Raman  wrote:
>
>
>
> > On Jun 6, 2022, at 2:32 PM, Alexander Duyck  
> > wrote:
> >
> > On Tue, May 24, 2022 at 9:11 AM Jagannathan Raman  
> > wrote:
> >>
> >> Forward remote device's interrupts to the guest
> >>
> >> Signed-off-by: Elena Ufimtseva 
> >> Signed-off-by: John G Johnson 
> >> Signed-off-by: Jagannathan Raman 
> >> ---
> >> include/hw/pci/pci.h  |  13 
> >> include/hw/remote/vfio-user-obj.h |   6 ++
> >> hw/pci/msi.c  |  16 ++--
> >> hw/pci/msix.c |  10 ++-
> >> hw/pci/pci.c  |  13 
> >> hw/remote/machine.c   |  14 +++-
> >> hw/remote/vfio-user-obj.c | 123 ++
> >> stubs/vfio-user-obj.c |   6 ++
> >> MAINTAINERS   |   1 +
> >> hw/remote/trace-events|   1 +
> >> stubs/meson.build |   1 +
> >> 11 files changed, 193 insertions(+), 11 deletions(-)
> >> create mode 100644 include/hw/remote/vfio-user-obj.h
> >> create mode 100644 stubs/vfio-user-obj.c
> >>
> >
> > So I had a question about a few bits below. Specifically I ran into
> > issues when I had setup two devices to be assigned to the same VM via
> > two vfio-user-pci/x-vfio-user-server interfaces.
>
> Thanks for the heads up, Alexander!
>
> >
> > What I am hitting is an assert(irq_num < bus->nirq) in
> > pci_bus_change_irq_level in the server.
>
> That is issue is because we are only initializing only one
> IRQ for the PCI bus - but it should be more. We will update
> it and when the bus initializes interrupts and get back to you.
>
> We only tested MSI/x devices for the multi-device config. We
> should also test INTx - could you please confirm which device
> you’re using so we could verify that it works before posting
> the next revision.
>
> Thank you!
> --
> Jag

I was testing an MSI-X capable NIC, so you can reproduce with e1000e.
During the device enumeration before the driver was even loaded it
threw the error:
qemu-system-x86_64: ../hw/pci/pci.c:276: pci_bus_change_irq_level:
Assertion `irq_num < bus->nirq' failed.

All it really takes is attaching to the second device, that is enough
to trigger the error since the irq_num will be non-zero.

If I recall, all the kernel is doing is unmasking the interrupt via
the config space and it is triggering the error which then shuts down
the server.



Re: [PATCH v10 13/14] vfio-user: handle device interrupts

2022-06-06 Thread Alexander Duyck
On Tue, May 24, 2022 at 9:11 AM Jagannathan Raman  wrote:
>
> Forward remote device's interrupts to the guest
>
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> ---
>  include/hw/pci/pci.h  |  13 
>  include/hw/remote/vfio-user-obj.h |   6 ++
>  hw/pci/msi.c  |  16 ++--
>  hw/pci/msix.c |  10 ++-
>  hw/pci/pci.c  |  13 
>  hw/remote/machine.c   |  14 +++-
>  hw/remote/vfio-user-obj.c | 123 ++
>  stubs/vfio-user-obj.c |   6 ++
>  MAINTAINERS   |   1 +
>  hw/remote/trace-events|   1 +
>  stubs/meson.build |   1 +
>  11 files changed, 193 insertions(+), 11 deletions(-)
>  create mode 100644 include/hw/remote/vfio-user-obj.h
>  create mode 100644 stubs/vfio-user-obj.c
>

So I had a question about a few bits below. Specifically I ran into
issues when I had setup two devices to be assigned to the same VM via
two vfio-user-pci/x-vfio-user-server interfaces.

What I am hitting is an assert(irq_num < bus->nirq) in
pci_bus_change_irq_level in the server.

> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> index 645b54343d..75d550daae 100644
> --- a/hw/remote/machine.c
> +++ b/hw/remote/machine.c
> @@ -23,6 +23,8 @@
>  #include "hw/remote/iommu.h"
>  #include "hw/qdev-core.h"
>  #include "hw/remote/iommu.h"
> +#include "hw/remote/vfio-user-obj.h"
> +#include "hw/pci/msi.h"
>
>  static void remote_machine_init(MachineState *machine)
>  {
> @@ -54,12 +56,16 @@ static void remote_machine_init(MachineState *machine)
>
>  if (s->vfio_user) {
>  remote_iommu_setup(pci_host->bus);
> -}
>
> -remote_iohub_init(>iohub);
> +msi_nonbroken = true;
> +
> +vfu_object_set_bus_irq(pci_host->bus);
> +} else {
> +remote_iohub_init(>iohub);
>
> -pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
> - >iohub, REMOTE_IOHUB_NB_PIRQS);
> +pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, 
> remote_iohub_map_irq,
> + >iohub, REMOTE_IOHUB_NB_PIRQS);
> +}
>
>  qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
>  }

If I am reading the code right this limits us to one legacy interrupt
in the vfio_user case, irq 0, correct? Is this intentional? Just
wanted to verify as this seems to limit us to supporting only one
device based on the mapping below.

> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index ee28a93782..eeb165a805 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -53,6 +53,9 @@
>  #include "hw/pci/pci.h"
>  #include "qemu/timer.h"
>  #include "exec/memory.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/msix.h"
> +#include "hw/remote/vfio-user-obj.h"
>
>  #define TYPE_VFU_OBJECT "x-vfio-user-server"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -96,6 +99,10 @@ struct VfuObject {
>  Error *unplug_blocker;
>
>  int vfu_poll_fd;
> +
> +MSITriggerFunc *default_msi_trigger;
> +MSIPrepareMessageFunc *default_msi_prepare_message;
> +MSIxPrepareMessageFunc *default_msix_prepare_message;
>  };
>
>  static void vfu_object_init_ctx(VfuObject *o, Error **errp);
> @@ -520,6 +527,111 @@ static void vfu_object_register_bars(vfu_ctx_t 
> *vfu_ctx, PCIDevice *pdev)
>  }
>  }
>
> +static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
> +{
> +int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> +pci_dev->devfn);
> +
> +return pci_bdf;
> +}
> +

This bit ends up mapping it so that the BDF ends up setting the IRQ
number. So for example device 0, function 0 will be IRQ 0, and device
1, function 0 will be IRQ 8. Just wondering why it is implemented this
way if we only intend to support one device. Also I am wondering if we
should support some sort of IRQ sharing?

> +static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
> +{
> +vfu_ctx_t *vfu_ctx = o->vfu_ctx;
> +int ret;
> +
> +ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +if (msix_nr_vectors_allocated(pci_dev)) {
> +ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
> +   msix_nr_vectors_allocated(pci_dev));
> +} else if (msi_nr_vectors_allocated(pci_dev)) {
> +ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
> +   msi_nr_vectors_allocated(pci_dev));
> +}
> +
> +if (ret < 0) {
> +return ret;
> +}
> +
> +vfu_object_setup_msi_cbs(o);
> +
> +pci_dev->irq_opaque = vfu_ctx;
> +
> +return 0;
> +}
> +
> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
> +{
> +pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, pci_bus, 
> 1);
> +}

Re: [RFC PATCH] mpqemu: Remove unlock/lock of iothread in mpqemu-link send and recv functions

2022-05-24 Thread Alexander Duyck
On Mon, May 23, 2022 at 3:56 PM Jag Raman  wrote:
>
>
>
> > On May 23, 2022, at 11:09 AM, Alexander Duyck  
> > wrote:
> >
> > From: Alexander Duyck 
> >
> > When I run Multi-process QEMU with an e1000 as the remote device and SMP
> > enabled I see the combination lock up and become unresponsive. The QEMU 
> > build
> > is a fairly standard x86_64-softmmu setup. After doing some digging I 
> > tracked
> > the lockup down to the what appears to be a race with the mpqemu-link 
> > msg_send
> > and msg_receive functions and the reacquisition of the lock.
> >
> > I am assuming the issue is some sort of lock inversion though I haven't
> > identified exactly what the other lock involved is yet. For now removing
> > the logic to unlock the iothread and then reacquire the lock seems to
> > resolve the issue. I am assuming the releasing of the lock was some form of
> > optimization but I am not certain so I am submitting this as an RFC.
>
> Hi Alexander,
>
> We are working on moving away from Multi-process QEMU and to using vfio-user
> based approach. The vfio-user patches are under review. I believe we would 
> drop
> the Multi-process support once vfio-user is merged.
>
> We release the lock here while communicating with the remote process via the
> QIOChannel. It is to prevent lockup of the VM in case the QIOChannel hangs.
>
> I was able to reproduce this issue at my end. There is a deadlock between
> "mpqemu_msg_send() -> qemu_mutex_lock_iothread()" and
> "mpqemu_msg_send_and_await_reply() -> QEMU_LOCK_GUARD(>io_mutex)”.
>
> From what I can tell, as soon as one vcpu thread drops the iothread lock, 
> another
> thread running mpqemu_msg_send_and_await_reply() holds on to it. That prevents
> the first thread from completing. Attaching backtrace below.
>
> To avoid the deadlock, I think we should drop both the iothread lock and 
> io_mutex
> and reacquire them in the correct order - first iothread and then io_mutex. 
> Given
> multiprocess QEMU would be dropped in the near future, I suppose we don’t have
> to proceed further along these lines.
>
> I tested your patch, and that fixes the e1000 issue at my end also. I believe 
> we
> could adopt it.

Hi Jag,

I will go take a look at the vfio-user patches. I hadn't been
following the list lately so I didn't realize things were headed in
that direction. It may work out better for me depending on what is
enabled, as I was looking at working with PCIe config and MSI-X anyway
so if the vfio-user supports that already then it may save me some
work.

What you describe as being the issue doesn't sound too surprising, as
I had mentioned disabling SMP also solved the problem for me. It is
likely due to the fact that the e1000 has separate threads running for
handling stats and such and then the main thread handling the
networking. I would think we cannot drop the io_mutex though as it is
needed to enforce the ordering of the request send and the reply
receive. Rather if we need to drop the iothread lock I would think it
might be better to drop it before acquiring the io_mutex, or to at
least wait until after we release the io_mutex before reacquiring it.

If you want I can resubmit my patch for acceptance, but it sounds like
the code will be deprecated soon so I am not sure there is much point.
I will turn my focus to vfio-user and see if I can get it up and
running for my use case.

Thanks,

- Alex



[RFC PATCH] mpqemu: Remove unlock/lock of iothread in mpqemu-link send and recv functions

2022-05-23 Thread Alexander Duyck
From: Alexander Duyck 

When I run Multi-process QEMU with an e1000 as the remote device and SMP
enabled I see the combination lock up and become unresponsive. The QEMU build
is a fairly standard x86_64-softmmu setup. After doing some digging I tracked
the lockup down to the what appears to be a race with the mpqemu-link msg_send
and msg_receive functions and the reacquisition of the lock.

I am assuming the issue is some sort of lock inversion though I haven't
identified exactly what the other lock involved is yet. For now removing
the logic to unlock the iothread and then reacquire the lock seems to
resolve the issue. I am assuming the releasing of the lock was some form of
optimization but I am not certain so I am submitting this as an RFC.

Signed-off-by: Alexander Duyck 
---
 hw/remote/mpqemu-link.c |   25 -
 1 file changed, 25 deletions(-)

diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index 9bd98e82197e..3e7818f54a63 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -33,7 +33,6 @@
  */
 bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
 {
-bool iolock = qemu_mutex_iothread_locked();
 bool iothread = qemu_in_iothread();
 struct iovec send[2] = {};
 int *fds = NULL;
@@ -57,16 +56,6 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error 
**errp)
  */
 assert(qemu_in_coroutine() || !iothread);
 
-/*
- * Skip unlocking/locking iothread lock when the IOThread is running
- * in co-routine context. Co-routine context is asserted above
- * for IOThread case.
- * Also skip lock handling while in a co-routine in the main context.
- */
-if (iolock && !iothread && !qemu_in_coroutine()) {
-qemu_mutex_unlock_iothread();
-}
-
 if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
 fds, nfds, 0, errp)) {
 ret = true;
@@ -74,11 +63,6 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error 
**errp)
 trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
 }
 
-if (iolock && !iothread && !qemu_in_coroutine()) {
-/* See above comment why skip locking here. */
-qemu_mutex_lock_iothread();
-}
-
 return ret;
 }
 
@@ -96,7 +80,6 @@ static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t 
len, int **fds,
size_t *nfds, Error **errp)
 {
 struct iovec iov = { .iov_base = buf, .iov_len = len };
-bool iolock = qemu_mutex_iothread_locked();
 bool iothread = qemu_in_iothread();
 int ret = -1;
 
@@ -106,16 +89,8 @@ static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, 
size_t len, int **fds,
  */
 assert(qemu_in_coroutine() || !iothread);
 
-if (iolock && !iothread && !qemu_in_coroutine()) {
-qemu_mutex_unlock_iothread();
-}
-
 ret = qio_channel_readv_full_all_eof(ioc, , 1, fds, nfds, errp);
 
-if (iolock && !iothread && !qemu_in_coroutine()) {
-qemu_mutex_lock_iothread();
-}
-
 return (ret <= 0) ? ret : iov.iov_len;
 }
 





Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT

2021-07-07 Thread Alexander Duyck
On Wed, Jul 7, 2021 at 1:08 PM Peter Xu  wrote:
>
> On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
> > On 07.07.21 20:02, Peter Xu wrote:
> > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > > As it never worked properly, let's disable it via the postcopy notifier 
> > > > on
> > > > the destination. Trying to set "migrate_set_capability postcopy-ram on"
> > > > on the destination now results in "virtio-balloon: 'free-page-hint' does
> > > > not support postcopy Error: Postcopy is not supported".
> > >
> > > Would it be possible to do this in reversed order?  Say, dynamically 
> > > disable
> > > free-page-hinting if postcopy capability is set when migration starts? 
> > > Perhaps
> > > it can also be re-enabled automatically when migration completes?
> >
> > I remember that this might be quite racy. We would have to make sure that no
> > hinting happens before we enable the capability.
> >
> > As soon as we messed with the dirty bitmap (during precopy), postcopy is no
> > longer safe. As noted in the patch, the only runtime alternative is to
> > disable postcopy as soon as we actually do clear a bit. Alternatively, we
> > could ignore any hints if the postcopy capability was enabled.
>
> Logically migration capabilities are applied at VM starts, and these
> capabilities should be constant during migration (I didn't check if there's a
> hard requirement; easy to add that if we want to assure it), and in most cases
> for the lifecycle of the vm.

Would it make sense to maybe just look at adding a postcopy value to
the PrecopyNotifyData that you could populate with
migration_in_postcopy() in precopy_notify()?

Then all you would need to do is check for that value and if it is set
you shut down the page hinting or don't start it since I suspect it
wouldn't likely add any value anyway since I would think flagging
unused pages doesn't add much value in a postcopy environment anyway.

> >
> > Whatever we do, we have to make sure that a user cannot trick the system
> > into an inconsistent state. Like enabling hinting, starting migration, then
> > enabling the postcopy capability and kicking of postcopy. I did not check if
> > we allow for that, though.
>
> We could turn free page hinting off when migration starts with 
> postcopy-ram=on,
> then re-enable it after migration finishes.  That looks very safe to me.  And 
> I
> don't even worry on user trying to mess it up - as that only put their own VM
> at risk; that's mostly fine to me.

We wouldn't necessarily even need to really turn it off, just don't
start it. I wonder if we couldn't just get away with adding a check to
the existing virtio_balloon_free_page_hint_notify to see if we are in
the postcopy state there and just shut things down or not start them.



Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-23 Thread Alexander Duyck
On Tue, Dec 22, 2020 at 7:39 PM Liang Li  wrote:
>
> > > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev,
> > > +struct hstate *h, unsigned int nid,
> > > +struct scatterlist *sgl, unsigned int *offset)
> > > +{
> > > +   struct list_head *list = >hugepage_freelists[nid];
> > > +   unsigned int page_len = PAGE_SIZE << h->order;
> > > +   struct page *page, *next;
> > > +   long budget;
> > > +   int ret = 0, scan_cnt = 0;
> > > +
> > > +   /*
> > > +* Perform early check, if free area is empty there is
> > > +* nothing to process so we can skip this free_list.
> > > +*/
> > > +   if (list_empty(list))
> > > +   return ret;
> > > +
> > > +   spin_lock_irq(_lock);
> > > +
> > > +   if (huge_page_order(h) > MAX_ORDER)
> > > +   budget = HUGEPAGE_REPORTING_CAPACITY;
> > > +   else
> > > +   budget = HUGEPAGE_REPORTING_CAPACITY * 32;
> >
> > Wouldn't huge_page_order always be more than MAX_ORDER? Seems like we
> > don't even really need budget since this should probably be pulling
> > out no more than one hugepage at a time.
>
> I want to disting a 2M page and 1GB page here. The order of 1GB page is 
> greater
> than MAX_ORDER while 2M page's order is less than MAX_ORDER.

The budget here is broken. When I put the budget in page reporting it
was so that we wouldn't try to report all of the memory in a given
region. It is meant to hold us to no more than one pass through 1/16
of the free memory. So essentially we will be slowly processing all of
memory and it will take 16 calls (32 seconds) for us to process a
system that is sitting completely idle. It is meant to pace us so we
don't spend a ton of time doing work that will be undone, not to
prevent us from burying a CPU which is what seems to be implied here.

Using HUGEPAGE_REPORTING_CAPACITY makes no sense here. I was using it
in the original definition because it was how many pages we could
scoop out at a time and then I was aiming for a 16th of that. Here you
are arbitrarily squaring HUGEPAGE_REPORTING_CAPACITY in terms of the
amount of work you will doo since you are using it as a multiple
instead of a divisor.

> >
> > > +   /* loop through free list adding unreported pages to sg list */
> > > +   list_for_each_entry_safe(page, next, list, lru) {
> > > +   /* We are going to skip over the reported pages. */
> > > +   if (PageReported(page)) {
> > > +   if (++scan_cnt >= MAX_SCAN_NUM) {
> > > +   ret = scan_cnt;
> > > +   break;
> > > +   }
> > > +   continue;
> > > +   }
> > > +
> >
> > It would probably have been better to place this set before your new
> > set. I don't see your new set necessarily being the best use for page
> > reporting.
>
> I haven't really latched on to what you mean, could you explain it again?

It would be better for you to spend time understanding how this patch
set works before you go about expanding it to do other things.
Mistakes like the budget one above kind of point out the fact that you
don't understand how this code was supposed to work and just kind of
shoehorned you page zeroing code onto it.

It would be better to look at trying to understand this code first
before you extend it to support your zeroing use case. So adding huge
pages first might make more sense than trying to zero and push the
order down. The fact is the page reporting extension should be minimal
for huge pages since they are just passed as a scatterlist so you
should only need to add a small bit to page_reporting.c to extend it
to support this use case.

> >
> > > +   /*
> > > +* If we fully consumed our budget then update our
> > > +* state to indicate that we are requesting additional
> > > +* processing and exit this list.
> > > +*/
> > > +   if (budget < 0) {
> > > +   atomic_set(>state, 
> > > PAGE_REPORTING_REQUESTED);
> > > +   next = page;
> > > +   break;
> > > +   }
> > > +
> >
> > If budget is only ever going to be 1 then we probably could just look
> > at making this the default case for any time we find a non-reported
> > page.
>
> and here again.

It comes down to the fact that the changes you made have a significant
impact on how this is supposed to function. Reducing the scatterlist
to a size of one makes the whole point of doing batching kind of
pointless. Basically the code should be rewritten with the assumption
that if you find a page you report it.

The old code would batch things up because there is significant
overhead to be addressed when going to the hypervisor to report said
memory. Your code doesn't seem to really take anything like that into
account 

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Alexander Duyck
On Mon, Dec 21, 2020 at 11:47 PM Liang Li  wrote:
>
> Free page reporting only supports buddy pages, it can't report the
> free pages reserved for hugetlbfs case. On the other hand, hugetlbfs
> is a good choice for a system with a huge amount of RAM, because it
> can help to reduce the memory management overhead and improve system
> performance.
> This patch add the support for reporting hugepages in the free list
> of hugetlb, it canbe used by virtio_balloon driver for memory
> overcommit and pre zero out free pages for speeding up memory population.
>
> Cc: Alexander Duyck 
> Cc: Mel Gorman 
> Cc: Andrea Arcangeli 
> Cc: Dan Williams 
> Cc: Dave Hansen 
> Cc: David Hildenbrand 
> Cc: Michal Hocko 
> Cc: Andrew Morton 
> Cc: Alex Williamson 
> Cc: Michael S. Tsirkin 
> Cc: Jason Wang 
> Cc: Mike Kravetz 
> Cc: Liang Li 
> Signed-off-by: Liang Li 
> ---
>  include/linux/hugetlb.h|   3 +
>  include/linux/page_reporting.h |   5 +
>  mm/hugetlb.c   |  29 
>  mm/page_reporting.c| 287 +
>  mm/page_reporting.h|  34 
>  5 files changed, 358 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..a72ad25501d3 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct ctl_table;
>  struct user_struct;
> @@ -114,6 +115,8 @@ int hugetlb_treat_movable_handler(struct ctl_table *, 
> int, void *, size_t *,
>  int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void *, size_t 
> *,
> loff_t *);
>
> +bool isolate_free_huge_page(struct page *page, struct hstate *h, int nid);
> +void putback_isolate_huge_page(struct hstate *h, struct page *page);
>  int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct 
> vm_area_struct *);
>  long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
>  struct page **, struct vm_area_struct **,
> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
> index 63e1e9fbcaa2..0da3d1a6f0cc 100644
> --- a/include/linux/page_reporting.h
> +++ b/include/linux/page_reporting.h
> @@ -7,6 +7,7 @@
>
>  /* This value should always be a power of 2, see page_reporting_cycle() */
>  #define PAGE_REPORTING_CAPACITY32
> +#define HUGEPAGE_REPORTING_CAPACITY1
>
>  struct page_reporting_dev_info {
> /* function that alters pages to make them "reported" */
> @@ -26,4 +27,8 @@ struct page_reporting_dev_info {
>  /* Tear-down and bring-up for page reporting devices */
>  void page_reporting_unregister(struct page_reporting_dev_info *prdev);
>  int page_reporting_register(struct page_reporting_dev_info *prdev);
> +
> +/* Tear-down and bring-up for hugepage reporting devices */
> +void hugepage_reporting_unregister(struct page_reporting_dev_info *prdev);
> +int hugepage_reporting_register(struct page_reporting_dev_info *prdev);
>  #endif /*_LINUX_PAGE_REPORTING_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index cbf32d2824fd..de6ce147dfe2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -41,6 +41,7 @@
>  #include 
>  #include 
>  #include 
> +#include "page_reporting.h"
>  #include "internal.h"
>
>  int hugetlb_max_hstate __read_mostly;
> @@ -1028,6 +1029,11 @@ static void enqueue_huge_page(struct hstate *h, struct 
> page *page)
> list_move(>lru, >hugepage_freelists[nid]);
> h->free_huge_pages++;
> h->free_huge_pages_node[nid]++;
> +   if (hugepage_reported(page)) {
> +   __ClearPageReported(page);
> +   pr_info("%s, free_huge_pages=%ld\n", __func__, 
> h->free_huge_pages);
> +   }
> +   hugepage_reporting_notify_free(h->order);
>  }
>
>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> @@ -5531,6 +5537,29 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long 
> address, pgd_t *pgd, int fla
> return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> 
> PAGE_SHIFT);
>  }
>
> +bool isolate_free_huge_page(struct page *page, struct hstate *h, int nid)
> +{
> +   bool ret = true;
> +
> +   VM_BUG_ON_PAGE(!PageHead(page), page);
> +
> +   list_move(>lru, >hugepage_activelist);
> +   set_page_refcounted(page);
> +   h->free_huge_pages--;
> +   h->free_huge_pages_node[nid]--;
> +
> +   return ret;
> +}
> +
> +void putback_isolate_huge_page(struct hstate *h, struct page *page)
> 

Re: [PATCH v2] e1000e: Added ICR clearing by corresponding IMS bit.

2020-12-14 Thread Alexander Duyck
Okay. That sounds reasonable. You should repost this with your more
thorough explanation of the problem and how this solves it in the
patch description.

Thanks.

- Alex

On Mon, Dec 14, 2020 at 3:09 AM Andrew Melnichenko  wrote:
>
> Hi,
> The issue is in LSC clearing. So, after "link up"(during initialization), the 
> next LSC event is masked and can't be processed.
> Technically, the event should be 'cleared' during ICR read.
> On Windows guest, everything works well, mostly because of different 
> interrupt routines(ICR clears during register write).
> So, I've added ICR clearing during read, according to the note by
> section 13.3.27 of the 8257X developers manual.
>
> On Thu, Dec 3, 2020 at 7:57 PM Alexander Duyck  
> wrote:
>>
>> On Thu, Dec 3, 2020 at 5:00 AM Andrew Melnychenko  wrote:
>> >
>> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1707441
>>
>> So the bugzilla seems to be reporting that the NIC operstate is being
>> misreported when qemu has configured the link down. Based on the
>> description it isn't clear to me how this patch addresses that. Some
>> documentation on how you reproduced the issue, and what was seen
>> before and after this patch would be useful.
>>
>> > Added ICR clearing if there is IMS bit - according to the note by
>>
>> Should probably be "Add" instead of "Added". Same for the title of the patch.
>>
>> > section 13.3.27 of the 8257X developers manual.
>> >
>> > Signed-off-by: Andrew Melnychenko 
>> > ---
>> >  hw/net/e1000e_core.c | 10 ++
>> >  hw/net/trace-events  |  1 +
>> >  2 files changed, 11 insertions(+)
>> >
>> > diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
>> > index 095c01ebc6..9705f5c52e 100644
>> > --- a/hw/net/e1000e_core.c
>> > +++ b/hw/net/e1000e_core.c
>> > @@ -2624,6 +2624,16 @@ e1000e_mac_icr_read(E1000ECore *core, int index)
>> >  e1000e_clear_ims_bits(core, core->mac[IAM]);
>> >  }
>> >
>> > +/*
>> > + * PCIe* GbE Controllers Open Source Software Developer's Manual
>> > + * 13.3.27 Interrupt Cause Read Register
>> > + */
>> > +if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
>> > +(core->mac[ICR] & core->mac[IMS])) {
>> > +trace_e1000e_irq_icr_clear_icr_bit_ims(core->mac[ICR], 
>> > core->mac[IMS]);
>> > +core->mac[ICR] = 0;
>> > +}
>> > +
>> >  trace_e1000e_irq_icr_read_exit(core->mac[ICR]);
>> >  e1000e_update_interrupt_state(core);
>> >  return ret;
>>
>> Changes like this have historically been problematic. I am curious
>> what testing had been done on this and with what drivers? Keep in mind
>> that we have to support several flavors of BSD, Windows, and Linux
>> with this.
>>
>> > diff --git a/hw/net/trace-events b/hw/net/trace-events
>> > index 5db45456d9..2c3521a19c 100644
>> > --- a/hw/net/trace-events
>> > +++ b/hw/net/trace-events
>> > @@ -237,6 +237,7 @@ e1000e_irq_icr_read_entry(uint32_t icr) "Starting ICR 
>> > read. Current ICR: 0x%x"
>> >  e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current ICR: 
>> > 0x%x"
>> >  e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to zero IMS"
>> >  e1000e_irq_icr_clear_iame(void) "Clearing ICR on read due to IAME"
>> > +e1000e_irq_icr_clear_icr_bit_ims(uint32_t icr, uint32_t ims) "Clearing 
>> > ICR on read due corresponding IMS bit: 0x%x & 0x%x"
>> >  e1000e_irq_iam_clear_eiame(uint32_t iam, uint32_t cause) "Clearing IMS 
>> > due to EIAME, IAM: 0x%X, cause: 0x%X"
>> >  e1000e_irq_icr_clear_eiac(uint32_t icr, uint32_t eiac) "Clearing ICR bits 
>> > due to EIAC, ICR: 0x%X, EIAC: 0x%X"
>> >  e1000e_irq_ims_clear_set_imc(uint32_t val) "Clearing IMS bits due to IMC 
>> > write 0x%x"
>> > --
>> > 2.29.2
>> >



Re: [PATCH v2] e1000e: Added ICR clearing by corresponding IMS bit.

2020-12-03 Thread Alexander Duyck
On Thu, Dec 3, 2020 at 5:00 AM Andrew Melnychenko  wrote:
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1707441

So the bugzilla seems to be reporting that the NIC operstate is being
misreported when qemu has configured the link down. Based on the
description it isn't clear to me how this patch addresses that. Some
documentation on how you reproduced the issue, and what was seen
before and after this patch would be useful.

> Added ICR clearing if there is IMS bit - according to the note by

Should probably be "Add" instead of "Added". Same for the title of the patch.

> section 13.3.27 of the 8257X developers manual.
>
> Signed-off-by: Andrew Melnychenko 
> ---
>  hw/net/e1000e_core.c | 10 ++
>  hw/net/trace-events  |  1 +
>  2 files changed, 11 insertions(+)
>
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index 095c01ebc6..9705f5c52e 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -2624,6 +2624,16 @@ e1000e_mac_icr_read(E1000ECore *core, int index)
>  e1000e_clear_ims_bits(core, core->mac[IAM]);
>  }
>
> +/*
> + * PCIe* GbE Controllers Open Source Software Developer's Manual
> + * 13.3.27 Interrupt Cause Read Register
> + */
> +if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
> +(core->mac[ICR] & core->mac[IMS])) {
> +trace_e1000e_irq_icr_clear_icr_bit_ims(core->mac[ICR], 
> core->mac[IMS]);
> +core->mac[ICR] = 0;
> +}
> +
>  trace_e1000e_irq_icr_read_exit(core->mac[ICR]);
>  e1000e_update_interrupt_state(core);
>  return ret;

Changes like this have historically been problematic. I am curious
what testing had been done on this and with what drivers? Keep in mind
that we have to support several flavors of BSD, Windows, and Linux
with this.

> diff --git a/hw/net/trace-events b/hw/net/trace-events
> index 5db45456d9..2c3521a19c 100644
> --- a/hw/net/trace-events
> +++ b/hw/net/trace-events
> @@ -237,6 +237,7 @@ e1000e_irq_icr_read_entry(uint32_t icr) "Starting ICR 
> read. Current ICR: 0x%x"
>  e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current ICR: 0x%x"
>  e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to zero IMS"
>  e1000e_irq_icr_clear_iame(void) "Clearing ICR on read due to IAME"
> +e1000e_irq_icr_clear_icr_bit_ims(uint32_t icr, uint32_t ims) "Clearing ICR 
> on read due corresponding IMS bit: 0x%x & 0x%x"
>  e1000e_irq_iam_clear_eiame(uint32_t iam, uint32_t cause) "Clearing IMS due 
> to EIAME, IAM: 0x%X, cause: 0x%X"
>  e1000e_irq_icr_clear_eiac(uint32_t icr, uint32_t eiac) "Clearing ICR bits 
> due to EIAC, ICR: 0x%X, EIAC: 0x%X"
>  e1000e_irq_ims_clear_set_imc(uint32_t val) "Clearing IMS bits due to IMC 
> write 0x%x"
> --
> 2.29.2
>



Re: [PATCH v2] virtio-balloon: always indicate S_DONE when migration fails

2020-07-22 Thread Alexander Duyck
On Wed, Jul 22, 2020 at 5:12 AM David Hildenbrand  wrote:
>
> On 22.07.20 14:05, David Hildenbrand wrote:
> > On 22.07.20 14:04, Michael S. Tsirkin wrote:
> >> On Mon, Jun 29, 2020 at 10:06:15AM +0200, David Hildenbrand wrote:
> >>> If something goes wrong during precopy, before stopping the VM, we will
> >>> never send a S_DONE indication to the VM, resulting in the hinted pages
> >>> not getting released to be used by the guest OS (e.g., Linux).
> >>>
> >>> Easy to reproduce:
> >>> 1. Start migration (e.g., HMP "migrate -d 'exec:gzip -c > STATEFILE.gz'")
> >>> 2. Cancel migration (e.g., HMP "migrate_cancel")
> >>> 3. Oberve in the guest (e.g., cat /proc/meminfo) that there is basically
> >>>no free memory left.
> >>>
> >>> While at it, add similar locking to virtio_balloon_free_page_done() as
> >>> done in virtio_balloon_free_page_stop. Locking is still weird, but that
> >>> has to be sorted out separately.
> >>>
> >>> There is nothing to do in the PRECOPY_NOTIFY_COMPLETE case. Add some
> >>> comments regarding S_DONE handling.
> >>>
> >>> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> >>> Reviewed-by: Alexander Duyck 
> >>> Cc: Wei Wang 
> >>> Cc: Alexander Duyck 
> >>> Signed-off-by: David Hildenbrand 
> >>
> >> IIUC this is superceded by Alexander's patches right?
> >
> > Not that I know ... @Alex?
> >
>
> Okay, I'm confused, that patch is already upstream (via your tree)?
>
> dd8eeb9671fc ("virtio-balloon: always indicate S_DONE when migration fails")
>
> Did you stumble over this mail by mistake again?

I agree. David's patches should be upstream, and if they aren't they
should be applied before mine. Mine are meant to be a follow-on as I
end the set with the "report"->"hint" rename for several fields. The
idea is the fixes go first and the rename comes last.

Thanks.

- Alex



[PATCH v3 QEMU 3/3] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-07-20 Thread Alexander Duyck
From: Alexander Duyck 

Recently a feature named Free Page Reporting was added to the virtio
balloon. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   76 ++--
 include/hw/virtio/virtio-balloon.h |   20 +
 2 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 6e2d1293402f..22cb5df717b1 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -526,22 +526,22 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED &&
-id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED &&
+id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -567,11 +567,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -594,14 +594,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 
 qemu_mutex_lock(>free_page_lock);
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 qemu_mutex_unlock(>free_page_lock);
 
 virtio_notify_config(vdev);
@@ -611,18 +611,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
- * The guest hasn't done the reporting, so host sends a notification
- * to the guest to actively stop the reporting.
+ * The guest isn't done hinting, so send a notification
+ * to the guest to actively stop the hinting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -632,20 +632,20 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_DONE) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_DONE) {
 /* See virtio_balloon_free_page_stop() */
 q

[PATCH v3 QEMU 1/3] virtio-balloon: Prevent guest from starting a report when we didn't request one

2020-07-20 Thread Alexander Duyck
From: Alexander Duyck 

Based on code review it appears possible for the driver to force the device
out of a stopped state when hinting by repeating the last ID it was
provided.

Prevent this by only allowing a transition to the start state when we are
in the requested state. This way the driver is only allowed to send one
descriptor that will transition the device into the start state. All others
will leave it in the stop state once it has finished.

Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e670f1e59534..ce70adcc6925 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -526,7 +526,8 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED &&
+id == dev->free_page_report_cmd_id) {
 dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
 } else {
 /*




[PATCH v3 QEMU 2/3] virtio-balloon: Add locking to prevent possible race when starting hinting

2020-07-20 Thread Alexander Duyck
From: Alexander Duyck 

There is already locking in place when we are stopping free page hinting
but there is not similar protections in place when we start. I can only
assume this was overlooked as in most cases the page hinting should not be
occurring when we are starting the hinting, however there is still a chance
we could be processing hints by the time we get back around to restarting
the hinting so we are better off making sure to protect the state with the
mutex lock rather than just updating the value with no protections.

Based on feedback from Peter Maydell this issue had also been spotted by
Coverity: CID 1430269

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |4 
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index ce70adcc6925..6e2d1293402f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -592,6 +592,8 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
 return;
 }
 
+qemu_mutex_lock(>free_page_lock);
+
 if (s->free_page_report_cmd_id == UINT_MAX) {
 s->free_page_report_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -600,6 +602,8 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
 }
 
 s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+qemu_mutex_unlock(>free_page_lock);
+
 virtio_notify_config(vdev);
 }
 




[PATCH v3 QEMU 0/3] virtio-balloon: Free page hinting clean-ups

2020-07-20 Thread Alexander Duyck
This series contains a couple minor cleanups related to free page hinting.

The first patch addresses what I believe is a possible issue in which the
driver could potentially force the device out of the stop state and back
into the running state if it were to replay an earlier virtqueue element
containing the same ID it had submitted earlier.

The second patch takes care of a possible race due to a mutex lock not being
used when starting the hinting from the device-side.

The final patch takes care of renaming various hinting objects that were
using "reporting" in the name to try and clarify which objects are for free
page reporting and which are for free page hinting.

Changes from v1:
Split first patch into two patches as each addresses a separate issue.
Added acked-by for first patch.

Changes from v2:
Added Acked-by for patch 2
Added comment to patch 2 about it fixing Coverity issue
Rebased on latest pull of QEMU

---

Alexander Duyck (3):
  virtio-balloon: Prevent guest from starting a report when we didn't 
request one
  virtio-balloon: Add locking to prevent possible race when starting hinting
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'


 hw/virtio/virtio-balloon.c |   79 +++-
 include/hw/virtio/virtio-balloon.h |   20 +
 2 files changed, 52 insertions(+), 47 deletions(-)

--



[PATCH v2 3/3] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-07-06 Thread Alexander Duyck
From: Alexander Duyck 

Recently a feature named Free Page Reporting was added to the virtio
balloon. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   74 ++--
 include/hw/virtio/virtio-balloon.h |   20 +-
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index b3e96a822b4d..a21e7c3db538 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -527,22 +527,22 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED &&
-id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED &&
+id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -568,11 +568,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -595,14 +595,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 
 qemu_mutex_lock(>free_page_lock);
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 qemu_mutex_unlock(>free_page_lock);
 
 virtio_notify_config(vdev);
@@ -612,18 +612,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
- * The guest hasn't done the reporting, so host sends a notification
- * to the guest to actively stop the reporting.
+ * The guest isn't done hinting, so send a notification
+ * to the guest to actively stop the hinting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -633,15 +633,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_f

[PATCH v2 2/3] virtio-balloon: Add locking to prevent possible race when starting hinting

2020-07-06 Thread Alexander Duyck
From: Alexander Duyck 

There is already locking in place when we are stopping free page hinting
but there is not similar protections in place when we start. I can only
assume this was overlooked as in most cases the page hinting should not be
occurring when we are starting the hinting, however there is still a chance
we could be processing hints by the time we get back around to restarting
the hinting so we are better off making sure to protect the state with the
mutex lock rather than just updating the value with no protections.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |4 
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 0c0fd7114799..b3e96a822b4d 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -593,6 +593,8 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
 return;
 }
 
+qemu_mutex_lock(>free_page_lock);
+
 if (s->free_page_report_cmd_id == UINT_MAX) {
 s->free_page_report_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -601,6 +603,8 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
 }
 
 s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+qemu_mutex_unlock(>free_page_lock);
+
 virtio_notify_config(vdev);
 }
 




[PATCH v2 0/3] virtio-balloon: Free page hinting clean-ups

2020-07-06 Thread Alexander Duyck
This series contains a couple minor cleanups related to free page hinting.

The first patch addresses what I believe is a possible issue in which the
driver could potentially force the device out of the stop state and back
into the running state if it were to replay an earlier virtqueue element
containing the same ID it had submitted earlier.

The second patch takes care of a possible race due to a mutex lock not being
used when starting the hinting from the device-side.

The final patch takes care of renaming various hinting objects that were
using "reporting" in the name to try and clarify which objects are for free
page reporting and which are for free page hinting.

Changes from v1:
Split first patch into two patches as each addresses a separate issue.
Added acked-by for first patch.

---

Alexander Duyck (3):
  virtio-balloon: Prevent guest from starting a report when we didn't 
request one
  virtio-balloon: Add locking to prevent possible race when starting hinting
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'


 hw/virtio/virtio-balloon.c |   77 +++-
 include/hw/virtio/virtio-balloon.h |   20 +
 2 files changed, 51 insertions(+), 46 deletions(-)

--



[PATCH v2 1/3] virtio-balloon: Prevent guest from starting a report when we didn't request one

2020-07-06 Thread Alexander Duyck
From: Alexander Duyck 

Based on code review it appears possible for the driver to force the device
out of a stopped state when hinting by repeating the last ID it was
provided.

Prevent this by only allowing a transition to the start state when we are
in the requested state. This way the driver is only allowed to send one
descriptor that will transition the device into the start state. All others
will leave it in the stop state once it has finished.

Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 10507b2a430a..0c0fd7114799 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -527,7 +527,8 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED &&
+id == dev->free_page_report_cmd_id) {
 dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
 } else {
 /*




Re: [PATCH 1/2] virtio-balloon: Prevent guest from starting a report when we didn't request one

2020-06-22 Thread Alexander Duyck
On Mon, Jun 22, 2020 at 1:10 AM David Hildenbrand  wrote:
>
> On 19.06.20 23:53, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > Based on code review it appears possible for the driver to force the device
> > out of a stopped state when hinting by repeating the last ID it was
> > provided.
>
> Indeed, thanks for noticing.
>
> >
> > Prevent this by only allowing a transition to the start state when we are
> > in the requested state. This way the driver is only allowed to send one
> > descriptor that will transition the device into the start state. All others
> > will leave it in the stop state once it has finished.
> >
> > In addition add the necessary locking to provent any potential races
>
> s/provent/prevent/

Thanks for catching that. I will fix the typo.

> > between the accesses of the cmd_id and the status.
> >
> > Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   11 +++
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 10507b2a430a..7f3af266f674 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -527,7 +527,8 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
> >  ret = false;
> >  goto out;
> >  }
> > -if (id == dev->free_page_report_cmd_id) {
> > +if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED &&
> > +id == dev->free_page_report_cmd_id) {
> >  dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> >  } else {
> >  /*
>
> But doesn't that mean that, after the first hint, all further ones will
> be discarded and we'll enter the STOP state in the else case? Or am I
> missing something?
>
> Shouldn't this be something like
>
> if (id == dev->free_page_report_cmd_id) {
> if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
> dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> }
> /* Stay in FREE_PAGE_REPORT_S_START as long as the cmd_id match .*/
> } else { ...

There should only be one element containing an outbuf at the start of
the report. Once that is processed we should not see the driver
sending additional outbufs unless it is sending the STOP command ID.

> > @@ -592,14 +593,16 @@ static void 
> > virtio_balloon_free_page_start(VirtIOBalloon *s)
> >  return;
> >  }
> >
> > -if (s->free_page_report_cmd_id == UINT_MAX) {
> > +qemu_mutex_lock(>free_page_lock);
> > +
> > +if (s->free_page_report_cmd_id++ == UINT_MAX) {
> >  s->free_page_report_cmd_id =
> > VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> > -} else {
> > -s->free_page_report_cmd_id++;
> >  }
>
> Somewhat unrelated cleanup.

Agreed. I can drop it if preferred. I just took care of it because I
was adding the lock above and below to prevent us from getting into
any wierd states where the command ID might be updated but the report
status was not.

> >
> >  s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> > +qemu_mutex_unlock(>free_page_lock);
> > +
> >  virtio_notify_config(vdev);
> >  }
> >
> >
>
>
> --
> Thanks,
>
> David / dhildenb
>



[PATCH 2/2] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-06-19 Thread Alexander Duyck
From: Alexander Duyck 

Recently a feature named Free Page Reporting was added to the virtio
balloon. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   72 ++--
 include/hw/virtio/virtio-balloon.h |   20 +-
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 7f3af266f674..eea53078a73f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -527,22 +527,22 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED &&
-id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED &&
+id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -568,11 +568,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -595,12 +595,12 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 
 qemu_mutex_lock(>free_page_lock);
 
-if (s->free_page_report_cmd_id++ == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id++ == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 qemu_mutex_unlock(>free_page_lock);
 
 virtio_notify_config(vdev);
@@ -610,18 +610,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
- * The guest hasn't done the reporting, so host sends a notification
- * to the guest to actively stop the reporting.
+ * The guest isn't done hinting, so send a notification
+ * to the guest to actively stop the hinting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -631,15 +631,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_

[PATCH 1/2] virtio-balloon: Prevent guest from starting a report when we didn't request one

2020-06-19 Thread Alexander Duyck
From: Alexander Duyck 

Based on code review it appears possible for the driver to force the device
out of a stopped state when hinting by repeating the last ID it was
provided.

Prevent this by only allowing a transition to the start state when we are
in the requested state. This way the driver is only allowed to send one
descriptor that will transition the device into the start state. All others
will leave it in the stop state once it has finished.

In addition add the necessary locking to provent any potential races
between the accesses of the cmd_id and the status.

Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 10507b2a430a..7f3af266f674 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -527,7 +527,8 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED &&
+id == dev->free_page_report_cmd_id) {
 dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
 } else {
 /*
@@ -592,14 +593,16 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
+qemu_mutex_lock(>free_page_lock);
+
+if (s->free_page_report_cmd_id++ == UINT_MAX) {
 s->free_page_report_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
-} else {
-s->free_page_report_cmd_id++;
 }
 
 s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+qemu_mutex_unlock(>free_page_lock);
+
 virtio_notify_config(vdev);
 }
 




[PATCH 0/2] virtio-balloon: Free page hinting clean-ups

2020-06-19 Thread Alexander Duyck
This series contains a couple minor cleanups related to free page hinting.

The first patch addresses what I believe is a possible issue in which the
driver could potentially force the device out of the stop state and back
into the running state if it were to replay an earlier virtqueue element
containing the same ID it had submitted earlier.

The second patch takes care of renaming various hinting objects that were
using "reporting" in the name to try and clarify which objects are for free
page reporting and which are for free page hinting.

---

Alexander Duyck (2):
  virtio-balloon: Prevent guest from starting a report when we didn't 
request one
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'


 hw/virtio/virtio-balloon.c |   77 +++-
 include/hw/virtio/virtio-balloon.h |   20 +
 2 files changed, 50 insertions(+), 47 deletions(-)

--



Re: [PATCH v25 QEMU 3/3] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-06-18 Thread Alexander Duyck
On Thu, Jun 18, 2020 at 10:10 AM David Hildenbrand  wrote:
>
> >>
> >> Ugh, ...
> >>
> >> @MST, you might have missed that in another discussion, what's your
> >> general opinion about removing free page hinting in QEMU (and Linux)? We
> >> keep finding issues in the QEMU implementation, including non-trivial
> >> ones, and have to speculate about the actual semantics. I can see that
> >> e.g., libvirt does not support it yet.
> >
> > Not maintaining two similar features sounds attractive.

Agreed. Just to make sure we are all on the same page I am adding Wei
Wang since he was the original author for page hinting.

> I consider free page hinting (in QEMU) to be in an unmaintainable state
> (and it looks like Alex and I are fixing a feature we don't actually
> intend to use / not aware of users). In contrast to that, the free page
> reporting functionality/implementation is a walk in the park.
>
> >
> > I'm still trying to get my head around the list of issues.  So far they
> > all look kind of minor to me.  Would you like to summarize them
> > somewhere?
>
> Some things I still have in my mind
>
>
> 1. If migration fails during RAM precopy, the guest will never receive a
> DONE notification. Probably easy to fix.

Agreed. It is just a matter of finding the right point to add a hook
so that if we abort the migration we can report DONE.

> 2. Unclear semantics. Alex tried to document what the actual semantics
> of hinted pages are. Assume the following in the guest to a previously
> hinted page
>
> /* page was hinted and is reused now */
> if (page[x] != Y)
> page[x] == Y;
> /* migration ends, we now run on the destination */
> BUG_ON(page[x] != Y);
> /* BUG, because the content chan
>
> A guest can observe that. And that could be a random driver that just
> allocated a page.
>
> (I *assume* in Linux we might catch that using kasan, but I am not 100%
> sure, also, the actual semantics to document are unclear - e.g., for
> other guests)
>
> As Alex mentioned, it is not even guaranteed in QEMU that we receive a
> zero page on the destination, it could also be something else (e.g.,
> previously migrated values).

So this is only an issue for pages that are pushed out of the balloon
as a part of the shrinker process though. So fixing it would be pretty
straightforward as we would just have to initialize or at least dirty
pages that are leaked as a part of the shrinker. That may have an
impact on performance though as it would result in us dirtying pages
that are freed as a result of the shrinker being triggered.

> 3. If I am not wrong, the iothread works in
> virtio_ballloon_get_free_page_hints() on the virtqueue only with holding
> the free_page_lock (no BQL).
>
> Assume we're migrating, the iothread is active, and the guest triggers a
> device reset.
>
> virtio_balloon_device_reset() will trigger a
> virtio_balloon_free_page_stop(s). That won't actually wait for the
> iothread to stop, it will only temporarily lock free_page_lock and
> update s->free_page_report_status.
>
> I think there can be a race between the device reset and the iothread.
> Once virtio_balloon_free_page_stop() returned,
> virtio_ballloon_get_free_page_hints() can still call
> - virtio_queue_set_notification(vq, 0);
> - virtio_queue_set_notification(vq, 1);
> - virtio_notify(vdev, vq);
> - virtqueue_pop()
>
> I doubt this is very nice.

And our conversation had me start looking though reference to
virtio_balloon_free_page_stop. It looks like we call it for when we
unrealize the device or reset the device. It might make more sense for
us to look at pushing the status to DONE and forcing the iothread to
be flushed out.

> There are other concerns I had regarding the iothread (e.g., while
> reporting is active, virtio_ballloon_get_free_page_hints() is
> essentially a busy loop, in contrast to documented -
> continue_to_get_hints will always be true).
>
> > The appeal of hinting is that it's 0 overhead outside migration,
> > and pains were taken to avoid keeping pages locked while
> > hypervisor is busy.
> >
> > If we are to drop hinting completely we need to show that reporting
> > can be comparable, and we'll probably want to add a mode for
> > reporting that behaves somewhat similarly.
>
> Depends on the actual users. If we're dropping a feature that nobody is
> actively using, I don't think we have to show anything.
>
> This feature obviously saw no proper review.

I'm pretty sure it had some, as it went through several iterations as
I recall. However I don't think the review of the virtio interface was
very detailed as I think most of the attention was on the kernel
interface.

As far as trying to do this with page reporting it would be doable,
but I would need to use something like the command interface so that I
would have a way to tell the driver when to drop the reported bit from
the pages and when to stop/resume hinting. However it still wouldn't
resolve the issue of copy on write style pages where the page is only
read and 

Re: [PATCH v25 QEMU 3/3] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-06-18 Thread Alexander Duyck
On Thu, Jun 18, 2020 at 5:54 AM David Hildenbrand  wrote:
>
> On 13.06.20 22:07, Alexander Duyck wrote:
> > On Tue, May 26, 2020 at 9:14 PM Alexander Duyck
> >  wrote:
> >>
> >> From: Alexander Duyck 
> >>
> >> In an upcoming patch a feature named Free Page Reporting is about to be
> >> added. In order to avoid any confusion we should drop the use of the word
> >> 'report' when referring to Free Page Hinting. So what this patch does is go
> >> through and replace all instances of 'report' with 'hint" when we are
> >> referring to free page hinting.
> >>
> >> Acked-by: David Hildenbrand 
> >> Signed-off-by: Alexander Duyck 
> >> ---
> >>  hw/virtio/virtio-balloon.c |   78 
> >> ++--
> >>  include/hw/virtio/virtio-balloon.h |   20 +
> >>  2 files changed, 49 insertions(+), 49 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> >> index 3e2ac1104b5f..dc15409b0bb6 100644
> >> --- a/hw/virtio/virtio-balloon.c
> >> +++ b/hw/virtio/virtio-balloon.c
> >
> > ...
> >
> >> @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void 
> >> *opaque, int version_id)
> >>  return 0;
> >>  }
> >>
> >> -static const VMStateDescription vmstate_virtio_balloon_free_page_report = 
> >> {
> >> +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
> >>  .name = "virtio-balloon-device/free-page-report",
> >>  .version_id = 1,
> >>  .minimum_version_id = 1,
> >>  .needed = virtio_balloon_free_page_support,
> >>  .fields = (VMStateField[]) {
> >> -VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> >> -VMSTATE_UINT32(free_page_report_status, VirtIOBalloon),
> >> +VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon),
> >> +VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon),
> >>  VMSTATE_END_OF_LIST()
> >>  }
> >>  };
> >
> > So I noticed this patch wasn't in the list of patches pulled, but that
> > is probably for the best since I believe the change above might have
> > broken migration as VMSTATE_UINT32 does a stringify on the first
> > parameter.
>
> Indeed, it's the name of the vmstate field. But I don't think it is
> relevant for migration. It's and indicator if a field is valid and it's
> used in traces/error messages.
>
> See git grep "field->name"
>
> I don't think renaming this is problematic. Can you rebase and resent?
> Thanks!

Okay, I will.

> > Any advice on how to address it, or should I just give up on renaming
> > free_page_report_cmd_id and free_page_report_status?
> >
> > Looking at this I wonder why we even need to migrate these values? It
> > seems like if we are completing a migration the cmd_id should always
> > be "DONE" shouldn't it? It isn't as if we are going to migrate the
>
> The *status* should be DONE IIUC. The cmd_id might be relevant, no? It's
> always incremented until it wraps.

The thing is, the cmd_id visible to the driver if the status is DONE
is the cmd_id value for DONE. So as long as the driver acknowledges
the value we could essentially start over the cmd_id without any
negative effect. The driver would have to put down a new descriptor to
start a block of hinting in order to begin reporting again so there
shouldn't be any risk of us falsely hinting pages that were in a
previous epoch.

Ugh, although now looking at it I think we might have a bug in the
QEMU code in that the driver could in theory force its way past a
"STOP" by just replaying the last command_id descriptor and then keep
going. Should be a pretty easy fix though as we should only allow a
transition to S_START if the status is S_REQUESTED/

> > hinting from one host to another. We will have to start over which is
> > essentially the signal that the "DONE" value provides. Same thing for
> > the status. We shouldn't be able to migrate unless both of these are
> > already in the "DONE" state so if anything I wonder if we shouldn't
> > have that as the initial state for the device and just drop the
> > migration info.
>
> We'll have to glue that to a compat machine unfortunately, so we can
> just keep migrating it ... :(

Yeah, I kind of figured that would be the case. However if the name
change is not an issue then it should not be a problem.

Thanks.

- Alex



Re: [PATCH v25 QEMU 3/3] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-06-13 Thread Alexander Duyck
On Tue, May 26, 2020 at 9:14 PM Alexander Duyck
 wrote:
>
> From: Alexander Duyck 
>
> In an upcoming patch a feature named Free Page Reporting is about to be
> added. In order to avoid any confusion we should drop the use of the word
> 'report' when referring to Free Page Hinting. So what this patch does is go
> through and replace all instances of 'report' with 'hint" when we are
> referring to free page hinting.
>
> Acked-by: David Hildenbrand 
> Signed-off-by: Alexander Duyck 
> ---
>  hw/virtio/virtio-balloon.c |   78 
> ++--
>  include/hw/virtio/virtio-balloon.h |   20 +
>  2 files changed, 49 insertions(+), 49 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 3e2ac1104b5f..dc15409b0bb6 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c

...

> @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void 
> *opaque, int version_id)
>  return 0;
>  }
>
> -static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
>  .name = "virtio-balloon-device/free-page-report",
>  .version_id = 1,
>  .minimum_version_id = 1,
>  .needed = virtio_balloon_free_page_support,
>  .fields = (VMStateField[]) {
> -VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> -VMSTATE_UINT32(free_page_report_status, VirtIOBalloon),
> +VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon),
> +VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon),
>  VMSTATE_END_OF_LIST()
>  }
>  };

So I noticed this patch wasn't in the list of patches pulled, but that
is probably for the best since I believe the change above might have
broken migration as VMSTATE_UINT32 does a stringify on the first
parameter.
Any advice on how to address it, or should I just give up on renaming
free_page_report_cmd_id and free_page_report_status?

Looking at this I wonder why we even need to migrate these values? It
seems like if we are completing a migration the cmd_id should always
be "DONE" shouldn't it? It isn't as if we are going to migrate the
hinting from one host to another. We will have to start over which is
essentially the signal that the "DONE" value provides. Same thing for
the status. We shouldn't be able to migrate unless both of these are
already in the "DONE" state so if anything I wonder if we shouldn't
have that as the initial state for the device and just drop the
migration info.

Thanks.

- Alex



Re: [PATCH v25 QEMU 0/3] virtio-balloon: add support for page poison and free page reporting

2020-06-08 Thread Alexander Duyck
It's been almost 2 weeks since I submitted this. Just thought I would
follow up and see if there is any ETA on when this might be applied,
or if I missed the need to fix something and resubmit.

Thanks.

- Alex


On Tue, May 26, 2020 at 9:13 PM Alexander Duyck
 wrote:
>
> This series provides an asynchronous means of reporting free guest pages
> to QEMU through virtio-balloon so that the memory associated with those
> pages can be dropped and reused by other processes and/or guests on the
> host. Using this it is possible to avoid unnecessary I/O to disk and
> greatly improve performance in the case of memory overcommit on the host.
>
> I originally submitted this patch series back on February 11th 2020[1],
> but at that time I was focused primarily on the kernel portion of this
> patch set. However as of April 7th those patches are now included in
> Linus's kernel tree[2] and so I am submitting the QEMU pieces for
> inclusion.
>
> [1]: 
> https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
> [2]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc
>
> Changes from v17:
> Fixed typo in patch 1 title
> Addressed white-space issues reported via checkpatch
> Added braces {} for two if statements to match expected coding style
>
> Changes from v18:
> Updated patches 2 and 3 based on input from dhildenb
> Added comment to patch 2 describing what keeps us from reporting a bad page
> Added patch to address issue with ROM devices being directly writable
>
> Changes from v19:
> Added std-headers change to match changes pushed for linux kernel headers
> Added patch to remove "report" from page hinting code paths
> Updated comment to better explain why we disable hints w/ page poisoning
> Removed code that was modifying config size for poison vs hinting
> Dropped x-page-poison property
> Added code to bounds check the reported region vs the RAM block
> Dropped patch for ROM devices as that was already pulled in by Paolo
>
> Changes from v20:
> Rearranged patches to push Linux header sync patches to front
> Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
> Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
> Fixed possible resource leak if poison or qemu_balloon_is_inhibited return 
> true
>
> Changes from v21:
> Added ack for patch 3
> Rewrote patch description for page poison reporting feature
> Made page-poison independent property and set to enabled by default
> Added logic to migrate poison_val
> Added several comments in code to better explain features
> Switched free-page-reporting property to disabled by default
>
> Changes from v22:
> Added ack for patches 4 & 5
> Added additional comment fixes in patch 3 to remove "reporting" reference
> Renamed rvq in patch 5 to reporting_vq to better match linux kernel
> Moved call adding reporting_vq to after free_page_vq
>
> Changes from v23:
> Rebased on latest QEMU
> Dropped patches 1 & 2 as Linux kernel headers were synced
> Added compat machine code for page-poison feature
>
> Changes from v24:
> Moved free page hinting rename to end of set as feature may be removed 
> entirely
> Added code to cleanup reporting_vq
>
> ---
>
> Alexander Duyck (3):
>   virtio-balloon: Implement support for page poison reporting feature
>   virtio-balloon: Provide an interface for free page reporting
>   virtio-balloon: Replace free page hinting references to 'report' with 
> 'hint'
>
>
>  hw/core/machine.c  |4 +
>  hw/virtio/virtio-balloon.c |  179 
> 
>  include/hw/virtio/virtio-balloon.h |   23 ++---
>  3 files changed, 155 insertions(+), 51 deletions(-)
>
> --



[PATCH v25 QEMU 1/3] virtio-balloon: Implement support for page poison reporting feature

2020-05-26 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison reporting if
we want to actually get data on if the guest will be poisoning pages.

Add a value for reporting the poison value being used if page poisoning is
enabled in the guest. With this we can determine if we will need to skip
free page reporting when it is enabled in the future.

The value currently has no impact on existing balloon interfaces. In the
case of existing balloon interfaces the onus is on the guest driver to
reapply whatever poison is in place.

When we add free page reporting the poison value is used to determine if
we can perform in-place page reporting. The expectation is that a reported
page will already contain the value specified by the poison, and the
reporting of the page should not change that value.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/core/machine.c  |4 +++-
 hw/virtio/virtio-balloon.c |   29 +
 include/hw/virtio/virtio-balloon.h |1 +
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index bb3a7b18b193..9eca7d8c9bfe 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,7 +28,9 @@
 #include "hw/mem/nvdimm.h"
 #include "migration/vmstate.h"
 
-GlobalProperty hw_compat_5_0[] = {};
+GlobalProperty hw_compat_5_0[] = {
+{ "virtio-balloon-device", "page-poison", "false" },
+};
 const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0);
 
 GlobalProperty hw_compat_4_2[] = {
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 065cd450f10f..26f6a7ca2e35 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
 config.free_page_report_cmd_id =
@@ -683,6 +684,14 @@ static ram_addr_t get_current_ram_size(void)
 return size;
 }
 
+static bool virtio_balloon_page_poison_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+}
+
 static void virtio_balloon_set_config(VirtIODevice *vdev,
   const uint8_t *config_data)
 {
@@ -697,6 +706,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_balloon_page_poison_support(dev)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -755,6 +768,17 @@ static const VMStateDescription 
vmstate_virtio_balloon_free_page_report = {
 }
 };
 
+static const VMStateDescription vmstate_virtio_balloon_page_poison = {
+.name = "vitio-balloon-device/page-poison",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = virtio_balloon_page_poison_support,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(poison_val, VirtIOBalloon),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
 .name = "virtio-balloon-device",
 .version_id = 1,
@@ -767,6 +791,7 @@ static const VMStateDescription 
vmstate_virtio_balloon_device = {
 },
 .subsections = (const VMStateDescription * []) {
 _virtio_balloon_free_page_report,
+_virtio_balloon_page_poison,
 NULL
 }
 };
@@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +943,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, true),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index d1c968d2376e..7fe78e5c14d7 100644
--- a/include/hw/virtio/virtio-balloon.h
+

[PATCH v25 QEMU 3/3] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-05-26 Thread Alexander Duyck
From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   78 ++--
 include/hw/virtio/virtio-balloon.h |   20 +
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 3e2ac1104b5f..dc15409b0bb6 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -527,21 +527,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -567,11 +567,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -592,14 +592,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -607,18 +607,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
- * The guest hasn't done the reporting, so host sends a notification
- * to the guest to actively stop the reporting.
+ * The guest isn't done hinting, so send a notification
+ * to the guest to actively stop the hinting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -628,15 +628,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_r

[PATCH v25 QEMU 2/3] virtio-balloon: Provide an interface for free page reporting

2020-05-26 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   72 
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 26f6a7ca2e35..3e2ac1104b5f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,67 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+/*
+ * When we discard the page it has the effect of removing the page
+ * from the hypervisor itself and causing it to be zeroed when it
+ * is returned to us. So we must not discard the page if it is
+ * accessible by another device or process, or if the guest is
+ * expecting it to retain a non-zero value.
+ */
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+goto skip_element;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+(ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+skip_element:
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -841,6 +902,12 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 virtio_error(vdev, "iothread is missing");
 }
 }
+
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->reporting_vq = virtio_add_queue(vdev, 32,
+   virtio_balloon_handle_report);
+}
+
 reset_stats(s);
 }
 
@@ -863,6 +930,9 @@ static void virtio_balloon_device_unrealize(DeviceState 
*dev)
 if (s->free_page_vq) {
 virtio_delete_queue(s->free_page_vq);
 }
+if (s->reporting_vq) {
+virtio_delete_queue(s->reporting_vq);
+}
 virtio_cleanup(vdev);
 }
 
@@ -945,6 +1015,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
 DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_PAGE_POISON, true),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/inclu

[PATCH v25 QEMU 0/3] virtio-balloon: add support for page poison and free page reporting

2020-05-26 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

Changes from v20:
Rearranged patches to push Linux header sync patches to front
Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
Fixed possible resource leak if poison or qemu_balloon_is_inhibited return true

Changes from v21:
Added ack for patch 3
Rewrote patch description for page poison reporting feature
Made page-poison independent property and set to enabled by default
Added logic to migrate poison_val
Added several comments in code to better explain features
Switched free-page-reporting property to disabled by default

Changes from v22:
Added ack for patches 4 & 5
Added additional comment fixes in patch 3 to remove "reporting" reference
Renamed rvq in patch 5 to reporting_vq to better match linux kernel
Moved call adding reporting_vq to after free_page_vq

Changes from v23:
Rebased on latest QEMU
Dropped patches 1 & 2 as Linux kernel headers were synced
Added compat machine code for page-poison feature

Changes from v24:
Moved free page hinting rename to end of set as feature may be removed entirely
Added code to cleanup reporting_vq

---

Alexander Duyck (3):
  virtio-balloon: Implement support for page poison reporting feature
  virtio-balloon: Provide an interface for free page reporting
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'


 hw/core/machine.c  |4 +
 hw/virtio/virtio-balloon.c |  179 
 include/hw/virtio/virtio-balloon.h |   23 ++---
 3 files changed, 155 insertions(+), 51 deletions(-)

--



Re: [PATCH v24 QEMU 3/3] virtio-balloon: Provide an interface for free page reporting

2020-05-18 Thread Alexander Duyck
On Fri, May 8, 2020 at 2:30 PM Alexander Duyck
 wrote:
>
> From: Alexander Duyck 
>
> Add support for free page reporting. The idea is to function very similar
> to how the balloon works in that we basically end up madvising the page as
> not being used. However we don't really need to bother with any deflate
> type logic since the page will be faulted back into the guest when it is
> read or written to.
>
> This provides a new way of letting the guest proactively report free
> pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> inflate/deflate that is triggered via the hypervisor explicitly.
>
> Acked-by: David Hildenbrand 
> Signed-off-by: Alexander Duyck 


I just realized that the patch below added the code to add the
reporting_vq but I never cleaned it up. I will submit a v25 in the
next couple of days that contains a fix for that.

> ---
>  hw/virtio/virtio-balloon.c |   69 
> 
>  include/hw/virtio/virtio-balloon.h |2 +
>  2 files changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 1666132a24c1..53abba290274 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -321,6 +321,67 @@ static void balloon_stats_set_poll_interval(Object *obj, 
> Visitor *v,
>  balloon_stats_change_timer(s, 0);
>  }
>
> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> +VirtQueueElement *elem;
> +
> +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
> +unsigned int i;
> +
> +/*
> + * When we discard the page it has the effect of removing the page
> + * from the hypervisor itself and causing it to be zeroed when it
> + * is returned to us. So we must not discard the page if it is
> + * accessible by another device or process, or if the guest is
> + * expecting it to retain a non-zero value.
> + */
> +if (qemu_balloon_is_inhibited() || dev->poison_val) {
> +goto skip_element;
> +}
> +
> +for (i = 0; i < elem->in_num; i++) {
> +void *addr = elem->in_sg[i].iov_base;
> +size_t size = elem->in_sg[i].iov_len;
> +ram_addr_t ram_offset;
> +RAMBlock *rb;
> +
> +/*
> + * There is no need to check the memory section to see if
> + * it is ram/readonly/romd like there is for handle_output
> + * below. If the region is not meant to be written to then
> + * address_space_map will have allocated a bounce buffer
> + * and it will be freed in address_space_unmap and trigger
> + * and unassigned_mem_write before failing to copy over the
> + * buffer. If more than one bad descriptor is provided it
> + * will return NULL after the first bounce buffer and fail
> + * to map any resources.
> + */
> +rb = qemu_ram_block_from_host(addr, false, _offset);
> +if (!rb) {
> +trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> +continue;
> +}
> +
> +/*
> + * For now we will simply ignore unaligned memory regions, or
> + * regions that overrun the end of the RAMBlock.
> + */
> +if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
> +(ram_offset + size) > qemu_ram_get_used_length(rb)) {
> +continue;
> +}
> +
> +ram_block_discard_range(rb, ram_offset, size);
> +}
> +
> +skip_element:
> +virtqueue_push(vq, elem, 0);
> +virtio_notify(vdev, vq);
> +g_free(elem);
> +}
> +}
> +
>  static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>  {
>  VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> @@ -841,6 +902,12 @@ static void virtio_balloon_device_realize(DeviceState 
> *dev, Error **errp)
>  virtio_error(vdev, "iothread is missing");
>  }
>  }
> +
> +if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
> +s->reporting_vq = virtio_add_queue(vdev, 32,
> +   virtio_balloon_handle_report);
> +}
> +
>  reset_stats(s);
>  }
>
> @@ -945,6 +1012,8 @@ static Property virtio_balloon_properties[] = {
>  VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>  DEFINE_PROP_BIT("page-poison&quo

Re: [PATCH v1 3/3] virtio-balloon: unref the iothread when unrealizing

2020-05-18 Thread Alexander Duyck
On Mon, May 18, 2020 at 8:42 AM David Hildenbrand  wrote:
>
> On 18.05.20 17:35, Alexander Duyck wrote:
> > On Mon, May 18, 2020 at 1:37 AM David Hildenbrand  wrote:
> >>
> >> We took a reference when realizing, so let's drop that reference when
> >> unrealizing.
> >>
> >> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> >> Cc: Wei Wang 
> >> Cc: Alexander Duyck 
> >> Cc: Michael S. Tsirkin 
> >> Cc: Philippe Mathieu-Daudé 
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  hw/virtio/virtio-balloon.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> >> index a4fcf2d777..3f8fc50be0 100644
> >> --- a/hw/virtio/virtio-balloon.c
> >> +++ b/hw/virtio/virtio-balloon.c
> >> @@ -820,6 +820,7 @@ static void 
> >> virtio_balloon_device_unrealize(DeviceState *dev)
> >>
> >>  if (s->free_page_bh) {
> >>  qemu_bh_delete(s->free_page_bh);
> >> +object_unref(OBJECT(s->iothread));
> >>  virtio_balloon_free_page_stop(s);
> >>  precopy_remove_notifier(>free_page_report_notify);
> >>  }
> >
> > I'm not entirely sure about this order of operations. It seems like it
> > would make more sense to remove the notifier, stop the hinting, delete
> > the bh, and then release the IO thread.
>
> This is the reverse order of the steps in
> virtio_balloon_device_realize(). And I guess it should be fine. The
> notifier cannot really be active/trigger while we are removing devices
> (cannot happen with concurrent migration). After qemu_bh_delete(), the
> iothread is effectively unused.
>
> I am unsure about many things regarding free page hinting (e.g., if the
> virtio_balloon_free_page_stop() is of any use while we are ripping out
> the device and it will be gone in a second).

Agreed. This is probably fine as is.

Reviewed-by: Alexander Duyck 



Re: [PATCH v1 3/3] virtio-balloon: unref the iothread when unrealizing

2020-05-18 Thread Alexander Duyck
On Mon, May 18, 2020 at 1:37 AM David Hildenbrand  wrote:
>
> We took a reference when realizing, so let's drop that reference when
> unrealizing.
>
> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> Cc: Wei Wang 
> Cc: Alexander Duyck 
> Cc: Michael S. Tsirkin 
> Cc: Philippe Mathieu-Daudé 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-balloon.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a4fcf2d777..3f8fc50be0 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -820,6 +820,7 @@ static void virtio_balloon_device_unrealize(DeviceState 
> *dev)
>
>  if (s->free_page_bh) {
>  qemu_bh_delete(s->free_page_bh);
> +object_unref(OBJECT(s->iothread));
>  virtio_balloon_free_page_stop(s);
>  precopy_remove_notifier(>free_page_report_notify);
>  }

I'm not entirely sure about this order of operations. It seems like it
would make more sense to remove the notifier, stop the hinting, delete
the bh, and then release the IO thread.



Re: [PATCH v1 2/3] virtio-balloon: fix free page hinting check on unrealize

2020-05-18 Thread Alexander Duyck
On Mon, May 18, 2020 at 1:37 AM David Hildenbrand  wrote:
>
> Checking against guest features is wrong. We allocated data structures
> based on host features. We can rely on "free_page_bh" as an indicator
> whether to un-do stuff instead.
>
> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> Cc: Wei Wang 
> Cc: Michael S. Tsirkin 
> Cc: Philippe Mathieu-Daudé 
> Cc: Alexander Duyck 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-balloon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index dc3b1067ab..a4fcf2d777 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -818,7 +818,7 @@ static void virtio_balloon_device_unrealize(DeviceState 
> *dev)
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>
> -if (virtio_balloon_free_page_support(s)) {
> +if (s->free_page_bh) {
>  qemu_bh_delete(s->free_page_bh);
>  virtio_balloon_free_page_stop(s);
>  precopy_remove_notifier(>free_page_report_notify);

Would it make sense to apply the same change to
virtio_balloon_device_reset and virtio_balloon_set_status? At least in
the case of virtio_balloon_set_status it seems like you could possibly
exploit it somehow as clearing the feature in the guest will prevent
the toggling of the block_iothread value.

Reviewed-by: Alexander Duyck 



Re: [PATCH v1 1/3] virtio-balloon: fix free page hinting without an iothread

2020-05-18 Thread Alexander Duyck
On Mon, May 18, 2020 at 1:37 AM David Hildenbrand  wrote:
>
> In case we don't have an iothread, we mark the feature as abscent but
> still add the queue. 'free_page_bh' remains set to NULL.
>
> qemu-system-i386 \
> -M microvm \
> -nographic \
> -device virtio-balloon-device,free-page-hint=true \
> -nographic \
> -display none \
> -monitor none \
> -serial none \
> -qtest stdio
>
> Doing a "write 0xce30 0x24
> 0x030003000300030003000300030003000300"
>
> We will trigger a SEGFAULT. Let's move the check and bail out.
>
> While at it, move the static initializations to instance_initialize().
>
> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> Reported-by: Alexander Bulekov 
> Cc: Wei Wang 
> Cc: Michael S. Tsirkin 
> Cc: Philippe Mathieu-Daudé 
> Cc: Alexander Duyck 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-balloon.c | 35 ++-
>  1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 065cd450f1..dc3b1067ab 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -789,6 +789,13 @@ static void virtio_balloon_device_realize(DeviceState 
> *dev, Error **errp)
>  return;
>  }
>
> +if (virtio_has_feature(s->host_features,
> +VIRTIO_BALLOON_F_FREE_PAGE_HINT) && !s->iothread) {
> +error_setg(errp, "'free-page-hint' requires 'iothread' to be set");
> +virtio_cleanup(vdev);
> +return;
> +}
> +
>  s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>  s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>  s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> @@ -797,24 +804,11 @@ static void virtio_balloon_device_realize(DeviceState 
> *dev, Error **errp)
> VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>  s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
> 
> virtio_balloon_handle_free_page_vq);
> -s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> -s->free_page_report_cmd_id =
> -   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> -s->free_page_report_notify.notify =
> -   
> virtio_balloon_free_page_report_notify;
>  precopy_add_notifier(>free_page_report_notify);
> -if (s->iothread) {
> -object_ref(OBJECT(s->iothread));
> -s->free_page_bh = 
> aio_bh_new(iothread_get_aio_context(s->iothread),
> -   virtio_ballloon_get_free_page_hints, 
> s);
> -qemu_mutex_init(>free_page_lock);
> -qemu_cond_init(>free_page_cond);
> -s->block_iothread = false;
> -} else {
> -/* Simply disable this feature if the iothread wasn't created. */
> -s->host_features &= ~(1 << VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> -virtio_error(vdev, "iothread is missing");
> -}
> +
> +object_ref(OBJECT(s->iothread));
> +s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
> + virtio_ballloon_get_free_page_hints, s);
>  }
>  reset_stats(s);
>  }
> @@ -892,6 +886,13 @@ static void virtio_balloon_instance_init(Object *obj)
>  {
>  VirtIOBalloon *s = VIRTIO_BALLOON(obj);
>
> +qemu_mutex_init(>free_page_lock);
> +qemu_cond_init(>free_page_cond);
> +s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +s->free_page_report_cmd_id = VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> +s->free_page_report_notify.notify = 
> virtio_balloon_free_page_report_notify;
> +s->block_iothread = false;
> +
>  object_property_add(obj, "guest-stats", "guest statistics",
>  balloon_stats_get_all, NULL, NULL, s);
>

So the one nit I have is that I am not sure you need to bother with
initializing block_iothread since it should already be initialized to
false/0 shouldn't it? Otherwise this all looks good to me.

Reviewed-by: Alexander Duyck 



[PATCH v24 QEMU 1/3] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-05-08 Thread Alexander Duyck
From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   78 ++--
 include/hw/virtio/virtio-balloon.h |   20 +
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..a209706b4d8d 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
- * The guest hasn't done the reporting, so host sends a notification
- * to the guest to actively stop the reporting.
+ * The guest isn't done hinting, so send a notification
+ * to the guest to actively stop the hinting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_r

[PATCH v24 QEMU 3/3] virtio-balloon: Provide an interface for free page reporting

2020-05-08 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   69 
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1666132a24c1..53abba290274 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,67 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+/*
+ * When we discard the page it has the effect of removing the page
+ * from the hypervisor itself and causing it to be zeroed when it
+ * is returned to us. So we must not discard the page if it is
+ * accessible by another device or process, or if the guest is
+ * expecting it to retain a non-zero value.
+ */
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+goto skip_element;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+(ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+skip_element:
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -841,6 +902,12 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 virtio_error(vdev, "iothread is missing");
 }
 }
+
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->reporting_vq = virtio_add_queue(vdev, 32,
+   virtio_balloon_handle_report);
+}
+
 reset_stats(s);
 }
 
@@ -945,6 +1012,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
 DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_PAGE_POISON, true),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 3ca2a78e1aca..28fd2b396087 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_hint_status {
 
 typedef struct VirtIOBalloon {
 VirtIODevice parent_obj;
-VirtQ

[PATCH v24 QEMU 2/3] virtio-balloon: Implement support for page poison reporting feature

2020-05-08 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison reporting if
we want to actually get data on if the guest will be poisoning pages.

Add a value for reporting the poison value being used if page poisoning is
enabled in the guest. With this we can determine if we will need to skip
free page reporting when it is enabled in the future.

The value currently has no impact on existing balloon interfaces. In the
case of existing balloon interfaces the onus is on the guest driver to
reapply whatever poison is in place.

When we add free page reporting the poison value is used to determine if
we can perform in-place page reporting. The expectation is that a reported
page will already contain the value specified by the poison, and the
reporting of the page should not change that value.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/core/machine.c  |4 +++-
 hw/virtio/virtio-balloon.c |   29 +
 include/hw/virtio/virtio-balloon.h |1 +
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 7a50dd518f4c..da26d58634dc 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,7 +28,9 @@
 #include "hw/mem/nvdimm.h"
 #include "migration/vmstate.h"
 
-GlobalProperty hw_compat_5_0[] = {};
+GlobalProperty hw_compat_5_0[] = {
+{ "virtio-balloon-device", "page-poison", "false" },
+};
 const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0);
 
 GlobalProperty hw_compat_4_2[] = {
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a209706b4d8d..1666132a24c1 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
 config.free_page_hint_cmd_id =
@@ -683,6 +684,14 @@ static ram_addr_t get_current_ram_size(void)
 return size;
 }
 
+static bool virtio_balloon_page_poison_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+}
+
 static void virtio_balloon_set_config(VirtIODevice *vdev,
   const uint8_t *config_data)
 {
@@ -697,6 +706,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_balloon_page_poison_support(dev)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -755,6 +768,17 @@ static const VMStateDescription 
vmstate_virtio_balloon_free_page_hint = {
 }
 };
 
+static const VMStateDescription vmstate_virtio_balloon_page_poison = {
+.name = "vitio-balloon-device/page-poison",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = virtio_balloon_page_poison_support,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(poison_val, VirtIOBalloon),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
 .name = "virtio-balloon-device",
 .version_id = 1,
@@ -767,6 +791,7 @@ static const VMStateDescription 
vmstate_virtio_balloon_device = {
 },
 .subsections = (const VMStateDescription * []) {
 _virtio_balloon_free_page_hint,
+_virtio_balloon_page_poison,
 NULL
 }
 };
@@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +943,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, true),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 108cff97e71a..3ca2a78e1aca 100644
--- a/include/hw/virtio/virtio-balloon.h
+

[PATCH v24 QEMU 0/3] virtio-balloon: add support for page poison reporting and free page reporting

2020-05-08 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

Changes from v20:
Rearranged patches to push Linux header sync patches to front
Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
Fixed possible resource leak if poison or qemu_balloon_is_inhibited return true

Changes from v21:
Added ack for patch 3
Rewrote patch description for page poison reporting feature
Made page-poison independent property and set to enabled by default
Added logic to migrate poison_val
Added several comments in code to better explain features
Switched free-page-reporting property to disabled by default

Changes from v22:
Added ack for patches 4 & 5
Added additional comment fixes in patch 3 to remove "reporting" reference
Renamed rvq in patch 5 to reporting_vq to better match linux kernel
Moved call adding reporting_vq to after free_page_vq

Changes from v23:
Rebased on latest QEMU
Dropped patches 1 & 2 as Linux kernel headers were synced
Added compat machine bits for page-poison feature

---

Alexander Duyck (3):
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
  virtio-balloon: Implement support for page poison reporting feature
  virtio-balloon: Provide an interface for free page reporting


 hw/core/machine.c  |4 +
 hw/virtio/virtio-balloon.c |  176 
 include/hw/virtio/virtio-balloon.h |   23 ++---
 3 files changed, 152 insertions(+), 51 deletions(-)

--



Re: [PATCH v23 QEMU 0/5] virtio-balloon: add support for page poison reporting and free page reporting

2020-05-08 Thread Alexander Duyck
I just wanted to follow up since it has been a little over a week
since I submitted this and I haven't heard anything back. It looks
like the linux-headers patches can be dropped since the headers appear
to have been synced. I was wondering if I should resubmit with just
the 3 patches that are adding the functionality, or if this patch-set
is good as-is?

Thanks.

- Alex

On Mon, Apr 27, 2020 at 5:53 PM Alexander Duyck
 wrote:
>
> This series provides an asynchronous means of reporting free guest pages
> to QEMU through virtio-balloon so that the memory associated with those
> pages can be dropped and reused by other processes and/or guests on the
> host. Using this it is possible to avoid unnecessary I/O to disk and
> greatly improve performance in the case of memory overcommit on the host.
>
> I originally submitted this patch series back on February 11th 2020[1],
> but at that time I was focused primarily on the kernel portion of this
> patch set. However as of April 7th those patches are now included in
> Linus's kernel tree[2] and so I am submitting the QEMU pieces for
> inclusion.
>
> [1]: 
> https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
> [2]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc
>
> Changes from v17:
> Fixed typo in patch 1 title
> Addressed white-space issues reported via checkpatch
> Added braces {} for two if statements to match expected coding style
>
> Changes from v18:
> Updated patches 2 and 3 based on input from dhildenb
> Added comment to patch 2 describing what keeps us from reporting a bad page
> Added patch to address issue with ROM devices being directly writable
>
> Changes from v19:
> Added std-headers change to match changes pushed for linux kernel headers
> Added patch to remove "report" from page hinting code paths
> Updated comment to better explain why we disable hints w/ page poisoning
> Removed code that was modifying config size for poison vs hinting
> Dropped x-page-poison property
> Added code to bounds check the reported region vs the RAM block
> Dropped patch for ROM devices as that was already pulled in by Paolo
>
> Changes from v20:
> Rearranged patches to push Linux header sync patches to front
> Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
> Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
> Fixed possible resource leak if poison or qemu_balloon_is_inhibited return 
> true
>
> Changes from v21:
> Added ack for patch 3
> Rewrote patch description for page poison reporting feature
> Made page-poison independent property and set to enabled by default
> Added logic to migrate poison_val
> Added several comments in code to better explain features
> Switched free-page-reporting property to disabled by default
>
> Changes from v22:
> Added ack for patches 4 & 5
> Added additional comment fixes in patch 3 to remove "reporting" reference
> Renamed rvq in patch 5 to reporting_vq to improve readability
> Moved call adding reporting_vq to after free_page_vq to fix VQ ordering
>
> ---
>
> Alexander Duyck (5):
>   linux-headers: Update to allow renaming of free_page_report_cmd_id
>   linux-headers: update to contain virito-balloon free page reporting
>   virtio-balloon: Replace free page hinting references to 'report' with 
> 'hint'
>   virtio-balloon: Implement support for page poison reporting feature
>   virtio-balloon: Provide an interface for free page reporting
>
>
>  hw/virtio/virtio-balloon.c  |  176 
> ++-
>  include/hw/virtio/virtio-balloon.h  |   23 ++-
>  include/standard-headers/linux/virtio_balloon.h |   12 +-
>  3 files changed, 159 insertions(+), 52 deletions(-)
>
> --



[PATCH v23 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-27 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   69 
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1666132a24c1..53abba290274 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,67 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+/*
+ * When we discard the page it has the effect of removing the page
+ * from the hypervisor itself and causing it to be zeroed when it
+ * is returned to us. So we must not discard the page if it is
+ * accessible by another device or process, or if the guest is
+ * expecting it to retain a non-zero value.
+ */
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+goto skip_element;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+(ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+skip_element:
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -841,6 +902,12 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 virtio_error(vdev, "iothread is missing");
 }
 }
+
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->reporting_vq = virtio_add_queue(vdev, 32,
+   virtio_balloon_handle_report);
+}
+
 reset_stats(s);
 }
 
@@ -945,6 +1012,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
 DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_PAGE_POISON, true),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 3ca2a78e1aca..28fd2b396087 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_hint_status {
 
 typedef struct VirtIOBalloon {
 VirtIODevice parent_obj;
-VirtQ

[PATCH v23 QEMU 4/5] virtio-balloon: Implement support for page poison reporting feature

2020-04-27 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison reporting if
we want to actually get data on if the guest will be poisoning pages.

Add a value for reporting the poison value being used if page poisoning is
enabled in the guest. With this we can determine if we will need to skip
free page reporting when it is enabled in the future.

The value currently has no impact on existing balloon interfaces. In the
case of existing balloon interfaces the onus is on the guest driver to
reapply whatever poison is in place.

When we add free page reporting the poison value is used to determine if
we can perform in-place page reporting. The expectation is that a reported
page will already contain the value specified by the poison, and the
reporting of the page should not change that value.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   29 +
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 30 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a209706b4d8d..1666132a24c1 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
 config.free_page_hint_cmd_id =
@@ -683,6 +684,14 @@ static ram_addr_t get_current_ram_size(void)
 return size;
 }
 
+static bool virtio_balloon_page_poison_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+}
+
 static void virtio_balloon_set_config(VirtIODevice *vdev,
   const uint8_t *config_data)
 {
@@ -697,6 +706,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_balloon_page_poison_support(dev)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -755,6 +768,17 @@ static const VMStateDescription 
vmstate_virtio_balloon_free_page_hint = {
 }
 };
 
+static const VMStateDescription vmstate_virtio_balloon_page_poison = {
+.name = "vitio-balloon-device/page-poison",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = virtio_balloon_page_poison_support,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(poison_val, VirtIOBalloon),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
 .name = "virtio-balloon-device",
 .version_id = 1,
@@ -767,6 +791,7 @@ static const VMStateDescription 
vmstate_virtio_balloon_device = {
 },
 .subsections = (const VMStateDescription * []) {
 _virtio_balloon_free_page_hint,
+_virtio_balloon_page_poison,
 NULL
 }
 };
@@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +943,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, true),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 108cff97e71a..3ca2a78e1aca 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v23 QEMU 2/5] linux-headers: update to contain virito-balloon free page reporting

2020-04-27 Thread Alexander Duyck
From: Alexander Duyck 

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index af0a6b59dab2..af3b7a1fa263 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v23 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-27 Thread Alexander Duyck
From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   78 ++--
 include/hw/virtio/virtio-balloon.h |   20 +
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..a209706b4d8d 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
- * The guest hasn't done the reporting, so host sends a notification
- * to the guest to actively stop the reporting.
+ * The guest isn't done hinting, so send a notification
+ * to the guest to actively stop the hinting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_r

[PATCH v23 QEMU 0/5] virtio-balloon: add support for page poison reporting and free page reporting

2020-04-27 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

Changes from v20:
Rearranged patches to push Linux header sync patches to front
Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
Fixed possible resource leak if poison or qemu_balloon_is_inhibited return true

Changes from v21:
Added ack for patch 3
Rewrote patch description for page poison reporting feature
Made page-poison independent property and set to enabled by default
Added logic to migrate poison_val
Added several comments in code to better explain features
Switched free-page-reporting property to disabled by default

Changes from v22:
Added ack for patches 4 & 5
Added additional comment fixes in patch 3 to remove "reporting" reference
Renamed rvq in patch 5 to reporting_vq to improve readability
Moved call adding reporting_vq to after free_page_vq to fix VQ ordering

---

Alexander Duyck (5):
  linux-headers: Update to allow renaming of free_page_report_cmd_id
  linux-headers: update to contain virito-balloon free page reporting
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
  virtio-balloon: Implement support for page poison reporting feature
  virtio-balloon: Provide an interface for free page reporting


 hw/virtio/virtio-balloon.c  |  176 ++-
 include/hw/virtio/virtio-balloon.h  |   23 ++-
 include/standard-headers/linux/virtio_balloon.h |   12 +-
 3 files changed, 159 insertions(+), 52 deletions(-)

--



[PATCH v23 QEMU 1/5] linux-headers: Update to allow renaming of free_page_report_cmd_id

2020-04-27 Thread Alexander Duyck
From: Alexander Duyck 

Sync to the latest upstream changes for free page hinting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..af0a6b59dab2 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,8 +47,15 @@ struct virtio_balloon_config {
uint32_t num_pages;
/* Number of pages we've actually got in balloon. */
uint32_t actual;
-   /* Free page report command id, readonly by guest */
-   uint32_t free_page_report_cmd_id;
+   /*
+* Free page hint command id, readonly by guest.
+* Was previously name free_page_report_cmd_id so we
+* need to carry that name for legacy support.
+*/
+   union {
+   uint32_t free_page_hint_cmd_id;
+   uint32_t free_page_report_cmd_id;   /* deprecated */
+   };
/* Stores PAGE_POISON if page poisoning is in use */
uint32_t poison_val;
 };




Re: [PATCH v22 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-27 Thread Alexander Duyck
On Mon, Apr 27, 2020 at 8:11 AM David Hildenbrand  wrote:
>
> On 27.04.20 17:08, Alexander Duyck wrote:
> > On Mon, Apr 27, 2020 at 1:15 AM David Hildenbrand  wrote:
> >>
> >> There is only one wrong comment remaining I think. Something like
> >>
> >> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> >> index a1d6fb52c8..1b2127c04c 100644
> >> --- a/hw/virtio/virtio-balloon.c
> >> +++ b/hw/virtio/virtio-balloon.c
> >> @@ -554,8 +554,8 @@ static void 
> >> virtio_balloon_free_page_stop(VirtIOBalloon *s)
> >>   */
> >>  qemu_mutex_lock(>free_page_lock);
> >>  /*
> >> - * The guest hasn't done the reporting, so host sends a 
> >> notification
> >> - * to the guest to actively stop the reporting.
> >> + * The guest isn't done with hinting, so the host sends a 
> >> notification
> >> + * to the guest to actively stop the hinting.
> >
> > I'll probably tweak it slightly and drop the "with". So the comment will 
> > read:
> > /*
> >  * The guest isn't done hinting, so host sends a notification
>
> I always feel like "so host sends" sounds wrong ("the host"). But I am
> not a native speaker.

Actually it might read better to get rid of "the host" entirely to
make it more of an imperative statement rather than a declarative one.
Maybe something more like:
/*
 * The guest isn't done hinting, so send a notification
 * to the guest to actively stop the hinting.
 */



Re: [virtio-dev] Re: [PATCH v22 QEMU 0/5] virtio-balloon: add support for page poison reporting and free page reporting

2020-04-27 Thread Alexander Duyck
On Mon, Apr 27, 2020 at 1:21 AM David Hildenbrand  wrote:
>
> Except one minor nit, looks good to me. We'll have to take care of
> compat handling regarding patch #4 as soon as we have 5.0 compat
> machines in place.

I will clean up the one comment and submit later today if there is no
other follow-up.

Thanks.

- Alex



Re: [PATCH v22 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-27 Thread Alexander Duyck
On Mon, Apr 27, 2020 at 1:15 AM David Hildenbrand  wrote:
>
> There is only one wrong comment remaining I think. Something like
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a1d6fb52c8..1b2127c04c 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -554,8 +554,8 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
> *s)
>   */
>  qemu_mutex_lock(>free_page_lock);
>  /*
> - * The guest hasn't done the reporting, so host sends a notification
> - * to the guest to actively stop the reporting.
> + * The guest isn't done with hinting, so the host sends a 
> notification
> + * to the guest to actively stop the hinting.

I'll probably tweak it slightly and drop the "with". So the comment will read:
/*
 * The guest isn't done hinting, so host sends a notification
 * to the guest to actively stop the hinting.
 */

There is one other spot left which is support for migration. The name
for the VMStateDescription is
"virtio-balloon-device/free-page-report". I am assuming I cannot
rename that. Otherwise all other references to report on the balloon
interface refer to reporting errors from what I can tell.



[PATCH v22 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   67 
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index c1c76ec09c95..2ce56c6c0794 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,67 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+/*
+ * When we discard the page it has the effect of removing the page
+ * from the hypervisor itself and causing it to be zeroed when it
+ * is returned to us. So we must not discard the page if it is
+ * accessible by another device or process, or if the guest is
+ * expecting it to retain a non-zero value.
+ */
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+goto skip_element;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+(ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+skip_element:
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -818,6 +879,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -945,6 +1010,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
 DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_PAGE_POISON, true),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 3ca2a78e1aca..ac4013d51010 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ 

[PATCH v22 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   74 ++--
 include/hw/virtio/virtio-balloon.h |   20 +-
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..a1d6fb52c876 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
  * The guest hasn't done the reporting, so host sends a notification
  * to the guest to actively stop the reporting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_report_notify);
+  free_page_hint_notify);
 VirtIODevice *vdev = VIRTIO_DEVICE(dev

[PATCH v22 QEMU 1/5] linux-headers: Update to allow renaming of free_page_report_cmd_id

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

Sync to the latest upstream changes for free page hinting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..af0a6b59dab2 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,8 +47,15 @@ struct virtio_balloon_config {
uint32_t num_pages;
/* Number of pages we've actually got in balloon. */
uint32_t actual;
-   /* Free page report command id, readonly by guest */
-   uint32_t free_page_report_cmd_id;
+   /*
+* Free page hint command id, readonly by guest.
+* Was previously name free_page_report_cmd_id so we
+* need to carry that name for legacy support.
+*/
+   union {
+   uint32_t free_page_hint_cmd_id;
+   uint32_t free_page_report_cmd_id;   /* deprecated */
+   };
/* Stores PAGE_POISON if page poisoning is in use */
uint32_t poison_val;
 };




[PATCH v22 QEMU 4/5] virtio-balloon: Implement support for page poison reporting feature

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison reporting if
we want to actually get data on if the guest will be poisoning pages.

Add a value for reporting the poison value being used if page poisoning is
enabled in the guest. With this we can determine if we will need to skip
free page reporting when it is enabled in the future.

The value currently has no impact on existing balloon interfaces. In the
case of existing balloon interfaces the onus is on the guest driver to
reapply whatever poison is in place.

When we add free page reporting the poison value is used to determine if
we can perform in-place page reporting. The expectation is that a reported
page will already contain the value specified by the poison, and the
reporting of the page should not change that value.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   29 +
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 30 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a1d6fb52c876..c1c76ec09c95 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
 config.free_page_hint_cmd_id =
@@ -683,6 +684,14 @@ static ram_addr_t get_current_ram_size(void)
 return size;
 }
 
+static bool virtio_balloon_page_poison_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+}
+
 static void virtio_balloon_set_config(VirtIODevice *vdev,
   const uint8_t *config_data)
 {
@@ -697,6 +706,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_balloon_page_poison_support(dev)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -755,6 +768,17 @@ static const VMStateDescription 
vmstate_virtio_balloon_free_page_hint = {
 }
 };
 
+static const VMStateDescription vmstate_virtio_balloon_page_poison = {
+.name = "vitio-balloon-device/page-poison",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = virtio_balloon_page_poison_support,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(poison_val, VirtIOBalloon),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
 .name = "virtio-balloon-device",
 .version_id = 1,
@@ -767,6 +791,7 @@ static const VMStateDescription 
vmstate_virtio_balloon_device = {
 },
 .subsections = (const VMStateDescription * []) {
 _virtio_balloon_free_page_hint,
+_virtio_balloon_page_poison,
 NULL
 }
 };
@@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +943,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, true),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 108cff97e71a..3ca2a78e1aca 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v22 QEMU 2/5] linux-headers: update to contain virito-balloon free page reporting

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index af0a6b59dab2..af3b7a1fa263 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v22 QEMU 0/5] virtio-balloon: add support for page poison reporting and free page reporting

2020-04-24 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be discarded and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

As a part of enabling this feature it was necessary to implement the page
poison reporting feature which had been added to the kernel, but was not
available in QEMU. This patch set adds it as a new device property
"page-poison" which is enabled by default, and adds support for migrating
the reported value.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

Changes from v20:
Rearranged patches to push Linux header sync patches to front
Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
Fixed possible resource leak if poison or qemu_balloon_is_inhibited return true

Changes from v21:
Added ack for patch 3
Rewrote patch description for page poison reporting feature
Made page-poison independent property and set to enabled by default
Added logic to migrate poison_val
Added several comments in code to better explain features
Switched free-page-reporting property to disabled by default

---

Alexander Duyck (5):
  linux-headers: Update to allow renaming of free_page_report_cmd_id
  linux-headers: update to contain virito-balloon free page reporting
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
  virtio-balloon: Implement support for page poison reporting feature
  virtio-balloon: Provide an interface for free page reporting


 hw/virtio/virtio-balloon.c  |  170 ++-
 include/hw/virtio/virtio-balloon.h  |   23 ++-
 include/standard-headers/linux/virtio_balloon.h |   12 +-
 3 files changed, 155 insertions(+), 50 deletions(-)

--



Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-24 Thread Alexander Duyck
On Fri, Apr 24, 2020 at 8:34 AM David Hildenbrand  wrote:
>
>
> >> Also, I do wonder if we want to default-enable it. It can still have a
> >> negative performance impact and some people might not want that.
> >
> > The negative performance impact should be minimal. At this point the
> > hinting process ends up being very light since it only processes a
> > small chunk of the memory before it shuts down for a couple seconds.
> > In my original data the only time any significant performance
> > regression was seen was with page shuffling enabled, and that was
> > before I put limits on how many pages we could process per pass and
> > how frequently we would make a pass through memory. My thought is that
> > we are much better having it enabled by default rather than having it
> > disabled by default. In the worst case scenario we have a little bit
> > of extra thread noise from it popping up running for a few
> > milliseconds and then sleeping for about two seconds, but the benefit
> > side is that the VMs will do us a favor and restrict themselves to a
> > much smaller idle footprint until such time as the memory is actually
> > needed in the guest.
>
> While I agree that the impact is small, there *is* a noticeable
> performance impact. One of the main users is memory overcommit.
>
> Also, imagine the following scenarios:
> - RT workload in the guest. Just because you have a virtio-balloon
>   device defined would mean something is suddenly active and
>   discarding/trying to discard pages.
> - vfio: free page reporting is completely useless right now and
>   *only* overhead.
> - prealloc does not expect that pages get suddenly discarded
> - s390x, which has CMM and might not benefit at all (except when using
>   huge pages as backing storage)
>
> No, I don't think it's acceptable to enable this as default. I remember
> that it is quite common to just define a balloon device but never use it.
>
> E.g., "A virtual memory balloon device is added to all Xen and KVM/QEMU
> guest virtual machines. It will be seen as  element" [1].
>
> I think we should let upper layers decide (just as we do for free page
> hinting, for example).

Okay. I guess there are a number of other cases I hand't thought of. I
will switch the default to false.

Thanks.

- Alex



Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-24 Thread Alexander Duyck
On Fri, Apr 24, 2020 at 4:20 AM David Hildenbrand  wrote:
>
> On 22.04.20 20:21, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > Add support for free page reporting. The idea is to function very similar
> > to how the balloon works in that we basically end up madvising the page as
> > not being used. However we don't really need to bother with any deflate
> > type logic since the page will be faulted back into the guest when it is
> > read or written to.
> >
> > This provides a new way of letting the guest proactively report free
> > pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> > inflate/deflate that is triggered via the hypervisor explicitly.
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   70 
> > 
> >  include/hw/virtio/virtio-balloon.h |2 +
> >  2 files changed, 71 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 5effc8b4653b..b473ff7f4b88 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,60 @@ static void balloon_stats_set_poll_interval(Object 
> > *obj, Visitor *v,
> >  balloon_stats_change_timer(s, 0);
> >  }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > +VirtQueueElement *elem;
> > +
> > +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
> > +unsigned int i;
> > +
>
> Maybe add a comment like
>
> /*
>  * As discarded pages will be zero when re-accessed, all pages either
>  * have the old value, or were zeroed out. In case the guest expects
>  * another value, make sure to never discard.
>  */
>
> Whatever you think is best.

Okay I will add the following comment:
/*
 * When we discard the page it has the effect of removing the page
 * from the hypervisor itself and causing it to be zeroed when it
 * is returned to us. So we must not discard the page if it is
 * accessible by another device or process, or if the guest is
 * expecting it to retain a non-zero value.
 */


> > +if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > +goto skip_element;
> > +}
> > +
> > +for (i = 0; i < elem->in_num; i++) {
> > +void *addr = elem->in_sg[i].iov_base;
> > +size_t size = elem->in_sg[i].iov_len;
> > +ram_addr_t ram_offset;
> > +RAMBlock *rb;
> > +
> > +/*
> > + * There is no need to check the memory section to see if
> > + * it is ram/readonly/romd like there is for handle_output
> > + * below. If the region is not meant to be written to then
> > + * address_space_map will have allocated a bounce buffer
> > + * and it will be freed in address_space_unmap and trigger
> > + * and unassigned_mem_write before failing to copy over the
> > + * buffer. If more than one bad descriptor is provided it
> > + * will return NULL after the first bounce buffer and fail
> > + * to map any resources.
> > + */
> > +rb = qemu_ram_block_from_host(addr, false, _offset);
> > +if (!rb) {
> > +trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> > +continue;
> > +}
> > +
> > +/*
> > + * For now we will simply ignore unaligned memory regions, or
> > + * regions that overrun the end of the RAMBlock.
> > + */
> > +if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) 
> > ||
> > +(ram_offset + size) > qemu_ram_get_used_length(rb)) {
> > +continue;
> > +}
> > +
> > +ram_block_discard_range(rb, ram_offset, size);
> > +}
> > +
> > +skip_element:
> > +virtqueue_push(vq, elem, 0);
> > +virtio_notify(vdev, vq);
> > +g_free(elem);
> > +}
> > +}
> > +
> >  static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >  {
> >  VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> > @@ -782,6 +836,16 @@ static void virtio_balloon_device_realize(DeviceState 
> > *dev, Error **errp)
> >

Re: [PATCH v21 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-24 Thread Alexander Duyck
On Fri, Apr 24, 2020 at 4:23 AM David Hildenbrand  wrote:
>
> On 22.04.20 20:21, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > In an upcoming patch a feature named Free Page Reporting is about to be
> > added. In order to avoid any confusion we should drop the use of the word
> > 'report' when referring to Free Page Hinting. So what this patch does is go
> > through and replace all instances of 'report' with 'hint" when we are
> > referring to free page hinting.
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   74 
> > ++--
> >  include/hw/virtio/virtio-balloon.h |   20 +-
> >  2 files changed, 47 insertions(+), 47 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index a4729f7fc930..a1d6fb52c876 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
> >  ret = false;
> >  goto out;
> >  }
> > -if (id == dev->free_page_report_cmd_id) {
> > -dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> > +if (id == dev->free_page_hint_cmd_id) {
> > +dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
> >  } else {
> >  /*
> >   * Stop the optimization only when it has started. This
> >   * avoids a stale stop sign for the previous command.
> >   */
> > -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
> > -dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
> > +dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
> >  }
> >  }
> >  }
> >
> >  if (elem->in_num) {
> > -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
> > +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
> >  qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> >elem->in_sg[0].iov_len);
> >  }
> > @@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
> > *opaque)
> >  qemu_mutex_unlock(>free_page_lock);
> >  virtio_notify(vdev, vq);
> >/*
> > -   * Start to poll the vq once the reporting started. Otherwise, 
> > continue
> > +   * Start to poll the vq once the hinting started. Otherwise, continue
> > * only when there are entries on the vq, which need to be given 
> > back.
> > */
> >  } while (continue_to_get_hints ||
> > - dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
> > + dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
> >  virtio_queue_set_notification(vq, 1);
> >  }
> >
> > @@ -531,14 +531,14 @@ static void 
> > virtio_balloon_free_page_start(VirtIOBalloon *s)
> >  return;
> >  }
> >
> > -if (s->free_page_report_cmd_id == UINT_MAX) {
> > -s->free_page_report_cmd_id =
> > -   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> > +if (s->free_page_hint_cmd_id == UINT_MAX) {
> > +s->free_page_hint_cmd_id =
> > +   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
> >  } else {
> > -s->free_page_report_cmd_id++;
> > +s->free_page_hint_cmd_id++;
> >  }
> >
> > -s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> > +s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
> >  virtio_notify_config(vdev);
> >  }
> >
> > @@ -546,18 +546,18 @@ static void 
> > virtio_balloon_free_page_stop(VirtIOBalloon *s)
> >  {
> >  VirtIODevice *vdev = VIRTIO_DEVICE(s);
> >
> > -if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
> > +if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
> >  /*
> >   * The lock also guarantees us that the
> >   * virtio_ballloon_get_free_page_hints exits after the
> > - * free_page_report_status is set to S_STOP.
> > + * free_page_hint_status is set to S_STOP.
> >   */
> >  qemu_mutex_lock(>free_page_lock);
&g

Re: [PATCH v21 QEMU 4/5] virtio-balloon: Implement support for page poison tracking feature

2020-04-23 Thread Alexander Duyck
On Thu, Apr 23, 2020 at 9:02 AM David Hildenbrand  wrote:
>
> On 23.04.20 16:46, Alexander Duyck wrote:
> > On Thu, Apr 23, 2020 at 1:11 AM David Hildenbrand  wrote:
> >>
> >> On 22.04.20 20:21, Alexander Duyck wrote:
> >>> From: Alexander Duyck 
> >>>
> >>> We need to make certain to advertise support for page poison tracking if
> >>> we want to actually get data on if the guest will be poisoning pages.
> >>>
> >>> Add a value for tracking the poison value being used if page poisoning is
> >>> enabled. With this we can determine if we will need to skip page reporting
> >>> when it is enabled in the future.
> >>
> >> Maybe add something about the semantics
> >>
> >> "VIRTIO_BALLOON_F_PAGE_POISON will not change the behavior of free page
> >> hinting or ordinary balloon inflation/deflation."
> >>
> >> I do wonder if we should just unconditionally enable
> >> VIRTIO_BALLOON_F_PAGE_POISON here, gluing it to the QEMU compat machine
> >> (via a property that is default-enabled, and disabled from compat 
> >> machines).
> >>
> >> Because, as Michael said, knowing that the guest is using page poisoning
> >> might be interesting even if free page reporting is not around.
> >
> > I could do that. So if that is the case though would I disable page
> > reporting if it isn't enabled, or would I be enabling page poison if
> > page reporting is enabled?
>
>
> So, I would suggest this the following as a diff to this patch (the TODO part 
> as a
> separate patch - we will have these compat properties briefly after the 5.0
> release)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index c1a444cb75..2e96cba4ff 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -28,6 +28,12 @@
>  #include "hw/mem/nvdimm.h"
>  #include "migration/vmstate.h"
>
> +// TODO: After 5.0 release
> +GlobalProperty hw_compat_5_0[] = {
> +{ "virtio-balloon-device", "page-hint", "false"},
> +};
> +const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0);
> +
>  GlobalProperty hw_compat_4_2[] = {
>  { "virtio-blk-device", "queue-size", "128"},
>  { "virtio-scsi-device", "virtqueue_size", "128"},

Okay, so the bit above is for after 5_0 is released then? Is there a
way to queue up a reminder or something so we get to it when the time
comes, or I just need to watch for 5.0 to come out and submit a patch
then?

> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a4729f7fc9..5ff8a669cf 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -924,6 +924,8 @@ static Property virtio_balloon_properties[] = {
>   qemu_4_0_config_size, false),
>  DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
>   IOThread *),
> +DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
> +VIRTIO_BALLOON_F_PAGE_POISON, true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>
>
> What to do with page reporting depends: I would not implicitly enable 
> features.
> I think we are talking about
>
> -device virtio-balloon-pci,...,page-poison=false,free-page-reporting=true
>
> a) Valid scenario. Fix Linux guests as we discussed to not use reporting if 
> they rely on page poisoning.

Okay I will probably go this route and resubmit the patch I had
submitted before that only allows us to do page reporting if poisoning
is disabled on the guest kernel, or the page-poison is enabled on the
hypervisor.

> b) Invalid scenario. E.g., bail out when trying to realize that device 
> ("'free-page-reporting' requires 'page-poison'").
>
> With new QEMU machines, this should not happen with
>
> -device virtio-balloon-pci,...,free-page-reporting=true
>
> as page-poison=true is the new default.
>
> What's your opinion?

I will just clean up and resubmit my earlier kernel patch, this time
without it messing with free page hinting.

> >
> >>>
> >>> Signed-off-by: Alexander Duyck 
> >>> ---
> >>>  hw/virtio/virtio-balloon.c |7 +++
> >>>  include/hw/virtio/virtio-balloon.h |1 +
> >>>  2 files changed, 8 insertions(+)
> >>>
> >>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> >>> index a1d6fb52c876..5effc8b4653b 100644
> >>> --- a/hw/virtio/virtio-balloon.c
> >>> +++ b/hw/virtio/vir

Re: [PATCH v21 QEMU 4/5] virtio-balloon: Implement support for page poison tracking feature

2020-04-23 Thread Alexander Duyck
On Thu, Apr 23, 2020 at 1:11 AM David Hildenbrand  wrote:
>
> On 22.04.20 20:21, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > We need to make certain to advertise support for page poison tracking if
> > we want to actually get data on if the guest will be poisoning pages.
> >
> > Add a value for tracking the poison value being used if page poisoning is
> > enabled. With this we can determine if we will need to skip page reporting
> > when it is enabled in the future.
>
> Maybe add something about the semantics
>
> "VIRTIO_BALLOON_F_PAGE_POISON will not change the behavior of free page
> hinting or ordinary balloon inflation/deflation."
>
> I do wonder if we should just unconditionally enable
> VIRTIO_BALLOON_F_PAGE_POISON here, gluing it to the QEMU compat machine
> (via a property that is default-enabled, and disabled from compat machines).
>
> Because, as Michael said, knowing that the guest is using page poisoning
> might be interesting even if free page reporting is not around.

I could do that. So if that is the case though would I disable page
reporting if it isn't enabled, or would I be enabling page poison if
page reporting is enabled?

> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |7 +++
> >  include/hw/virtio/virtio-balloon.h |1 +
> >  2 files changed, 8 insertions(+)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index a1d6fb52c876..5effc8b4653b 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice 
> > *vdev, uint8_t *config_data)
> >
> >  config.num_pages = cpu_to_le32(dev->num_pages);
> >  config.actual = cpu_to_le32(dev->actual);
> > +config.poison_val = cpu_to_le32(dev->poison_val);
> >
> >  if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
> >  config.free_page_hint_cmd_id =
> > @@ -697,6 +698,10 @@ static void virtio_balloon_set_config(VirtIODevice 
> > *vdev,
> >  qapi_event_send_balloon_change(vm_ram_size -
> >  ((ram_addr_t) dev->actual << 
> > VIRTIO_BALLOON_PFN_SHIFT));
> >  }
> > +dev->poison_val = 0;
> > +if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
> > +dev->poison_val = le32_to_cpu(config.poison_val);
> > +}
> >  trace_virtio_balloon_set_config(dev->actual, oldactual);
> >  }
> >
> > @@ -854,6 +859,8 @@ static void virtio_balloon_device_reset(VirtIODevice 
> > *vdev)
> >  g_free(s->stats_vq_elem);
> >  s->stats_vq_elem = NULL;
> >  }
> > +
> > +s->poison_val = 0;
> >  }
> >
> >  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> > diff --git a/include/hw/virtio/virtio-balloon.h 
> > b/include/hw/virtio/virtio-balloon.h
> > index 108cff97e71a..3ca2a78e1aca 100644
> > --- a/include/hw/virtio/virtio-balloon.h
> > +++ b/include/hw/virtio/virtio-balloon.h
> > @@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
> >  uint32_t host_features;
> >
> >  bool qemu_4_0_config_size;
> > +uint32_t poison_val;
> >  } VirtIOBalloon;
> >
> >  #endif
> >
>
> You still have to migrate poison_val if I am not wrong, otherwise you
> would lose it during migration if I am not mistaking.

So what are the requirements to migrate a value? Would I need to add a
property so it can be retrieved/set?



[PATCH v21 QEMU 2/5] linux-headers: update to contain virito-balloon free page reporting

2020-04-22 Thread Alexander Duyck
From: Alexander Duyck 

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index af0a6b59dab2..af3b7a1fa263 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-22 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   70 
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 5effc8b4653b..b473ff7f4b88 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,60 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+goto skip_element;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+(ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+skip_element:
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -782,6 +836,16 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 VirtIOBalloon *s = VIRTIO_BALLOON(dev);
 int ret;
 
+/*
+ * Page reporting is dependant on page poison to make sure we can
+ * report a page without changing the state of the internal data.
+ * We need to set the flag before we call virtio_init as it will
+ * affect the config size of the vdev.
+ */
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->host_features |= 1 << VIRTIO_BALLOON_F_PAGE_POISON;
+}
+
 virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
 virtio_balloon_config_size(s));
 
@@ -798,6 +862,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -923,6 +991,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, true),
 /* QEMU 4.0 accidentally changed the config size even when fr

[PATCH v21 QEMU 4/5] virtio-balloon: Implement support for page poison tracking feature

2020-04-22 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages.

Add a value for tracking the poison value being used if page poisoning is
enabled. With this we can determine if we will need to skip page reporting
when it is enabled in the future.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |7 +++
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 8 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a1d6fb52c876..5effc8b4653b 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
 config.free_page_hint_cmd_id =
@@ -697,6 +698,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -854,6 +859,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 108cff97e71a..3ca2a78e1aca 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v21 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-22 Thread Alexander Duyck
From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   74 ++--
 include/hw/virtio/virtio-balloon.h |   20 +-
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..a1d6fb52c876 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
  * The guest hasn't done the reporting, so host sends a notification
  * to the guest to actively stop the reporting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_report_notify);
+  free_page_hint_notify);
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 PrecopyNo

[PATCH v21 QEMU 1/5] linux-headers: Update to allow renaming of free_page_report_cmd_id

2020-04-22 Thread Alexander Duyck
From: Alexander Duyck 

Sync to the latest upstream changes for free page hinting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..af0a6b59dab2 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,8 +47,15 @@ struct virtio_balloon_config {
uint32_t num_pages;
/* Number of pages we've actually got in balloon. */
uint32_t actual;
-   /* Free page report command id, readonly by guest */
-   uint32_t free_page_report_cmd_id;
+   /*
+* Free page hint command id, readonly by guest.
+* Was previously name free_page_report_cmd_id so we
+* need to carry that name for legacy support.
+*/
+   union {
+   uint32_t free_page_hint_cmd_id;
+   uint32_t free_page_report_cmd_id;   /* deprecated */
+   };
/* Stores PAGE_POISON if page poisoning is in use */
uint32_t poison_val;
 };




[PATCH v21 QEMU 0/5] virtio-balloon: add support for free page reporting

2020-04-22 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

As of April 7th this functionality has been enabled in Linus's kernel
tree[1] and so I am submitting the QEMU pieces for inclusion.

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

Changes from v20:
Rearranged patches to push Linux header sync patches to front
Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
Fixed possible resource leak if poison or qemu_balloon_is_inhibited return true
Updated cover page comments

---

Alexander Duyck (5):
  linux-headers: Update to allow renaming of free_page_report_cmd_id
  linux-headers: update to contain virito-balloon free page reporting
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
  virtio-balloon: Implement support for page poison tracking feature
  virtio-balloon: Provide an interface for free page reporting


 hw/virtio/virtio-balloon.c  |  151 +--
 include/hw/virtio/virtio-balloon.h  |   23 ++--
 include/standard-headers/linux/virtio_balloon.h |   12 ++
 3 files changed, 136 insertions(+), 50 deletions(-)

--



[PATCH v20 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-16 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   62 +++-
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 19d587fd05cb..e5c9317921a1 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,59 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+continue;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+   (ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -728,7 +781,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
-if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
 virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
 }
 
@@ -818,6 +872,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -949,6 +1007,8 @@ static Property virtio_balloon_properties[] = {
  */
 DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
  qemu_4_0_config_size, false),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, true),
 DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 3ca2a78e1aca..ac4013d51010 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enu

[PATCH v20 QEMU 4/5] linux-headers: update to contain virito-balloon free page reporting

2020-04-16 Thread Alexander Duyck
From: Alexander Duyck 

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index af0a6b59dab2..af3b7a1fa263 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v20 QEMU 2/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-16 Thread Alexander Duyck
From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   74 ++--
 include/hw/virtio/virtio-balloon.h |   20 +-
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..a1d6fb52c876 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
  * The guest hasn't done the reporting, so host sends a notification
  * to the guest to actively stop the reporting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_report_notify);
+  free_page_hint_notify);
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 PrecopyNo

[PATCH v20 QEMU 0/5] virtio-balloon: add support for free page reporting

2020-04-16 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

---

Alexander Duyck (5):
  linux-headers: Update to allow renaming of free_page_report_cmd_id
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
  virtio-balloon: Implement support for page poison tracking feature
  linux-headers: update to contain virito-balloon free page reporting
  virtio-balloon: Provide an interface for free page reporting


 hw/virtio/virtio-balloon.c  |  161 ++-
 include/hw/virtio/virtio-balloon.h  |   23 ++-
 include/standard-headers/linux/virtio_balloon.h |   12 +-
 3 files changed, 146 insertions(+), 50 deletions(-)

--



[PATCH v20 QEMU 3/5] virtio-balloon: Implement support for page poison tracking feature

2020-04-16 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.

Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   27 +++
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 28 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a1d6fb52c876..19d587fd05cb 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,23 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
+/*
+ * If page poisoning is enabled then we probably shouldn't bother with
+ * the hinting since the poisoning will dirty the page and invalidate
+ * the work we are doing anyway.
+ *
+ * If at some point in the future the implementation is changed in the
+ * guest we would still have issues as the poison would need to be
+ * applied to the last state of the page on the remote end.
+ *
+ * To do that we could improve upon this current implementation by
+ * migrating a poison/zero page if the page is flagged as dirty instead
+ * of simply skipping it.
+ */
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+return;
+}
+
 if (s->free_page_hint_cmd_id == UINT_MAX) {
 s->free_page_hint_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
@@ -634,6 +651,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
 config.free_page_hint_cmd_id =
@@ -697,6 +715,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -706,6 +728,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
+}
 
 return f;
 }
@@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 108cff97e71a..3ca2a78e1aca 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v20 QEMU 1/5] linux-headers: Update to allow renaming of free_page_report_cmd_id

2020-04-16 Thread Alexander Duyck
From: Alexander Duyck 

Sync to the latest upstream changes for free page hinting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..af0a6b59dab2 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,8 +47,15 @@ struct virtio_balloon_config {
uint32_t num_pages;
/* Number of pages we've actually got in balloon. */
uint32_t actual;
-   /* Free page report command id, readonly by guest */
-   uint32_t free_page_report_cmd_id;
+   /*
+* Free page hint command id, readonly by guest.
+* Was previously name free_page_report_cmd_id so we
+* need to carry that name for legacy support.
+*/
+   union {
+   uint32_t free_page_hint_cmd_id;
+   uint32_t free_page_report_cmd_id;   /* deprecated */
+   };
/* Stores PAGE_POISON if page poisoning is in use */
uint32_t poison_val;
 };




Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-16 Thread Alexander Duyck
On Thu, Apr 16, 2020 at 7:55 AM David Hildenbrand  wrote:
>
> >> We should document our result of page poisoning, free page hinting, and
> >> free page reporting there as well. I hope you'll have time for the latter.
> >>
> >> -
> >> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> >> -
> >>
> >> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> >> guest is using page poisoning. Guest writes to the poison_val config
> >> field to tell host about the page poisoning value that is in use."
> >> -> Very little information, no signs about what has to be done.
> >
> > I think it's an informational field. Knowing that free pages
> > are full of a specific pattern can be handy for the hypervisor
> > for a variety of reasons. E.g. compression/deduplication?
>
> I was referring to the documentation of the feature and what we
> (hypervisor) are expected to do (in regards to inflation/deflation).
>
> Yes, it might be valuable to know that the guest is using poisoning. I
> assume compression/deduplication (IOW KSM) will figure out themselves
> that such pages are equal.

The other thing to keep in mind is that the poison value only really
comes into play with hinting/reporting. In the case of the standard
balloon the pages are considered allocated from the guest's
perspective until the balloon is deflated. Then any poison/init will
occur over again anyway so I don't think the standard balloon should
really care.

For hinting it somewhat depends. Currently the implementation is
inflating a balloon so having poisoning or init_on_free means it is
written to immediately after it is freed so it defeats the purpose of
the hinting. However that is a Linux implementation issue, not
necessarily an issue with the QEMU implementation. As such may be I
should fix that in the Linux driver since that has been ignored in
QEMU up until now anyway. The more interesting bit is what should the
behavior be from the hypervisor when a page is marked as being hinted.
I think right now the behavior is to just not migrate the page. I
wonder though if we shouldn't instead just consider the page a zero
page, and then maybe modify the zero page behavior for the case where
we know page poisoning is enabled.

For reporting it is a matter of tracking the contents. We don't want
to modify the contents in any way as we are attempting to essentially
do in-place tracking of the page. So if it is poisoned or initialized
it needs to stay in that state so we cannot invalidate the page if
doing so will cause it to lose state information.

> >> "Let the hypervisor know that we are expecting a specific value to be
> >> written back in balloon pages."
> >
> >
> >
> >> -> Okay, that talks about "balloon pages", which would include right now
> >> -- pages "inflated" and then "deflated" using free page hinting
> >> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
> >>queue
> >
> > ATM, in this case driver calls "free" and that fills page with the
> > poison value.
>
> Yes, that's what I mentioned somehwere, it's currently done by Linux and ...
>
> >
> > It might be a valid optimization to allow driver to skip
> > poisoning of freed pages in this case.
>
> ... we should prepare for that :)

Agreed.

> >
> >> And I would add
> >>
> >> "However, if the inflated page was not filled with "poison_val" when
> >> inflating, it's not predictable if the original page or a page filled
> >> with "poison_val" is returned."
> >>
> >> Which would cover the "we did not discard the page in the hypervisor, so
> >> the original page is still there".
> >>
> >>
> >> We should also document what is expected to happen if "poison_val" is
> >> suddenly changed by the guest at one point in time again. (e.g., not
> >> supported, unexpected things can happen, etc.)
> >
> > Right. I think we should require that this can only be changed
> > before features have been negotiated.
> > That is the only point where hypervisor can still fail
> > gracefully (i.e. fail FEATURES_OK).
>
> Agreed.

I believe that is the current behavior. Essentially if poisoning
enabled then the feature flag is left set. I think the one change I
will make in the driver is that if poisoning is enabled in the kernel,
but PAGE_POISON is not available as a feature, I am going to disable
both the reporting and hinting features in virtballoon_validate.

> I can totally understand if Alex would want to stop working on
> VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
> enable free page reporting in case we don't have
> VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)

I already have a patch for that.

The bigger issue is how to deal with the PAGE_POISON being enabled
with FREE_PAGE_HINTING. The legacy code at this point is just broken
and is plowing through with FREE_PAGE_HINTING while it is 

Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-15 Thread Alexander Duyck
On Wed, Apr 15, 2020 at 12:46 PM David Hildenbrand  wrote:
>
>
>
> > Am 15.04.2020 um 21:29 schrieb Alexander Duyck :
> >
> > On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand  
> > wrote:
> >>
> >>>
> >>> The comment above explains the "why". Basically poisoning a page will
> >>> dirty it. So why hint a page as free when that will drop it back into
> >>> the guest and result in it being dirtied again. What you end up with
> >>> is all the pages that were temporarily placed in the balloon are dirty
> >>> after the hinting report is finished at which point you made things
> >>> worse instead of helping to improve them.
> >>
> >> Let me think this through. What happens on free page hinting is that
> >>
> >> a) we tell the guest that migration starts
> >> b) it allocates pages (alloc_pages()), sends them to us and adds them to
> >>   a list
> >> b) we exclude these pages on migration
> >> c) we tell the guest that migration is over
> >> d) the guest frees all allocated pages
> >>
> >> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
> >> will fill all pages again with a pattern (or zero).
> >
> > They should have already been filled with the poison pattern before we
> > got to d). A bigger worry is that we at some point in the future
> > update the kernel so that d) doesn't trigger re-poisoning, in which
> > case the pages won't be marked as dirty, we will have skipped the
> > migration, and we have no idea what will be in the pages at the end.
> >
> >> AFAIKs, it's either
> >>
> >> 1) Not performing free page hinting, migrating pages filled with a
> >> poison value (or zero).
> >> 2) Performing free page hinting, not migrating pages filled with a
> >> poison value (or zero), letting only the guest fill them again.
> >>
> >> I have the feeling, that 2) is still better for migration, because we
> >> don't migrate useless pages and let the guest reconstruct the content?
> >> (having a poison value of zero might be debatable)
> >>
> >> Can you tell me what I am missing? :)
> >
> > The goal of the free page hinting was to reduce the number of pages
> > that have to be migrated, in the second case you point out is is
> > basically deferring it and will make the post-copy quite more
> > expensive since all of the free memory will be pushed to the post-copy
> > which I would think would be undesirable since it means the VM would
> > have to be down for a greater amount of time with the poisoning
> > enabled.
>
> Postcopy is a very good point, bought!
>
> But (what you wrote above) it sounds like that this is really what we *have 
> to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully 
> it was documented). We should rephrase the comment then.

Do you have a link to the spec that I could look at? I am not hopeful
that this will be listed there since the poison side of QEMU was never
implemented. The flag is only here because it was copied over in the
kernel header.

> I could imagine that we also have to care about ordinary page 
> inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate 
> if set for now).

The problem is this was broken from the start for that. The issue is
that the poison feature was wrapped up within the page hinting
feature. So as a result enabling it for a VM that doesn't recognize
the feature independently would likely leave the poison value
uninitialized and flagging as though a value of 0 is used. In addition
the original poison checking wasn't making sure that the page wasn't
init_on_free which has the same effect as page poisoning but isn't
page poisoning.

> >
> > The worst case scenario would be one in which the VM was suspended for
> > migration while there were still pages in the balloon that needed to
> > be drained. In such a case you would have them in an indeterminate
> > state since the last page poisoning for them might have been ignored
> > since they were marked as "free", so they are just going to be
> > whatever value they were if they had been migrated at all.
> >
> >>>
> >>>>
> >>>>> +return;
> >>>>> +}
> >>>>> +
> >>>>> if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>>> s->free_page_report_cmd_id =
> >>>>>VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>>>
> >>&g

Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-15 Thread Alexander Duyck
On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand  wrote:
>
> >
> > The comment above explains the "why". Basically poisoning a page will
> > dirty it. So why hint a page as free when that will drop it back into
> > the guest and result in it being dirtied again. What you end up with
> > is all the pages that were temporarily placed in the balloon are dirty
> > after the hinting report is finished at which point you made things
> > worse instead of helping to improve them.
>
> Let me think this through. What happens on free page hinting is that
>
> a) we tell the guest that migration starts
> b) it allocates pages (alloc_pages()), sends them to us and adds them to
>a list
> b) we exclude these pages on migration
> c) we tell the guest that migration is over
> d) the guest frees all allocated pages
>
> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
> will fill all pages again with a pattern (or zero).

They should have already been filled with the poison pattern before we
got to d). A bigger worry is that we at some point in the future
update the kernel so that d) doesn't trigger re-poisoning, in which
case the pages won't be marked as dirty, we will have skipped the
migration, and we have no idea what will be in the pages at the end.

> AFAIKs, it's either
>
> 1) Not performing free page hinting, migrating pages filled with a
> poison value (or zero).
> 2) Performing free page hinting, not migrating pages filled with a
> poison value (or zero), letting only the guest fill them again.
>
> I have the feeling, that 2) is still better for migration, because we
> don't migrate useless pages and let the guest reconstruct the content?
> (having a poison value of zero might be debatable)
>
> Can you tell me what I am missing? :)

The goal of the free page hinting was to reduce the number of pages
that have to be migrated, in the second case you point out is is
basically deferring it and will make the post-copy quite more
expensive since all of the free memory will be pushed to the post-copy
which I would think would be undesirable since it means the VM would
have to be down for a greater amount of time with the poisoning
enabled.

The worst case scenario would be one in which the VM was suspended for
migration while there were still pages in the balloon that needed to
be drained. In such a case you would have them in an indeterminate
state since the last page poisoning for them might have been ignored
since they were marked as "free", so they are just going to be
whatever value they were if they had been migrated at all.

> >
> >>
> >>> +return;
> >>> +}
> >>> +
> >>>  if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>  s->free_page_report_cmd_id =
> >>> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>
> >> We should rename all "free_page_report" stuff we can to
> >> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >> side :) ) before adding free page reporting .
> >>
> >> (looking at the virtio-balloon linux header, it's also confusing but
> >> we're stuck with that - maybe we should add better comments)
> >
> > Are we stuck? Couldn't we just convert it to an anonymous union with
> > free_page_hint_cmd_id and then use that where needed?
>
> Saw your patch already :)
>
> >
> >>> @@ -618,12 +627,10 @@ static size_t 
> >>> virtio_balloon_config_size(VirtIOBalloon *s)
> >>>  if (s->qemu_4_0_config_size) {
> >>>  return sizeof(struct virtio_balloon_config);
> >>>  }
> >>> -if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>> +if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>> +virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>  return sizeof(struct virtio_balloon_config);
> >>>  }
> >>> -if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> -return offsetof(struct virtio_balloon_config, poison_val);
> >>> -}
> >>
> >> I am not sure this change is completely sane. Why is that necessary at all?
> >
> > The poison_val is stored at the end of the structure and is required
> > in order to make free page hinting to work. What this change is doing
>
> Required to make it work? In the kernel,
>
> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> Author: Wei Wang 
> Date:   Mon Aug 27 09:32:19 2018 +0800
>
> virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>
> was merged after
>
> commit 86a559787e6f5cf662c081363f64a20cad654195
> Author: Wei Wang 
> Date:   Mon Aug 27 09:32:17 2018 +0800
>
> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>
> So I assume it's perfectly fine to not have it.
>
> I'd say it's the responsibility of the guest to *not* use
> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> into the foot.

Based on the timing I am guessing the page poisoning was in the same
patch set 

Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-15 Thread Alexander Duyck
On Wed, Apr 15, 2020 at 1:08 AM David Hildenbrand  wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > We need to make certain to advertise support for page poison tracking if
> > we want to actually get data on if the guest will be poisoning pages. So
> > if free page hinting is active we should add page poisoning support and
> > let the guest disable it if it isn't using it.
> >
> > Page poisoning will result in a page being dirtied on free. As such we
> > cannot really avoid having to copy the page at least one more time since
> > we will need to write the poison value to the destination. As such we can
> > just ignore free page hinting if page poisoning is enabled as it will
> > actually reduce the work we have to do.
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   26 ++
> >  include/hw/virtio/virtio-balloon.h |1 +
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index a4729f7fc930..1c6d36a29a04 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -531,6 +531,15 @@ static void 
> > virtio_balloon_free_page_start(VirtIOBalloon *s)
> >  return;
> >  }
> >
> > +/*
> > + * If page poisoning is enabled then we probably shouldn't bother with
> > + * the hinting since the poisoning will dirty the page and invalidate
> > + * the work we are doing anyway.
> > + */
> > +if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
>
> Why not check for the poison value instead? (as you do in patch #3) ?

So if I recall correctly the vdev has feature requires the host to
indicate that the feature is in use. If page poisoning is not enabled
on the host then it clears the flag on its end and we can proceed with
the feature.

The comment above explains the "why". Basically poisoning a page will
dirty it. So why hint a page as free when that will drop it back into
the guest and result in it being dirtied again. What you end up with
is all the pages that were temporarily placed in the balloon are dirty
after the hinting report is finished at which point you made things
worse instead of helping to improve them.

>
> > +return;
> > +}
> > +
> >  if (s->free_page_report_cmd_id == UINT_MAX) {
> >  s->free_page_report_cmd_id =
> > VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>
> We should rename all "free_page_report" stuff we can to
> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> side :) ) before adding free page reporting .
>
> (looking at the virtio-balloon linux header, it's also confusing but
> we're stuck with that - maybe we should add better comments)

Are we stuck? Couldn't we just convert it to an anonymous union with
free_page_hint_cmd_id and then use that where needed?

> > @@ -618,12 +627,10 @@ static size_t 
> > virtio_balloon_config_size(VirtIOBalloon *s)
> >  if (s->qemu_4_0_config_size) {
> >  return sizeof(struct virtio_balloon_config);
> >  }
> > -if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> > +if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> > +virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >  return sizeof(struct virtio_balloon_config);
> >  }
> > -if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > -return offsetof(struct virtio_balloon_config, poison_val);
> > -}
>
> I am not sure this change is completely sane. Why is that necessary at all?

The poison_val is stored at the end of the structure and is required
in order to make free page hinting to work. What this change is doing
is forcing it so that we report the config size as the full size if
either poisoning or hinting are enabled since the poison val is the
last member of the config structure.

If the question is why bother reducing the size if free page hinting
is not present then I guess we could simplify this and just report
report the size of the config for all cases.

> >  return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
> >  }
> >
> > @@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice 
> > *vdev, uint8_t *config_data)
> >
> >  config.num_pages = cpu_to_le32(dev->num_pages);
> >  config.actual = cpu_to_le32(dev->actual);
> > +config.poi

Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting

2020-04-15 Thread Alexander Duyck
On Wed, Apr 15, 2020 at 1:17 AM David Hildenbrand  wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > Add support for free page reporting. The idea is to function very similar
> > to how the balloon works in that we basically end up madvising the page as
> > not being used. However we don't really need to bother with any deflate
> > type logic since the page will be faulted back into the guest when it is
> > read or written to.
> >
> > This provides a new way of letting the guest proactively report free
> > pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> > inflate/deflate that is triggered via the hypervisor explicitly.
>
> Much better, thanks!
>
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   63 
> > +++-
> >  include/hw/virtio/virtio-balloon.h |2 +
> >  2 files changed, 62 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 1c6d36a29a04..86d8b48a8e3a 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object 
> > *obj, Visitor *v,
> >  balloon_stats_change_timer(s, 0);
> >  }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > +VirtQueueElement *elem;
> > +
> > +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
> > +unsigned int i;
> > +
> > +for (i = 0; i < elem->in_num; i++) {
> > +void *addr = elem->in_sg[i].iov_base;
> > +size_t size = elem->in_sg[i].iov_len;
> > +ram_addr_t ram_offset;
> > +size_t rb_page_size;
> > +RAMBlock *rb;
> > +
> > +if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > +continue;
>
> actually, you want to do that in the outer loop, no?

I'll move that. Odds are compiler was doing that anyway.

> > +}
> > +
> > +/*
> > + * There is no need to check the memory section to see if
> > + * it is ram/readonly/romd like there is for handle_output
> > + * below. If the region is not meant to be written to then
> > + * address_space_map will have allocated a bounce buffer
> > + * and it will be freed in address_space_unmap and trigger
> > + * and unassigned_mem_write before failing to copy over the
> > + * buffer. If more than one bad descriptor is provided it
> > + * will return NULL after the first bounce buffer and fail
> > + * to map any resources.
> > + */
> > +rb = qemu_ram_block_from_host(addr, false, _offset);
> > +if (!rb) {
> > +trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> > +continue;
> > +}
> > +
> > +/* For now we will simply ignore unaligned memory regions */
> > +rb_page_size = qemu_ram_pagesize(rb);
> > +if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
>
> /me thinks you can drop rb_page_size

You mean just fold it into the statement? I guess I can do that.

> I *think* there is still one remaining case to handle: Crossing RAM blocks.
>
> Most probably you should check
>
> /* For now, ignore crossing RAM blocks. */
> if (ram_offset + size >= qemu_ram_get_used_length()) {
> continue;
> }
>
> otherwise ram_block_discard_range() will report an error.

Makes sense. I can add that into the QEMU_IS_ALIGNED check.



Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions

2020-04-13 Thread Alexander Duyck
On Fri, Apr 10, 2020 at 3:50 AM Paolo Bonzini  wrote:
>
> On 10/04/20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > According to the documentation in memory.h a ROM memory region will be
> > backed by RAM for reads, but is supposed to go through a callback for
> > writes. Currently we were not checking for the existence of the rom_device
> > flag when determining if we could perform a direct write or not.
> >
> > To correct that add a check to memory_region_is_direct so that if the
> > memory region has the rom_device flag set we will return false for all
> > checks where is_write is set.
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  include/exec/memory.h |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 1614d9a02c0c..e000bd2f97b2 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -2351,8 +2351,8 @@ void 
> > address_space_write_cached_slow(MemoryRegionCache *cache,
> >  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
> >  {
> >  if (is_write) {
> > -return memory_region_is_ram(mr) &&
> > -   !mr->readonly && !memory_region_is_ram_device(mr);
> > +return memory_region_is_ram(mr) && !mr->readonly &&
> > +   !mr->rom_device && !memory_region_is_ram_device(mr);
> >  } else {
> >  return (memory_region_is_ram(mr) && 
> > !memory_region_is_ram_device(mr)) ||
> > memory_region_is_romd(mr);
> >
>
> Good catch.  I queued this up for 5.0.
>
> Thanks,
>
> Paolo

Thanks Paolo,

It looks like you only pulled this patch correct?

If so, David & Michael, do I need to resubmit the first 3 in this
series or can those be pulled separately?

Thanks.

Alex



[PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-09 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.

Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   26 ++
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..1c6d36a29a04 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
+/*
+ * If page poisoning is enabled then we probably shouldn't bother with
+ * the hinting since the poisoning will dirty the page and invalidate
+ * the work we are doing anyway.
+ */
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+return;
+}
+
 if (s->free_page_report_cmd_id == UINT_MAX) {
 s->free_page_report_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 if (s->qemu_4_0_config_size) {
 return sizeof(struct virtio_balloon_config);
 }
-if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
+virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 return sizeof(struct virtio_balloon_config);
 }
-if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-return offsetof(struct virtio_balloon_config, poison_val);
-}
 return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
 }
 
@@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
 config.free_page_report_cmd_id =
@@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = virtio_vdev_has_feature(vdev,
+  VIRTIO_BALLOON_F_PAGE_POISON) ?
+  le32_to_cpu(config.poison_val) : 0;
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
+}
 
 return f;
 }
@@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index d1c968d2376e..7fe78e5c14d7 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting

2020-04-09 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

---

Alexander Duyck (4):
  virtio-balloon: Implement support for page poison tracking feature
  linux-headers: update to contain virito-balloon free page reporting
  virtio-balloon: Provide an interface for free page reporting
  memory: Do not allow direct write access to rom_device regions


 hw/virtio/virtio-balloon.c  |   85 ++-
 include/exec/memory.h   |4 +
 include/hw/virtio/virtio-balloon.h  |3 +
 include/standard-headers/linux/virtio_balloon.h |1 
 4 files changed, 86 insertions(+), 7 deletions(-)

--



[PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting

2020-04-09 Thread Alexander Duyck
From: Alexander Duyck 

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..1c5f6d6f2de6 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions

2020-04-09 Thread Alexander Duyck
From: Alexander Duyck 

According to the documentation in memory.h a ROM memory region will be
backed by RAM for reads, but is supposed to go through a callback for
writes. Currently we were not checking for the existence of the rom_device
flag when determining if we could perform a direct write or not.

To correct that add a check to memory_region_is_direct so that if the
memory region has the rom_device flag set we will return false for all
checks where is_write is set.

Signed-off-by: Alexander Duyck 
---
 include/exec/memory.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1614d9a02c0c..e000bd2f97b2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache 
*cache,
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
 if (is_write) {
-return memory_region_is_ram(mr) &&
-   !mr->readonly && !memory_region_is_ram_device(mr);
+return memory_region_is_ram(mr) && !mr->readonly &&
+   !mr->rom_device && !memory_region_is_ram_device(mr);
 } else {
 return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) 
||
memory_region_is_romd(mr);




[PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting

2020-04-09 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   63 +++-
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1c6d36a29a04..86d8b48a8e3a 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+size_t rb_page_size;
+RAMBlock *rb;
+
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+continue;
+}
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/* For now we will simply ignore unaligned memory regions */
+rb_page_size = qemu_ram_pagesize(rb);
+if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -628,7 +679,8 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 return sizeof(struct virtio_balloon_config);
 }
 if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
-virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(features, VIRTIO_BALLOON_F_REPORTING)) {
 return sizeof(struct virtio_balloon_config);
 }
 return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
@@ -717,7 +769,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
-if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
 virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
 }
 
@@ -807,6 +860,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -940,6 +997,8 @@ static Property virtio_balloon_properties[] = {
  */
 DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
  qemu_4_0_config_size, false),
+DEFINE_PROP_BIT(&q

Re: [PATCH v18 QEMU 3/3] virtio-balloon: Provide a interface for free page reporting

2020-04-09 Thread Alexander Duyck
On Thu, Apr 9, 2020 at 10:35 AM David Hildenbrand  wrote:
> On 09.04.20 19:34, Alexander Duyck wrote:
> >>>  hw/virtio/virtio-balloon.c |   48 
> >>> +++-
> >>>  include/hw/virtio/virtio-balloon.h |2 +-
> >>>  2 files changed, 47 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> >>> index 1c6d36a29a04..297b267198ac 100644
> >>> --- a/hw/virtio/virtio-balloon.c
> >>> +++ b/hw/virtio/virtio-balloon.c
> >>> @@ -321,6 +321,42 @@ static void balloon_stats_set_poll_interval(Object 
> >>> *obj, Visitor *v,
> >>>  balloon_stats_change_timer(s, 0);
> >>>  }
> >>>
> >>> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue 
> >>> *vq)
> >>> +{
> >>> +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >>> +VirtQueueElement *elem;
> >>> +
> >>> +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
> >>> +unsigned int i;
> >>> +
> >>> +for (i = 0; i < elem->in_num; i++) {
> >>> +void *addr = elem->in_sg[i].iov_base;
> >>> +size_t size = elem->in_sg[i].iov_len;
> >>> +ram_addr_t ram_offset;
> >>> +size_t rb_page_size;
> >>> +RAMBlock *rb;
> >>> +
> >>> +if (qemu_balloon_is_inhibited() || dev->poison_val) {
> >>> +continue;
> >>> +}
> >>
> >> These checks are not sufficient. See virtio_balloon_handle_output(),
> >> where we e.g., check that somebody doesn't try to discard the bios.
> >>
> >> Maybe we can factor our/unify the handling in both code paths.
> >
> > I am going to have to look at this further. If I understand correctly
> > you are asking me to add all of the memory section checks that are in
> > virtio_balloon_handle_output correct?
> >
> > I'm not sure it makes sense with the scatterlist approach I have taken
> > here. All of the mappings were created as a scatterlist of writable
> > DMA mappings unlike the regular balloon which is just stuffing PFN
> > numbers into an array and then sending the array across. As such I
> > would think there are already such protections in place. For instance,
> > you wouldn't want to let virtio-net map the BIOS and request data be
> > written into that either, correct? So I am assuming there is something
> > there to prevent that already isn't there?
>
> Good point, let's find out :)

Okay, so I believe I figured it out. From what I can tell there is a
call in address_space_map that will determine if we can directly write
to the page or not. However it looks like there might be one minor
issue as it is assuming it can write directly to ROM devices and that
isn't correct. I will add a patch to my set to address that.

Other than that the behavior seems to be that a bounce buffer will be
allocated on the first instance of a page backed by ROM or anything
other than RAM, and after that it will return NULL until the bounce
buffer is freed. If we start getting NULLs the mapping will be aborted
and we wouldn't even get into this code path. After we unmap the
region it will attempt to use address_space_write which should fail
for any region that isn't meant to be written to, or it will copy
zeros out of the bounce buffer into the region if it is writable via
address_space_write.

So the DMA mapping API in virtqueue_pop/virtqueue_push will take care
of doing the right stuff for the mappings in the case that the guest
is trying to do something really stupid, at least after I address the
direct write access to rom_device regions.

- Alex



Re: [PATCH v18 QEMU 3/3] virtio-balloon: Provide a interface for free page reporting

2020-04-09 Thread Alexander Duyck
On Thu, Apr 9, 2020 at 12:44 AM David Hildenbrand  wrote:
>
> On 09.04.20 00:55, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > Add support for what I am referring to as "free page reporting".
>
> "Add support for "free page reporting".
>
> > Basically the idea is to function very similar to how the balloon works
> > in that we basically end up madvising the page as not being used. However
>
> I'd get rid of one "basically".
>
> > we don't really need to bother with any deflate type logic since the page
> > will be faulted back into the guest when it is read or written to.
> >
> > This is meant to be a simplification of the existing balloon interface
> > to use for providing hints to what memory needs to be freed. I am assuming
>
> It's not really a simplification, it's something new. It's a new way of
> letting the guest automatically report free pages to the hypervisor, so
> the hypervisor can reuse them. In contrast to inflate/deflate, that's
> triggered via the hypervisor explicitly.
>
> > this is safe to do as the deflate logic does not actually appear to do very
> > much other than tracking what subpages have been released and which ones
> > haven't.
>
> "I assume this is safe" does not sound very confident. Can we just drop
> the last sentence?

Agreed. I will make the requested changes to the patch description.

>
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   48 
> > +++-
> >  include/hw/virtio/virtio-balloon.h |2 +-
> >  2 files changed, 47 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 1c6d36a29a04..297b267198ac 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,42 @@ static void balloon_stats_set_poll_interval(Object 
> > *obj, Visitor *v,
> >  balloon_stats_change_timer(s, 0);
> >  }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > +VirtQueueElement *elem;
> > +
> > +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
> > +unsigned int i;
> > +
> > +for (i = 0; i < elem->in_num; i++) {
> > +void *addr = elem->in_sg[i].iov_base;
> > +size_t size = elem->in_sg[i].iov_len;
> > +ram_addr_t ram_offset;
> > +size_t rb_page_size;
> > +RAMBlock *rb;
> > +
> > +if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > +continue;
> > +}
>
> These checks are not sufficient. See virtio_balloon_handle_output(),
> where we e.g., check that somebody doesn't try to discard the bios.
>
> Maybe we can factor our/unify the handling in both code paths.

I am going to have to look at this further. If I understand correctly
you are asking me to add all of the memory section checks that are in
virtio_balloon_handle_output correct?

I'm not sure it makes sense with the scatterlist approach I have taken
here. All of the mappings were created as a scatterlist of writable
DMA mappings unlike the regular balloon which is just stuffing PFN
numbers into an array and then sending the array across. As such I
would think there are already such protections in place. For instance,
you wouldn't want to let virtio-net map the BIOS and request data be
written into that either, correct? So I am assuming there is something
there to prevent that already isn't there?

> > +
> > +rb = qemu_ram_block_from_host(addr, false, _offset);
> > +rb_page_size = qemu_ram_pagesize(rb);
> > +
> > +/* For now we will simply ignore unaligned memory regions */
> > +if ((ram_offset | size) & (rb_page_size - 1)) {
>
> "!QEMU_IS_ALIGNED()" please to make this easier to read.

done.

> > +continue;
> > +}
> > +
> > +ram_block_discard_range(rb, ram_offset, size);
> > +}
> > +
> > +virtqueue_push(vq, elem, 0);
> > +virtio_notify(vdev, vq);
> > +g_free(elem);
> > +}
> > +}
> > +
>
> [...]
>
> >  if (virtio_has_feature(s->host_features,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >  s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
> > @@ -940,6 +982,8 @@ static Property virtio_balloon_properties[] = {
> >   */
> >  DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
> >   qemu_4_0_config_size, false),
> > +DEFINE_PROP_BIT("unused-page-reporting", VirtIOBalloon, host_features,
>
> "free-page-reporting"

Thanks for catching that. It has been a while since that was renamed
and I must have let it slip through the cracks.

- Alex



Re: [PATCH v18 QEMU 2/3] virtio-balloon: Add support for providing free page reports to host

2020-04-09 Thread Alexander Duyck
On Thu, Apr 9, 2020 at 12:36 AM David Hildenbrand  wrote:
>
> On 09.04.20 00:55, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > Add support for the page reporting feature provided by virtio-balloon.
> > Reporting differs from the regular balloon functionality in that is is
> > much less durable than a standard memory balloon. Instead of creating a
> > list of pages that cannot be accessed the pages are only inaccessible
> > while they are being indicated to the virtio interface. Once the
> > interface has acknowledged them they are placed back into their respective
> > free lists and are once again accessible by the guest system.
> >
> > Unlike a standard balloon we don't inflate and deflate the pages. Instead
> > we perform the reporting, and once the reporting is completed it is
> > assumed that the page has been dropped from the guest and will be faulted
> > back in the next time the page is accessed.
> >
> > This patch is a subset of the UAPI patch that was submitted for the Linux
> > kernel. The original patch can be found at:
> > https://lore.kernel.org/lkml/20200211224657.29318.68624.stgit@localhost.localdomain/
>
> You don't need all these comments.

Sorry about that. Those are basically the same comments from the
original upstream patch. I just wasn't aware of the process so I just
copied that over and added the comment/link to the original patch.

> Usually we do
>
> "linux-headers: update to contain virito-balloon free page reporting
>
> Let's sync the latest upstream changes for free page reporting. To be
> replaced by a full linux header sync.
>
> Signed-off-by: Alexander Duyck 
> "
>
> mst will replace this by a full header sync (if necessary) when sending
> it upstream

I will update the patch description.

Thanks.

- Alex



[PATCH v18 QEMU 2/3] virtio-balloon: Add support for providing free page reports to host

2020-04-08 Thread Alexander Duyck
From: Alexander Duyck 

Add support for the page reporting feature provided by virtio-balloon.
Reporting differs from the regular balloon functionality in that is is
much less durable than a standard memory balloon. Instead of creating a
list of pages that cannot be accessed the pages are only inaccessible
while they are being indicated to the virtio interface. Once the
interface has acknowledged them they are placed back into their respective
free lists and are once again accessible by the guest system.

Unlike a standard balloon we don't inflate and deflate the pages. Instead
we perform the reporting, and once the reporting is completed it is
assumed that the page has been dropped from the guest and will be faulted
back in the next time the page is accessed.

This patch is a subset of the UAPI patch that was submitted for the Linux
kernel. The original patch can be found at:
https://lore.kernel.org/lkml/20200211224657.29318.68624.stgit@localhost.localdomain/

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..1c5f6d6f2de6 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v18 QEMU 0/3] virtio-balloon: add support for providing free page reporting

2020-04-08 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series[1] back on February 11th 2020,
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I thought I should resubmit the QEMU side of
this for inclusion so that the feature will be available in QEMU hopefully
by the time the 5.7 kernel tree is released.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

---

Alexander Duyck (3):
  virtio-balloon: Implement support for page poison tracking feature
  virtio-balloon: Add support for providing free page reports to host
  virtio-balloon: Provide a interface for free page reporting


 hw/virtio/virtio-balloon.c  |   70 ++-
 include/hw/virtio/virtio-balloon.h  |3 +
 include/standard-headers/linux/virtio_balloon.h |1 
 3 files changed, 69 insertions(+), 5 deletions(-)

--



[PATCH v18 QEMU 1/3] virtio-balloon: Implement support for page poison tracking feature

2020-04-08 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.

Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   26 ++
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..1c6d36a29a04 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
+/*
+ * If page poisoning is enabled then we probably shouldn't bother with
+ * the hinting since the poisoning will dirty the page and invalidate
+ * the work we are doing anyway.
+ */
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+return;
+}
+
 if (s->free_page_report_cmd_id == UINT_MAX) {
 s->free_page_report_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 if (s->qemu_4_0_config_size) {
 return sizeof(struct virtio_balloon_config);
 }
-if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
+virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 return sizeof(struct virtio_balloon_config);
 }
-if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-return offsetof(struct virtio_balloon_config, poison_val);
-}
 return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
 }
 
@@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
 config.free_page_report_cmd_id =
@@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = virtio_vdev_has_feature(vdev,
+  VIRTIO_BALLOON_F_PAGE_POISON) ?
+  le32_to_cpu(config.poison_val) : 0;
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
+}
 
 return f;
 }
@@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index d1c968d2376e..7fe78e5c14d7 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v18 QEMU 3/3] virtio-balloon: Provide a interface for free page reporting

2020-04-08 Thread Alexander Duyck
From: Alexander Duyck 

Add support for what I am referring to as "free page reporting".
Basically the idea is to function very similar to how the balloon works
in that we basically end up madvising the page as not being used. However
we don't really need to bother with any deflate type logic since the page
will be faulted back into the guest when it is read or written to.

This is meant to be a simplification of the existing balloon interface
to use for providing hints to what memory needs to be freed. I am assuming
this is safe to do as the deflate logic does not actually appear to do very
much other than tracking what subpages have been released and which ones
haven't.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   48 +++-
 include/hw/virtio/virtio-balloon.h |2 +-
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1c6d36a29a04..297b267198ac 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,42 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+size_t rb_page_size;
+RAMBlock *rb;
+
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+continue;
+}
+
+rb = qemu_ram_block_from_host(addr, false, _offset);
+rb_page_size = qemu_ram_pagesize(rb);
+
+/* For now we will simply ignore unaligned memory regions */
+if ((ram_offset | size) & (rb_page_size - 1)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -628,7 +664,8 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 return sizeof(struct virtio_balloon_config);
 }
 if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
-virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(features, VIRTIO_BALLOON_F_REPORTING)) {
 return sizeof(struct virtio_balloon_config);
 }
 return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
@@ -717,7 +754,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
-if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
 virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
 }
 
@@ -807,6 +845,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -940,6 +982,8 @@ static Property virtio_balloon_properties[] = {
  */
 DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
  qemu_4_0_config_size, false),
+DEFINE_PROP_BIT("unused-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, true),
 DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 7fe78e5c14d7..db5bf7127112 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
 
 typedef struct VirtIOBalloon {
 VirtIODevice parent_obj;
-VirtQueue *ivq, *dvq, *svq, *f

[PATCH v17 RESUBMIT QEMU 3/3] virtio-balloon: Provide a interface for free page reporting

2020-04-08 Thread Alexander Duyck
From: Alexander Duyck 

Add support for what I am referring to as "free page reporting".
Basically the idea is to function very similar to how the balloon works
in that we basically end up madvising the page as not being used. However
we don't really need to bother with any deflate type logic since the page
will be faulted back into the guest when it is read or written to.

This is meant to be a simplification of the existing balloon interface
to use for providing hints to what memory needs to be freed. I am assuming
this is safe to do as the deflate logic does not actually appear to do very
much other than tracking what subpages have been released and which ones
haven't.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   46 ++--
 include/hw/virtio/virtio-balloon.h |2 +-
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 455d85b7082f..5faafd2f62ac 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,40 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+   unsigned int i;
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+size_t rb_page_size;
+RAMBlock *rb;
+
+if (qemu_balloon_is_inhibited() || dev->poison_val)
+continue;
+
+rb = qemu_ram_block_from_host(addr, false, _offset);
+rb_page_size = qemu_ram_pagesize(rb);
+
+/* For now we will simply ignore unaligned memory regions */
+if ((ram_offset | size) & (rb_page_size - 1))
+continue;
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -628,7 +662,8 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 return sizeof(struct virtio_balloon_config);
 }
 if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
-virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(features, VIRTIO_BALLOON_F_REPORTING)) {
 return sizeof(struct virtio_balloon_config);
 }
 return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
@@ -716,7 +751,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
-if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
 virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
 }
 
@@ -806,6 +842,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -939,6 +979,8 @@ static Property virtio_balloon_properties[] = {
  */
 DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
  qemu_4_0_config_size, false),
+DEFINE_PROP_BIT("unused-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, true),
 DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 7fe78e5c14d7..db5bf7127112 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
 
 typedef struct VirtIOBalloon {
 VirtIODevice parent_obj;
-VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+VirtQueue *ivq, *dvq, *svq, *fr

[PATCH v17 RESUBMIT QEMU 1/3] virtio-balloon: Implement support for page poison tracking feature

2020-04-08 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.

Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   25 +
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..455d85b7082f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
+/*
+ * If page poisoning is enabled then we probably shouldn't bother with
+ * the hinting since the poisoning will dirty the page and invalidate
+ * the work we are doing anyway.
+ */
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+return;
+}
+
 if (s->free_page_report_cmd_id == UINT_MAX) {
 s->free_page_report_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 if (s->qemu_4_0_config_size) {
 return sizeof(struct virtio_balloon_config);
 }
-if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
+virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 return sizeof(struct virtio_balloon_config);
 }
-if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-return offsetof(struct virtio_balloon_config, poison_val);
-}
 return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
 }
 
@@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
 config.free_page_report_cmd_id =
@@ -697,6 +705,8 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = virtio_vdev_has_feature(vdev, 
VIRTIO_BALLOON_F_PAGE_POISON) ? 
+  le32_to_cpu(config.poison_val) : 0;
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -706,6 +716,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
+}
 
 return f;
 }
@@ -854,6 +867,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +931,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index d1c968d2376e..7fe78e5c14d7 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v17 RESUBMIT QEMU 2/3] virtio-balloon: Add support for providing free page reports to host

2020-04-08 Thread Alexander Duyck
From: Alexander Duyck 

Add support for the page reporting feature provided by virtio-balloon.
Reporting differs from the regular balloon functionality in that is is
much less durable than a standard memory balloon. Instead of creating a
list of pages that cannot be accessed the pages are only inaccessible
while they are being indicated to the virtio interface. Once the
interface has acknowledged them they are placed back into their respective
free lists and are once again accessible by the guest system.

Unlike a standard balloon we don't inflate and deflate the pages. Instead
we perform the reporting, and once the reporting is completed it is
assumed that the page has been dropped from the guest and will be faulted
back in the next time the page is accessed.

This patch is a subset of the UAPI patch that was submitted for the Linux
kernel. The original patch can be found at:
https://lore.kernel.org/lkml/20200211224657.29318.68624.stgit@localhost.localdomain/

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..1c5f6d6f2de6 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v17 RESUBMIT QEMU 0/3] virtio-balloon: add support for providing free page reporting

2020-04-08 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series[1] back on February 11th 2020,
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I thought I should resubmit the QEMU side of
this for inclusion so that the feature will be available in QEMU hopefully
by the time the 5.7 kernel tree is released.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

---

Alexander Duyck (3):
  virtio-balloon: Implement support for page poison tracking feature
  virtio-balloon: Add support for providing free page reports to host
  virtio-balloon: Provide a interface for free page reporting


 hw/virtio/virtio-balloon.c  |   67 ++-
 include/hw/virtio/virtio-balloon.h  |3 +
 include/standard-headers/linux/virtio_balloon.h |1 
 3 files changed, 66 insertions(+), 5 deletions(-)

--



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-02-07 Thread Alexander Duyck
On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin <m...@redhat.com> wrote:
> On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
>> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei@intel.com> wrote:
>> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> >> > > The virtual IOMMU isn't supported by the accelerators for now.
>> >> > > Because vhost-user currently lacks of an efficient way to share
>> >> > > the IOMMU table in VM to vhost backend. That's why the software
>> >> > > implementation of virtual IOMMU support in vhost-user backend
>> >> > > can't support dynamic mapping well.
>> >> > What exactly is meant by that? vIOMMU seems to work for people,
>> >> > it's not that fast if you change mappings all the time,
>> >> > but e.g. dpdk within guest doesn't.
>> >>
>> >> Yes, software implementation support dynamic mapping for sure. I think the
>> >> point is, current vhost-user backend can not program hardware IOMMU. So it
>> >> can not let hardware accelerator to cowork with software vIOMMU.
>> >
>> > Vhost-user backend can program hardware IOMMU. Currently
>> > vhost-user backend (or more precisely the vDPA driver in
>> > vhost-user backend) will use the memory table (delivered
>> > by the VHOST_USER_SET_MEM_TABLE message) to program the
>> > IOMMU via vfio, and that's why accelerators can use the
>> > GPA (guest physical address) in descriptors directly.
>> >
>> > Theoretically, we can use the IOVA mapping info (delivered
>> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
>> > and accelerators will be able to use IOVA. But the problem
>> > is that in vhost-user QEMU won't push all the IOVA mappings
>> > to backend directly. Backend needs to ask for those info
>> > when it meets a new IOVA. Such design and implementation
>> > won't work well for dynamic mappings anyway and couldn't
>> > be supported by hardware accelerators.
>> >
>> >> I think
>> >> that's another call to implement the offloaded path inside qemu which has
>> >> complete support for vIOMMU co-operated VFIO.
>> >
>> > Yes, that's exactly what we want. After revisiting the
>> > last paragraph in the commit message, I found it's not
>> > really accurate. The practicability of dynamic mappings
>> > support is a common issue for QEMU. It also exists for
>> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
>> > map/unmap events, the data path performance couldn't be
>> > high. If we want to thoroughly fix this issue especially
>> > for vfio (hw/vfio in QEMU), we need to have the offload
>> > path Jason mentioned in QEMU. And I think accelerators
>> > could use it too.
>> >
>> > Best regards,
>> > Tiwei Bie
>>
>> I wonder if we couldn't look at coming up with an altered security
>> model for the IOMMU drivers to address some of the performance issues
>> seen with typical hardware IOMMU?
>>
>> In the case of most network devices, we seem to be moving toward a
>> model where the Rx pages are mapped for an extended period of time and
>> see a fairly high rate of reuse. As such pages mapped as being
>> writable or read/write by the device are left mapped for an extended
>> period of time while Tx pages, which are read only, are often
>> mapped/unmapped since they are coming from some other location in the
>> kernel beyond the driver's control.
>>
>> If we were to somehow come up with a model where the read-only(Tx)
>> pages had access to a pre-allocated memory mapped address, and the
>> read/write(descriptor rings), write-only(Rx) pages were provided with
>> dynamic addresses we might be able to come up with a solution that
>> would allow for fairly high network performance while at least
>> protecting from memory corruption. The only issue it would open up is
>> that the device would have the ability to read any/all memory on the
>> guest. I was wondering about doing something like this with the vIOMMU
>> with VFIO for the Intel NICs this way since an interface like igb,
>> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
>> performance under such a model and as long as the writable pages were
>> being tracked by the vIOMMU. It could even allow for live migration
>> support if the vIOMMU provided the info needed for migratable/dirty
>&g

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-02-04 Thread Alexander Duyck
On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie  wrote:
> On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> > > The virtual IOMMU isn't supported by the accelerators for now.
>> > > Because vhost-user currently lacks of an efficient way to share
>> > > the IOMMU table in VM to vhost backend. That's why the software
>> > > implementation of virtual IOMMU support in vhost-user backend
>> > > can't support dynamic mapping well.
>> > What exactly is meant by that? vIOMMU seems to work for people,
>> > it's not that fast if you change mappings all the time,
>> > but e.g. dpdk within guest doesn't.
>>
>> Yes, software implementation support dynamic mapping for sure. I think the
>> point is, current vhost-user backend can not program hardware IOMMU. So it
>> can not let hardware accelerator to cowork with software vIOMMU.
>
> Vhost-user backend can program hardware IOMMU. Currently
> vhost-user backend (or more precisely the vDPA driver in
> vhost-user backend) will use the memory table (delivered
> by the VHOST_USER_SET_MEM_TABLE message) to program the
> IOMMU via vfio, and that's why accelerators can use the
> GPA (guest physical address) in descriptors directly.
>
> Theoretically, we can use the IOVA mapping info (delivered
> by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> and accelerators will be able to use IOVA. But the problem
> is that in vhost-user QEMU won't push all the IOVA mappings
> to backend directly. Backend needs to ask for those info
> when it meets a new IOVA. Such design and implementation
> won't work well for dynamic mappings anyway and couldn't
> be supported by hardware accelerators.
>
>> I think
>> that's another call to implement the offloaded path inside qemu which has
>> complete support for vIOMMU co-operated VFIO.
>
> Yes, that's exactly what we want. After revisiting the
> last paragraph in the commit message, I found it's not
> really accurate. The practicability of dynamic mappings
> support is a common issue for QEMU. It also exists for
> vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> map/unmap events, the data path performance couldn't be
> high. If we want to thoroughly fix this issue especially
> for vfio (hw/vfio in QEMU), we need to have the offload
> path Jason mentioned in QEMU. And I think accelerators
> could use it too.
>
> Best regards,
> Tiwei Bie

I wonder if we couldn't look at coming up with an altered security
model for the IOMMU drivers to address some of the performance issues
seen with typical hardware IOMMU?

In the case of most network devices, we seem to be moving toward a
model where the Rx pages are mapped for an extended period of time and
see a fairly high rate of reuse. As such pages mapped as being
writable or read/write by the device are left mapped for an extended
period of time while Tx pages, which are read only, are often
mapped/unmapped since they are coming from some other location in the
kernel beyond the driver's control.

If we were to somehow come up with a model where the read-only(Tx)
pages had access to a pre-allocated memory mapped address, and the
read/write(descriptor rings), write-only(Rx) pages were provided with
dynamic addresses we might be able to come up with a solution that
would allow for fairly high network performance while at least
protecting from memory corruption. The only issue it would open up is
that the device would have the ability to read any/all memory on the
guest. I was wondering about doing something like this with the vIOMMU
with VFIO for the Intel NICs this way since an interface like igb,
ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
performance under such a model and as long as the writable pages were
being tracked by the vIOMMU. It could even allow for live migration
support if the vIOMMU provided the info needed for migratable/dirty
page tracking and we held off on migrating any of the dynamically
mapped pages until after they were either unmapped or an FLR reset the
device.

Thanks.

- Alex



Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-12 Thread Alexander Duyck
On Sat, Jun 11, 2016 at 8:03 PM, Zhou Jie <zhoujie2...@cn.fujitsu.com> wrote:
> Hi, Alex
>
>
> On 2016/6/9 23:39, Alexander Duyck wrote:
>>
>> On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie <zhoujie2...@cn.fujitsu.com>
>> wrote:
>>>
>>> TO Alex
>>> TO Michael
>>>
>>>In your solution you add a emulate PCI bridge to act as
>>>a bridge between direct assigned devices and the host bridge.
>>>Do you mean put all direct assigned devices to
>>>one emulate PCI bridge?
>>>If yes, this maybe bring some problems.
>>>
>>>We are writing a patchset to support aer feature in qemu.
>>>When assigning a vfio device with AER enabled, we must check whether
>>>the device supports a host bus reset (ie. hot reset) as this may be
>>>used by the guest OS in order to recover the device from an AER
>>>error.
>>>QEMU must therefore have the ability to perform a physical
>>>host bus reset using the existing vfio APIs in response to a virtual
>>>bus reset in the VM.
>>>A physical bus reset affects all of the devices on the host bus.
>>>Therefore all physical devices affected by a bus reset must be
>>>configured on the same virtual bus in the VM.
>>>And no devices unaffected by the bus reset,
>>>be configured on the same virtual bus.
>>>
>>>http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html
>>>
>>> Sincerely,
>>> Zhou Jie
>>
>>
>> That makes sense, but I don't think you have to worry much about this
>> at this point at least on my side as this was mostly just theory and I
>> haven't had a chance to put any of it into practice as of yet.
>>
>> My idea has been evolving on this for a while.  One thought I had is
>> that we may want to have something like an emulated IOMMU and if
>> possible we would want to split it up over multiple domains just so we
>> can be certain that the virtual interfaces and the physical ones
>> existed in separate domains.  In regards to your concerns perhaps what
>> we could do is put each assigned device into its own domain to prevent
>> them from affecting each other.  To that end we could probably break
>> things up so that each device effectively lives in its own PCIe slot
>> in the emulated system.  Then when we start a migration of the guest
>> the assigned device domains would then have to be tracked for unmap
>> and sync calls when the direction is from the device.
>>
>> I will keep your concerns in mind in the future when I get some time
>> to look at exploring this solution further.
>>
>> - Alex
>
>
> I am thinking about the practice of migration of passthrough device.
>
> In your solution, you use a vendor specific configuration space to
> negotiate with guest.
> If you put each assigned device into its own domain,
> how can qemu negotiate with guest?
> Add the vendor specific configuration space to every pci bus which
> is assigned a passthrough device?

This is kind of the direction I was thinking of heading in, so yes.
Basically in my mind we should be emulating a PCIe hierarchy if we
want so support device assignment.  That way we can already make use
of things like hot-plug and AER natively.  So if we have a root port
assigned to each assigned device we should be able to place some extra
logic there to handle things like this.

- Alex



Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-09 Thread Alexander Duyck
On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie  wrote:
> TO Alex
> TO Michael
>
>In your solution you add a emulate PCI bridge to act as
>a bridge between direct assigned devices and the host bridge.
>Do you mean put all direct assigned devices to
>one emulate PCI bridge?
>If yes, this maybe bring some problems.
>
>We are writing a patchset to support aer feature in qemu.
>When assigning a vfio device with AER enabled, we must check whether
>the device supports a host bus reset (ie. hot reset) as this may be
>used by the guest OS in order to recover the device from an AER
>error.
>QEMU must therefore have the ability to perform a physical
>host bus reset using the existing vfio APIs in response to a virtual
>bus reset in the VM.
>A physical bus reset affects all of the devices on the host bus.
>Therefore all physical devices affected by a bus reset must be
>configured on the same virtual bus in the VM.
>And no devices unaffected by the bus reset,
>be configured on the same virtual bus.
>
>http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html
>
> Sincerely,
> Zhou Jie

That makes sense, but I don't think you have to worry much about this
at this point at least on my side as this was mostly just theory and I
haven't had a chance to put any of it into practice as of yet.

My idea has been evolving on this for a while.  One thought I had is
that we may want to have something like an emulated IOMMU and if
possible we would want to split it up over multiple domains just so we
can be certain that the virtual interfaces and the physical ones
existed in separate domains.  In regards to your concerns perhaps what
we could do is put each assigned device into its own domain to prevent
them from affecting each other.  To that end we could probably break
things up so that each device effectively lives in its own PCIe slot
in the emulated system.  Then when we start a migration of the guest
the assigned device domains would then have to be tracked for unmap
and sync calls when the direction is from the device.

I will keep your concerns in mind in the future when I get some time
to look at exploring this solution further.

- Alex



Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Alexander Duyck
On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin <m...@redhat.com> wrote:
> On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
>> >> The two mechanisms referenced above would likely require coordination with
>> >> QEMU and as such are open to discussion.  I haven't attempted to address
>> >> them as I am not sure there is a consensus as of yet.  My personal
>> >> preference would be to add a vendor-specific configuration block to the
>> >> emulated pci-bridge interfaces created by QEMU that would allow us to
>> >> essentially extend shpc to support guest live migration with pass-through
>> >> devices.
>> >
>> > shpc?
>>
>> That is kind of what I was thinking.  We basically need some mechanism
>> to allow for the host to ask the device to quiesce.  It has been
>> proposed to possibly even look at something like an ACPI interface
>> since I know ACPI is used by QEMU to manage hot-plug in the standard
>> case.
>>
>> - Alex
>
>
> Start by using hot-unplug for this!
>
> Really use your patch guest side, and write host side
> to allow starting migration with the device, but
> defer completing it.

Yeah, I'm fully on board with this idea, though I'm not really working
on this right now since last I knew the folks on this thread from
Intel were working on it.  My patches were mostly meant to be a nudge
in this direction so that we could get away from the driver specific
code.

> So
>
> 1.- host tells guest to start tracking memory writes
> 2.- guest acks
> 3.- migration starts
> 4.- most memory is migrated
> 5.- host tells guest to eject device
> 6.- guest acks
> 7.- stop vm and migrate rest of state
>

Sounds about right.  The only way this differs from what I see as the
final solution for this is that instead of fully ejecting the device
in step 5 the driver would instead pause the device and give the host
something like 10 seconds to stop the VM and resume with the same
device connected if it is available.  We would probably also need to
look at a solution that would force the device to be ejected or abort
prior to starting the migration if it doesn't give us the ack in step
2.

> It will already be a win since hot unplug after migration starts and
> most memory has been migrated is better than hot unplug before migration
> starts.

Right.  Generally the longer the VF can be maintained as a part of the
guest the longer the network performance is improved versus using a
purely virtual interface.

> Then measure downtime and profile. Then we can look at ways
> to quiesce device faster which really means step 5 is replaced
> with "host tells guest to quiesce device and dirty (or just unmap!)
> all memory mapped for write by device".

Step 5 will be the spot where we really need to start modifying
drivers.  Specifically we probably need to go through and clean-up
things so that we can reduce as many of the delays in the driver
suspend/resume path as possible.  I suspect there is quite a bit that
can be done there that would probably also improve boot and shutdown
times since those are also impacted by the devices.

- Alex



Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-04 Thread Alexander Duyck
On Mon, Jan 4, 2016 at 12:41 PM, Konrad Rzeszutek Wilk
<konrad.w...@oracle.com> wrote:
> On Sun, Dec 13, 2015 at 01:28:09PM -0800, Alexander Duyck wrote:
>> This patch set is meant to be the guest side code for a proof of concept
>> involving leaving pass-through devices in the guest during the warm-up
>> phase of guest live migration.  In order to accomplish this I have added a
>
> What does that mean? 'warm-up-phase'?

It is the first phase in a pre-copy migration.
https://en.wikipedia.org/wiki/Live_migration

Basically in this phase all the memory is marked as dirty and then
copied.  Any memory that changes gets marked as dirty as well.
Currently DMA circumvents this as the user space dirty page tracking
isn't able to track DMA.

>> new function called dma_mark_dirty that will mark the pages associated with
>> the DMA transaction as dirty in the case of either an unmap or a
>> sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
>> DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
>> the stop-and-copy phase, however allowing the device to be present should
>> significantly improve the performance of the guest during the warm-up
>> period.
>
> .. if the warm-up phase is short I presume? If the warm-up phase takes
> a long time (busy guest that is of 1TB size) it wouldn't help much as the
> tracking of these DMA's may be quite long?
>
>>
>> This current implementation is very preliminary and there are number of
>> items still missing.  Specifically in order to make this a more complete
>> solution we need to support:
>> 1.  Notifying hypervisor that drivers are dirtying DMA pages received
>
> .. And somehow giving the hypervisor the GPFN so it can retain the PFN in
> the VT-d as long as possible.

Yes, what has happened is that the host went through and marked all
memory as read-only.  So trying to do any operation that requires
write access triggers a page fault which is then used by the host to
track pages that were dirtied.

>> 2.  Bypassing page dirtying when it is not needed.
>
> How would this work with with device doing DMA operations _after_ the 
> migration?
> That is the driver submits and DMA READ.. migrates away, device is unplugged,
> VT-d context is torn down - device does the DMA READ gets an VT-d error...
>
> and what then? How should the device on the other host replay the DMA READ?

The device has to quiesce before the migration can occur.  We cannot
have any DMA mappings still open when we reach the stop-and-copy phase
of the migration.  The solution I have proposed here works for
streaming mappings but doesn't solve the case for things like
dma_alloc_coherent where a bidirectional mapping is maintained between
the CPU and the device.

>>
>> The two mechanisms referenced above would likely require coordination with
>> QEMU and as such are open to discussion.  I haven't attempted to address
>> them as I am not sure there is a consensus as of yet.  My personal
>> preference would be to add a vendor-specific configuration block to the
>> emulated pci-bridge interfaces created by QEMU that would allow us to
>> essentially extend shpc to support guest live migration with pass-through
>> devices.
>
> shpc?

That is kind of what I was thinking.  We basically need some mechanism
to allow for the host to ask the device to quiesce.  It has been
proposed to possibly even look at something like an ACPI interface
since I know ACPI is used by QEMU to manage hot-plug in the standard
case.

- Alex



  1   2   >