Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Konrad Rzeszutek Wilk
> +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
> +/*
> + * Balloon pages special page->mapping.
> + * users must properly allocate and initiliaze an instance of 
> balloon_mapping,

initialize

> + * and set it as the page->mapping for balloon enlisted page instances.
> + *
> + * address_space_operations necessary methods for ballooned pages:
> + *   .migratepage- used to perform balloon's page migration (as is)
> + *   .invalidatepage - used to isolate a page from balloon's page list
> + *   .freepage   - used to reinsert an isolated page to balloon's page 
> list
> + */
> +struct address_space *balloon_mapping;
> +EXPORT_SYMBOL(balloon_mapping);

Why don't you call this kvm_balloon_mapping - and when other balloon
drivers use it, then change it to something more generic. Also at that
future point the other balloon drivers might do it a bit differently so
it might be that will be reworked completly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
On Tue, Jun 26, 2012 at 2:45 PM, Rik van Riel  wrote:
> On 06/26/2012 05:31 PM, Frank Swiderski wrote:
>>
>> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel  wrote:
>
>
>>> The code looks good to me, my only worry is the
>>> code duplication. We now have 5 balloon drivers,
>>> for 4 hypervisors, all implementing everything
>>> from scratch...
>>
>>
>> Do you have any recommendations on this?  I could (I think reasonably
>> so) modify the existing virtio_balloon.c and have it change behavior
>> based on a feature bit or other configuration.  I'm not sure that
>> really addresses the root of what you're pointing out--it's still
>> adding a different implementation, but doing so as an extension of an
>> existing one.
>
>
> Ideally, I believe we would have two balloon
> top parts in a guest (one classical balloon,
> one on the LRU), and four bottom parts (kvm,
> xen, vmware & s390).
>
> That way the virt specific bits of a balloon
> driver would be essentially a ->balloon_page
> and ->release_page callback for pages, as well
> as methods to communicate with the host.
>
> All the management of pages, including stuff
> like putting them on the LRU, or isolating
> them for migration, would be done with the
> same common code, regardless of what virt
> software we are running on.
>
> Of course, that is a substantial amount of
> work and I feel it would be unreasonable to
> block anyone's code on that kind of thing
> (especially considering that your code is good),
> but I do believe the explosion of balloon
> code is a little worrying.
>

Hm, that makes a lot of sense.  That would be a few patches definitely
worth doing, IMHO.  I'm not entirely sure how I feel about inflating
the balloon drivers in the meantime.  Sigh, and I didn't even mean
that as a pun.

fes
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
On Tue, Jun 26, 2012 at 2:47 PM, Michael S. Tsirkin  wrote:
> On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote:
>> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel  wrote:
>> > On 06/26/2012 04:32 PM, Frank Swiderski wrote:
>> >>
>> >> This implementation of a virtio balloon driver uses the page cache to
>> >> "store" pages that have been released to the host.  The communication
>> >> (outside of target counts) is one way--the guest notifies the host when
>> >> it adds a page to the page cache, allowing the host to madvise(2) with
>> >> MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
>> >> (via the regular page reclaim).  This means that inflating the balloon
>> >> is similar to the existing balloon mechanism, but the deflate is
>> >> different--it re-uses existing Linux kernel functionality to
>> >> automatically reclaim.
>> >>
>> >> Signed-off-by: Frank Swiderski
>> >
>> >
>> > It is a great idea, but how can this memory balancing
>> > possibly work if someone uses memory cgroups inside a
>> > guest?
>>
>> Thanks and good point--this isn't something that I considered in the
>> implementation.
>>
>> > Having said that, we currently do not have proper
>> > memory reclaim balancing between cgroups at all, so
>> > requiring that of this balloon driver would be
>> > unreasonable.
>> >
>> > The code looks good to me, my only worry is the
>> > code duplication. We now have 5 balloon drivers,
>> > for 4 hypervisors, all implementing everything
>> > from scratch...
>>
>> Do you have any recommendations on this?  I could (I think reasonably
>> so) modify the existing virtio_balloon.c and have it change behavior
>> based on a feature bit or other configuration.  I'm not sure that
>> really addresses the root of what you're pointing out--it's still
>> adding a different implementation, but doing so as an extension of an
>> existing one.
>>
>> fes
>
> Let's assume it's a feature bit: how would you
> formulate what the feature does *from host point of view*?
>
> --
> MST

In this implementation, the host doesn't keep track of pages in the
balloon, as there is no explicit deflate path.  The host device for
this implementation should merely, for example, MADV_DONTNEED on the
pages sent in an inflate.  Thus, the inflate becomes a notification
that the guest doesn't need those pages mapped in, but that they
should be available if the guest touches them.  In that sense, it's
not a rigid shrink of guest memory.  I'm not sure what I'd call the
feature bit though.

Was that the question you were asking, or did I misread?

fes
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Rafael Aquini
Mel,

First and foremost, thank you for taking the time to review these bits and
provide such valuable feedback.

On Tue, Jun 26, 2012 at 11:17:29AM +0100, Mel Gorman wrote:
> > +/* return 1 if page is part of a guest's memory balloon, 0 otherwise */
> > +static inline int PageBalloon(struct page *page)
> > +{
> > +   return is_balloon_page(page);
> > +}
> 
> bool
> 
> Why is there both is_balloon_page and PageBalloon? 
> 
> is_ballon_page is so simple it should just be a static inline here
> 
> extern struct address_space *balloon_mapping;
> static inline bool is_balloon_page(page)
> {
>   return page->mapping == balloon_mapping;
> }
>   
I was thinking about sustain the same syntax other page tests utilize,
but I rather stick to your suggestion on this one.

 
> >  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> > @@ -312,6 +313,14 @@ isolate_migratepages_range(struct zone *zone, struct 
> > compact_control *cc,
> > continue;
> > }
> >  
> > +   /*
> > +* For ballooned pages, we need to isolate them before testing
> > +* for PageLRU, as well as skip the LRU page isolation steps.
> > +*/
> 
> This says what, but not why.
> 
> I didn't check the exact mechanics of a balloon page but I expect it's that
> balloon pages are not on the LRU. If they are on the LRU, that's pretty dumb.
> 
> 
> /*
>  * Balloon pages can be migrated but are not on the LRU. Isolate
>  * them before LRU checks.
>  */
> 
> 
> It would be nicer to do this without gotos
> 
> /*
>  * It is possible to migrate LRU pages and balloon pages. Skip
>  * any other type of page
>  */
> if (is_balloon_page(page)) {
>   if (!isolate_balloon_page(page))
>   continue;
> } else if (PageLRU(page)) {
>   
> }
> 
> You will need to shuffle things around a little to make it work properly
> but if we handle other page types in the future it will be neater
> overall.
>
I'm glad you've put things this way on this one. Despite I was thinking on 
doing it
the way you suggested, I took the goto approach because I was afraid of doing
otherwise could be considered as an unnecessary radical surgery on established 
code.
Will do it, certainly.


> > +struct address_space *balloon_mapping;
> > +EXPORT_SYMBOL(balloon_mapping);
> > +
> 
> EXPORT_SYMBOL_GPL?
> 
> I don't mind how it is exported as such. I'm idly curious if there are
> external closed modules that use the driver.
> 
To be honest with you, that was picked with no particular case in mind. And, 
since
you've raised this question, I'm also curious. However, after giving a thought
on your feedback, I believe EXPORT_SYMBOL_GPL suits far better.


> > +/* ballooned page id check */
> > +int is_balloon_page(struct page *page)
> > +{
> > +   struct address_space *mapping = page->mapping;
> > +   if (mapping == balloon_mapping)
> > +   return 1;
> > +   return 0;
> > +}
> > +
> > +/* __isolate_lru_page() counterpart for a ballooned page */
> > +int isolate_balloon_page(struct page *page)
> > +{
> > +   struct address_space *mapping = page->mapping;
> 
> This is a publicly visible function and while your current usage looks
> correct it would not hurt to do something like this;
> 
> if (WARN_ON(!is_page_ballon(page))
>   return 0;
>
Excellent point!
 

> > +   if (mapping->a_ops->invalidatepage) {
> > +   /*
> > +* We can race against move_to_new_page() and stumble across a
> > +* locked 'newpage'. If we succeed on isolating it, the result
> > +* tends to be disastrous. So, we sanely skip PageLocked here.
> > +*/
> > +   if (likely(!PageLocked(page) && get_page_unless_zero(page))) {
> 
> But the page can get locked after this point.
> 
> Would it not be better to do a trylock_page() and unlock the page on
> exit after the isolation completes?
> 
Far better, for sure! thanks (again)


> > @@ -78,7 +78,10 @@ void putback_lru_pages(struct list_head *l)
> > list_del(&page->lru);
> > dec_zone_page_state(page, NR_ISOLATED_ANON +
> > page_is_file_cache(page));
> > -   putback_lru_page(page);
> > +   if (unlikely(PageBalloon(page)))
> > +   VM_BUG_ON(!putback_balloon_page(page));
> 
> Why not BUG_ON?
> 
> What shocked me actually is that VM_BUG_ON code is executed on
> !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd:
> gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of
> VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi,
> was this deliberate?
> 
> Either way, you always want to call putback_ballon_page() so BUG_ON is
> more appropriate although gracefully recovering from the situation and a
> WARN would be better.
> 
Shame on me!
 I was lazy enough to not carefully read VM_BUG_ON's definition and get its
original purpose. Will change it, for sure.


Once more, thank you!

Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote:
> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel  wrote:
> > On 06/26/2012 04:32 PM, Frank Swiderski wrote:
> >>
> >> This implementation of a virtio balloon driver uses the page cache to
> >> "store" pages that have been released to the host.  The communication
> >> (outside of target counts) is one way--the guest notifies the host when
> >> it adds a page to the page cache, allowing the host to madvise(2) with
> >> MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
> >> (via the regular page reclaim).  This means that inflating the balloon
> >> is similar to the existing balloon mechanism, but the deflate is
> >> different--it re-uses existing Linux kernel functionality to
> >> automatically reclaim.
> >>
> >> Signed-off-by: Frank Swiderski
> >
> >
> > It is a great idea, but how can this memory balancing
> > possibly work if someone uses memory cgroups inside a
> > guest?
> 
> Thanks and good point--this isn't something that I considered in the
> implementation.
> 
> > Having said that, we currently do not have proper
> > memory reclaim balancing between cgroups at all, so
> > requiring that of this balloon driver would be
> > unreasonable.
> >
> > The code looks good to me, my only worry is the
> > code duplication. We now have 5 balloon drivers,
> > for 4 hypervisors, all implementing everything
> > from scratch...
> 
> Do you have any recommendations on this?  I could (I think reasonably
> so) modify the existing virtio_balloon.c and have it change behavior
> based on a feature bit or other configuration.  I'm not sure that
> really addresses the root of what you're pointing out--it's still
> adding a different implementation, but doing so as an extension of an
> existing one.
> 
> fes

Let's assume it's a feature bit: how would you
formulate what the feature does *from host point of view*?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Rik van Riel

On 06/26/2012 05:31 PM, Frank Swiderski wrote:

On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel  wrote:



The code looks good to me, my only worry is the
code duplication. We now have 5 balloon drivers,
for 4 hypervisors, all implementing everything
from scratch...


Do you have any recommendations on this?  I could (I think reasonably
so) modify the existing virtio_balloon.c and have it change behavior
based on a feature bit or other configuration.  I'm not sure that
really addresses the root of what you're pointing out--it's still
adding a different implementation, but doing so as an extension of an
existing one.


Ideally, I believe we would have two balloon
top parts in a guest (one classical balloon,
one on the LRU), and four bottom parts (kvm,
xen, vmware & s390).

That way the virt specific bits of a balloon
driver would be essentially a ->balloon_page
and ->release_page callback for pages, as well
as methods to communicate with the host.

All the management of pages, including stuff
like putting them on the LRU, or isolating
them for migration, would be done with the
same common code, regardless of what virt
software we are running on.

Of course, that is a substantial amount of
work and I feel it would be unreasonable to
block anyone's code on that kind of thing
(especially considering that your code is good),
but I do believe the explosion of balloon
code is a little worrying.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote:
> This implementation of a virtio balloon driver uses the page cache to
> "store" pages that have been released to the host.  The communication
> (outside of target counts) is one way--the guest notifies the host when
> it adds a page to the page cache, allowing the host to madvise(2) with
> MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
> (via the regular page reclaim).  This means that inflating the balloon
> is similar to the existing balloon mechanism, but the deflate is
> different--it re-uses existing Linux kernel functionality to
> automatically reclaim.
> 
> Signed-off-by: Frank Swiderski 

I'm pondering this:

Should it really be a separate driver/device ID?
If it behaves the same from host POV, maybe it
should be up to the guest how to inflate/deflate
the balloon internally?

> ---
>  drivers/virtio/Kconfig  |   13 +
>  drivers/virtio/Makefile |1 +
>  drivers/virtio/virtio_fileballoon.c |  636 
> +++
>  include/linux/virtio_balloon.h  |9 +
>  include/linux/virtio_ids.h  |1 +
>  5 files changed, 660 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/virtio/virtio_fileballoon.c
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index f38b17a..cffa2a7 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -35,6 +35,19 @@ config VIRTIO_BALLOON
>  
>If unsure, say M.
>  
> +config VIRTIO_FILEBALLOON
> + tristate "Virtio page cache-backed balloon driver"
> + select VIRTIO
> + select VIRTIO_RING
> + ---help---
> +  This driver supports decreasing and automatically reclaiming the
> +  memory within a guest VM.  Unlike VIRTIO_BALLOON, this driver instead
> +  tries to maintain a specific target balloon size using the page cache.
> +  This allows the guest to implicitly deflate the balloon by flushing
> +  pages from the cache and touching the page.
> +
> +  If unsure, say N.
> +
>   config VIRTIO_MMIO
>   tristate "Platform bus driver for memory mapped virtio devices 
> (EXPERIMENTAL)"
>   depends on HAS_IOMEM && EXPERIMENTAL
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 5a4c63c..7ca0a3f 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> +obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o
> diff --git a/drivers/virtio/virtio_fileballoon.c 
> b/drivers/virtio/virtio_fileballoon.c
> new file mode 100644
> index 000..ff252ec
> --- /dev/null
> +++ b/drivers/virtio/virtio_fileballoon.c
> @@ -0,0 +1,636 @@
> +/* Virtio file (page cache-backed) balloon implementation, inspired by
> + * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty 
> Russel's
> + * implementation.
> + *
> + * This implementation of the virtio balloon driver re-uses the page cache to
> + * allow memory consumed by inflating the balloon to be reclaimed by linux.  
> It
> + * creates and mounts a bare-bones filesystem containing a single inode.  
> When
> + * the host requests the balloon to inflate, it does so by "reading" pages at
> + * offsets into the inode mapping's page_tree.  The host is notified when the
> + * pages are added to the page_tree, allowing it (the host) to madvise(2) the
> + * corresponding host memory, reducing the RSS of the virtual machine.  In 
> this
> + * implementation, the host is only notified when a page is added to the
> + * balloon.  Reclaim happens under the existing TTFP logic, which flushes 
> unused
> + * pages in the page cache.  If the host used MADV_DONTNEED, then when the 
> guest
> + * uses the page, the zero page will be mapped in, allowing automatic (and 
> fast,
> + * compared to requiring a host notification via a virtio queue to get memory
> + * back) reclaim.
> + *
> + *  Copyright 2008 Rusty Russell IBM Corporation
> + *  Copyright 2011 Frank Swiderski Google Inc
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, write to the Free Software
> + *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  
> USA
> + */
> +#include 
> +#include 
> 

Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel  wrote:
> On 06/26/2012 04:32 PM, Frank Swiderski wrote:
>>
>> This implementation of a virtio balloon driver uses the page cache to
>> "store" pages that have been released to the host.  The communication
>> (outside of target counts) is one way--the guest notifies the host when
>> it adds a page to the page cache, allowing the host to madvise(2) with
>> MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
>> (via the regular page reclaim).  This means that inflating the balloon
>> is similar to the existing balloon mechanism, but the deflate is
>> different--it re-uses existing Linux kernel functionality to
>> automatically reclaim.
>>
>> Signed-off-by: Frank Swiderski
>
>
> It is a great idea, but how can this memory balancing
> possibly work if someone uses memory cgroups inside a
> guest?

Thanks and good point--this isn't something that I considered in the
implementation.

> Having said that, we currently do not have proper
> memory reclaim balancing between cgroups at all, so
> requiring that of this balloon driver would be
> unreasonable.
>
> The code looks good to me, my only worry is the
> code duplication. We now have 5 balloon drivers,
> for 4 hypervisors, all implementing everything
> from scratch...

Do you have any recommendations on this?  I could (I think reasonably
so) modify the existing virtio_balloon.c and have it change behavior
based on a feature bit or other configuration.  I'm not sure that
really addresses the root of what you're pointing out--it's still
adding a different implementation, but doing so as an extension of an
existing one.

fes
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
This implementation of a virtio balloon driver uses the page cache to
"store" pages that have been released to the host.  The communication
(outside of target counts) is one way--the guest notifies the host when
it adds a page to the page cache, allowing the host to madvise(2) with
MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
(via the regular page reclaim).  This means that inflating the balloon
is similar to the existing balloon mechanism, but the deflate is
different--it re-uses existing Linux kernel functionality to
automatically reclaim.

Signed-off-by: Frank Swiderski 
---
 drivers/virtio/Kconfig  |   13 +
 drivers/virtio/Makefile |1 +
 drivers/virtio/virtio_fileballoon.c |  636 +++
 include/linux/virtio_balloon.h  |9 +
 include/linux/virtio_ids.h  |1 +
 5 files changed, 660 insertions(+), 0 deletions(-)
 create mode 100644 drivers/virtio/virtio_fileballoon.c

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index f38b17a..cffa2a7 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -35,6 +35,19 @@ config VIRTIO_BALLOON
 
 If unsure, say M.
 
+config VIRTIO_FILEBALLOON
+   tristate "Virtio page cache-backed balloon driver"
+   select VIRTIO
+   select VIRTIO_RING
+   ---help---
+This driver supports decreasing and automatically reclaiming the
+memory within a guest VM.  Unlike VIRTIO_BALLOON, this driver instead
+tries to maintain a specific target balloon size using the page cache.
+This allows the guest to implicitly deflate the balloon by flushing
+pages from the cache and touching the page.
+
+If unsure, say N.
+
  config VIRTIO_MMIO
tristate "Platform bus driver for memory mapped virtio devices 
(EXPERIMENTAL)"
depends on HAS_IOMEM && EXPERIMENTAL
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 5a4c63c..7ca0a3f 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o
diff --git a/drivers/virtio/virtio_fileballoon.c 
b/drivers/virtio/virtio_fileballoon.c
new file mode 100644
index 000..ff252ec
--- /dev/null
+++ b/drivers/virtio/virtio_fileballoon.c
@@ -0,0 +1,636 @@
+/* Virtio file (page cache-backed) balloon implementation, inspired by
+ * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty Russel's
+ * implementation.
+ *
+ * This implementation of the virtio balloon driver re-uses the page cache to
+ * allow memory consumed by inflating the balloon to be reclaimed by linux.  It
+ * creates and mounts a bare-bones filesystem containing a single inode.  When
+ * the host requests the balloon to inflate, it does so by "reading" pages at
+ * offsets into the inode mapping's page_tree.  The host is notified when the
+ * pages are added to the page_tree, allowing it (the host) to madvise(2) the
+ * corresponding host memory, reducing the RSS of the virtual machine.  In this
+ * implementation, the host is only notified when a page is added to the
+ * balloon.  Reclaim happens under the existing TTFP logic, which flushes 
unused
+ * pages in the page cache.  If the host used MADV_DONTNEED, then when the 
guest
+ * uses the page, the zero page will be mapped in, allowing automatic (and 
fast,
+ * compared to requiring a host notification via a virtio queue to get memory
+ * back) reclaim.
+ *
+ *  Copyright 2008 Rusty Russell IBM Corporation
+ *  Copyright 2011 Frank Swiderski Google Inc
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define VIRTBALLOON_PFN_ARRAY_SIZE 256
+
+struct virtio_balloon {
+   struct virtio_device *vdev;
+   struct virtqueue *inflate_vq;
+
+   /* Where the ballooning thread waits for config to change. */
+   wait_queue_head_t config_change;
+
+   /* The thread servicing the balloon. */
+   

Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Rik van Riel

On 06/26/2012 04:32 PM, Frank Swiderski wrote:

This implementation of a virtio balloon driver uses the page cache to
"store" pages that have been released to the host.  The communication
(outside of target counts) is one way--the guest notifies the host when
it adds a page to the page cache, allowing the host to madvise(2) with
MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
(via the regular page reclaim).  This means that inflating the balloon
is similar to the existing balloon mechanism, but the deflate is
different--it re-uses existing Linux kernel functionality to
automatically reclaim.

Signed-off-by: Frank Swiderski


It is a great idea, but how can this memory balancing
possibly work if someone uses memory cgroups inside a
guest?

Having said that, we currently do not have proper
memory reclaim balancing between cgroups at all, so
requiring that of this balloon driver would be
unreasonable.

The code looks good to me, my only worry is the
code duplication. We now have 5 balloon drivers,
for 4 hypervisors, all implementing everything
from scratch...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Andi Kleen
> How is the compiler meant to optimise away "cond" if it's a function
> call?

Inlines can be optimized away.  These tests are usually inlines.

> What did I miss? If nothing, then I will revert this particular change
> and Rafael will need to be sure his patch is not using VM_BUG_ON to call
> a function with side-effects.

Do you have an example where the code is actually different,
or are you just speculating?

FWIW for my config both generates the same code:

size vmlinux-andi-vmbug vmlinux-vmbug-nothing 
   textdata bss dec hex filename
118097041457352 1159168 14426224 dc2070 vmlinux-andi-vmbug
118097041457352 1159168 14426224 dc2070 vmlinux-vmbug-nothing

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Mel Gorman
On Tue, Jun 26, 2012 at 06:52:58PM +0200, Andi Kleen wrote:
> > 
> > What shocked me actually is that VM_BUG_ON code is executed on
> > !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd:
> > gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of
> > VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi,
> > was this deliberate?
> 
> The idea was that the compiler optimizes it away anyways.
> 
> I'm not fully sure what putback_balloon_page does, but if it just tests
> a bit (without non variable test_bit) it should be ok.
> 

This was the definition before

#ifdef CONFIG_DEBUG_VM
#define VM_BUG_ON(cond) BUG_ON(cond)
#else
#define VM_BUG_ON(cond) do { } while (0)
#endif

and now it's

#ifdef CONFIG_DEBUG_VM
#define VM_BUG_ON(cond) BUG_ON(cond)
#else
#define VM_BUG_ON(cond) do { (void)(cond); } while (0)
#endif

How is the compiler meant to optimise away "cond" if it's a function
call?

In the old definition VM_BUG_ON did nothing and the intention was that the
"cond" should never had any side-effects. It was to be used for potentially
expensive tests to catch additional issues in DEBUG_VM kernels. My concern
is that after commit 4e60c86bd that the VM doing these additional checks
unnecesarily with a performance hit. In most cases the checks are small
but in others such as free_pages we are calling virt_addr_valid() which
is heavier.

What did I miss? If nothing, then I will revert this particular change
and Rafael will need to be sure his patch is not using VM_BUG_ON to call
a function with side-effects.

-- 
Mel Gorman
SUSE Labs
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Andi Kleen
> I'm not fully sure what putback_balloon_page does, but if it just tests
> a bit (without non variable test_bit) it should be ok.
 ^

should be "without variable ..."

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Andi Kleen
> 
> What shocked me actually is that VM_BUG_ON code is executed on
> !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd:
> gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of
> VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi,
> was this deliberate?

The idea was that the compiler optimizes it away anyways.

I'm not fully sure what putback_balloon_page does, but if it just tests
a bit (without non variable test_bit) it should be ok.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Mel Gorman
On Tue, Jun 26, 2012 at 09:17:43AM -0400, Rik van Riel wrote:
> On 06/25/2012 07:25 PM, Rafael Aquini wrote:
> >This patch introduces helper functions that teach compaction and migration 
> >bits
> >how to cope with pages which are part of a guest memory balloon, in order to
> >make them movable by memory compaction procedures.
> >
> >Signed-off-by: Rafael Aquini
> 
> The function fill_balloon in drivers/virtio/virtio_balloon.c
> should probably add __GFP_MOVABLE to the gfp mask for alloc_pages,
> to keep the pageblock where balloon pages are allocated marked
> as movable.
> 

You're right, but patch 3 is already doing that at the same time the
migration primitives are introduced. It does mean the full series has to
be applied to do anything but I think it's bisect safe.

-- 
Mel Gorman
SUSE Labs
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Rik van Riel

On 06/25/2012 07:25 PM, Rafael Aquini wrote:

This patch introduces helper functions that teach compaction and migration bits
how to cope with pages which are part of a guest memory balloon, in order to
make them movable by memory compaction procedures.

Signed-off-by: Rafael Aquini


The function fill_balloon in drivers/virtio/virtio_balloon.c
should probably add __GFP_MOVABLE to the gfp mask for alloc_pages,
to keep the pageblock where balloon pages are allocated marked
as movable.

--
All rights reversed
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-06-26 Thread Mel Gorman

(apologies if there are excessive typos. I damaged my left hand and
typing is painful).

Adding Andi to cc for question on VM_BUG_ON.

On Mon, Jun 25, 2012 at 08:25:56PM -0300, Rafael Aquini wrote:
> This patch introduces helper functions that teach compaction and migration 
> bits
> how to cope with pages which are part of a guest memory balloon, in order to
> make them movable by memory compaction procedures.
> 
> Signed-off-by: Rafael Aquini 
> ---
>  include/linux/mm.h |   17 +
>  mm/compaction.c|   72 
> 
>  mm/migrate.c   |   30 +-
>  3 files changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b36d08c..360656e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1629,5 +1629,22 @@ static inline unsigned int 
> debug_guardpage_minorder(void) { return 0; }
>  static inline bool page_is_guard(struct page *page) { return false; }
>  #endif /* CONFIG_DEBUG_PAGEALLOC */
>  
> +#if (defined(CONFIG_VIRTIO_BALLOON) || \
> + defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
> +extern int is_balloon_page(struct page *);
> +extern int isolate_balloon_page(struct page *);
> +extern int putback_balloon_page(struct page *);
> +
> +/* return 1 if page is part of a guest's memory balloon, 0 otherwise */
> +static inline int PageBalloon(struct page *page)
> +{
> + return is_balloon_page(page);
> +}

bool

Why is there both is_balloon_page and PageBalloon? 

is_ballon_page is so simple it should just be a static inline here

extern struct address_space *balloon_mapping;
static inline bool is_balloon_page(page)
{
return page->mapping == balloon_mapping;
}



> +#else
> +static inline int PageBalloon(struct page *page) { return 0; }
> +static inline int isolate_balloon_page(struct page *page){ return 0; }
> +static inline int putback_balloon_page(struct page *page){ return 0; }
> +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && COMPACTION */
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7ea259d..8835d55 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "internal.h"
>  
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> @@ -312,6 +313,14 @@ isolate_migratepages_range(struct zone *zone, struct 
> compact_control *cc,
>   continue;
>   }
>  
> + /*
> +  * For ballooned pages, we need to isolate them before testing
> +  * for PageLRU, as well as skip the LRU page isolation steps.
> +  */

This says what, but not why.

I didn't check the exact mechanics of a balloon page but I expect it's that
balloon pages are not on the LRU. If they are on the LRU, that's pretty dumb.


/*
 * Balloon pages can be migrated but are not on the LRU. Isolate
 * them before LRU checks.
 */


It would be nicer to do this without gotos

/*
 * It is possible to migrate LRU pages and balloon pages. Skip
 * any other type of page
 */
if (is_balloon_page(page)) {
if (!isolate_balloon_page(page))
continue;
} else if (PageLRU(page)) {

}

You will need to shuffle things around a little to make it work properly
but if we handle other page types in the future it will be neater
overall.


> + if (PageBalloon(page))
> + if (isolate_balloon_page(page))
> + goto isolated_balloon_page;
> +
>   if (!PageLRU(page))
>   continue;
>  
> @@ -338,6 +347,7 @@ isolate_migratepages_range(struct zone *zone, struct 
> compact_control *cc,
>  
>   /* Successfully isolated */
>   del_page_from_lru_list(page, lruvec, page_lru(page));
> +isolated_balloon_page:
>   list_add(&page->lru, migratelist);
>   cc->nr_migratepages++;
>   nr_isolated++;
> @@ -903,4 +913,66 @@ void compaction_unregister_node(struct node *node)
>  }
>  #endif /* CONFIG_SYSFS && CONFIG_NUMA */
>  
> +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
> +/*
> + * Balloon pages special page->mapping.
> + * users must properly allocate and initiliaze an instance of 
> balloon_mapping,
> + * and set it as the page->mapping for balloon enlisted page instances.
> + *
> + * address_space_operations necessary methods for ballooned pages:
> + *   .migratepage- used to perform balloon's page migration (as is)
> + *   .invalidatepage - used to isolate a page from balloon's page list
> + *   .freepage   - used to reinsert an isolated page to balloon's page 
> list
> + */
> +struct address_space *balloon_mapping;
> +EXPORT_SYMBOL(balloon_mapping);
> +

EXPORT_SYMBOL_GPL?

I don't mind how it is exported as such. I'm idly cur

[rfc] virtio-spec: introduce VIRTIO_NET_F_MULTIQUEUE

2012-06-26 Thread Jason Wang
This patch introduces the multiqueue capabilities to virtio net devices. The
number of tx/rx queue pairs available in the device were exposed through config
space, and driver could negotiate the number of pairs it wish to use through
ctrl vq.

Signed-off-by: Jason Wang 
---
 virtio-0.9.5.lyx |  180 --
 1 file changed, 176 insertions(+), 4 deletions(-)

diff --git a/virtio-0.9.5.lyx b/virtio-0.9.5.lyx
index 3c80ecf..480e9c7 100644
--- a/virtio-0.9.5.lyx
+++ b/virtio-0.9.5.lyx
@@ -56,6 +56,7 @@
 \html_math_output 0
 \html_css_as_file 0
 \html_be_strict false
+\author 2090695081 "Jason Wang" 
 \end_header
 
 \begin_body
@@ -3854,11 +3855,22 @@ ID 1
 \end_layout
 
 \begin_layout Description
-Virtqueues 0:receiveq.
+Virtqueues 
+\change_inserted 2090695081 1340693104
+
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted 2090695081 1340693118
+When VIRTIO_NET_F_MULTIQUEUE is not set: 
+\change_unchanged
+0:receiveq.
  1:transmitq.
  2:controlq
 \begin_inset Foot
-status open
+status collapsed
 
 \begin_layout Plain Layout
 Only if VIRTIO_NET_F_CTRL_VQ set
@@ -3867,9 +3879,60 @@ Only if VIRTIO_NET_F_CTRL_VQ set
 \end_inset
 
 
+\change_inserted 2090695081 1340693122
+
 \end_layout
 
 \begin_layout Description
+
+\change_inserted 2090695081 1340693866
+When VIRTIO_NET_F_MULTIQUEUE is set and there's N tx/rx queue pairs: 
0:receiveq1.
+ 1:transmitq1.
+ 2:controlq
+\begin_inset Foot
+status collapsed
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693141
+Only if VIRTIO_NET_F_CTRL_VQ set
+\end_layout
+
+\end_inset
+
+ ...
+ 2N-1
+\begin_inset Foot
+status collapsed
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693284
+2N-2 If VIRTIO_NET_F_CTRL_VQ not set
+\end_layout
+
+\end_inset
+
+:receiveqN.
+ 2N
+\begin_inset Foot
+status collapsed
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693302
+2N-1 If VIRTIO_NET_F_CTRL_VQ is not set
+\end_layout
+
+\end_inset
+
+: transmitqN
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Description
 Feature
 \begin_inset space ~
 \end_inset
@@ -4027,6 +4090,16 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 2090695081 1340692965
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 2090695081 1340693017
+VIRTIO_NET_F_MULTIQUEUE (22) Device has multiple tx/rx queues.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4039,11 +4112,22 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout T
+\change_inserted 2090695081 1340693345
+hree
+\change_deleted 2090695081 1340693344
+wo
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: 
VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 2090695081 1340693398
+ The num queue pairs fields only exist if VIRTIO_NET_F_MULTIQUEUE is set.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4076,6 +4160,17 @@ struct virtio_net_config {
 \begin_layout Plain Layout
 
 u16 status;
+\change_inserted 2090695081 1340692955
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340692962
+
+   u16 num_queue_pairs;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -4527,7 +4622,7 @@ O features are used, the Guest will need to accept 
packets of up to 65550
  So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every buffer in the receive
  queue needs to be at least this length 
 \begin_inset Foot
-status open
+status collapsed
 
 \begin_layout Plain Layout
 Obviously each one can be split across multiple descriptor elements.
@@ -4980,6 +5075,83 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through 
control vq.
 \begin_layout Enumerate
 .
  
+\change_inserted 2090695081 1340693446
+
+\end_layout
+
+\begin_layout Subsection*
+
+\change_inserted 2090695081 1340693500
+Negotiating the number of queue pairs
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1340693733
+If the driver negotiates the VIRTIO_NET_F_MULTIQUEUE (depends on 
VIRTIO_NET_F_CT
+RL_VQ), it can then negotiate the number of queue pairs it wish to use by
+ placing the number in num_queue_pairs field of virtio_net_ctrl_multiqueue
+ through VIRTIO_NET_CTRL_MULTIQUEUE_NUM command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1340693782
+If the driver doesn't negotiate the number, all tx/rx queues were enabled
+ by default.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1340693616
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693620
+
+struct virtio_net