Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
> +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE) > +/* > + * Balloon pages special page->mapping. > + * users must properly allocate and initiliaze an instance of > balloon_mapping, initialize > + * and set it as the page->mapping for balloon enlisted page instances. > + * > + * address_space_operations necessary methods for ballooned pages: > + * .migratepage- used to perform balloon's page migration (as is) > + * .invalidatepage - used to isolate a page from balloon's page list > + * .freepage - used to reinsert an isolated page to balloon's page > list > + */ > +struct address_space *balloon_mapping; > +EXPORT_SYMBOL(balloon_mapping); Why don't you call this kvm_balloon_mapping - and when other balloon drivers use it, then change it to something more generic. Also at that future point the other balloon drivers might do it a bit differently so it might be that will be reworked completly. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] Add a page cache-backed balloon device driver.
On Tue, Jun 26, 2012 at 2:45 PM, Rik van Riel wrote: > On 06/26/2012 05:31 PM, Frank Swiderski wrote: >> >> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel wrote: > > >>> The code looks good to me, my only worry is the >>> code duplication. We now have 5 balloon drivers, >>> for 4 hypervisors, all implementing everything >>> from scratch... >> >> >> Do you have any recommendations on this? I could (I think reasonably >> so) modify the existing virtio_balloon.c and have it change behavior >> based on a feature bit or other configuration. I'm not sure that >> really addresses the root of what you're pointing out--it's still >> adding a different implementation, but doing so as an extension of an >> existing one. > > > Ideally, I believe we would have two balloon > top parts in a guest (one classical balloon, > one on the LRU), and four bottom parts (kvm, > xen, vmware & s390). > > That way the virt specific bits of a balloon > driver would be essentially a ->balloon_page > and ->release_page callback for pages, as well > as methods to communicate with the host. > > All the management of pages, including stuff > like putting them on the LRU, or isolating > them for migration, would be done with the > same common code, regardless of what virt > software we are running on. > > Of course, that is a substantial amount of > work and I feel it would be unreasonable to > block anyone's code on that kind of thing > (especially considering that your code is good), > but I do believe the explosion of balloon > code is a little worrying. > Hm, that makes a lot of sense. That would be a few patches definitely worth doing, IMHO. I'm not entirely sure how I feel about inflating the balloon drivers in the meantime. Sigh, and I didn't even mean that as a pun. fes ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] Add a page cache-backed balloon device driver.
On Tue, Jun 26, 2012 at 2:47 PM, Michael S. Tsirkin wrote: > On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote: >> On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel wrote: >> > On 06/26/2012 04:32 PM, Frank Swiderski wrote: >> >> >> >> This implementation of a virtio balloon driver uses the page cache to >> >> "store" pages that have been released to the host. The communication >> >> (outside of target counts) is one way--the guest notifies the host when >> >> it adds a page to the page cache, allowing the host to madvise(2) with >> >> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit >> >> (via the regular page reclaim). This means that inflating the balloon >> >> is similar to the existing balloon mechanism, but the deflate is >> >> different--it re-uses existing Linux kernel functionality to >> >> automatically reclaim. >> >> >> >> Signed-off-by: Frank Swiderski >> > >> > >> > It is a great idea, but how can this memory balancing >> > possibly work if someone uses memory cgroups inside a >> > guest? >> >> Thanks and good point--this isn't something that I considered in the >> implementation. >> >> > Having said that, we currently do not have proper >> > memory reclaim balancing between cgroups at all, so >> > requiring that of this balloon driver would be >> > unreasonable. >> > >> > The code looks good to me, my only worry is the >> > code duplication. We now have 5 balloon drivers, >> > for 4 hypervisors, all implementing everything >> > from scratch... >> >> Do you have any recommendations on this? I could (I think reasonably >> so) modify the existing virtio_balloon.c and have it change behavior >> based on a feature bit or other configuration. I'm not sure that >> really addresses the root of what you're pointing out--it's still >> adding a different implementation, but doing so as an extension of an >> existing one. >> >> fes > > Let's assume it's a feature bit: how would you > formulate what the feature does *from host point of view*? > > -- > MST In this implementation, the host doesn't keep track of pages in the balloon, as there is no explicit deflate path. The host device for this implementation should merely, for example, MADV_DONTNEED on the pages sent in an inflate. Thus, the inflate becomes a notification that the guest doesn't need those pages mapped in, but that they should be available if the guest touches them. In that sense, it's not a rigid shrink of guest memory. I'm not sure what I'd call the feature bit though. Was that the question you were asking, or did I misread? fes ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
Mel, First and foremost, thank you for taking the time to review these bits and provide such valuable feedback. On Tue, Jun 26, 2012 at 11:17:29AM +0100, Mel Gorman wrote: > > +/* return 1 if page is part of a guest's memory balloon, 0 otherwise */ > > +static inline int PageBalloon(struct page *page) > > +{ > > + return is_balloon_page(page); > > +} > > bool > > Why is there both is_balloon_page and PageBalloon? > > is_ballon_page is so simple it should just be a static inline here > > extern struct address_space *balloon_mapping; > static inline bool is_balloon_page(page) > { > return page->mapping == balloon_mapping; > } > I was thinking about sustain the same syntax other page tests utilize, but I rather stick to your suggestion on this one. > > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > > @@ -312,6 +313,14 @@ isolate_migratepages_range(struct zone *zone, struct > > compact_control *cc, > > continue; > > } > > > > + /* > > +* For ballooned pages, we need to isolate them before testing > > +* for PageLRU, as well as skip the LRU page isolation steps. > > +*/ > > This says what, but not why. > > I didn't check the exact mechanics of a balloon page but I expect it's that > balloon pages are not on the LRU. If they are on the LRU, that's pretty dumb. > > > /* > * Balloon pages can be migrated but are not on the LRU. Isolate > * them before LRU checks. > */ > > > It would be nicer to do this without gotos > > /* > * It is possible to migrate LRU pages and balloon pages. Skip > * any other type of page > */ > if (is_balloon_page(page)) { > if (!isolate_balloon_page(page)) > continue; > } else if (PageLRU(page)) { > > } > > You will need to shuffle things around a little to make it work properly > but if we handle other page types in the future it will be neater > overall. > I'm glad you've put things this way on this one. Despite I was thinking on doing it the way you suggested, I took the goto approach because I was afraid of doing otherwise could be considered as an unnecessary radical surgery on established code. Will do it, certainly. > > +struct address_space *balloon_mapping; > > +EXPORT_SYMBOL(balloon_mapping); > > + > > EXPORT_SYMBOL_GPL? > > I don't mind how it is exported as such. I'm idly curious if there are > external closed modules that use the driver. > To be honest with you, that was picked with no particular case in mind. And, since you've raised this question, I'm also curious. However, after giving a thought on your feedback, I believe EXPORT_SYMBOL_GPL suits far better. > > +/* ballooned page id check */ > > +int is_balloon_page(struct page *page) > > +{ > > + struct address_space *mapping = page->mapping; > > + if (mapping == balloon_mapping) > > + return 1; > > + return 0; > > +} > > + > > +/* __isolate_lru_page() counterpart for a ballooned page */ > > +int isolate_balloon_page(struct page *page) > > +{ > > + struct address_space *mapping = page->mapping; > > This is a publicly visible function and while your current usage looks > correct it would not hurt to do something like this; > > if (WARN_ON(!is_page_ballon(page)) > return 0; > Excellent point! > > + if (mapping->a_ops->invalidatepage) { > > + /* > > +* We can race against move_to_new_page() and stumble across a > > +* locked 'newpage'. If we succeed on isolating it, the result > > +* tends to be disastrous. So, we sanely skip PageLocked here. > > +*/ > > + if (likely(!PageLocked(page) && get_page_unless_zero(page))) { > > But the page can get locked after this point. > > Would it not be better to do a trylock_page() and unlock the page on > exit after the isolation completes? > Far better, for sure! thanks (again) > > @@ -78,7 +78,10 @@ void putback_lru_pages(struct list_head *l) > > list_del(&page->lru); > > dec_zone_page_state(page, NR_ISOLATED_ANON + > > page_is_file_cache(page)); > > - putback_lru_page(page); > > + if (unlikely(PageBalloon(page))) > > + VM_BUG_ON(!putback_balloon_page(page)); > > Why not BUG_ON? > > What shocked me actually is that VM_BUG_ON code is executed on > !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd: > gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of > VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi, > was this deliberate? > > Either way, you always want to call putback_ballon_page() so BUG_ON is > more appropriate although gracefully recovering from the situation and a > WARN would be better. > Shame on me! I was lazy enough to not carefully read VM_BUG_ON's definition and get its original purpose. Will change it, for sure. Once more, thank you!
Re: [PATCH] Add a page cache-backed balloon device driver.
On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote: > On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel wrote: > > On 06/26/2012 04:32 PM, Frank Swiderski wrote: > >> > >> This implementation of a virtio balloon driver uses the page cache to > >> "store" pages that have been released to the host. The communication > >> (outside of target counts) is one way--the guest notifies the host when > >> it adds a page to the page cache, allowing the host to madvise(2) with > >> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit > >> (via the regular page reclaim). This means that inflating the balloon > >> is similar to the existing balloon mechanism, but the deflate is > >> different--it re-uses existing Linux kernel functionality to > >> automatically reclaim. > >> > >> Signed-off-by: Frank Swiderski > > > > > > It is a great idea, but how can this memory balancing > > possibly work if someone uses memory cgroups inside a > > guest? > > Thanks and good point--this isn't something that I considered in the > implementation. > > > Having said that, we currently do not have proper > > memory reclaim balancing between cgroups at all, so > > requiring that of this balloon driver would be > > unreasonable. > > > > The code looks good to me, my only worry is the > > code duplication. We now have 5 balloon drivers, > > for 4 hypervisors, all implementing everything > > from scratch... > > Do you have any recommendations on this? I could (I think reasonably > so) modify the existing virtio_balloon.c and have it change behavior > based on a feature bit or other configuration. I'm not sure that > really addresses the root of what you're pointing out--it's still > adding a different implementation, but doing so as an extension of an > existing one. > > fes Let's assume it's a feature bit: how would you formulate what the feature does *from host point of view*? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] Add a page cache-backed balloon device driver.
On 06/26/2012 05:31 PM, Frank Swiderski wrote: On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel wrote: The code looks good to me, my only worry is the code duplication. We now have 5 balloon drivers, for 4 hypervisors, all implementing everything from scratch... Do you have any recommendations on this? I could (I think reasonably so) modify the existing virtio_balloon.c and have it change behavior based on a feature bit or other configuration. I'm not sure that really addresses the root of what you're pointing out--it's still adding a different implementation, but doing so as an extension of an existing one. Ideally, I believe we would have two balloon top parts in a guest (one classical balloon, one on the LRU), and four bottom parts (kvm, xen, vmware & s390). That way the virt specific bits of a balloon driver would be essentially a ->balloon_page and ->release_page callback for pages, as well as methods to communicate with the host. All the management of pages, including stuff like putting them on the LRU, or isolating them for migration, would be done with the same common code, regardless of what virt software we are running on. Of course, that is a substantial amount of work and I feel it would be unreasonable to block anyone's code on that kind of thing (especially considering that your code is good), but I do believe the explosion of balloon code is a little worrying. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] Add a page cache-backed balloon device driver.
On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote: > This implementation of a virtio balloon driver uses the page cache to > "store" pages that have been released to the host. The communication > (outside of target counts) is one way--the guest notifies the host when > it adds a page to the page cache, allowing the host to madvise(2) with > MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit > (via the regular page reclaim). This means that inflating the balloon > is similar to the existing balloon mechanism, but the deflate is > different--it re-uses existing Linux kernel functionality to > automatically reclaim. > > Signed-off-by: Frank Swiderski I'm pondering this: Should it really be a separate driver/device ID? If it behaves the same from host POV, maybe it should be up to the guest how to inflate/deflate the balloon internally? > --- > drivers/virtio/Kconfig | 13 + > drivers/virtio/Makefile |1 + > drivers/virtio/virtio_fileballoon.c | 636 > +++ > include/linux/virtio_balloon.h |9 + > include/linux/virtio_ids.h |1 + > 5 files changed, 660 insertions(+), 0 deletions(-) > create mode 100644 drivers/virtio/virtio_fileballoon.c > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig > index f38b17a..cffa2a7 100644 > --- a/drivers/virtio/Kconfig > +++ b/drivers/virtio/Kconfig > @@ -35,6 +35,19 @@ config VIRTIO_BALLOON > >If unsure, say M. > > +config VIRTIO_FILEBALLOON > + tristate "Virtio page cache-backed balloon driver" > + select VIRTIO > + select VIRTIO_RING > + ---help--- > + This driver supports decreasing and automatically reclaiming the > + memory within a guest VM. Unlike VIRTIO_BALLOON, this driver instead > + tries to maintain a specific target balloon size using the page cache. > + This allows the guest to implicitly deflate the balloon by flushing > + pages from the cache and touching the page. > + > + If unsure, say N. > + > config VIRTIO_MMIO > tristate "Platform bus driver for memory mapped virtio devices > (EXPERIMENTAL)" > depends on HAS_IOMEM && EXPERIMENTAL > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile > index 5a4c63c..7ca0a3f 100644 > --- a/drivers/virtio/Makefile > +++ b/drivers/virtio/Makefile > @@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o > obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o > obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o > obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o > +obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o > diff --git a/drivers/virtio/virtio_fileballoon.c > b/drivers/virtio/virtio_fileballoon.c > new file mode 100644 > index 000..ff252ec > --- /dev/null > +++ b/drivers/virtio/virtio_fileballoon.c > @@ -0,0 +1,636 @@ > +/* Virtio file (page cache-backed) balloon implementation, inspired by > + * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty > Russel's > + * implementation. > + * > + * This implementation of the virtio balloon driver re-uses the page cache to > + * allow memory consumed by inflating the balloon to be reclaimed by linux. > It > + * creates and mounts a bare-bones filesystem containing a single inode. > When > + * the host requests the balloon to inflate, it does so by "reading" pages at > + * offsets into the inode mapping's page_tree. The host is notified when the > + * pages are added to the page_tree, allowing it (the host) to madvise(2) the > + * corresponding host memory, reducing the RSS of the virtual machine. In > this > + * implementation, the host is only notified when a page is added to the > + * balloon. Reclaim happens under the existing TTFP logic, which flushes > unused > + * pages in the page cache. If the host used MADV_DONTNEED, then when the > guest > + * uses the page, the zero page will be mapped in, allowing automatic (and > fast, > + * compared to requiring a host notification via a virtio queue to get memory > + * back) reclaim. > + * > + * Copyright 2008 Rusty Russell IBM Corporation > + * Copyright 2011 Frank Swiderski Google Inc > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > +#include > +#include >
Re: [PATCH] Add a page cache-backed balloon device driver.
On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel wrote: > On 06/26/2012 04:32 PM, Frank Swiderski wrote: >> >> This implementation of a virtio balloon driver uses the page cache to >> "store" pages that have been released to the host. The communication >> (outside of target counts) is one way--the guest notifies the host when >> it adds a page to the page cache, allowing the host to madvise(2) with >> MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit >> (via the regular page reclaim). This means that inflating the balloon >> is similar to the existing balloon mechanism, but the deflate is >> different--it re-uses existing Linux kernel functionality to >> automatically reclaim. >> >> Signed-off-by: Frank Swiderski > > > It is a great idea, but how can this memory balancing > possibly work if someone uses memory cgroups inside a > guest? Thanks and good point--this isn't something that I considered in the implementation. > Having said that, we currently do not have proper > memory reclaim balancing between cgroups at all, so > requiring that of this balloon driver would be > unreasonable. > > The code looks good to me, my only worry is the > code duplication. We now have 5 balloon drivers, > for 4 hypervisors, all implementing everything > from scratch... Do you have any recommendations on this? I could (I think reasonably so) modify the existing virtio_balloon.c and have it change behavior based on a feature bit or other configuration. I'm not sure that really addresses the root of what you're pointing out--it's still adding a different implementation, but doing so as an extension of an existing one. fes ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] Add a page cache-backed balloon device driver.
This implementation of a virtio balloon driver uses the page cache to "store" pages that have been released to the host. The communication (outside of target counts) is one way--the guest notifies the host when it adds a page to the page cache, allowing the host to madvise(2) with MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit (via the regular page reclaim). This means that inflating the balloon is similar to the existing balloon mechanism, but the deflate is different--it re-uses existing Linux kernel functionality to automatically reclaim. Signed-off-by: Frank Swiderski --- drivers/virtio/Kconfig | 13 + drivers/virtio/Makefile |1 + drivers/virtio/virtio_fileballoon.c | 636 +++ include/linux/virtio_balloon.h |9 + include/linux/virtio_ids.h |1 + 5 files changed, 660 insertions(+), 0 deletions(-) create mode 100644 drivers/virtio/virtio_fileballoon.c diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index f38b17a..cffa2a7 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -35,6 +35,19 @@ config VIRTIO_BALLOON If unsure, say M. +config VIRTIO_FILEBALLOON + tristate "Virtio page cache-backed balloon driver" + select VIRTIO + select VIRTIO_RING + ---help--- +This driver supports decreasing and automatically reclaiming the +memory within a guest VM. Unlike VIRTIO_BALLOON, this driver instead +tries to maintain a specific target balloon size using the page cache. +This allows the guest to implicitly deflate the balloon by flushing +pages from the cache and touching the page. + +If unsure, say N. + config VIRTIO_MMIO tristate "Platform bus driver for memory mapped virtio devices (EXPERIMENTAL)" depends on HAS_IOMEM && EXPERIMENTAL diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 5a4c63c..7ca0a3f 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o +obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o diff --git a/drivers/virtio/virtio_fileballoon.c b/drivers/virtio/virtio_fileballoon.c new file mode 100644 index 000..ff252ec --- /dev/null +++ b/drivers/virtio/virtio_fileballoon.c @@ -0,0 +1,636 @@ +/* Virtio file (page cache-backed) balloon implementation, inspired by + * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty Russel's + * implementation. + * + * This implementation of the virtio balloon driver re-uses the page cache to + * allow memory consumed by inflating the balloon to be reclaimed by linux. It + * creates and mounts a bare-bones filesystem containing a single inode. When + * the host requests the balloon to inflate, it does so by "reading" pages at + * offsets into the inode mapping's page_tree. The host is notified when the + * pages are added to the page_tree, allowing it (the host) to madvise(2) the + * corresponding host memory, reducing the RSS of the virtual machine. In this + * implementation, the host is only notified when a page is added to the + * balloon. Reclaim happens under the existing TTFP logic, which flushes unused + * pages in the page cache. If the host used MADV_DONTNEED, then when the guest + * uses the page, the zero page will be mapped in, allowing automatic (and fast, + * compared to requiring a host notification via a virtio queue to get memory + * back) reclaim. + * + * Copyright 2008 Rusty Russell IBM Corporation + * Copyright 2011 Frank Swiderski Google Inc + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define VIRTBALLOON_PFN_ARRAY_SIZE 256 + +struct virtio_balloon { + struct virtio_device *vdev; + struct virtqueue *inflate_vq; + + /* Where the ballooning thread waits for config to change. */ + wait_queue_head_t config_change; + + /* The thread servicing the balloon. */ +
Re: [PATCH] Add a page cache-backed balloon device driver.
On 06/26/2012 04:32 PM, Frank Swiderski wrote: This implementation of a virtio balloon driver uses the page cache to "store" pages that have been released to the host. The communication (outside of target counts) is one way--the guest notifies the host when it adds a page to the page cache, allowing the host to madvise(2) with MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit (via the regular page reclaim). This means that inflating the balloon is similar to the existing balloon mechanism, but the deflate is different--it re-uses existing Linux kernel functionality to automatically reclaim. Signed-off-by: Frank Swiderski It is a great idea, but how can this memory balancing possibly work if someone uses memory cgroups inside a guest? Having said that, we currently do not have proper memory reclaim balancing between cgroups at all, so requiring that of this balloon driver would be unreasonable. The code looks good to me, my only worry is the code duplication. We now have 5 balloon drivers, for 4 hypervisors, all implementing everything from scratch... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
> How is the compiler meant to optimise away "cond" if it's a function > call? Inlines can be optimized away. These tests are usually inlines. > What did I miss? If nothing, then I will revert this particular change > and Rafael will need to be sure his patch is not using VM_BUG_ON to call > a function with side-effects. Do you have an example where the code is actually different, or are you just speculating? FWIW for my config both generates the same code: size vmlinux-andi-vmbug vmlinux-vmbug-nothing textdata bss dec hex filename 118097041457352 1159168 14426224 dc2070 vmlinux-andi-vmbug 118097041457352 1159168 14426224 dc2070 vmlinux-vmbug-nothing -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
On Tue, Jun 26, 2012 at 06:52:58PM +0200, Andi Kleen wrote: > > > > What shocked me actually is that VM_BUG_ON code is executed on > > !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd: > > gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of > > VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi, > > was this deliberate? > > The idea was that the compiler optimizes it away anyways. > > I'm not fully sure what putback_balloon_page does, but if it just tests > a bit (without non variable test_bit) it should be ok. > This was the definition before #ifdef CONFIG_DEBUG_VM #define VM_BUG_ON(cond) BUG_ON(cond) #else #define VM_BUG_ON(cond) do { } while (0) #endif and now it's #ifdef CONFIG_DEBUG_VM #define VM_BUG_ON(cond) BUG_ON(cond) #else #define VM_BUG_ON(cond) do { (void)(cond); } while (0) #endif How is the compiler meant to optimise away "cond" if it's a function call? In the old definition VM_BUG_ON did nothing and the intention was that the "cond" should never had any side-effects. It was to be used for potentially expensive tests to catch additional issues in DEBUG_VM kernels. My concern is that after commit 4e60c86bd that the VM doing these additional checks unnecesarily with a performance hit. In most cases the checks are small but in others such as free_pages we are calling virt_addr_valid() which is heavier. What did I miss? If nothing, then I will revert this particular change and Rafael will need to be sure his patch is not using VM_BUG_ON to call a function with side-effects. -- Mel Gorman SUSE Labs ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
> I'm not fully sure what putback_balloon_page does, but if it just tests > a bit (without non variable test_bit) it should be ok. ^ should be "without variable ..." -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
> > What shocked me actually is that VM_BUG_ON code is executed on > !CONFIG_DEBUG_VM builds and has been since 2.6.36 due to commit [4e60c86bd: > gcc-4.6: mm: fix unused but set warnings]. I thought the whole point of > VM_BUG_ON was to avoid expensive and usually unnecessary checks. Andi, > was this deliberate? The idea was that the compiler optimizes it away anyways. I'm not fully sure what putback_balloon_page does, but if it just tests a bit (without non variable test_bit) it should be ok. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
On Tue, Jun 26, 2012 at 09:17:43AM -0400, Rik van Riel wrote: > On 06/25/2012 07:25 PM, Rafael Aquini wrote: > >This patch introduces helper functions that teach compaction and migration > >bits > >how to cope with pages which are part of a guest memory balloon, in order to > >make them movable by memory compaction procedures. > > > >Signed-off-by: Rafael Aquini > > The function fill_balloon in drivers/virtio/virtio_balloon.c > should probably add __GFP_MOVABLE to the gfp mask for alloc_pages, > to keep the pageblock where balloon pages are allocated marked > as movable. > You're right, but patch 3 is already doing that at the same time the migration primitives are introduced. It does mean the full series has to be applied to do anything but I think it's bisect safe. -- Mel Gorman SUSE Labs ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
On 06/25/2012 07:25 PM, Rafael Aquini wrote: This patch introduces helper functions that teach compaction and migration bits how to cope with pages which are part of a guest memory balloon, in order to make them movable by memory compaction procedures. Signed-off-by: Rafael Aquini The function fill_balloon in drivers/virtio/virtio_balloon.c should probably add __GFP_MOVABLE to the gfp mask for alloc_pages, to keep the pageblock where balloon pages are allocated marked as movable. -- All rights reversed ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
(apologies if there are excessive typos. I damaged my left hand and typing is painful). Adding Andi to cc for question on VM_BUG_ON. On Mon, Jun 25, 2012 at 08:25:56PM -0300, Rafael Aquini wrote: > This patch introduces helper functions that teach compaction and migration > bits > how to cope with pages which are part of a guest memory balloon, in order to > make them movable by memory compaction procedures. > > Signed-off-by: Rafael Aquini > --- > include/linux/mm.h | 17 + > mm/compaction.c| 72 > > mm/migrate.c | 30 +- > 3 files changed, 118 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index b36d08c..360656e 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1629,5 +1629,22 @@ static inline unsigned int > debug_guardpage_minorder(void) { return 0; } > static inline bool page_is_guard(struct page *page) { return false; } > #endif /* CONFIG_DEBUG_PAGEALLOC */ > > +#if (defined(CONFIG_VIRTIO_BALLOON) || \ > + defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION) > +extern int is_balloon_page(struct page *); > +extern int isolate_balloon_page(struct page *); > +extern int putback_balloon_page(struct page *); > + > +/* return 1 if page is part of a guest's memory balloon, 0 otherwise */ > +static inline int PageBalloon(struct page *page) > +{ > + return is_balloon_page(page); > +} bool Why is there both is_balloon_page and PageBalloon? is_ballon_page is so simple it should just be a static inline here extern struct address_space *balloon_mapping; static inline bool is_balloon_page(page) { return page->mapping == balloon_mapping; } > +#else > +static inline int PageBalloon(struct page *page) { return 0; } > +static inline int isolate_balloon_page(struct page *page){ return 0; } > +static inline int putback_balloon_page(struct page *page){ return 0; } > +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && COMPACTION */ > + > #endif /* __KERNEL__ */ > #endif /* _LINUX_MM_H */ > diff --git a/mm/compaction.c b/mm/compaction.c > index 7ea259d..8835d55 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include "internal.h" > > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > @@ -312,6 +313,14 @@ isolate_migratepages_range(struct zone *zone, struct > compact_control *cc, > continue; > } > > + /* > + * For ballooned pages, we need to isolate them before testing > + * for PageLRU, as well as skip the LRU page isolation steps. > + */ This says what, but not why. I didn't check the exact mechanics of a balloon page but I expect it's that balloon pages are not on the LRU. If they are on the LRU, that's pretty dumb. /* * Balloon pages can be migrated but are not on the LRU. Isolate * them before LRU checks. */ It would be nicer to do this without gotos /* * It is possible to migrate LRU pages and balloon pages. Skip * any other type of page */ if (is_balloon_page(page)) { if (!isolate_balloon_page(page)) continue; } else if (PageLRU(page)) { } You will need to shuffle things around a little to make it work properly but if we handle other page types in the future it will be neater overall. > + if (PageBalloon(page)) > + if (isolate_balloon_page(page)) > + goto isolated_balloon_page; > + > if (!PageLRU(page)) > continue; > > @@ -338,6 +347,7 @@ isolate_migratepages_range(struct zone *zone, struct > compact_control *cc, > > /* Successfully isolated */ > del_page_from_lru_list(page, lruvec, page_lru(page)); > +isolated_balloon_page: > list_add(&page->lru, migratelist); > cc->nr_migratepages++; > nr_isolated++; > @@ -903,4 +913,66 @@ void compaction_unregister_node(struct node *node) > } > #endif /* CONFIG_SYSFS && CONFIG_NUMA */ > > +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE) > +/* > + * Balloon pages special page->mapping. > + * users must properly allocate and initiliaze an instance of > balloon_mapping, > + * and set it as the page->mapping for balloon enlisted page instances. > + * > + * address_space_operations necessary methods for ballooned pages: > + * .migratepage- used to perform balloon's page migration (as is) > + * .invalidatepage - used to isolate a page from balloon's page list > + * .freepage - used to reinsert an isolated page to balloon's page > list > + */ > +struct address_space *balloon_mapping; > +EXPORT_SYMBOL(balloon_mapping); > + EXPORT_SYMBOL_GPL? I don't mind how it is exported as such. I'm idly cur
[rfc] virtio-spec: introduce VIRTIO_NET_F_MULTIQUEUE
This patch introduces the multiqueue capabilities to virtio net devices. The number of tx/rx queue pairs available in the device were exposed through config space, and driver could negotiate the number of pairs it wish to use through ctrl vq. Signed-off-by: Jason Wang --- virtio-0.9.5.lyx | 180 -- 1 file changed, 176 insertions(+), 4 deletions(-) diff --git a/virtio-0.9.5.lyx b/virtio-0.9.5.lyx index 3c80ecf..480e9c7 100644 --- a/virtio-0.9.5.lyx +++ b/virtio-0.9.5.lyx @@ -56,6 +56,7 @@ \html_math_output 0 \html_css_as_file 0 \html_be_strict false +\author 2090695081 "Jason Wang" \end_header \begin_body @@ -3854,11 +3855,22 @@ ID 1 \end_layout \begin_layout Description -Virtqueues 0:receiveq. +Virtqueues +\change_inserted 2090695081 1340693104 + +\end_layout + +\begin_deeper +\begin_layout Description + +\change_inserted 2090695081 1340693118 +When VIRTIO_NET_F_MULTIQUEUE is not set: +\change_unchanged +0:receiveq. 1:transmitq. 2:controlq \begin_inset Foot -status open +status collapsed \begin_layout Plain Layout Only if VIRTIO_NET_F_CTRL_VQ set @@ -3867,9 +3879,60 @@ Only if VIRTIO_NET_F_CTRL_VQ set \end_inset +\change_inserted 2090695081 1340693122 + \end_layout \begin_layout Description + +\change_inserted 2090695081 1340693866 +When VIRTIO_NET_F_MULTIQUEUE is set and there's N tx/rx queue pairs: 0:receiveq1. + 1:transmitq1. + 2:controlq +\begin_inset Foot +status collapsed + +\begin_layout Plain Layout + +\change_inserted 2090695081 1340693141 +Only if VIRTIO_NET_F_CTRL_VQ set +\end_layout + +\end_inset + + ... + 2N-1 +\begin_inset Foot +status collapsed + +\begin_layout Plain Layout + +\change_inserted 2090695081 1340693284 +2N-2 If VIRTIO_NET_F_CTRL_VQ not set +\end_layout + +\end_inset + +:receiveqN. + 2N +\begin_inset Foot +status collapsed + +\begin_layout Plain Layout + +\change_inserted 2090695081 1340693302 +2N-1 If VIRTIO_NET_F_CTRL_VQ is not set +\end_layout + +\end_inset + +: transmitqN +\change_unchanged + +\end_layout + +\end_deeper +\begin_layout Description Feature \begin_inset space ~ \end_inset @@ -4027,6 +4090,16 @@ VIRTIO_NET_F_CTRL_VLAN \begin_layout Description VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets. +\change_inserted 2090695081 1340692965 + +\end_layout + +\begin_layout Description + +\change_inserted 2090695081 1340693017 +VIRTIO_NET_F_MULTIQUEUE (22) Device has multiple tx/rx queues. +\change_unchanged + \end_layout \end_deeper @@ -4039,11 +4112,22 @@ configuration \begin_inset space ~ \end_inset -layout Two configuration fields are currently defined. +layout T +\change_inserted 2090695081 1340693345 +hree +\change_deleted 2090695081 1340693344 +wo +\change_unchanged + configuration fields are currently defined. The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC is set), and the status field only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN K_UP and VIRTIO_NET_S_ANNOUNCE. + +\change_inserted 2090695081 1340693398 + The num queue pairs fields only exist if VIRTIO_NET_F_MULTIQUEUE is set. + +\change_unchanged \begin_inset listings inline false @@ -4076,6 +4160,17 @@ struct virtio_net_config { \begin_layout Plain Layout u16 status; +\change_inserted 2090695081 1340692955 + +\end_layout + +\begin_layout Plain Layout + +\change_inserted 2090695081 1340692962 + + u16 num_queue_pairs; +\change_unchanged + \end_layout \begin_layout Plain Layout @@ -4527,7 +4622,7 @@ O features are used, the Guest will need to accept packets of up to 65550 So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every buffer in the receive queue needs to be at least this length \begin_inset Foot -status open +status collapsed \begin_layout Plain Layout Obviously each one can be split across multiple descriptor elements. @@ -4980,6 +5075,83 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq. \begin_layout Enumerate . +\change_inserted 2090695081 1340693446 + +\end_layout + +\begin_layout Subsection* + +\change_inserted 2090695081 1340693500 +Negotiating the number of queue pairs +\end_layout + +\begin_layout Standard + +\change_inserted 2090695081 1340693733 +If the driver negotiates the VIRTIO_NET_F_MULTIQUEUE (depends on VIRTIO_NET_F_CT +RL_VQ), it can then negotiate the number of queue pairs it wish to use by + placing the number in num_queue_pairs field of virtio_net_ctrl_multiqueue + through VIRTIO_NET_CTRL_MULTIQUEUE_NUM command. +\end_layout + +\begin_layout Standard + +\change_inserted 2090695081 1340693782 +If the driver doesn't negotiate the number, all tx/rx queues were enabled + by default. +\end_layout + +\begin_layout Standard + +\change_inserted 2090695081 1340693616 +\begin_inset listings +inline false +status open + +\begin_layout Plain Layout + +\change_inserted 2090695081 1340693620 + +struct virtio_net