Re: [PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages

2012-08-08 Thread Minchan Kim
Hi Rafael,

On Wed, Aug 08, 2012 at 07:53:19PM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
> 
> This patch introduces the helper functions as well as the necessary changes
> to teach compaction and migration bits how to cope with pages which are
> part of a guest memory balloon, in order to make them movable by memory
> compaction procedures.
> 
> Signed-off-by: Rafael Aquini 
> ---
>  include/linux/mm.h |  17 +++
>  mm/compaction.c| 131 
> +
>  mm/migrate.c   |  30 +++-
>  3 files changed, 158 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 311be90..18f978b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1662,5 +1662,22 @@ static inline unsigned int 
> debug_guardpage_minorder(void) { return 0; }
>  static inline bool page_is_guard(struct page *page) { return false; }
>  #endif /* CONFIG_DEBUG_PAGEALLOC */
>  
> +#if (defined(CONFIG_VIRTIO_BALLOON) || \
> + defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
> +extern bool isolate_balloon_page(struct page *);
> +extern bool putback_balloon_page(struct page *);
> +extern struct address_space *balloon_mapping;
> +
> +static inline bool movable_balloon_page(struct page *page)
> +{
> + return (page->mapping && page->mapping == balloon_mapping);
> +}
> +
> +#else
> +static inline bool isolate_balloon_page(struct page *page) { return false; }
> +static inline bool putback_balloon_page(struct page *page) { return false; }
> +static inline bool movable_balloon_page(struct page *page) { return false; }
> +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e78cb96..7372592 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "internal.h"
>  
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> @@ -21,6 +22,90 @@
>  #define CREATE_TRACE_POINTS
>  #include 
>  
> +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
> +/*
> + * Balloon pages special page->mapping.
> + * Users must properly allocate and initialize an instance of 
> balloon_mapping,
> + * and set it as the page->mapping for balloon enlisted page instances.
> + * There is no need on utilizing struct address_space locking schemes for
> + * balloon_mapping as, once it gets initialized at balloon driver, it will
> + * remain just like a static reference that helps us on identifying a guest
> + * ballooned page by its mapping, as well as it will keep the 'a_ops' 
> callback
> + * pointers to the functions that will execute the balloon page mobility 
> tasks.
> + *
> + * address_space_operations necessary methods for ballooned pages:
> + *   .migratepage- used to perform balloon's page migration (as is)
> + *   .invalidatepage - used to isolate a page from balloon's page list
> + *   .freepage   - used to reinsert an isolated page to balloon's page 
> list
> + */
> +struct address_space *balloon_mapping;
> +EXPORT_SYMBOL_GPL(balloon_mapping);
> +
> +static inline void __isolate_balloon_page(struct page *page)
> +{
> + page->mapping->a_ops->invalidatepage(page, 0);
> +}
> +
> +static inline void __putback_balloon_page(struct page *page)
> +{
> + page->mapping->a_ops->freepage(page);
> +}
> +
> +/* __isolate_lru_page() counterpart for a ballooned page */
> +bool isolate_balloon_page(struct page *page)
> +{
> + if (WARN_ON(!movable_balloon_page(page)))
> + return false;
> +
> + if (likely(get_page_unless_zero(page))) {
> + /*
> +  * As balloon pages are not isolated from LRU lists, concurrent
> +  * compaction threads can race against page migration functions
> +  * move_to_new_page() & __unmap_and_move().
> +  * In order to avoid having an already isolated balloon page
> +  * being (wrongly) re-isolated while it is under migration,
> +  * lets be sure we have the page lock before proceeding with
> +  * the balloon page isolation steps.
> +  */
> + if (likely(trylock_page(page))) {
> + /*
> +  * A ballooned page, by default, has just one refcount.
> +  * Prevent concurrent compaction threads from isolating
> +  * an already isolated balloon page.
> +  */
> + if (movable_balloon_page(page) &&
> + (page_count(page) == 2)) {
> +  

Re: [PATCH v6 3/3] mm: add vm event counters for balloon pages compaction

2012-08-08 Thread Rik van Riel

On 08/08/2012 06:53 PM, Rafael Aquini wrote:

This patch is only for testing report purposes and shall be dropped in case of
the rest of this patchset getting accepted for merging.


I wonder if it would make sense to just keep these statistics, so
if a change breaks balloon migration in the future, we'll be able
to see it...


Signed-off-by: Rafael Aquini


Acked-by: Rik van Riel 

--
All rights reversed
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 2/3] virtio_balloon: introduce migration primitives to balloon pages

2012-08-08 Thread Rik van Riel

On 08/08/2012 06:53 PM, Rafael Aquini wrote:

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme to provide the
proper synchronization and protection for struct virtio_balloon elements
against concurrent accesses due to parallel operations introduced by
memory compaction / page migration.
  - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;
  - pages_lock (spinlock): special protection to balloon pages list against
  concurrent list handling operations;

Signed-off-by: Rafael Aquini


Reviewed-by: Rik van Riel 

--
All rights reversed
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages

2012-08-08 Thread Rik van Riel

On 08/08/2012 06:53 PM, Rafael Aquini wrote:

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini


Reviewed-by: Rik van Riel 

--
All rights reversed
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v6 2/3] virtio_balloon: introduce migration primitives to balloon pages

2012-08-08 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme to provide the
proper synchronization and protection for struct virtio_balloon elements
against concurrent accesses due to parallel operations introduced by
memory compaction / page migration.
 - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;
 - pages_lock (spinlock): special protection to balloon pages list against
  concurrent list handling operations;

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c | 138 +---
 include/linux/virtio_balloon.h  |   4 ++
 2 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..7c937a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -35,6 +36,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
 
+/* Synchronizes accesses/updates to the struct virtio_balloon elements */
+DEFINE_MUTEX(balloon_lock);
+
+/* Protects 'virtio_balloon->pages' list against concurrent handling */
+DEFINE_SPINLOCK(pages_lock);
+
 struct virtio_balloon
 {
struct virtio_device *vdev;
@@ -51,6 +58,7 @@ struct virtio_balloon
 
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+
/*
 * The pages we've told the Host we're not using.
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(&balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-   __GFP_NOMEMALLOC | __GFP_NOWARN);
+   struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+   __GFP_NORETRY | __GFP_NOWARN |
+   __GFP_NOMEMALLOC);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
+   spin_lock(&pages_lock);
list_add(&page->lru, &vb->pages);
+   page->mapping = balloon_mapping;
+   spin_unlock(&pages_lock);
}
 
/* Didn't get any?  Oh well. */
@@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
return;
 
tell_host(vb, vb->inflate_vq);
+   mutex_unlock(&balloon_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(&balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   /*
+* We can race against virtballoon_isolatepage() and end up
+* stumbling across a _temporarily_ empty 'pages' list.
+*/
+   spin_lock(&pages_lock);
+   if (unlikely(list_empty(&vb->pages))) {
+   spin_unlock(&pages_lock);
+   break;
+   }
page = list_first_entry(&vb->pages, struct page, lru);
+   page->mapping = NULL;
list_del(&page->lru);
+   spin_unlock(&pages_lock);
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 * is true, we *have* to 

[PATCH v6 0/3] make balloon pages movable by compaction

2012-08-08 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Rafael Aquini (3):
  mm: introduce compaction and migration for virtio ballooned pages
  virtio_balloon: introduce migration primitives to balloon pages
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c | 139 +---
 include/linux/mm.h  |  17 +
 include/linux/virtio_balloon.h  |   4 ++
 include/linux/vm_event_item.h   |   2 +
 mm/compaction.c | 132 --
 mm/migrate.c|  32 -
 mm/vmstat.c |   4 ++
 7 files changed, 302 insertions(+), 28 deletions(-)

Change log:
v6:
 * rename 'is_balloon_page()' to 'movable_balloon_page()' (Rik);
v5:
 * address Andrew Morton's review comments on the patch series;
 * address a couple extra nitpick suggestions on PATCH 01 (Minchan);
v4: 
 * address Rusty Russel's review comments on PATCH 02;
 * re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3;
V3: 
 * address reviewers nitpick suggestions on PATCH 01 (Mel, Minchan);
V2: 
 * address Mel Gorman's review comments on PATCH 01;


Preliminary test results:
(2 VCPU 1024mB RAM KVM guest running 3.6.0_rc1+ -- after a reboot)

* 64mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > 
/proc/sys/vm/compact_memory & done &>/dev/null 
[1]   Doneecho 1 > /proc/sys/vm/compact_memory
[2]   Doneecho 1 > /proc/sys/vm/compact_memory
[3]   Doneecho 1 > /proc/sys/vm/compact_memory
[4]   Doneecho 1 > /proc/sys/vm/compact_memory
[5]-  Doneecho 1 > /proc/sys/vm/compact_memory
[6]+  Doneecho 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 3520
compact_pages_moved 47548
compact_pagemigrate_failed 120
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 16378
compact_balloon_failed 0
compact_balloon_isolated 16378
compact_balloon_freed 16378

* 128mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > 
/proc/sys/vm/compact_memory & done &>/dev/null 
[1]   Doneecho 1 > /proc/sys/vm/compact_memory
[2]   Doneecho 1 > /proc/sys/vm/compact_memory
[3]   Doneecho 1 > /proc/sys/vm/compact_memory
[4]   Doneecho 1 > /proc/sys/vm/compact_memory
[5]-  Doneecho 1 > /proc/sys/vm/compact_memory
[6]+  Doneecho 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 3356
compact_pages_moved 47099
compact_pagemigrate_failed 158
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 26275
compact_balloon_failed 42
compact_balloon_isolated 26317
compact_balloon_freed 26275

-- 
1.7.11.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v6 3/3] mm: add vm event counters for balloon pages compaction

2012-08-08 Thread Rafael Aquini
This patch is only for testing report purposes and shall be dropped in case of
the rest of this patchset getting accepted for merging.

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c | 1 +
 include/linux/vm_event_item.h   | 2 ++
 mm/compaction.c | 1 +
 mm/migrate.c| 6 --
 mm/vmstat.c | 4 
 5 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7c937a0..b8f7ea5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -414,6 +414,7 @@ int virtballoon_migratepage(struct address_space *mapping,
 
mutex_unlock(&balloon_lock);
 
+   count_vm_event(COMPACTBALLOONMIGRATED);
return 0;
 }
 
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 57f7b10..a632a5d 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,6 +41,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
+   COMPACTBALLOONMIGRATED, COMPACTBALLOONFAILED,
+   COMPACTBALLOONISOLATED, COMPACTBALLOONFREED,
 #endif
 #ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/compaction.c b/mm/compaction.c
index 7372592..5d6a344 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -77,6 +77,7 @@ bool isolate_balloon_page(struct page *page)
(page_count(page) == 2)) {
__isolate_balloon_page(page);
unlock_page(page);
+   count_vm_event(COMPACTBALLOONISOLATED);
return true;
}
unlock_page(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index 871a304..4115875 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -79,9 +79,10 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   if (unlikely(movable_balloon_page(page)))
+   if (unlikely(movable_balloon_page(page))) {
+   count_vm_event(COMPACTBALLOONFAILED);
WARN_ON(!putback_balloon_page(page));
-   else
+   } else
putback_lru_page(page);
}
 }
@@ -872,6 +873,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned 
long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+   count_vm_event(COMPACTBALLOONFREED);
return rc;
}
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index df7a674..8d80f60 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -768,6 +768,10 @@ const char * const vmstat_text[] = {
"compact_stall",
"compact_fail",
"compact_success",
+   "compact_balloon_migrated",
+   "compact_balloon_failed",
+   "compact_balloon_isolated",
+   "compact_balloon_freed",
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE
-- 
1.7.11.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages

2012-08-08 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini 
---
 include/linux/mm.h |  17 +++
 mm/compaction.c| 131 +
 mm/migrate.c   |  30 +++-
 3 files changed, 158 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..18f978b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1662,5 +1662,22 @@ static inline unsigned int 
debug_guardpage_minorder(void) { return 0; }
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
+#if (defined(CONFIG_VIRTIO_BALLOON) || \
+   defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
+extern bool isolate_balloon_page(struct page *);
+extern bool putback_balloon_page(struct page *);
+extern struct address_space *balloon_mapping;
+
+static inline bool movable_balloon_page(struct page *page)
+{
+   return (page->mapping && page->mapping == balloon_mapping);
+}
+
+#else
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline bool putback_balloon_page(struct page *page) { return false; }
+static inline bool movable_balloon_page(struct page *page) { return false; }
+#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index e78cb96..7372592 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -21,6 +22,90 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+/*
+ * Balloon pages special page->mapping.
+ * Users must properly allocate and initialize an instance of balloon_mapping,
+ * and set it as the page->mapping for balloon enlisted page instances.
+ * There is no need on utilizing struct address_space locking schemes for
+ * balloon_mapping as, once it gets initialized at balloon driver, it will
+ * remain just like a static reference that helps us on identifying a guest
+ * ballooned page by its mapping, as well as it will keep the 'a_ops' callback
+ * pointers to the functions that will execute the balloon page mobility tasks.
+ *
+ * address_space_operations necessary methods for ballooned pages:
+ *   .migratepage- used to perform balloon's page migration (as is)
+ *   .invalidatepage - used to isolate a page from balloon's page list
+ *   .freepage   - used to reinsert an isolated page to balloon's page list
+ */
+struct address_space *balloon_mapping;
+EXPORT_SYMBOL_GPL(balloon_mapping);
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+   page->mapping->a_ops->invalidatepage(page, 0);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+   page->mapping->a_ops->freepage(page);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool isolate_balloon_page(struct page *page)
+{
+   if (WARN_ON(!movable_balloon_page(page)))
+   return false;
+
+   if (likely(get_page_unless_zero(page))) {
+   /*
+* As balloon pages are not isolated from LRU lists, concurrent
+* compaction threads can race against page migration functions
+* move_to_new_page() & __unmap_and_move().
+* In order to avoid having an already isolated balloon page
+* being (wrongly) re-isolated while it is under migration,
+* lets be sure we have the page lock before proceeding with
+* the balloon page isolation steps.
+*/
+   if (likely(trylock_page(page))) {
+   /*
+* A ballooned page, by default, has just one refcount.
+* Prevent concurrent compaction threads from isolating
+* an already isolated balloon page.
+*/
+   if (movable_balloon_page(page) &&
+   (page_count(page) == 2)) {
+   __isolate_balloon_page(page);
+   unlock_page(page);
+   return true;
+   }
+   unlock_page(page);
+   }
+   /* Drop refcount t

[PATCH V7 2/2] virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path

2012-08-08 Thread Asias He
We need to support both REQ_FLUSH and REQ_FUA for bio based path since
it does not get the sequencing of REQ_FUA into REQ_FLUSH that request
based drivers can request.

REQ_FLUSH is emulated by:
A) If the bio has no data to write:
1. Send VIRTIO_BLK_T_FLUSH to device,
2. In the flush I/O completion handler, finish the bio

B) If the bio has data to write:
1. Send VIRTIO_BLK_T_FLUSH to device
2. In the flush I/O completion handler, send the actual write data to device
3. In the write I/O completion handler, finish the bio

REQ_FUA is emulated by:
1. Send the actual write data to device
2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device
3. In the flush I/O completion handler, finish the bio

Changes in v7:
- Using vbr->flags to trace request type
- Dropped unnecessary struct virtio_blk *vblk parameter
- Reuse struct virtblk_req in bio done function

Cahnges in v6:
- Reworked REQ_FLUSH and REQ_FUA emulatation order

Cc: Rusty Russell 
Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Tejun Heo 
Cc: Shaohua Li 
Cc: "Michael S. Tsirkin" 
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Asias He 
---
 drivers/block/virtio_blk.c | 272 +++--
 1 file changed, 188 insertions(+), 84 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 95cfeed..2edfb5c 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -58,10 +58,20 @@ struct virtblk_req
struct bio *bio;
struct virtio_blk_outhdr out_hdr;
struct virtio_scsi_inhdr in_hdr;
+   struct work_struct work;
+   struct virtio_blk *vblk;
+   int flags;
u8 status;
struct scatterlist sg[];
 };
 
+enum {
+   VBLK_IS_FLUSH   = 1,
+   VBLK_REQ_FLUSH  = 2,
+   VBLK_REQ_DATA   = 4,
+   VBLK_REQ_FUA= 8,
+};
+
 static inline int virtblk_result(struct virtblk_req *vbr)
 {
switch (vbr->status) {
@@ -74,9 +84,133 @@ static inline int virtblk_result(struct virtblk_req *vbr)
}
 }
 
-static inline void virtblk_request_done(struct virtio_blk *vblk,
-   struct virtblk_req *vbr)
+static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk,
+   gfp_t gfp_mask)
+{
+   struct virtblk_req *vbr;
+
+   vbr = mempool_alloc(vblk->pool, gfp_mask);
+   if (vbr && use_bio)
+   sg_init_table(vbr->sg, vblk->sg_elems);
+
+   vbr->vblk = vblk;
+
+   return vbr;
+}
+
+static void virtblk_add_buf_wait(struct virtio_blk *vblk,
+struct virtblk_req *vbr,
+unsigned long out,
+unsigned long in)
+{
+   DEFINE_WAIT(wait);
+
+   for (;;) {
+   prepare_to_wait_exclusive(&vblk->queue_wait, &wait,
+ TASK_UNINTERRUPTIBLE);
+
+   spin_lock_irq(vblk->disk->queue->queue_lock);
+   if (virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr,
+ GFP_ATOMIC) < 0) {
+   spin_unlock_irq(vblk->disk->queue->queue_lock);
+   io_schedule();
+   } else {
+   virtqueue_kick(vblk->vq);
+   spin_unlock_irq(vblk->disk->queue->queue_lock);
+   break;
+   }
+
+   }
+
+   finish_wait(&vblk->queue_wait, &wait);
+}
+
+static inline void virtblk_add_req(struct virtblk_req *vbr,
+  unsigned int out, unsigned int in)
+{
+   struct virtio_blk *vblk = vbr->vblk;
+
+   spin_lock_irq(vblk->disk->queue->queue_lock);
+   if (unlikely(virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr,
+   GFP_ATOMIC) < 0)) {
+   spin_unlock_irq(vblk->disk->queue->queue_lock);
+   virtblk_add_buf_wait(vblk, vbr, out, in);
+   return;
+   }
+   virtqueue_kick(vblk->vq);
+   spin_unlock_irq(vblk->disk->queue->queue_lock);
+}
+
+static int virtblk_bio_send_flush(struct virtblk_req *vbr)
+{
+   unsigned int out = 0, in = 0;
+
+   vbr->flags |= VBLK_IS_FLUSH;
+   vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH;
+   vbr->out_hdr.sector = 0;
+   vbr->out_hdr.ioprio = 0;
+   sg_set_buf(&vbr->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
+   sg_set_buf(&vbr->sg[out + in++], &vbr->status, sizeof(vbr->status));
+
+   virtblk_add_req(vbr, out, in);
+
+   return 0;
+}
+
+static int virtblk_bio_send_data(struct virtblk_req *vbr)
 {
+   struct virtio_blk *vblk = vbr->vblk;
+   unsigned int num, out = 0, in = 0;
+   struct bio *bio = vbr->bio;
+
+   vbr->flags &= ~VBLK_IS_FLUSH;
+   vbr->out_hdr.type = 0;
+   vbr->out_hdr.sector = bio->bi_sector;

[PATCH V7 1/2] virtio-blk: Add bio-based IO path for virtio-blk

2012-08-08 Thread Asias He
This patch introduces bio-based IO path for virtio-blk.

Compared to request-based IO path, bio-based IO path uses driver
provided ->make_request_fn() method to bypasses the IO scheduler. It
handles the bio to device directly without allocating a request in block
layer. This reduces the IO path in guest kernel to achieve high IOPS
and lower latency. The downside is that guest can not use the IO
scheduler to merge and sort requests. However, this is not a big problem
if the backend disk in host side uses faster disk device.

When the bio-based IO path is not enabled, virtio-blk still uses the
original request-based IO path, no performance difference is observed.

Using a slow device e.g. normal SATA disk, the bio-based IO path for
sequential read and write are slower than req-based IO path due to lack
of merge in guest kernel. So we make the bio-based path optional.

Performance evaluation:
-
1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 28%, 24%, 21%, 16%
 Latency improvement: 32%, 17%, 21%, 16%

Long version:
 With bio-based IO path:
  seq-read  : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec
  seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec
  rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec
  rand-write: io=3095.7MB, bw=96198KB/s,  iops=192396 , runt= 32952msec
clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30
clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35
clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39
clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45
  cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954
  cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137
  cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648
  cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375
 With request-based IO path:
  seq-read  : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec
  seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec
  rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec
  rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec
clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49
clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15
clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27
clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68
  cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622
  cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959
  cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082
  cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985

2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 11%, 11%, 13%, 10%
 Latency improvement: 10%, 10%, 12%, 10%
Long Version:
 With bio-based IO path:
  read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec
  write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec
  read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec
  write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec
clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29
clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80
clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07
clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25
  cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517
  cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604
  cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612
  cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105
 With request-based IO path:
  read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec
  write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec
  read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec
  write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec
clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84
clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64
clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55
clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95
  cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641
  cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689
  cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223
  cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409

3) Fio test is performed in a 8 vcpu guest with normal SAT

[PATCH V7 0/2] Improve virtio-blk performance

2012-08-08 Thread Asias He
Hi, all

Changes in v7:
- Using vbr->flags to trace request type
- Dropped unnecessary struct virtio_blk *vblk parameter
- Reuse struct virtblk_req in bio done function
- Added performance data on normal SATA device and the reason why make it 
optional

Fio test shows bio-based IO path gives the following performance improvement:

1) Ramdisk device
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 28%, 24%, 21%, 16%
 Latency improvement: 32%, 17%, 21%, 16%
2) Fusion IO device
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 11%, 11%, 13%, 10%
 Latency improvement: 10%, 10%, 12%, 10%
3) Normal SATA device
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : -10%, -10%, 4.4%, 0.5%
 Latency improvement: -12%, -15%, 2.5%, 0.8%

Asias He (2):
  virtio-blk: Add bio-based IO path for virtio-blk
  virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path

 drivers/block/virtio_blk.c | 301 +++--
 1 file changed, 264 insertions(+), 37 deletions(-)

-- 
1.7.11.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization