Re: [PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages
Hi Rafael, On Wed, Aug 08, 2012 at 07:53:19PM -0300, Rafael Aquini wrote: > Memory fragmentation introduced by ballooning might reduce significantly > the number of 2MB contiguous memory blocks that can be used within a guest, > thus imposing performance penalties associated with the reduced number of > transparent huge pages that could be used by the guest workload. > > This patch introduces the helper functions as well as the necessary changes > to teach compaction and migration bits how to cope with pages which are > part of a guest memory balloon, in order to make them movable by memory > compaction procedures. > > Signed-off-by: Rafael Aquini > --- > include/linux/mm.h | 17 +++ > mm/compaction.c| 131 > + > mm/migrate.c | 30 +++- > 3 files changed, 158 insertions(+), 20 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 311be90..18f978b 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1662,5 +1662,22 @@ static inline unsigned int > debug_guardpage_minorder(void) { return 0; } > static inline bool page_is_guard(struct page *page) { return false; } > #endif /* CONFIG_DEBUG_PAGEALLOC */ > > +#if (defined(CONFIG_VIRTIO_BALLOON) || \ > + defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION) > +extern bool isolate_balloon_page(struct page *); > +extern bool putback_balloon_page(struct page *); > +extern struct address_space *balloon_mapping; > + > +static inline bool movable_balloon_page(struct page *page) > +{ > + return (page->mapping && page->mapping == balloon_mapping); > +} > + > +#else > +static inline bool isolate_balloon_page(struct page *page) { return false; } > +static inline bool putback_balloon_page(struct page *page) { return false; } > +static inline bool movable_balloon_page(struct page *page) { return false; } > +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */ > + > #endif /* __KERNEL__ */ > #endif /* _LINUX_MM_H */ > diff --git a/mm/compaction.c b/mm/compaction.c > index e78cb96..7372592 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include "internal.h" > > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > @@ -21,6 +22,90 @@ > #define CREATE_TRACE_POINTS > #include > > +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE) > +/* > + * Balloon pages special page->mapping. > + * Users must properly allocate and initialize an instance of > balloon_mapping, > + * and set it as the page->mapping for balloon enlisted page instances. > + * There is no need on utilizing struct address_space locking schemes for > + * balloon_mapping as, once it gets initialized at balloon driver, it will > + * remain just like a static reference that helps us on identifying a guest > + * ballooned page by its mapping, as well as it will keep the 'a_ops' > callback > + * pointers to the functions that will execute the balloon page mobility > tasks. > + * > + * address_space_operations necessary methods for ballooned pages: > + * .migratepage- used to perform balloon's page migration (as is) > + * .invalidatepage - used to isolate a page from balloon's page list > + * .freepage - used to reinsert an isolated page to balloon's page > list > + */ > +struct address_space *balloon_mapping; > +EXPORT_SYMBOL_GPL(balloon_mapping); > + > +static inline void __isolate_balloon_page(struct page *page) > +{ > + page->mapping->a_ops->invalidatepage(page, 0); > +} > + > +static inline void __putback_balloon_page(struct page *page) > +{ > + page->mapping->a_ops->freepage(page); > +} > + > +/* __isolate_lru_page() counterpart for a ballooned page */ > +bool isolate_balloon_page(struct page *page) > +{ > + if (WARN_ON(!movable_balloon_page(page))) > + return false; > + > + if (likely(get_page_unless_zero(page))) { > + /* > + * As balloon pages are not isolated from LRU lists, concurrent > + * compaction threads can race against page migration functions > + * move_to_new_page() & __unmap_and_move(). > + * In order to avoid having an already isolated balloon page > + * being (wrongly) re-isolated while it is under migration, > + * lets be sure we have the page lock before proceeding with > + * the balloon page isolation steps. > + */ > + if (likely(trylock_page(page))) { > + /* > + * A ballooned page, by default, has just one refcount. > + * Prevent concurrent compaction threads from isolating > + * an already isolated balloon page. > + */ > + if (movable_balloon_page(page) && > + (page_count(page) == 2)) { > +
Re: [PATCH v6 3/3] mm: add vm event counters for balloon pages compaction
On 08/08/2012 06:53 PM, Rafael Aquini wrote: This patch is only for testing report purposes and shall be dropped in case of the rest of this patchset getting accepted for merging. I wonder if it would make sense to just keep these statistics, so if a change breaks balloon migration in the future, we'll be able to see it... Signed-off-by: Rafael Aquini Acked-by: Rik van Riel -- All rights reversed ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 2/3] virtio_balloon: introduce migration primitives to balloon pages
On 08/08/2012 06:53 PM, Rafael Aquini wrote: Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. Besides making balloon pages movable at allocation time and introducing the necessary primitives to perform balloon page migration/compaction, this patch also introduces the following locking scheme to provide the proper synchronization and protection for struct virtio_balloon elements against concurrent accesses due to parallel operations introduced by memory compaction / page migration. - balloon_lock (mutex) : synchronizes the access demand to elements of struct virtio_balloon and its queue operations; - pages_lock (spinlock): special protection to balloon pages list against concurrent list handling operations; Signed-off-by: Rafael Aquini Reviewed-by: Rik van Riel -- All rights reversed ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages
On 08/08/2012 06:53 PM, Rafael Aquini wrote: Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. This patch introduces the helper functions as well as the necessary changes to teach compaction and migration bits how to cope with pages which are part of a guest memory balloon, in order to make them movable by memory compaction procedures. Signed-off-by: Rafael Aquini Reviewed-by: Rik van Riel -- All rights reversed ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v6 2/3] virtio_balloon: introduce migration primitives to balloon pages
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. Besides making balloon pages movable at allocation time and introducing the necessary primitives to perform balloon page migration/compaction, this patch also introduces the following locking scheme to provide the proper synchronization and protection for struct virtio_balloon elements against concurrent accesses due to parallel operations introduced by memory compaction / page migration. - balloon_lock (mutex) : synchronizes the access demand to elements of struct virtio_balloon and its queue operations; - pages_lock (spinlock): special protection to balloon pages list against concurrent list handling operations; Signed-off-by: Rafael Aquini --- drivers/virtio/virtio_balloon.c | 138 +--- include/linux/virtio_balloon.h | 4 ++ 2 files changed, 134 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0908e60..7c937a0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -27,6 +27,7 @@ #include #include #include +#include /* * Balloon device works in 4K page units. So each page is pointed to by @@ -35,6 +36,12 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT) +/* Synchronizes accesses/updates to the struct virtio_balloon elements */ +DEFINE_MUTEX(balloon_lock); + +/* Protects 'virtio_balloon->pages' list against concurrent handling */ +DEFINE_SPINLOCK(pages_lock); + struct virtio_balloon { struct virtio_device *vdev; @@ -51,6 +58,7 @@ struct virtio_balloon /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* * The pages we've told the Host we're not using. * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb->pfns)); + mutex_lock(&balloon_lock); for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY | - __GFP_NOMEMALLOC | __GFP_NOWARN); + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE | + __GFP_NORETRY | __GFP_NOWARN | + __GFP_NOMEMALLOC); if (!page) { if (printk_ratelimit()) dev_printk(KERN_INFO, &vb->vdev->dev, @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb->pfns + vb->num_pfns, page); vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; + spin_lock(&pages_lock); list_add(&page->lru, &vb->pages); + page->mapping = balloon_mapping; + spin_unlock(&pages_lock); } /* Didn't get any? Oh well. */ @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb->inflate_vq); + mutex_unlock(&balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb->pfns)); + mutex_lock(&balloon_lock); for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + /* +* We can race against virtballoon_isolatepage() and end up +* stumbling across a _temporarily_ empty 'pages' list. +*/ + spin_lock(&pages_lock); + if (unlikely(list_empty(&vb->pages))) { + spin_unlock(&pages_lock); + break; + } page = list_first_entry(&vb->pages, struct page, lru); + page->mapping = NULL; list_del(&page->lru); + spin_unlock(&pages_lock); set_page_pfns(vb->pfns + vb->num_pfns, page); vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to
[PATCH v6 0/3] make balloon pages movable by compaction
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. This patch-set follows the main idea discussed at 2012 LSFMMS session: "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/ to introduce the required changes to the virtio_balloon driver, as well as the changes to the core compaction & migration bits, in order to make those subsystems aware of ballooned pages and allow memory balloon pages become movable within a guest, thus avoiding the aforementioned fragmentation issue Rafael Aquini (3): mm: introduce compaction and migration for virtio ballooned pages virtio_balloon: introduce migration primitives to balloon pages mm: add vm event counters for balloon pages compaction drivers/virtio/virtio_balloon.c | 139 +--- include/linux/mm.h | 17 + include/linux/virtio_balloon.h | 4 ++ include/linux/vm_event_item.h | 2 + mm/compaction.c | 132 -- mm/migrate.c| 32 - mm/vmstat.c | 4 ++ 7 files changed, 302 insertions(+), 28 deletions(-) Change log: v6: * rename 'is_balloon_page()' to 'movable_balloon_page()' (Rik); v5: * address Andrew Morton's review comments on the patch series; * address a couple extra nitpick suggestions on PATCH 01 (Minchan); v4: * address Rusty Russel's review comments on PATCH 02; * re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3; V3: * address reviewers nitpick suggestions on PATCH 01 (Mel, Minchan); V2: * address Mel Gorman's review comments on PATCH 01; Preliminary test results: (2 VCPU 1024mB RAM KVM guest running 3.6.0_rc1+ -- after a reboot) * 64mB balloon: [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat compact_blocks_moved 0 compact_pages_moved 0 compact_pagemigrate_failed 0 compact_stall 0 compact_fail 0 compact_success 0 compact_balloon_migrated 0 compact_balloon_failed 0 compact_balloon_isolated 0 compact_balloon_freed 0 [root@localhost ~]# [root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null [1] Doneecho 1 > /proc/sys/vm/compact_memory [2] Doneecho 1 > /proc/sys/vm/compact_memory [3] Doneecho 1 > /proc/sys/vm/compact_memory [4] Doneecho 1 > /proc/sys/vm/compact_memory [5]- Doneecho 1 > /proc/sys/vm/compact_memory [6]+ Doneecho 1 > /proc/sys/vm/compact_memory [root@localhost ~]# [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat compact_blocks_moved 3520 compact_pages_moved 47548 compact_pagemigrate_failed 120 compact_stall 0 compact_fail 0 compact_success 0 compact_balloon_migrated 16378 compact_balloon_failed 0 compact_balloon_isolated 16378 compact_balloon_freed 16378 * 128mB balloon: [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat compact_blocks_moved 0 compact_pages_moved 0 compact_pagemigrate_failed 0 compact_stall 0 compact_fail 0 compact_success 0 compact_balloon_migrated 0 compact_balloon_failed 0 compact_balloon_isolated 0 compact_balloon_freed 0 [root@localhost ~]# [root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null [1] Doneecho 1 > /proc/sys/vm/compact_memory [2] Doneecho 1 > /proc/sys/vm/compact_memory [3] Doneecho 1 > /proc/sys/vm/compact_memory [4] Doneecho 1 > /proc/sys/vm/compact_memory [5]- Doneecho 1 > /proc/sys/vm/compact_memory [6]+ Doneecho 1 > /proc/sys/vm/compact_memory [root@localhost ~]# [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat compact_blocks_moved 3356 compact_pages_moved 47099 compact_pagemigrate_failed 158 compact_stall 0 compact_fail 0 compact_success 0 compact_balloon_migrated 26275 compact_balloon_failed 42 compact_balloon_isolated 26317 compact_balloon_freed 26275 -- 1.7.11.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v6 3/3] mm: add vm event counters for balloon pages compaction
This patch is only for testing report purposes and shall be dropped in case of the rest of this patchset getting accepted for merging. Signed-off-by: Rafael Aquini --- drivers/virtio/virtio_balloon.c | 1 + include/linux/vm_event_item.h | 2 ++ mm/compaction.c | 1 + mm/migrate.c| 6 -- mm/vmstat.c | 4 5 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 7c937a0..b8f7ea5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -414,6 +414,7 @@ int virtballoon_migratepage(struct address_space *mapping, mutex_unlock(&balloon_lock); + count_vm_event(COMPACTBALLOONMIGRATED); return 0; } diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 57f7b10..a632a5d 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -41,6 +41,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_COMPACTION COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED, COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS, + COMPACTBALLOONMIGRATED, COMPACTBALLOONFAILED, + COMPACTBALLOONISOLATED, COMPACTBALLOONFREED, #endif #ifdef CONFIG_HUGETLB_PAGE HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, diff --git a/mm/compaction.c b/mm/compaction.c index 7372592..5d6a344 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -77,6 +77,7 @@ bool isolate_balloon_page(struct page *page) (page_count(page) == 2)) { __isolate_balloon_page(page); unlock_page(page); + count_vm_event(COMPACTBALLOONISOLATED); return true; } unlock_page(page); diff --git a/mm/migrate.c b/mm/migrate.c index 871a304..4115875 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -79,9 +79,10 @@ void putback_lru_pages(struct list_head *l) list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); - if (unlikely(movable_balloon_page(page))) + if (unlikely(movable_balloon_page(page))) { + count_vm_event(COMPACTBALLOONFAILED); WARN_ON(!putback_balloon_page(page)); - else + } else putback_lru_page(page); } } @@ -872,6 +873,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, page_is_file_cache(page)); put_page(page); __free_page(page); + count_vm_event(COMPACTBALLOONFREED); return rc; } out: diff --git a/mm/vmstat.c b/mm/vmstat.c index df7a674..8d80f60 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -768,6 +768,10 @@ const char * const vmstat_text[] = { "compact_stall", "compact_fail", "compact_success", + "compact_balloon_migrated", + "compact_balloon_failed", + "compact_balloon_isolated", + "compact_balloon_freed", #endif #ifdef CONFIG_HUGETLB_PAGE -- 1.7.11.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. This patch introduces the helper functions as well as the necessary changes to teach compaction and migration bits how to cope with pages which are part of a guest memory balloon, in order to make them movable by memory compaction procedures. Signed-off-by: Rafael Aquini --- include/linux/mm.h | 17 +++ mm/compaction.c| 131 + mm/migrate.c | 30 +++- 3 files changed, 158 insertions(+), 20 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 311be90..18f978b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1662,5 +1662,22 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; } static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ +#if (defined(CONFIG_VIRTIO_BALLOON) || \ + defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION) +extern bool isolate_balloon_page(struct page *); +extern bool putback_balloon_page(struct page *); +extern struct address_space *balloon_mapping; + +static inline bool movable_balloon_page(struct page *page) +{ + return (page->mapping && page->mapping == balloon_mapping); +} + +#else +static inline bool isolate_balloon_page(struct page *page) { return false; } +static inline bool putback_balloon_page(struct page *page) { return false; } +static inline bool movable_balloon_page(struct page *page) { return false; } +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */ + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/mm/compaction.c b/mm/compaction.c index e78cb96..7372592 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "internal.h" #if defined CONFIG_COMPACTION || defined CONFIG_CMA @@ -21,6 +22,90 @@ #define CREATE_TRACE_POINTS #include +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE) +/* + * Balloon pages special page->mapping. + * Users must properly allocate and initialize an instance of balloon_mapping, + * and set it as the page->mapping for balloon enlisted page instances. + * There is no need on utilizing struct address_space locking schemes for + * balloon_mapping as, once it gets initialized at balloon driver, it will + * remain just like a static reference that helps us on identifying a guest + * ballooned page by its mapping, as well as it will keep the 'a_ops' callback + * pointers to the functions that will execute the balloon page mobility tasks. + * + * address_space_operations necessary methods for ballooned pages: + * .migratepage- used to perform balloon's page migration (as is) + * .invalidatepage - used to isolate a page from balloon's page list + * .freepage - used to reinsert an isolated page to balloon's page list + */ +struct address_space *balloon_mapping; +EXPORT_SYMBOL_GPL(balloon_mapping); + +static inline void __isolate_balloon_page(struct page *page) +{ + page->mapping->a_ops->invalidatepage(page, 0); +} + +static inline void __putback_balloon_page(struct page *page) +{ + page->mapping->a_ops->freepage(page); +} + +/* __isolate_lru_page() counterpart for a ballooned page */ +bool isolate_balloon_page(struct page *page) +{ + if (WARN_ON(!movable_balloon_page(page))) + return false; + + if (likely(get_page_unless_zero(page))) { + /* +* As balloon pages are not isolated from LRU lists, concurrent +* compaction threads can race against page migration functions +* move_to_new_page() & __unmap_and_move(). +* In order to avoid having an already isolated balloon page +* being (wrongly) re-isolated while it is under migration, +* lets be sure we have the page lock before proceeding with +* the balloon page isolation steps. +*/ + if (likely(trylock_page(page))) { + /* +* A ballooned page, by default, has just one refcount. +* Prevent concurrent compaction threads from isolating +* an already isolated balloon page. +*/ + if (movable_balloon_page(page) && + (page_count(page) == 2)) { + __isolate_balloon_page(page); + unlock_page(page); + return true; + } + unlock_page(page); + } + /* Drop refcount t
[PATCH V7 2/2] virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path
We need to support both REQ_FLUSH and REQ_FUA for bio based path since it does not get the sequencing of REQ_FUA into REQ_FLUSH that request based drivers can request. REQ_FLUSH is emulated by: A) If the bio has no data to write: 1. Send VIRTIO_BLK_T_FLUSH to device, 2. In the flush I/O completion handler, finish the bio B) If the bio has data to write: 1. Send VIRTIO_BLK_T_FLUSH to device 2. In the flush I/O completion handler, send the actual write data to device 3. In the write I/O completion handler, finish the bio REQ_FUA is emulated by: 1. Send the actual write data to device 2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device 3. In the flush I/O completion handler, finish the bio Changes in v7: - Using vbr->flags to trace request type - Dropped unnecessary struct virtio_blk *vblk parameter - Reuse struct virtblk_req in bio done function Cahnges in v6: - Reworked REQ_FLUSH and REQ_FUA emulatation order Cc: Rusty Russell Cc: Jens Axboe Cc: Christoph Hellwig Cc: Tejun Heo Cc: Shaohua Li Cc: "Michael S. Tsirkin" Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Asias He --- drivers/block/virtio_blk.c | 272 +++-- 1 file changed, 188 insertions(+), 84 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 95cfeed..2edfb5c 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -58,10 +58,20 @@ struct virtblk_req struct bio *bio; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; + struct work_struct work; + struct virtio_blk *vblk; + int flags; u8 status; struct scatterlist sg[]; }; +enum { + VBLK_IS_FLUSH = 1, + VBLK_REQ_FLUSH = 2, + VBLK_REQ_DATA = 4, + VBLK_REQ_FUA= 8, +}; + static inline int virtblk_result(struct virtblk_req *vbr) { switch (vbr->status) { @@ -74,9 +84,133 @@ static inline int virtblk_result(struct virtblk_req *vbr) } } -static inline void virtblk_request_done(struct virtio_blk *vblk, - struct virtblk_req *vbr) +static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk, + gfp_t gfp_mask) +{ + struct virtblk_req *vbr; + + vbr = mempool_alloc(vblk->pool, gfp_mask); + if (vbr && use_bio) + sg_init_table(vbr->sg, vblk->sg_elems); + + vbr->vblk = vblk; + + return vbr; +} + +static void virtblk_add_buf_wait(struct virtio_blk *vblk, +struct virtblk_req *vbr, +unsigned long out, +unsigned long in) +{ + DEFINE_WAIT(wait); + + for (;;) { + prepare_to_wait_exclusive(&vblk->queue_wait, &wait, + TASK_UNINTERRUPTIBLE); + + spin_lock_irq(vblk->disk->queue->queue_lock); + if (virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr, + GFP_ATOMIC) < 0) { + spin_unlock_irq(vblk->disk->queue->queue_lock); + io_schedule(); + } else { + virtqueue_kick(vblk->vq); + spin_unlock_irq(vblk->disk->queue->queue_lock); + break; + } + + } + + finish_wait(&vblk->queue_wait, &wait); +} + +static inline void virtblk_add_req(struct virtblk_req *vbr, + unsigned int out, unsigned int in) +{ + struct virtio_blk *vblk = vbr->vblk; + + spin_lock_irq(vblk->disk->queue->queue_lock); + if (unlikely(virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr, + GFP_ATOMIC) < 0)) { + spin_unlock_irq(vblk->disk->queue->queue_lock); + virtblk_add_buf_wait(vblk, vbr, out, in); + return; + } + virtqueue_kick(vblk->vq); + spin_unlock_irq(vblk->disk->queue->queue_lock); +} + +static int virtblk_bio_send_flush(struct virtblk_req *vbr) +{ + unsigned int out = 0, in = 0; + + vbr->flags |= VBLK_IS_FLUSH; + vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH; + vbr->out_hdr.sector = 0; + vbr->out_hdr.ioprio = 0; + sg_set_buf(&vbr->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr)); + sg_set_buf(&vbr->sg[out + in++], &vbr->status, sizeof(vbr->status)); + + virtblk_add_req(vbr, out, in); + + return 0; +} + +static int virtblk_bio_send_data(struct virtblk_req *vbr) { + struct virtio_blk *vblk = vbr->vblk; + unsigned int num, out = 0, in = 0; + struct bio *bio = vbr->bio; + + vbr->flags &= ~VBLK_IS_FLUSH; + vbr->out_hdr.type = 0; + vbr->out_hdr.sector = bio->bi_sector;
[PATCH V7 1/2] virtio-blk: Add bio-based IO path for virtio-blk
This patch introduces bio-based IO path for virtio-blk. Compared to request-based IO path, bio-based IO path uses driver provided ->make_request_fn() method to bypasses the IO scheduler. It handles the bio to device directly without allocating a request in block layer. This reduces the IO path in guest kernel to achieve high IOPS and lower latency. The downside is that guest can not use the IO scheduler to merge and sort requests. However, this is not a big problem if the backend disk in host side uses faster disk device. When the bio-based IO path is not enabled, virtio-blk still uses the original request-based IO path, no performance difference is observed. Using a slow device e.g. normal SATA disk, the bio-based IO path for sequential read and write are slower than req-based IO path due to lack of merge in guest kernel. So we make the bio-based path optional. Performance evaluation: - 1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 28%, 24%, 21%, 16% Latency improvement: 32%, 17%, 21%, 16% Long version: With bio-based IO path: seq-read : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec rand-write: io=3095.7MB, bw=96198KB/s, iops=192396 , runt= 32952msec clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30 clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35 clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39 clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45 cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954 cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137 cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648 cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375 With request-based IO path: seq-read : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49 clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15 clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27 clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68 cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622 cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959 cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082 cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985 2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 11%, 11%, 13%, 10% Latency improvement: 10%, 10%, 12%, 10% Long Version: With bio-based IO path: read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29 clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80 clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07 clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25 cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517 cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604 cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612 cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105 With request-based IO path: read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84 clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64 clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55 clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95 cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641 cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689 cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223 cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409 3) Fio test is performed in a 8 vcpu guest with normal SAT
[PATCH V7 0/2] Improve virtio-blk performance
Hi, all Changes in v7: - Using vbr->flags to trace request type - Dropped unnecessary struct virtio_blk *vblk parameter - Reuse struct virtblk_req in bio done function - Added performance data on normal SATA device and the reason why make it optional Fio test shows bio-based IO path gives the following performance improvement: 1) Ramdisk device With bio-based IO path, sequential read/write, random read/write IOPS boost : 28%, 24%, 21%, 16% Latency improvement: 32%, 17%, 21%, 16% 2) Fusion IO device With bio-based IO path, sequential read/write, random read/write IOPS boost : 11%, 11%, 13%, 10% Latency improvement: 10%, 10%, 12%, 10% 3) Normal SATA device With bio-based IO path, sequential read/write, random read/write IOPS boost : -10%, -10%, 4.4%, 0.5% Latency improvement: -12%, -15%, 2.5%, 0.8% Asias He (2): virtio-blk: Add bio-based IO path for virtio-blk virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path drivers/block/virtio_blk.c | 301 +++-- 1 file changed, 264 insertions(+), 37 deletions(-) -- 1.7.11.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization