Re: [PATCH 0/6] VSOCK for Linux upstreaming
On Mon, 05 Nov 2012 10:00:38 -0800 George Zhang georgezh...@vmware.com wrote: * * * This series of VSOCK linux upstreaming patches include latest udpate from VMware. Summary of changes: - Add include/linux/socket.h for AF_VSOCK. - Cleanup some comments. - Cleanup makefiles. * * * In an effort to improve the out-of-the-box experience with Linux kernels for VMware users, VMware is working on readying the Virtual Machine Communication Interface (vmw_vmci) and VMCI Sockets (VSOCK) (vmw_vsock) kernel modules for inclusion in the Linux kernel. The purpose of this post is to acquire feedback on the vmw_vsock kernel module. The vmw_vmci kernel module has been presented in an early post. * * * VMCI Sockets allows virtual machines to communicate with host kernel modules and the VMware hypervisors. VMCI Sockets kernel module has dependency on VMCI kernel module. User level applications both in a virtual machine and on the host can use vmw_vmci through VMCI Sockets API which facilitates fast and efficient communication between guest virtual machines and their host. A socket address family designed to be compatible with UDP and TCP at the interface level. Today, VMCI and VMCI Sockets are used by the VMware shared folders (HGFS) and various VMware Tools components inside the guest for zero-config, network-less access to VMware host services. In addition to this, VMware's users are using VMCI Sockets for various applications, where network access of the virtual machine is restricted or non-existent. Examples of this are VMs communicating with device proxies for proprietary hardware running as host applications and automated testing of applications running within virtual machines. The VMware VMCI Sockets are similar to other socket types, like Berkeley UNIX socket interface. The VMCI sockets module supports both connection-oriented stream sockets like TCP, and connectionless datagram sockets like UDP. The VSOCK protocol family is defined as AF_VSOCK and the socket operations split for SOCK_DGRAM and SOCK_STREAM. For additional information about the use of VMCI and in particular VMCI Sockets, please refer to the VMCI Socket Programming Guide available at https://www.vmware.com/support/developer/vmci-sdk/. This should go to netdev as well since it is a new address family. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] VSOCK for Linux upstreaming
Never mind, mail server seemed to be overloaded today. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/6] VSOCK: vsock protocol implementation.
On Mon, 05 Nov 2012 10:00:51 -0800 George Zhang georgezh...@vmware.com wrote: + /* Added in 2.6.10. */ + .owner = THIS_MODULE, Thanks for submitting this, it will make life easier for distro's that now have to go through extra effort to include out of mainline support for Vmware. You did some scrubbing of the macro's to support multiple kernel versions, but there are still some leftovers. This code seems to have a lot of this added in version xxx type comments. These are probably not a good idea to include in the mainline kernel code. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
Hi Rusty, So, this adds another host-side virtqueue implementation. Can we combine them together conveniently? You pulled out more stuff into vring.h which is a start, but it's a bit overloaded. Perhaps we should separate the common fields into struct vring, and use it to build: struct vring_guest { struct vring vr; u16 last_used_idx; }; struct vring_host { struct vring vr; u16 last_avail_idx; }; I haven't looked closely at vhost to see what it wants, but I would think we could share more code. I have played around with the code in vhost.c to explore your idea. The main issue I run into is that vhost.c is accessing user data while my new code does not. So I end up with some quirky code testing if the ring lives in user memory or not. Another issue is sparse warnings when accessing user memory. With your suggested changes I end up sharing about 100 lines of code. So in sum, I feel this add more complexity than what we gain by sharing. Below is an initial draft of the re-usable code. I added is_uaccess to struct virtio_ring in order to know if the ring lives in user memory. Let me know what you think. [snip] int virtqueue_add_used(struct vring_host *vr, unsigned int head, int len, struct vring_used_elem **used) { /* The virtqueue contains a ring of used buffers. Get a pointer to the * next entry in that used ring. */ *used = vr-vring.used-ring[vr-last_used_idx % vr-vring.num]; if (vr-is_uaccess) { if(unlikely(__put_user(head, (*used)-id))) { pr_debug(Failed to write used id); return -EFAULT; } if (unlikely(__put_user(len, (*used)-len))) { pr_debug(Failed to write used len); return -EFAULT; } smp_wmb(); if (__put_user(vr-last_used_idx + 1, vr-vring.used-idx)) { pr_debug(Failed to increment used idx); return -EFAULT; } } else { (*used)-id = head; (*used)-len = len; smp_wmb(); vr-vring.used-idx = vr-last_used_idx + 1; } vr-last_used_idx++; return 0; } /* Each buffer in the virtqueues is actually a chain of descriptors. This * function returns the next descriptor in the chain, * or -1U if we're at the end. */ unsigned virtqueue_next_desc(struct vring_desc *desc) { unsigned int next; /* If this descriptor says it doesn't chain, we're done. */ if (!(desc-flags VRING_DESC_F_NEXT)) return -1U; /* Check they're not leading us off end of descriptors. */ next = desc-next; /* Make sure compiler knows to grab that: we don't want it changing! */ /* We will use the result as an index in an array, so most * architectures only need a compiler barrier here. */ read_barrier_depends(); return next; } static int virtqueue_next_avail_desc(struct vring_host *vr) { int head; u16 last_avail_idx; /* Check it isn't doing very strange things with descriptor numbers. */ last_avail_idx = vr-last_avail_idx; if (vr-is_uaccess) { if (__get_user(vr-avail_idx, vr-vring.avail-idx)) { pr_debug(Failed to access avail idx at %p\n, vr-vring.avail-idx); return -EFAULT; } } else vr-avail_idx = vr-vring.avail-idx; if (unlikely((u16)(vr-avail_idx - last_avail_idx) vr-vring.num)) { pr_debug(Guest moved used index from %u to %u, last_avail_idx, vr-avail_idx); return -EFAULT; } /* If there's nothing new since last we looked, return invalid. */ if (vr-avail_idx == last_avail_idx) return vr-vring.num; /* Only get avail ring entries after they have been exposed by guest. */ smp_rmb(); /* Grab the next descriptor number they're advertising, and increment * the index we've seen. */ if (vr-is_uaccess) { if (unlikely(__get_user(head, vr-vring.avail-ring[last_avail_idx % vr-vring.num]))) { pr_debug(Failed to read head: idx %d address %p\n, last_avail_idx, vr-vring.avail-ring[last_avail_idx % vr-vring.num]); return -EFAULT; } } else head = vr-vring.avail-ring[last_avail_idx %
[PATCH] xen/events: xen/events: fix RCU warning
exit_idle() should be called after irq_enter(), otherwise it throws: [2.513020] [ INFO: suspicious RCU usage. ] [2.513076] 3.6.5 #1 Not tainted [2.513128] --- [2.513183] include/linux/rcupdate.h:725 rcu_read_lock() used illegally while idle! [2.513271] [2.513271] other info that might help us debug this: [2.513271] [2.513388] [2.513388] RCU used illegally from idle CPU! [2.513388] rcu_scheduler_active = 1, debug_locks = 1 [2.513511] RCU used illegally from extended quiescent state! [2.513572] 1 lock held by swapper/0/0: [2.513626] #0: (rcu_read_lock){..}, at: [810e9fe0] __atomic_notifier_call_chain+0x0/0x140 [2.513815] [2.513815] stack backtrace: [2.513897] Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1 [2.513954] Call Trace: [2.514005] IRQ [811259a2] lockdep_rcu_suspicious+0xe2/0x130 [2.514107] [810ea10c] __atomic_notifier_call_chain+0x12c/0x140 [2.514169] [810e9fe0] ? atomic_notifier_chain_unregister+0x90/0x90 [2.514258] [811216cd] ? trace_hardirqs_off+0xd/0x10 [2.514318] [810ea136] atomic_notifier_call_chain+0x16/0x20 [2.514381] [810777c3] exit_idle+0x43/0x50 [2.514441] [81568865] xen_evtchn_do_upcall+0x25/0x50 [2.514503] [81aa690e] xen_do_hypervisor_callback+0x1e/0x30 [2.514562] EOI [810013aa] ? hypercall_page+0x3aa/0x1000 [2.514662] [810013aa] ? hypercall_page+0x3aa/0x1000 [2.514722] [81061540] ? xen_safe_halt+0x10/0x20 [2.514782] [81075cfa] ? default_idle+0xba/0x570 [2.514841] [810778af] ? cpu_idle+0xdf/0x140 [2.514900] [81a4d881] ? rest_init+0x135/0x144 [2.514960] [81a4d74c] ? csum_partial_copy_generic+0x16c/0x16c [2.515022] [82520c45] ? start_kernel+0x3db/0x3e8 [2.515081] [8252066a] ? repair_env_string+0x5a/0x5a [2.515141] [82520356] ? x86_64_start_reservations+0x131/0x135 [2.515202] [82524aca] ? xen_start_kernel+0x465/0x46 Signed-off-by: Mojiong Qiu mj...@tencent.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Jeremy Fitzhardinge jer...@goop.org Cc: xen-de...@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@kernel.org (at least to 3.0.y) --- drivers/xen/events.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 912ac81..0be4df3 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -1395,10 +1395,10 @@ void xen_evtchn_do_upcall(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); + irq_enter(); #ifdef CONFIG_X86 exit_idle(); #endif - irq_enter(); __xen_evtchn_do_upcall(); -- 1.6.3.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Pv-drivers] [PATCH 1/6] VSOCK: vsock protocol implementation.
Hi Stephen, You did some scrubbing of the macro's to support multiple kernel versions, but there are still some leftovers. This code seems to have a lot of this added in version xxx type comments. These are probably not a good idea to include in the mainline kernel code. Thanks so much for taking a look. Sorry about that, we'll remove all such occurrences and send out a new series of patches in a bit. Thanks! - Andy ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen/events: xen/events: fix RCU warning
On Tue, Nov 06, 2012 at 04:08:15PM +0800, Mojiong Qiu wrote: exit_idle() should be called after irq_enter(), otherwise it throws: That seems odd - wouldn't smp_x86_platform_ipi also need the same treatment [edit: I was looking at 3.0 kernel code] ? Ah, this is caused by commit 98ad1cc14a5c4fd658f9d72c6ba5c86dfd3ce0d5 Author: Frederic Weisbecker fweis...@gmail.com Date: Fri Oct 7 18:22:09 2011 +0200 x86: Call idle notifier after irq_enter() and it missed the xen-case which means that any kernel v3.3 and newer need this, but earlier do not. Thx. Will put in 3.7 tree. [2.513020] [ INFO: suspicious RCU usage. ] [2.513076] 3.6.5 #1 Not tainted [2.513128] --- [2.513183] include/linux/rcupdate.h:725 rcu_read_lock() used illegally while idle! [2.513271] [2.513271] other info that might help us debug this: [2.513271] [2.513388] [2.513388] RCU used illegally from idle CPU! [2.513388] rcu_scheduler_active = 1, debug_locks = 1 [2.513511] RCU used illegally from extended quiescent state! [2.513572] 1 lock held by swapper/0/0: [2.513626] #0: (rcu_read_lock){..}, at: [810e9fe0] __atomic_notifier_call_chain+0x0/0x140 [2.513815] [2.513815] stack backtrace: [2.513897] Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1 [2.513954] Call Trace: [2.514005] IRQ [811259a2] lockdep_rcu_suspicious+0xe2/0x130 [2.514107] [810ea10c] __atomic_notifier_call_chain+0x12c/0x140 [2.514169] [810e9fe0] ? atomic_notifier_chain_unregister+0x90/0x90 [2.514258] [811216cd] ? trace_hardirqs_off+0xd/0x10 [2.514318] [810ea136] atomic_notifier_call_chain+0x16/0x20 [2.514381] [810777c3] exit_idle+0x43/0x50 [2.514441] [81568865] xen_evtchn_do_upcall+0x25/0x50 [2.514503] [81aa690e] xen_do_hypervisor_callback+0x1e/0x30 [2.514562] EOI [810013aa] ? hypercall_page+0x3aa/0x1000 [2.514662] [810013aa] ? hypercall_page+0x3aa/0x1000 [2.514722] [81061540] ? xen_safe_halt+0x10/0x20 [2.514782] [81075cfa] ? default_idle+0xba/0x570 [2.514841] [810778af] ? cpu_idle+0xdf/0x140 [2.514900] [81a4d881] ? rest_init+0x135/0x144 [2.514960] [81a4d74c] ? csum_partial_copy_generic+0x16c/0x16c [2.515022] [82520c45] ? start_kernel+0x3db/0x3e8 [2.515081] [8252066a] ? repair_env_string+0x5a/0x5a [2.515141] [82520356] ? x86_64_start_reservations+0x131/0x135 [2.515202] [82524aca] ? xen_start_kernel+0x465/0x46 Signed-off-by: Mojiong Qiu mj...@tencent.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Jeremy Fitzhardinge jer...@goop.org Cc: xen-de...@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@kernel.org (at least to 3.0.y) ^^^- vger.kernel.org You mean 3.3.x --- drivers/xen/events.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 912ac81..0be4df3 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -1395,10 +1395,10 @@ void xen_evtchn_do_upcall(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); + irq_enter(); #ifdef CONFIG_X86 exit_idle(); #endif - irq_enter(); __xen_evtchn_do_upcall(); -- 1.6.3.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 1/7] mm: adjust address_space_operations.migratepage() return code
This patch introduces MIGRATEPAGE_SUCCESS as the default return code for address_space_operations.migratepage() method and documents the expected return code for the same method in failure cases. Signed-off-by: Rafael Aquini aqu...@redhat.com --- fs/hugetlbfs/inode.c| 4 ++-- include/linux/migrate.h | 7 +++ mm/migrate.c| 22 +++--- 3 files changed, 20 insertions(+), 13 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index c5bc355..bdeda2c 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -608,11 +608,11 @@ static int hugetlbfs_migrate_page(struct address_space *mapping, int rc; rc = migrate_huge_page_move_mapping(mapping, newpage, page); - if (rc) + if (rc != MIGRATEPAGE_SUCCESS) return rc; migrate_page_copy(newpage, page); - return 0; + return MIGRATEPAGE_SUCCESS; } static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index ce7e667..a4e886d 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -7,6 +7,13 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **); +/* + * Return values from addresss_space_operations.migratepage(): + * - negative errno on page migration failure; + * - zero on page migration success; + */ +#define MIGRATEPAGE_SUCCESS0 + #ifdef CONFIG_MIGRATION extern void putback_lru_pages(struct list_head *l); diff --git a/mm/migrate.c b/mm/migrate.c index 77ed2d7..98c7a89 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -286,7 +286,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, /* Anonymous page without mapping */ if (page_count(page) != 1) return -EAGAIN; - return 0; + return MIGRATEPAGE_SUCCESS; } spin_lock_irq(mapping-tree_lock); @@ -356,7 +356,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, } spin_unlock_irq(mapping-tree_lock); - return 0; + return MIGRATEPAGE_SUCCESS; } /* @@ -372,7 +372,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, if (!mapping) { if (page_count(page) != 1) return -EAGAIN; - return 0; + return MIGRATEPAGE_SUCCESS; } spin_lock_irq(mapping-tree_lock); @@ -399,7 +399,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, page_unfreeze_refs(page, expected_count - 1); spin_unlock_irq(mapping-tree_lock); - return 0; + return MIGRATEPAGE_SUCCESS; } /* @@ -486,11 +486,11 @@ int migrate_page(struct address_space *mapping, rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode); - if (rc) + if (rc != MIGRATEPAGE_SUCCESS) return rc; migrate_page_copy(newpage, page); - return 0; + return MIGRATEPAGE_SUCCESS; } EXPORT_SYMBOL(migrate_page); @@ -513,7 +513,7 @@ int buffer_migrate_page(struct address_space *mapping, rc = migrate_page_move_mapping(mapping, newpage, page, head, mode); - if (rc) + if (rc != MIGRATEPAGE_SUCCESS) return rc; /* @@ -549,7 +549,7 @@ int buffer_migrate_page(struct address_space *mapping, } while (bh != head); - return 0; + return MIGRATEPAGE_SUCCESS; } EXPORT_SYMBOL(buffer_migrate_page); #endif @@ -814,7 +814,7 @@ skip_unmap: put_anon_vma(anon_vma); uncharge: - mem_cgroup_end_migration(mem, page, newpage, rc == 0); + mem_cgroup_end_migration(mem, page, newpage, rc == MIGRATEPAGE_SUCCESS); unlock: unlock_page(page); out: @@ -987,7 +987,7 @@ int migrate_pages(struct list_head *from, case -EAGAIN: retry++; break; - case 0: + case MIGRATEPAGE_SUCCESS: break; default: /* Permanent failure */ @@ -1024,7 +1024,7 @@ int migrate_huge_page(struct page *hpage, new_page_t get_new_page, /* try again */ cond_resched(); break; - case 0: + case MIGRATEPAGE_SUCCESS: goto out; default: rc = -EIO; -- 1.7.11.7 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 0/7] make balloon pages movable by compaction
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. This patch-set follows the main idea discussed at 2012 LSFMMS session: Ballooning for transparent huge pages -- http://lwn.net/Articles/490114/ to introduce the required changes to the virtio_balloon driver, as well as the changes to the core compaction migration bits, in order to make those subsystems aware of ballooned pages and allow memory balloon pages become movable within a guest, thus avoiding the aforementioned fragmentation issue Following are numbers that prove this patch benefits on allowing compaction to be more effective at memory ballooned guests. Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite, running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB chunks, at every minute (inflating/deflating), while test was running: ===BEGIN stress-highalloc STRESS-HIGHALLOC highalloc-3.7 highalloc-3.7 rc4-clean rc4-patch Pass 1 55.00 ( 0.00%)62.00 ( 7.00%) Pass 2 54.00 ( 0.00%)62.00 ( 8.00%) while Rested75.00 ( 0.00%)80.00 ( 5.00%) MMTests Statistics: duration 3.7 3.7 rc4-clean rc4-patch User 1207.59 1207.46 System 1300.55 1299.61 Elapsed 2273.72 2157.06 MMTests Statistics: vmstat 3.7 3.7 rc4-clean rc4-patch Page Ins3581516 2374368 Page Outs 1114869210410332 Swap Ins 80 47 Swap Outs 3641 476 Direct pages scanned 37978 33826 Kswapd pages scanned1828245 1342869 Kswapd pages reclaimed 1710236 1304099 Direct pages reclaimed32207 31005 Kswapd efficiency 93% 97% Kswapd velocity 804.077 622.546 Direct efficiency 84% 91% Direct velocity 16.703 15.682 Percentage direct scans 2% 2% Page writes by reclaim792529704 Page writes file 756119228 Page writes anon 3641 476 Page reclaim immediate16764 11014 Page rescued immediate0 0 Slabs scanned 2171904 2152448 Direct inode steals 3852261 Kswapd inode steals 659137 609670 Kswapd skipped wait 1 69 THP fault alloc 546 631 THP collapse alloc 361 339 THP splits 259 263 THP fault fallback 98 50 THP collapse fail20 17 Compaction stalls 747 499 Compaction success 244 145 Compaction failures 503 354 Compaction pages moved 370888 474837 Compaction move failure 77378 65259 ===END stress-highalloc Rafael Aquini (7): mm: adjust address_space_operations.migratepage() return code mm: redefine address_space.assoc_mapping mm: introduce a common interface for balloon pages mobility mm: introduce compaction and migration for ballooned pages virtio_balloon: introduce migration primitives to balloon pages mm: introduce putback_movable_pages() mm: add vm event counters for balloon pages compaction drivers/virtio/virtio_balloon.c| 136 +-- fs/buffer.c| 12 +- fs/gfs2/glock.c| 2 +- fs/hugetlbfs/inode.c | 4 +- fs/inode.c | 2 +- fs/nilfs2/page.c | 2 +- include/linux/balloon_compaction.h | 220 ++ include/linux/fs.h | 2 +- include/linux/migrate.h| 19 +++ include/linux/pagemap.h| 16 +++ include/linux/vm_event_item.h | 8 +- mm/Kconfig | 15 ++ mm/Makefile| 1 + mm/balloon_compaction.c| 271 + mm/compaction.c| 27 +++- mm/migrate.c | 77 +-- mm/page_alloc.c| 2 +- mm/vmstat.c| 10 +- 18 files changed, 782 insertions(+), 44 deletions(-) create mode 100644 include/linux/balloon_compaction.h create mode 100644 mm/balloon_compaction.c Change log: v11: * Address AKPM's last review suggestions; * Extend the balloon compaction common API and simplify its usage at driver; * Minor nitpicks on code commentary; v10: * Adjust leak_balloon() wait_event logic to make a clear locking scheme (MST); * Drop the RCU
[PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. Besides making balloon pages movable at allocation time and introducing the necessary primitives to perform balloon page migration/compaction, this patch also introduces the following locking scheme, in order to enhance the syncronization methods for accessing elements of struct virtio_balloon, thus providing protection against concurrent access introduced by parallel memory migration threads. - balloon_lock (mutex) : synchronizes the access demand to elements of struct virtio_balloon and its queue operations; Signed-off-by: Rafael Aquini aqu...@redhat.com --- drivers/virtio/virtio_balloon.c | 135 1 file changed, 123 insertions(+), 12 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0908e60..69eede7 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -27,6 +27,7 @@ #include linux/delay.h #include linux/slab.h #include linux/module.h +#include linux/balloon_compaction.h /* * Balloon device works in 4K page units. So each page is pointed to by @@ -34,6 +35,7 @@ * page units. */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) +#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 struct virtio_balloon { @@ -52,15 +54,19 @@ struct virtio_balloon /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; /* -* The pages we've told the Host we're not using. +* The pages we've told the Host we're not using are enqueued +* at vb_dev_info-pages list. * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE * to num_pages above. */ - struct list_head pages; + struct balloon_dev_info *vb_dev_info; + + /* Synchronize access/update to this struct virtio_balloon elements */ + struct mutex balloon_lock; /* The array of pfns we tell the Host about. */ unsigned int num_pfns; - u32 pfns[256]; + u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; /* Memory statistics */ int need_stats_update; @@ -122,18 +128,25 @@ static void set_page_pfns(u32 pfns[], struct page *page) static void fill_balloon(struct virtio_balloon *vb, size_t num) { + struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; + + static DEFINE_RATELIMIT_STATE(fill_balloon_rs, + DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY | - __GFP_NOMEMALLOC | __GFP_NOWARN); + struct page *page = balloon_page_enqueue(vb_dev_info); + if (!page) { - if (printk_ratelimit()) + if (__ratelimit(fill_balloon_rs)) dev_printk(KERN_INFO, vb-vdev-dev, Out of puff! Can't get %zu pages\n, - num); + VIRTIO_BALLOON_PAGES_PER_PAGE); /* Sleep for at least 1/5 of a second before retry. */ msleep(200); break; @@ -141,7 +154,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; - list_add(page-lru, vb-pages); } /* Didn't get any? Oh well. */ @@ -149,6 +161,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb-inflate_vq); + mutex_unlock(vb-balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -165,14 +178,17 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num) static void leak_balloon(struct virtio_balloon *vb, size_t num) { struct page *page; + struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { -
[PATCH v11 6/7] mm: introduce putback_movable_pages()
The PATCH mm: introduce compaction and migration for virtio ballooned pages hacks around putback_lru_pages() in order to allow ballooned pages to be re-inserted on balloon page list as if a ballooned page was like a LRU page. As ballooned pages are not legitimate LRU pages, this patch introduces putback_movable_pages() to properly cope with cases where the isolated pageset contains ballooned pages and LRU pages, thus fixing the mentioned inelegant hack around putback_lru_pages(). Signed-off-by: Rafael Aquini aqu...@redhat.com --- include/linux/migrate.h | 2 ++ mm/compaction.c | 6 +++--- mm/migrate.c| 20 mm/page_alloc.c | 2 +- 4 files changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index e570c3c..ff074a4 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -27,6 +27,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **); #ifdef CONFIG_MIGRATION extern void putback_lru_pages(struct list_head *l); +extern void putback_movable_pages(struct list_head *l); extern int migrate_page(struct address_space *, struct page *, struct page *, enum migrate_mode); extern int migrate_pages(struct list_head *l, new_page_t x, @@ -50,6 +51,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, #else static inline void putback_lru_pages(struct list_head *l) {} +static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode) { return -ENOSYS; } diff --git a/mm/compaction.c b/mm/compaction.c index 76abd84..f268bd8 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -995,7 +995,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; - putback_lru_pages(cc-migratepages); + putback_movable_pages(cc-migratepages); cc-nr_migratepages = 0; goto out; case ISOLATE_NONE: @@ -1018,9 +1018,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) trace_mm_compaction_migratepages(nr_migrate - nr_remaining, nr_remaining); - /* Release LRU pages not migrated */ + /* Release isolated pages not migrated */ if (err) { - putback_lru_pages(cc-migratepages); + putback_movable_pages(cc-migratepages); cc-nr_migratepages = 0; if (err == -ENOMEM) { ret = COMPACT_PARTIAL; diff --git a/mm/migrate.c b/mm/migrate.c index 87ffe54..adb3d44 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -80,6 +80,26 @@ void putback_lru_pages(struct list_head *l) list_del(page-lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); + putback_lru_page(page); + } +} + +/* + * Put previously isolated pages back onto the appropriate lists + * from where they were once taken off for compaction/migration. + * + * This function shall be used instead of putback_lru_pages(), + * whenever the isolated pageset has been built by isolate_migratepages_range() + */ +void putback_movable_pages(struct list_head *l) +{ + struct page *page; + struct page *page2; + + list_for_each_entry_safe(page, page2, l, lru) { + list_del(page-lru); + dec_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); if (unlikely(balloon_page_movable(page))) balloon_page_putback(page); else diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..1cb0f93 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5710,7 +5710,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, 0, false, MIGRATE_SYNC); } - putback_lru_pages(cc-migratepages); + putback_movable_pages(cc-migratepages); return ret 0 ? 0 : ret; } -- 1.7.11.7 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 7/7] mm: add vm event counters for balloon pages compaction
This patch introduces a new set of vm event counters to keep track of ballooned pages compaction activity. Signed-off-by: Rafael Aquini aqu...@redhat.com --- drivers/virtio/virtio_balloon.c | 1 + include/linux/vm_event_item.h | 8 +++- mm/balloon_compaction.c | 2 ++ mm/migrate.c| 1 + mm/vmstat.c | 10 +- 5 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 69eede7..3756fc1 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -411,6 +411,7 @@ int virtballoon_migratepage(struct address_space *mapping, tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); + balloon_event_count(COMPACTBALLOONMIGRATED); return MIGRATEPAGE_BALLOON_SUCCESS; } diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 3d31145..cbd72fc 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_COMPACTION COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED, COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS, -#endif +#ifdef CONFIG_BALLOON_COMPACTION + COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */ + COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */ + COMPACTBALLOONRELEASED, /* old-page released after migration */ + COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */ +#endif /* CONFIG_BALLOON_COMPACTION */ +#endif /* CONFIG_COMPACTION */ #ifdef CONFIG_HUGETLB_PAGE HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, #endif diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c index 90935aa..32927eb 100644 --- a/mm/balloon_compaction.c +++ b/mm/balloon_compaction.c @@ -215,6 +215,7 @@ bool balloon_page_isolate(struct page *page) if (__is_movable_balloon_page(page) page_count(page) == 2) { __isolate_balloon_page(page); + balloon_event_count(COMPACTBALLOONISOLATED); unlock_page(page); return true; } @@ -237,6 +238,7 @@ void balloon_page_putback(struct page *page) if (__is_movable_balloon_page(page)) { __putback_balloon_page(page); put_page(page); + balloon_event_count(COMPACTBALLOONRETURNED); } else { __WARN(); dump_page(page); diff --git a/mm/migrate.c b/mm/migrate.c index adb3d44..ee3037d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -896,6 +896,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, page_is_file_cache(page)); put_page(page); __free_page(page); + balloon_event_count(COMPACTBALLOONRELEASED); return 0; } out: diff --git a/mm/vmstat.c b/mm/vmstat.c index c737057..1363edc 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -781,7 +781,15 @@ const char * const vmstat_text[] = { compact_stall, compact_fail, compact_success, -#endif + +#ifdef CONFIG_BALLOON_COMPACTION + compact_balloon_isolated, + compact_balloon_migrated, + compact_balloon_released, + compact_balloon_returned, +#endif /* CONFIG_BALLOON_COMPACTION */ + +#endif /* CONFIG_COMPACTION */ #ifdef CONFIG_HUGETLB_PAGE htlb_buddy_alloc_success, -- 1.7.11.7 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming
On 11/05/12 19:19, Andy King wrote: Hi David, The big and only question is whether anyone can actually use any of this stuff without your proprietary bits? Do you mean the VMCI calls? The VMCI driver is in the process of being upstreamed into the drivers/misc tree. Greg (cc'd on these patches) is actively reviewing that code and we are addressing feedback. Also, there was some interest from RedHat into using vSockets as a unified interface, routed over a hypervisor-specific transport (virtio or otherwise, although for now VMCI is the only one implemented). Can you outline how this can be done? From a quick look over the code it seems like vsock has a hard dependency on vmci, is that correct? When making vsock a generic, reusable kernel service it should be the other way around: vsock should provide the core implementation and an interface where hypervisor-specific transports (vmci, virtio, xenbus, ...) can register themself. cheers, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization