Re: [PATCH 0/6] VSOCK for Linux upstreaming

2012-11-06 Thread Stephen Hemminger
On Mon, 05 Nov 2012 10:00:38 -0800
George Zhang georgezh...@vmware.com wrote:

 * * *
 This series of VSOCK linux upstreaming patches include latest udpate from
 VMware.
 
 Summary of changes:
   - Add include/linux/socket.h for AF_VSOCK.
 - Cleanup some comments.
 - Cleanup makefiles.
 
 
 
 * * *
 
 In an effort to improve the out-of-the-box experience with Linux
 kernels for VMware users, VMware is working on readying the Virtual
 Machine Communication Interface (vmw_vmci) and VMCI Sockets (VSOCK)
 (vmw_vsock) kernel modules for inclusion in the Linux kernel. The
 purpose of this post is to acquire feedback on the vmw_vsock kernel
 module. The vmw_vmci kernel module has been presented in an early post.
 
 
 * * *
 
 VMCI Sockets allows virtual machines to communicate with host kernel
 modules and the VMware hypervisors. VMCI Sockets kernel module has
 dependency on VMCI kernel module. User level applications both in
 a virtual machine and on the host can use vmw_vmci through VMCI
 Sockets API which facilitates fast and efficient communication
 between guest virtual machines and their host. A socket
 address family designed to be compatible with UDP and TCP at the
 interface level. Today, VMCI and VMCI Sockets are used by the VMware
 shared folders (HGFS) and various VMware Tools components inside the
 guest for zero-config, network-less access to VMware host services. In
 addition to this, VMware's users are using VMCI Sockets for various
 applications, where network access of the virtual machine is
 restricted or non-existent. Examples of this are VMs communicating
 with device proxies for proprietary hardware running as host
 applications and automated testing of applications running within
 virtual machines.
 
 The VMware VMCI Sockets are similar to other socket types, like
 Berkeley UNIX socket interface. The VMCI sockets module supports
 both connection-oriented stream sockets like TCP, and connectionless
 datagram sockets like UDP. The VSOCK protocol family is defined as
 AF_VSOCK and the socket operations split for SOCK_DGRAM and
 SOCK_STREAM.
 
 For additional information about the use of VMCI and in particular
 VMCI Sockets, please refer to the VMCI Socket Programming Guide
 available at https://www.vmware.com/support/developer/vmci-sdk/.
 

This should go to netdev as well since it is a new address family.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/6] VSOCK for Linux upstreaming

2012-11-06 Thread Stephen Hemminger
Never mind, mail server seemed to be overloaded today.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/6] VSOCK: vsock protocol implementation.

2012-11-06 Thread Stephen Hemminger
On Mon, 05 Nov 2012 10:00:51 -0800
George Zhang georgezh...@vmware.com wrote:

 + /* Added in 2.6.10. */
 + .owner = THIS_MODULE,

Thanks for submitting this, it will make life easier for distro's
that now have to go through extra effort to include out of mainline
support for Vmware.

You did some scrubbing of the macro's to support multiple kernel
versions, but there are still some leftovers.
This code seems to have a lot of this added in version xxx
type comments. These are probably not a good idea to include
in the mainline kernel code. 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings

2012-11-06 Thread Sjur Brændeland
Hi Rusty,

 So, this adds another host-side virtqueue implementation.

 Can we combine them together conveniently?  You pulled out more stuff
 into vring.h which is a start, but it's a bit overloaded.
 Perhaps we should separate the common fields into struct vring, and use
 it to build:

 struct vring_guest {
 struct vring vr;
 u16 last_used_idx;
 };

 struct vring_host {
 struct vring vr;
 u16 last_avail_idx;
 };
 I haven't looked closely at vhost to see what it wants, but I would
 think we could share more code.

I have played around with the code in vhost.c to explore your idea.
The main issue I run into is that vhost.c is accessing user data while my new
code does not. So I end up with some quirky code testing if the ring lives in
user memory or not.  Another issue is sparse warnings when
accessing user memory.

With your suggested changes I end up sharing about 100 lines of code.
So in sum, I feel this add more complexity than what we gain by sharing.

Below is an initial draft of the re-usable code. I added is_uaccess to struct
virtio_ring in order to know if the ring lives in user memory.

Let me know what you think.

[snip]
int virtqueue_add_used(struct vring_host *vr, unsigned int head, int len,
struct vring_used_elem  **used)
{
/* The virtqueue contains a ring of used buffers.  Get a pointer to the
 * next entry in that used ring. */
*used = vr-vring.used-ring[vr-last_used_idx % vr-vring.num];
if (vr-is_uaccess) {
if(unlikely(__put_user(head, (*used)-id))) {
pr_debug(Failed to write used id);
return -EFAULT;
}
if (unlikely(__put_user(len, (*used)-len))) {
pr_debug(Failed to write used len);
return -EFAULT;
}
smp_wmb();
if (__put_user(vr-last_used_idx + 1,
   vr-vring.used-idx)) {
pr_debug(Failed to increment used idx);
return -EFAULT;
}
} else {
(*used)-id = head;
(*used)-len = len;
smp_wmb();
vr-vring.used-idx = vr-last_used_idx + 1;
}
vr-last_used_idx++;
return 0;
}

/* Each buffer in the virtqueues is actually a chain of descriptors.  This
 * function returns the next descriptor in the chain,
 * or -1U if we're at the end. */
unsigned virtqueue_next_desc(struct vring_desc *desc)
{
unsigned int next;

/* If this descriptor says it doesn't chain, we're done. */
if (!(desc-flags  VRING_DESC_F_NEXT))
return -1U;

/* Check they're not leading us off end of descriptors. */
next = desc-next;
/* Make sure compiler knows to grab that: we don't want it changing! */
/* We will use the result as an index in an array, so most
 * architectures only need a compiler barrier here. */
read_barrier_depends();

return next;
}

static int virtqueue_next_avail_desc(struct vring_host *vr)
{
int head;
u16 last_avail_idx;

/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vr-last_avail_idx;
if (vr-is_uaccess) {
if (__get_user(vr-avail_idx, vr-vring.avail-idx)) {
pr_debug(Failed to access avail idx at %p\n,
 vr-vring.avail-idx);
return -EFAULT;
}
} else
vr-avail_idx = vr-vring.avail-idx;

if (unlikely((u16)(vr-avail_idx - last_avail_idx)  vr-vring.num)) {
pr_debug(Guest moved used index from %u to %u,
   last_avail_idx, vr-avail_idx);
return -EFAULT;
}

/* If there's nothing new since last we looked, return invalid. */
if (vr-avail_idx == last_avail_idx)
return vr-vring.num;

/* Only get avail ring entries after they have been exposed by guest. */
smp_rmb();

/* Grab the next descriptor number they're advertising, and increment
 * the index we've seen. */
if (vr-is_uaccess) {
if (unlikely(__get_user(head,
vr-vring.avail-ring[last_avail_idx
   % vr-vring.num]))) {
pr_debug(Failed to read head: idx %d address %p\n,
 last_avail_idx,
 vr-vring.avail-ring[last_avail_idx %
vr-vring.num]);
return -EFAULT;
}
} else
head = vr-vring.avail-ring[last_avail_idx % 

[PATCH] xen/events: xen/events: fix RCU warning

2012-11-06 Thread Mojiong Qiu
exit_idle() should be called after irq_enter(), otherwise it throws:

[2.513020] [ INFO: suspicious RCU usage. ]
[2.513076] 3.6.5 #1 Not tainted
[2.513128] ---
[2.513183] include/linux/rcupdate.h:725 rcu_read_lock() used illegally 
while idle!
[2.513271]
[2.513271] other info that might help us debug this:
[2.513271]
[2.513388]
[2.513388] RCU used illegally from idle CPU!
[2.513388] rcu_scheduler_active = 1, debug_locks = 1
[2.513511] RCU used illegally from extended quiescent state!
[2.513572] 1 lock held by swapper/0/0:
[2.513626]  #0:  (rcu_read_lock){..}, at: [810e9fe0] 
__atomic_notifier_call_chain+0x0/0x140
[2.513815]
[2.513815] stack backtrace:
[2.513897] Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1
[2.513954] Call Trace:
[2.514005]  IRQ  [811259a2] lockdep_rcu_suspicious+0xe2/0x130
[2.514107]  [810ea10c] __atomic_notifier_call_chain+0x12c/0x140
[2.514169]  [810e9fe0] ? 
atomic_notifier_chain_unregister+0x90/0x90
[2.514258]  [811216cd] ? trace_hardirqs_off+0xd/0x10
[2.514318]  [810ea136] atomic_notifier_call_chain+0x16/0x20
[2.514381]  [810777c3] exit_idle+0x43/0x50
[2.514441]  [81568865] xen_evtchn_do_upcall+0x25/0x50
[2.514503]  [81aa690e] xen_do_hypervisor_callback+0x1e/0x30
[2.514562]  EOI  [810013aa] ? hypercall_page+0x3aa/0x1000
[2.514662]  [810013aa] ? hypercall_page+0x3aa/0x1000
[2.514722]  [81061540] ? xen_safe_halt+0x10/0x20
[2.514782]  [81075cfa] ? default_idle+0xba/0x570
[2.514841]  [810778af] ? cpu_idle+0xdf/0x140
[2.514900]  [81a4d881] ? rest_init+0x135/0x144
[2.514960]  [81a4d74c] ? csum_partial_copy_generic+0x16c/0x16c
[2.515022]  [82520c45] ? start_kernel+0x3db/0x3e8
[2.515081]  [8252066a] ? repair_env_string+0x5a/0x5a
[2.515141]  [82520356] ? x86_64_start_reservations+0x131/0x135
[2.515202]  [82524aca] ? xen_start_kernel+0x465/0x46

Signed-off-by: Mojiong Qiu mj...@tencent.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Jeremy Fitzhardinge jer...@goop.org
Cc: xen-de...@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Cc: sta...@kernel.org (at least to 3.0.y)
---
 drivers/xen/events.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 912ac81..0be4df3 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1395,10 +1395,10 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
 
+   irq_enter();
 #ifdef CONFIG_X86
exit_idle();
 #endif
-   irq_enter();
 
__xen_evtchn_do_upcall();
 
-- 
1.6.3.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Pv-drivers] [PATCH 1/6] VSOCK: vsock protocol implementation.

2012-11-06 Thread Andy King
Hi Stephen,

 You did some scrubbing of the macro's to support multiple kernel
 versions, but there are still some leftovers.
 This code seems to have a lot of this added in version xxx
 type comments. These are probably not a good idea to include
 in the mainline kernel code.

Thanks so much for taking a look.  Sorry about that, we'll remove
all such occurrences and send out a new series of patches in a bit.

Thanks!
- Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen/events: xen/events: fix RCU warning

2012-11-06 Thread Konrad Rzeszutek Wilk
On Tue, Nov 06, 2012 at 04:08:15PM +0800, Mojiong Qiu wrote:
 exit_idle() should be called after irq_enter(), otherwise it throws:

That seems odd - wouldn't smp_x86_platform_ipi
also need the same treatment [edit: I was looking at 3.0 kernel code] ?

Ah, this is caused by

commit 98ad1cc14a5c4fd658f9d72c6ba5c86dfd3ce0d5
Author: Frederic Weisbecker fweis...@gmail.com
Date:   Fri Oct 7 18:22:09 2011 +0200

x86: Call idle notifier after irq_enter()


and it missed the xen-case

which means that any kernel v3.3 and newer need this, but earlier do
not.

Thx. Will put in 3.7 tree.

 
 [2.513020] [ INFO: suspicious RCU usage. ]
 [2.513076] 3.6.5 #1 Not tainted
 [2.513128] ---
 [2.513183] include/linux/rcupdate.h:725 rcu_read_lock() used illegally 
 while idle!
 [2.513271]
 [2.513271] other info that might help us debug this:
 [2.513271]
 [2.513388]
 [2.513388] RCU used illegally from idle CPU!
 [2.513388] rcu_scheduler_active = 1, debug_locks = 1
 [2.513511] RCU used illegally from extended quiescent state!
 [2.513572] 1 lock held by swapper/0/0:
 [2.513626]  #0:  (rcu_read_lock){..}, at: [810e9fe0] 
 __atomic_notifier_call_chain+0x0/0x140
 [2.513815]
 [2.513815] stack backtrace:
 [2.513897] Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1
 [2.513954] Call Trace:
 [2.514005]  IRQ  [811259a2] lockdep_rcu_suspicious+0xe2/0x130
 [2.514107]  [810ea10c] __atomic_notifier_call_chain+0x12c/0x140
 [2.514169]  [810e9fe0] ? 
 atomic_notifier_chain_unregister+0x90/0x90
 [2.514258]  [811216cd] ? trace_hardirqs_off+0xd/0x10
 [2.514318]  [810ea136] atomic_notifier_call_chain+0x16/0x20
 [2.514381]  [810777c3] exit_idle+0x43/0x50
 [2.514441]  [81568865] xen_evtchn_do_upcall+0x25/0x50
 [2.514503]  [81aa690e] xen_do_hypervisor_callback+0x1e/0x30
 [2.514562]  EOI  [810013aa] ? hypercall_page+0x3aa/0x1000
 [2.514662]  [810013aa] ? hypercall_page+0x3aa/0x1000
 [2.514722]  [81061540] ? xen_safe_halt+0x10/0x20
 [2.514782]  [81075cfa] ? default_idle+0xba/0x570
 [2.514841]  [810778af] ? cpu_idle+0xdf/0x140
 [2.514900]  [81a4d881] ? rest_init+0x135/0x144
 [2.514960]  [81a4d74c] ? csum_partial_copy_generic+0x16c/0x16c
 [2.515022]  [82520c45] ? start_kernel+0x3db/0x3e8
 [2.515081]  [8252066a] ? repair_env_string+0x5a/0x5a
 [2.515141]  [82520356] ? x86_64_start_reservations+0x131/0x135
 [2.515202]  [82524aca] ? xen_start_kernel+0x465/0x46
 
 Signed-off-by: Mojiong Qiu mj...@tencent.com
 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc: Jeremy Fitzhardinge jer...@goop.org
 Cc: xen-de...@lists.xensource.com
 Cc: virtualization@lists.linux-foundation.org
 Cc: linux-ker...@vger.kernel.org
 Cc: sta...@kernel.org (at least to 3.0.y)
 ^^^- vger.kernel.org

You mean 3.3.x

 ---
  drivers/xen/events.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/xen/events.c b/drivers/xen/events.c
 index 912ac81..0be4df3 100644
 --- a/drivers/xen/events.c
 +++ b/drivers/xen/events.c
 @@ -1395,10 +1395,10 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
  {
   struct pt_regs *old_regs = set_irq_regs(regs);
  
 + irq_enter();
  #ifdef CONFIG_X86
   exit_idle();
  #endif
 - irq_enter();
  
   __xen_evtchn_do_upcall();
  
 -- 
 1.6.3.2
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 1/7] mm: adjust address_space_operations.migratepage() return code

2012-11-06 Thread Rafael Aquini
This patch introduces MIGRATEPAGE_SUCCESS as the default return code
for address_space_operations.migratepage() method and documents the
expected return code for the same method in failure cases.

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 fs/hugetlbfs/inode.c|  4 ++--
 include/linux/migrate.h |  7 +++
 mm/migrate.c| 22 +++---
 3 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c5bc355..bdeda2c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -608,11 +608,11 @@ static int hugetlbfs_migrate_page(struct address_space 
*mapping,
int rc;
 
rc = migrate_huge_page_move_mapping(mapping, newpage, page);
-   if (rc)
+   if (rc != MIGRATEPAGE_SUCCESS)
return rc;
migrate_page_copy(newpage, page);
 
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 
 static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..a4e886d 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -7,6 +7,13 @@
 
 typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
+/*
+ * Return values from addresss_space_operations.migratepage():
+ * - negative errno on page migration failure;
+ * - zero on page migration success;
+ */
+#define MIGRATEPAGE_SUCCESS0
+
 #ifdef CONFIG_MIGRATION
 
 extern void putback_lru_pages(struct list_head *l);
diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..98c7a89 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -286,7 +286,7 @@ static int migrate_page_move_mapping(struct address_space 
*mapping,
/* Anonymous page without mapping */
if (page_count(page) != 1)
return -EAGAIN;
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
}
 
spin_lock_irq(mapping-tree_lock);
@@ -356,7 +356,7 @@ static int migrate_page_move_mapping(struct address_space 
*mapping,
}
spin_unlock_irq(mapping-tree_lock);
 
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 
 /*
@@ -372,7 +372,7 @@ int migrate_huge_page_move_mapping(struct address_space 
*mapping,
if (!mapping) {
if (page_count(page) != 1)
return -EAGAIN;
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
}
 
spin_lock_irq(mapping-tree_lock);
@@ -399,7 +399,7 @@ int migrate_huge_page_move_mapping(struct address_space 
*mapping,
page_unfreeze_refs(page, expected_count - 1);
 
spin_unlock_irq(mapping-tree_lock);
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 
 /*
@@ -486,11 +486,11 @@ int migrate_page(struct address_space *mapping,
 
rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);
 
-   if (rc)
+   if (rc != MIGRATEPAGE_SUCCESS)
return rc;
 
migrate_page_copy(newpage, page);
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 EXPORT_SYMBOL(migrate_page);
 
@@ -513,7 +513,7 @@ int buffer_migrate_page(struct address_space *mapping,
 
rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);
 
-   if (rc)
+   if (rc != MIGRATEPAGE_SUCCESS)
return rc;
 
/*
@@ -549,7 +549,7 @@ int buffer_migrate_page(struct address_space *mapping,
 
} while (bh != head);
 
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 EXPORT_SYMBOL(buffer_migrate_page);
 #endif
@@ -814,7 +814,7 @@ skip_unmap:
put_anon_vma(anon_vma);
 
 uncharge:
-   mem_cgroup_end_migration(mem, page, newpage, rc == 0);
+   mem_cgroup_end_migration(mem, page, newpage, rc == MIGRATEPAGE_SUCCESS);
 unlock:
unlock_page(page);
 out:
@@ -987,7 +987,7 @@ int migrate_pages(struct list_head *from,
case -EAGAIN:
retry++;
break;
-   case 0:
+   case MIGRATEPAGE_SUCCESS:
break;
default:
/* Permanent failure */
@@ -1024,7 +1024,7 @@ int migrate_huge_page(struct page *hpage, new_page_t 
get_new_page,
/* try again */
cond_resched();
break;
-   case 0:
+   case MIGRATEPAGE_SUCCESS:
goto out;
default:
rc = -EIO;
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 0/7] make balloon pages movable by compaction

2012-11-06 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
Ballooning for transparent huge pages -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction  migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Following are numbers that prove this patch benefits on allowing compaction
to be more effective at memory ballooned guests.

Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB chunks,
at every minute (inflating/deflating), while test was running:

===BEGIN stress-highalloc

STRESS-HIGHALLOC
 highalloc-3.7 highalloc-3.7
 rc4-clean rc4-patch
Pass 1  55.00 ( 0.00%)62.00 ( 7.00%)
Pass 2  54.00 ( 0.00%)62.00 ( 8.00%)
while Rested75.00 ( 0.00%)80.00 ( 5.00%)

MMTests Statistics: duration
 3.7 3.7
   rc4-clean   rc4-patch
User 1207.59 1207.46
System   1300.55 1299.61
Elapsed  2273.72 2157.06

MMTests Statistics: vmstat
3.7 3.7
  rc4-clean   rc4-patch
Page Ins3581516 2374368
Page Outs  1114869210410332
Swap Ins 80  47
Swap Outs  3641 476
Direct pages scanned  37978   33826
Kswapd pages scanned1828245 1342869
Kswapd pages reclaimed  1710236 1304099
Direct pages reclaimed32207   31005
Kswapd efficiency   93% 97%
Kswapd velocity 804.077 622.546
Direct efficiency   84% 91%
Direct velocity  16.703  15.682
Percentage direct scans  2%  2%
Page writes by reclaim792529704
Page writes file  756119228
Page writes anon   3641 476
Page reclaim immediate16764   11014
Page rescued immediate0   0
Slabs scanned   2171904 2152448
Direct inode steals 3852261
Kswapd inode steals  659137  609670
Kswapd skipped wait   1  69
THP fault alloc 546 631
THP collapse alloc  361 339
THP splits  259 263
THP fault fallback   98  50
THP collapse fail20  17
Compaction stalls   747 499
Compaction success  244 145
Compaction failures 503 354
Compaction pages moved   370888  474837
Compaction move failure   77378   65259

===END stress-highalloc

Rafael Aquini (7):
  mm: adjust address_space_operations.migratepage() return code
  mm: redefine address_space.assoc_mapping
  mm: introduce a common interface for balloon pages mobility
  mm: introduce compaction and migration for ballooned pages
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce putback_movable_pages()
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c| 136 +--
 fs/buffer.c|  12 +-
 fs/gfs2/glock.c|   2 +-
 fs/hugetlbfs/inode.c   |   4 +-
 fs/inode.c |   2 +-
 fs/nilfs2/page.c   |   2 +-
 include/linux/balloon_compaction.h | 220 ++
 include/linux/fs.h |   2 +-
 include/linux/migrate.h|  19 +++
 include/linux/pagemap.h|  16 +++
 include/linux/vm_event_item.h  |   8 +-
 mm/Kconfig |  15 ++
 mm/Makefile|   1 +
 mm/balloon_compaction.c| 271 +
 mm/compaction.c|  27 +++-
 mm/migrate.c   |  77 +--
 mm/page_alloc.c|   2 +-
 mm/vmstat.c|  10 +-
 18 files changed, 782 insertions(+), 44 deletions(-)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c

Change log:
v11:
 * Address AKPM's last review suggestions;
 * Extend the balloon compaction common API and simplify its usage at driver;
 * Minor nitpicks on code commentary;
v10:
 * Adjust leak_balloon() wait_event logic to make a clear locking scheme (MST);
 * Drop the RCU 

[PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

2012-11-06 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme, in order to
enhance the syncronization methods for accessing elements of struct
virtio_balloon, thus providing protection against concurrent access
introduced by parallel memory migration threads.

 - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 135 
 1 file changed, 123 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..69eede7 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
 #include linux/delay.h
 #include linux/slab.h
 #include linux/module.h
+#include linux/balloon_compaction.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -34,6 +35,7 @@
  * page units.
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE  VIRTIO_BALLOON_PFN_SHIFT)
+#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
 
 struct virtio_balloon
 {
@@ -52,15 +54,19 @@ struct virtio_balloon
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
/*
-* The pages we've told the Host we're not using.
+* The pages we've told the Host we're not using are enqueued
+* at vb_dev_info-pages list.
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
 * to num_pages above.
 */
-   struct list_head pages;
+   struct balloon_dev_info *vb_dev_info;
+
+   /* Synchronize access/update to this struct virtio_balloon elements */
+   struct mutex balloon_lock;
 
/* The array of pfns we tell the Host about. */
unsigned int num_pfns;
-   u32 pfns[256];
+   u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
 
/* Memory statistics */
int need_stats_update;
@@ -122,18 +128,25 @@ static void set_page_pfns(u32 pfns[], struct page *page)
 
 static void fill_balloon(struct virtio_balloon *vb, size_t num)
 {
+   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
+
+   static DEFINE_RATELIMIT_STATE(fill_balloon_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 
+   mutex_lock(vb-balloon_lock);
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-   __GFP_NOMEMALLOC | __GFP_NOWARN);
+   struct page *page = balloon_page_enqueue(vb_dev_info);
+
if (!page) {
-   if (printk_ratelimit())
+   if (__ratelimit(fill_balloon_rs))
dev_printk(KERN_INFO, vb-vdev-dev,
   Out of puff! Can't get %zu pages\n,
-  num);
+  VIRTIO_BALLOON_PAGES_PER_PAGE);
/* Sleep for at least 1/5 of a second before retry. */
msleep(200);
break;
@@ -141,7 +154,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
-   list_add(page-lru, vb-pages);
}
 
/* Didn't get any?  Oh well. */
@@ -149,6 +161,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
return;
 
tell_host(vb, vb-inflate_vq);
+   mutex_unlock(vb-balloon_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -165,14 +178,17 @@ static void release_pages_by_pfn(const u32 pfns[], 
unsigned int num)
 static void leak_balloon(struct virtio_balloon *vb, size_t num)
 {
struct page *page;
+   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 
+   mutex_lock(vb-balloon_lock);
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-

[PATCH v11 6/7] mm: introduce putback_movable_pages()

2012-11-06 Thread Rafael Aquini
The PATCH mm: introduce compaction and migration for virtio ballooned pages
hacks around putback_lru_pages() in order to allow ballooned pages to be
re-inserted on balloon page list as if a ballooned page was like a LRU page.

As ballooned pages are not legitimate LRU pages, this patch introduces
putback_movable_pages() to properly cope with cases where the isolated
pageset contains ballooned pages and LRU pages, thus fixing the mentioned
inelegant hack around putback_lru_pages().

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 include/linux/migrate.h |  2 ++
 mm/compaction.c |  6 +++---
 mm/migrate.c| 20 
 mm/page_alloc.c |  2 +-
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e570c3c..ff074a4 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -27,6 +27,7 @@ typedef struct page *new_page_t(struct page *, unsigned long 
private, int **);
 #ifdef CONFIG_MIGRATION
 
 extern void putback_lru_pages(struct list_head *l);
+extern void putback_movable_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t x,
@@ -50,6 +51,7 @@ extern int migrate_huge_page_move_mapping(struct 
address_space *mapping,
 #else
 
 static inline void putback_lru_pages(struct list_head *l) {}
+static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 76abd84..f268bd8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -995,7 +995,7 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
-   putback_lru_pages(cc-migratepages);
+   putback_movable_pages(cc-migratepages);
cc-nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
@@ -1018,9 +1018,9 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
nr_remaining);
 
-   /* Release LRU pages not migrated */
+   /* Release isolated pages not migrated */
if (err) {
-   putback_lru_pages(cc-migratepages);
+   putback_movable_pages(cc-migratepages);
cc-nr_migratepages = 0;
if (err == -ENOMEM) {
ret = COMPACT_PARTIAL;
diff --git a/mm/migrate.c b/mm/migrate.c
index 87ffe54..adb3d44 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -80,6 +80,26 @@ void putback_lru_pages(struct list_head *l)
list_del(page-lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
+   putback_lru_page(page);
+   }
+}
+
+/*
+ * Put previously isolated pages back onto the appropriate lists
+ * from where they were once taken off for compaction/migration.
+ *
+ * This function shall be used instead of putback_lru_pages(),
+ * whenever the isolated pageset has been built by isolate_migratepages_range()
+ */
+void putback_movable_pages(struct list_head *l)
+{
+   struct page *page;
+   struct page *page2;
+
+   list_for_each_entry_safe(page, page2, l, lru) {
+   list_del(page-lru);
+   dec_zone_page_state(page, NR_ISOLATED_ANON +
+   page_is_file_cache(page));
if (unlikely(balloon_page_movable(page)))
balloon_page_putback(page);
else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..1cb0f93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5710,7 +5710,7 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
0, false, MIGRATE_SYNC);
}
 
-   putback_lru_pages(cc-migratepages);
+   putback_movable_pages(cc-migratepages);
return ret  0 ? 0 : ret;
 }
 
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

2012-11-06 Thread Rafael Aquini
This patch introduces a new set of vm event counters to keep track of
ballooned pages compaction activity.

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 drivers/virtio/virtio_balloon.c |  1 +
 include/linux/vm_event_item.h   |  8 +++-
 mm/balloon_compaction.c |  2 ++
 mm/migrate.c|  1 +
 mm/vmstat.c | 10 +-
 5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 69eede7..3756fc1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -411,6 +411,7 @@ int virtballoon_migratepage(struct address_space *mapping,
tell_host(vb, vb-deflate_vq);
 
mutex_unlock(vb-balloon_lock);
+   balloon_event_count(COMPACTBALLOONMIGRATED);
 
return MIGRATEPAGE_BALLOON_SUCCESS;
 }
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 3d31145..cbd72fc 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
-#endif
+#ifdef CONFIG_BALLOON_COMPACTION
+   COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
+   COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
+   COMPACTBALLOONRELEASED, /* old-page released after migration */
+   COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
+#endif /* CONFIG_BALLOON_COMPACTION */
+#endif /* CONFIG_COMPACTION */
 #ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
 #endif
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 90935aa..32927eb 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -215,6 +215,7 @@ bool balloon_page_isolate(struct page *page)
if (__is_movable_balloon_page(page) 
page_count(page) == 2) {
__isolate_balloon_page(page);
+   balloon_event_count(COMPACTBALLOONISOLATED);
unlock_page(page);
return true;
}
@@ -237,6 +238,7 @@ void balloon_page_putback(struct page *page)
if (__is_movable_balloon_page(page)) {
__putback_balloon_page(page);
put_page(page);
+   balloon_event_count(COMPACTBALLOONRETURNED);
} else {
__WARN();
dump_page(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index adb3d44..ee3037d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -896,6 +896,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned 
long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+   balloon_event_count(COMPACTBALLOONRELEASED);
return 0;
}
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..1363edc 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -781,7 +781,15 @@ const char * const vmstat_text[] = {
compact_stall,
compact_fail,
compact_success,
-#endif
+
+#ifdef CONFIG_BALLOON_COMPACTION
+   compact_balloon_isolated,
+   compact_balloon_migrated,
+   compact_balloon_released,
+   compact_balloon_returned,
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+#endif /* CONFIG_COMPACTION */
 
 #ifdef CONFIG_HUGETLB_PAGE
htlb_buddy_alloc_success,
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming

2012-11-06 Thread Gerd Hoffmann
On 11/05/12 19:19, Andy King wrote:
 Hi David,
 
 The big and only question is whether anyone can actually use any of
 this stuff without your proprietary bits?
 
 Do you mean the VMCI calls?  The VMCI driver is in the process of being
 upstreamed into the drivers/misc tree.  Greg (cc'd on these patches) is
 actively reviewing that code and we are addressing feedback.
 
 Also, there was some interest from RedHat into using vSockets as a unified
 interface, routed over a hypervisor-specific transport (virtio or
 otherwise, although for now VMCI is the only one implemented).

Can you outline how this can be done?  From a quick look over the code
it seems like vsock has a hard dependency on vmci, is that correct?

When making vsock a generic, reusable kernel service it should be the
other way around:  vsock should provide the core implementation and an
interface where hypervisor-specific transports (vmci, virtio, xenbus,
...) can register themself.

cheers,
  Gerd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization