On Tue, Mar 19, 2013 at 01:49:34PM -0400, Michael R. Hines wrote: > I also did a test using RDMA + cgroup, and the kernel killed my QEMU :) > > So, infiniband is not smart enough to know how to avoid pinning a > zero page, I guess. > > - Michael > > On 03/19/2013 01:14 PM, Paolo Bonzini wrote: > >Il 19/03/2013 18:09, Michael R. Hines ha scritto: > >>Allowing QEMU to swap due to a cgroup limit during migration is a viable > >>overcommit option? > >> > >>I'm trying to keep an open mind, but that would kill the migration > >>time..... > >Would it swap? Doesn't the kernel back all zero pages with a single > >copy-on-write page? If that still accounts towards cgroup limits, it > >would be a bug. > > > >Old kernels do not have a shared zero hugepage, and that includes some > >distro kernels. Perhaps that's the problem. > > > >Paolo > >
I really shouldn't break COW if you don't request LOCAL_WRITE. I think it's a kernel bug, and apparently has been there in the code since the first version: get_user_pages parameters swapped. I'll send a patch. If it's applied, you should also change your code from + IBV_ACCESS_LOCAL_WRITE | + IBV_ACCESS_REMOTE_WRITE | + IBV_ACCESS_REMOTE_READ); to + IBV_ACCESS_REMOTE_READ); on send side. Then, each time we detect a page has changed we must make sure to unregister and re-register it. Or if you want to be very smart, check that the PFN didn't change and reregister if it did. This will make overcommit work. -- MST