Re: [patch 0/6] MMU Notifiers V6

2008-02-13 Thread Jack Steiner
> GRU
> - Simple additional hardware TLB (possibly covering multiple instances of
>   Linux)
> - Needs TLB shootdown when the VM unmaps pages.
> - Determines page address via follow_page (from interrupt context) but can
>   fall back to get_user_pages().
> - No page reference possible since no page status is kept..

I applied the latest mmuops patch to a 2.6.24 kernel & updated the
GRU driver to use it. As far as I can tell, everything works ok.
Although more testing is needed, all current tests of driver functionality
are working on both a system simulator and a hardware simulator.

The driver itself is still a few weeks from being ready to post but I can
send code fragments of the portions related to mmuops or external TLB
management if anyone is interested.


--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-13 Thread Jack Steiner
 GRU
 - Simple additional hardware TLB (possibly covering multiple instances of
   Linux)
 - Needs TLB shootdown when the VM unmaps pages.
 - Determines page address via follow_page (from interrupt context) but can
   fall back to get_user_pages().
 - No page reference possible since no page status is kept..

I applied the latest mmuops patch to a 2.6.24 kernel  updated the
GRU driver to use it. As far as I can tell, everything works ok.
Although more testing is needed, all current tests of driver functionality
are working on both a system simulator and a hardware simulator.

The driver itself is still a few weeks from being ready to post but I can
send code fragments of the portions related to mmuops or external TLB
management if anyone is interested.


--- jack
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-09 Thread Christoph Lameter
On Sat, 9 Feb 2008, Rik van Riel wrote:

> PG_mlock is on the way and can easily be reused for this, too.

Note that a pinned page is different from an mlocked page. A mlocked page 
can be moved through page migration and/or memory hotplug. A pinned page 
must make both fail.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-09 Thread Rik van Riel
On Fri, 8 Feb 2008 18:16:16 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Sat, 9 Feb 2008, Andrea Arcangeli wrote:
> 
> > The VM shouldn't break if try_to_unmap doesn't actually make the page
> > freeable for whatever reason. Permanent pins shouldn't happen anyway,
> 
> VM is livelocking if too many page are pinned that way right now.

> Rik has a patchset under development that addresses issues like this

PG_mlock is on the way and can easily be reused for this, too.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-09 Thread Rik van Riel
On Fri, 8 Feb 2008 18:16:16 -0800 (PST)
Christoph Lameter [EMAIL PROTECTED] wrote:
 On Sat, 9 Feb 2008, Andrea Arcangeli wrote:
 
  The VM shouldn't break if try_to_unmap doesn't actually make the page
  freeable for whatever reason. Permanent pins shouldn't happen anyway,
 
 VM is livelocking if too many page are pinned that way right now.

 Rik has a patchset under development that addresses issues like this

PG_mlock is on the way and can easily be reused for this, too.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-09 Thread Christoph Lameter
On Sat, 9 Feb 2008, Rik van Riel wrote:

 PG_mlock is on the way and can easily be reused for this, too.

Note that a pinned page is different from an mlocked page. A mlocked page 
can be moved through page migration and/or memory hotplug. A pinned page 
must make both fail.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Sat, 9 Feb 2008, Andrea Arcangeli wrote:

> The VM shouldn't break if try_to_unmap doesn't actually make the page
> freeable for whatever reason. Permanent pins shouldn't happen anyway,

VM is livelocking if too many page are pinned that way right now. The 
higher the processors per node the higher the risk of livelock because 
more processors are in the process of cycling through pages that have an 
elevated refcount.

> so defining an ad-hoc API for that doesn't sound too appealing. Not
> sure if old hardware deserves those special lru-size-reduction
> optimizations but it's not my call (certainly swapoff/mlock would get
> higher priority in that lru-size-reduction area).

Rik has a patchset under development that addresses issues like this. The 
elevated refcount pin problem is not really relevant to the patchset we 
are discussing here.
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrea Arcangeli
On Fri, Feb 08, 2008 at 05:27:03PM -0800, Christoph Lameter wrote:
> Pages will still be on the LRU and cycle through rmap again and again. 
> If page migration is used on those pages then the code may make repeated 
> attempt to migrate the page thinking that the page count must at some 
> point drop.
>
> I do not think that the page count was intended to be used to pin pages 
> permanently. If we had a marker on such pages then we could take them off 
> the LRU and not try to migrate them.

The VM shouldn't break if try_to_unmap doesn't actually make the page
freeable for whatever reason. Permanent pins shouldn't happen anyway,
so defining an ad-hoc API for that doesn't sound too appealing. Not
sure if old hardware deserves those special lru-size-reduction
optimizations but it's not my call (certainly swapoff/mlock would get
higher priority in that lru-size-reduction area).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Sat, 9 Feb 2008, Andrea Arcangeli wrote:

> > H.. that means we need something that actually pins pages for good so 
> > that the VM can avoid reclaiming it and so that page migration can avoid 
> > trying to migrate them. Something like yet another page flag.
> 
> What's wrong with pinning with the page count like now? Dumb adapters
> would simply not register themself in the mmu notifier list no?

Pages will still be on the LRU and cycle through rmap again and again. 
If page migration is used on those pages then the code may make repeated 
attempt to migrate the page thinking that the page count must at some 
point drop.

I do not think that the page count was intended to be used to pin pages 
permanently. If we had a marker on such pages then we could take them off 
the LRU and not try to migrate them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrea Arcangeli
On Fri, Feb 08, 2008 at 04:36:16PM -0800, Christoph Lameter wrote:
> On Fri, 8 Feb 2008, Roland Dreier wrote:
> 
> > That would of course work -- dumb adapters would just always fail,
> > which might be inefficient.
> 
> H.. that means we need something that actually pins pages for good so 
> that the VM can avoid reclaiming it and so that page migration can avoid 
> trying to migrate them. Something like yet another page flag.

What's wrong with pinning with the page count like now? Dumb adapters
would simply not register themself in the mmu notifier list no?

> 
> Ccing Rik.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Roland Dreier wrote:

> That would of course work -- dumb adapters would just always fail,
> which might be inefficient.

H.. that means we need something that actually pins pages for good so 
that the VM can avoid reclaiming it and so that page migration can avoid 
trying to migrate them. Something like yet another page flag.

Ccing Rik.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Roland Dreier
 > I thought the adaptor can always remove the mapping by renegotiating 
 > with the remote side? Even if its dumb then a callback could notify the 
 > driver that it may be required to tear down the mapping. We then hold the 
 > pages until we get okay by the driver that the mapping has been removed.

Of course we can always destroy the memory region but that would break
the semantics that applications expect.  Basically an application can
register some chunk of its memory and get a key that it can pass to a
remote peer to let the remote peer operate on its memory via RDMA.
And that memory region/key is expected to stay valid until there is an
application-level operation to destroy it (or until the app crashes or
gets killed, etc).

 > We could also let the unmapping fail if the driver indicates that the 
 > mapping must stay.

That would of course work -- dumb adapters would just always fail,
which might be inefficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

> Quite possibly none of the infiniband developers even know about it..

Well Andrea's initial approach was even featured on LWN a couple of 
weeks back.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Roland Dreier wrote:

> In general, this MMU notifier stuff will only be useful to a subset of
> InfiniBand/RDMA hardware.  Some adapters are smart enough to handle
> changing the IO virtual -> bus/physical mapping on the fly, but some
> aren't.  For the dumb adapters, I think the current ib_umem_get() is
> pretty close to as good as we can get: we have to keep the physical
> pages pinned for as long as the adapter is allowed to DMA into the
> memory region.

I thought the adaptor can always remove the mapping by renegotiating 
with the remote side? Even if its dumb then a callback could notify the 
driver that it may be required to tear down the mapping. We then hold the 
pages until we get okay by the driver that the mapping has been removed.

We could also let the unmapping fail if the driver indicates that the 
mapping must stay.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Roland Dreier
 > We have done several rounds of discussion on linux-kernel about this so 
 > far and the IB folks have not shown up to join in. I have tried to make 
 > this as general as possible.

Sorry, this has been on my "things to look at" list for a while, but I
haven't gotten a chance to really understand where things are yet.

In general, this MMU notifier stuff will only be useful to a subset of
InfiniBand/RDMA hardware.  Some adapters are smart enough to handle
changing the IO virtual -> bus/physical mapping on the fly, but some
aren't.  For the dumb adapters, I think the current ib_umem_get() is
pretty close to as good as we can get: we have to keep the physical
pages pinned for as long as the adapter is allowed to DMA into the
memory region.

For the smart adapters, we just need a chance to change the adapter's
page table when the kernel/CPU's mapping changes, and naively, this
stuff looks like it would work.

Andrew, does that help?

- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 8 Feb 2008 16:05:00 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Fri, 8 Feb 2008, Andrew Morton wrote:
> 
> > You took it correctly, and I didn't understand the answer ;)
> 
> We have done several rounds of discussion on linux-kernel about this so 
> far and the IB folks have not shown up to join in. I have tried to make 
> this as general as possible.

infiniband would appear to be the major present in-kernel client of this new
interface.  So as a part of proving its usefulness, correctness, etc we
should surely work on converting infiniband to use it, and prove its
goodness.

Quite possibly none of the infiniband developers even know about it..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

> You took it correctly, and I didn't understand the answer ;)

We have done several rounds of discussion on linux-kernel about this so 
far and the IB folks have not shown up to join in. I have tried to make 
this as general as possible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 8 Feb 2008 17:43:02 -0600 Robin Holt <[EMAIL PROTECTED]> wrote:

> On Fri, Feb 08, 2008 at 03:41:24PM -0800, Christoph Lameter wrote:
> > On Fri, 8 Feb 2008, Robin Holt wrote:
> > 
> > > > > What about ib_umem_get()?
> > 
> > Correct.
> > 
> > You missed the turn of the conversation to how ib_umem_get() works. 
> > Currently it seems to pin the same way that the SLES10 XPmem works.
> 
> Ah.  I took Andrew's question as more of a probe about whether we had
> worked with the IB folks to ensure this fits the ib_umem_get needs
> as well.
> 

You took it correctly, and I didn't understand the answer ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Robin Holt
On Fri, Feb 08, 2008 at 03:41:24PM -0800, Christoph Lameter wrote:
> On Fri, 8 Feb 2008, Robin Holt wrote:
> 
> > > > What about ib_umem_get()?
> 
> Correct.
> 
> You missed the turn of the conversation to how ib_umem_get() works. 
> Currently it seems to pin the same way that the SLES10 XPmem works.

Ah.  I took Andrew's question as more of a probe about whether we had
worked with the IB folks to ensure this fits the ib_umem_get needs
as well.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Robin Holt wrote:

> > > What about ib_umem_get()?
> > 
> > Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
> > we effectively pin a page (page migration will fail) but we will 
> > continually be reclaiming the page and may repeatedly try to move it. We 
> > have issues with XPmem causing too many pages to be pinned and thus the 
> > OOM getting into weird behavior modes (OOM or stop lru scanning due to 
> > all_reclaimable set).
> > 
> > An elevated refcount will also not be noticed by any of the schemes under 
> > consideration to improve LRU scanning performance.
> 
> Christoph, I am not sure what you are saying here.  With v4 and later,
> I thought we were able to use the rmap invalidation to remove the ref
> count that XPMEM was holding and therefore be able to swapout.  Did I miss
> something?  I agree the existing XPMEM does pin.  I hope we are not saying
> the XPMEM based upon these patches will not be able to swap/migrate.

Correct.

You missed the turn of the conversation to how ib_umem_get() works. 
Currently it seems to pin the same way that the SLES10 XPmem works.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Robin Holt
On Fri, Feb 08, 2008 at 03:32:19PM -0800, Christoph Lameter wrote:
> On Fri, 8 Feb 2008, Andrew Morton wrote:
> 
> > What about ib_umem_get()?
> 
> Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
> we effectively pin a page (page migration will fail) but we will 
> continually be reclaiming the page and may repeatedly try to move it. We 
> have issues with XPmem causing too many pages to be pinned and thus the 
> OOM getting into weird behavior modes (OOM or stop lru scanning due to 
> all_reclaimable set).
> 
> An elevated refcount will also not be noticed by any of the schemes under 
> consideration to improve LRU scanning performance.

Christoph, I am not sure what you are saying here.  With v4 and later,
I thought we were able to use the rmap invalidation to remove the ref
count that XPMEM was holding and therefore be able to swapout.  Did I miss
something?  I agree the existing XPMEM does pin.  I hope we are not saying
the XPMEM based upon these patches will not be able to swap/migrate.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

> What about ib_umem_get()?

Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
we effectively pin a page (page migration will fail) but we will 
continually be reclaiming the page and may repeatedly try to move it. We 
have issues with XPmem causing too many pages to be pinned and thus the 
OOM getting into weird behavior modes (OOM or stop lru scanning due to 
all_reclaimable set).

An elevated refcount will also not be noticed by any of the schemes under 
consideration to improve LRU scanning performance.

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 08 Feb 2008 14:06:16 -0800
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> This is a patchset implementing MMU notifier callbacks based on Andrea's
> earlier work. These are needed if Linux pages are referenced from something
> else than tracked by the rmaps of the kernel (an external MMU). MMU
> notifiers allow us to get rid of the page pinning for RDMA and various
> other purposes. It gets rid of the broken use of mlock for page pinning.
> (mlock really does *not* pin pages)
> 
> More information on the rationale and the technical details can be found in
> the first patch and the README provided by that patch in
> Documentation/mmu_notifiers.
> 
> The known immediate users are
> 
> KVM
> - Establishes a refcount to the page via get_user_pages().
> - External references are called spte.
> - Has page tables to track pages whose refcount was elevated but
>   no reverse maps.
> 
> GRU
> - Simple additional hardware TLB (possibly covering multiple instances of
>   Linux)
> - Needs TLB shootdown when the VM unmaps pages.
> - Determines page address via follow_page (from interrupt context) but can
>   fall back to get_user_pages().
> - No page reference possible since no page status is kept..
> 
> XPmem
> - Allows use of a processes memory by remote instances of Linux.
> - Provides its own reverse mappings to track remote pte.
> - Established refcounts on the exported pages.
> - Must sleep in order to wait for remote acks of ptes that are being
>   cleared.
> 

What about ib_umem_get()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 8 Feb 2008 17:43:02 -0600 Robin Holt [EMAIL PROTECTED] wrote:

 On Fri, Feb 08, 2008 at 03:41:24PM -0800, Christoph Lameter wrote:
  On Fri, 8 Feb 2008, Robin Holt wrote:
  
 What about ib_umem_get()?
  
  Correct.
  
  You missed the turn of the conversation to how ib_umem_get() works. 
  Currently it seems to pin the same way that the SLES10 XPmem works.
 
 Ah.  I took Andrew's question as more of a probe about whether we had
 worked with the IB folks to ensure this fits the ib_umem_get needs
 as well.
 

You took it correctly, and I didn't understand the answer ;)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

 What about ib_umem_get()?

Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
we effectively pin a page (page migration will fail) but we will 
continually be reclaiming the page and may repeatedly try to move it. We 
have issues with XPmem causing too many pages to be pinned and thus the 
OOM getting into weird behavior modes (OOM or stop lru scanning due to 
all_reclaimable set).

An elevated refcount will also not be noticed by any of the schemes under 
consideration to improve LRU scanning performance.

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

 You took it correctly, and I didn't understand the answer ;)

We have done several rounds of discussion on linux-kernel about this so 
far and the IB folks have not shown up to join in. I have tried to make 
this as general as possible.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 8 Feb 2008 16:05:00 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] 
wrote:

 On Fri, 8 Feb 2008, Andrew Morton wrote:
 
  You took it correctly, and I didn't understand the answer ;)
 
 We have done several rounds of discussion on linux-kernel about this so 
 far and the IB folks have not shown up to join in. I have tried to make 
 this as general as possible.

infiniband would appear to be the major present in-kernel client of this new
interface.  So as a part of proving its usefulness, correctness, etc we
should surely work on converting infiniband to use it, and prove its
goodness.

Quite possibly none of the infiniband developers even know about it..
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Roland Dreier wrote:

 In general, this MMU notifier stuff will only be useful to a subset of
 InfiniBand/RDMA hardware.  Some adapters are smart enough to handle
 changing the IO virtual - bus/physical mapping on the fly, but some
 aren't.  For the dumb adapters, I think the current ib_umem_get() is
 pretty close to as good as we can get: we have to keep the physical
 pages pinned for as long as the adapter is allowed to DMA into the
 memory region.

I thought the adaptor can always remove the mapping by renegotiating 
with the remote side? Even if its dumb then a callback could notify the 
driver that it may be required to tear down the mapping. We then hold the 
pages until we get okay by the driver that the mapping has been removed.

We could also let the unmapping fail if the driver indicates that the 
mapping must stay.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

 Quite possibly none of the infiniband developers even know about it..

Well Andrea's initial approach was even featured on LWN a couple of 
weeks back.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Roland Dreier
  I thought the adaptor can always remove the mapping by renegotiating 
  with the remote side? Even if its dumb then a callback could notify the 
  driver that it may be required to tear down the mapping. We then hold the 
  pages until we get okay by the driver that the mapping has been removed.

Of course we can always destroy the memory region but that would break
the semantics that applications expect.  Basically an application can
register some chunk of its memory and get a key that it can pass to a
remote peer to let the remote peer operate on its memory via RDMA.
And that memory region/key is expected to stay valid until there is an
application-level operation to destroy it (or until the app crashes or
gets killed, etc).

  We could also let the unmapping fail if the driver indicates that the 
  mapping must stay.

That would of course work -- dumb adapters would just always fail,
which might be inefficient.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Roland Dreier wrote:

 That would of course work -- dumb adapters would just always fail,
 which might be inefficient.

H.. that means we need something that actually pins pages for good so 
that the VM can avoid reclaiming it and so that page migration can avoid 
trying to migrate them. Something like yet another page flag.

Ccing Rik.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrea Arcangeli
On Fri, Feb 08, 2008 at 04:36:16PM -0800, Christoph Lameter wrote:
 On Fri, 8 Feb 2008, Roland Dreier wrote:
 
  That would of course work -- dumb adapters would just always fail,
  which might be inefficient.
 
 H.. that means we need something that actually pins pages for good so 
 that the VM can avoid reclaiming it and so that page migration can avoid 
 trying to migrate them. Something like yet another page flag.

What's wrong with pinning with the page count like now? Dumb adapters
would simply not register themself in the mmu notifier list no?

 
 Ccing Rik.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Sat, 9 Feb 2008, Andrea Arcangeli wrote:

  H.. that means we need something that actually pins pages for good so 
  that the VM can avoid reclaiming it and so that page migration can avoid 
  trying to migrate them. Something like yet another page flag.
 
 What's wrong with pinning with the page count like now? Dumb adapters
 would simply not register themself in the mmu notifier list no?

Pages will still be on the LRU and cycle through rmap again and again. 
If page migration is used on those pages then the code may make repeated 
attempt to migrate the page thinking that the page count must at some 
point drop.

I do not think that the page count was intended to be used to pin pages 
permanently. If we had a marker on such pages then we could take them off 
the LRU and not try to migrate them.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Sat, 9 Feb 2008, Andrea Arcangeli wrote:

 The VM shouldn't break if try_to_unmap doesn't actually make the page
 freeable for whatever reason. Permanent pins shouldn't happen anyway,

VM is livelocking if too many page are pinned that way right now. The 
higher the processors per node the higher the risk of livelock because 
more processors are in the process of cycling through pages that have an 
elevated refcount.

 so defining an ad-hoc API for that doesn't sound too appealing. Not
 sure if old hardware deserves those special lru-size-reduction
 optimizations but it's not my call (certainly swapoff/mlock would get
 higher priority in that lru-size-reduction area).

Rik has a patchset under development that addresses issues like this. The 
elevated refcount pin problem is not really relevant to the patchset we 
are discussing here.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 08 Feb 2008 14:06:16 -0800
Christoph Lameter [EMAIL PROTECTED] wrote:

 This is a patchset implementing MMU notifier callbacks based on Andrea's
 earlier work. These are needed if Linux pages are referenced from something
 else than tracked by the rmaps of the kernel (an external MMU). MMU
 notifiers allow us to get rid of the page pinning for RDMA and various
 other purposes. It gets rid of the broken use of mlock for page pinning.
 (mlock really does *not* pin pages)
 
 More information on the rationale and the technical details can be found in
 the first patch and the README provided by that patch in
 Documentation/mmu_notifiers.
 
 The known immediate users are
 
 KVM
 - Establishes a refcount to the page via get_user_pages().
 - External references are called spte.
 - Has page tables to track pages whose refcount was elevated but
   no reverse maps.
 
 GRU
 - Simple additional hardware TLB (possibly covering multiple instances of
   Linux)
 - Needs TLB shootdown when the VM unmaps pages.
 - Determines page address via follow_page (from interrupt context) but can
   fall back to get_user_pages().
 - No page reference possible since no page status is kept..
 
 XPmem
 - Allows use of a processes memory by remote instances of Linux.
 - Provides its own reverse mappings to track remote pte.
 - Established refcounts on the exported pages.
 - Must sleep in order to wait for remote acks of ptes that are being
   cleared.
 

What about ib_umem_get()?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Robin Holt
On Fri, Feb 08, 2008 at 03:32:19PM -0800, Christoph Lameter wrote:
 On Fri, 8 Feb 2008, Andrew Morton wrote:
 
  What about ib_umem_get()?
 
 Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
 we effectively pin a page (page migration will fail) but we will 
 continually be reclaiming the page and may repeatedly try to move it. We 
 have issues with XPmem causing too many pages to be pinned and thus the 
 OOM getting into weird behavior modes (OOM or stop lru scanning due to 
 all_reclaimable set).
 
 An elevated refcount will also not be noticed by any of the schemes under 
 consideration to improve LRU scanning performance.

Christoph, I am not sure what you are saying here.  With v4 and later,
I thought we were able to use the rmap invalidation to remove the ref
count that XPMEM was holding and therefore be able to swapout.  Did I miss
something?  I agree the existing XPMEM does pin.  I hope we are not saying
the XPMEM based upon these patches will not be able to swap/migrate.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Robin Holt wrote:

   What about ib_umem_get()?
  
  Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
  we effectively pin a page (page migration will fail) but we will 
  continually be reclaiming the page and may repeatedly try to move it. We 
  have issues with XPmem causing too many pages to be pinned and thus the 
  OOM getting into weird behavior modes (OOM or stop lru scanning due to 
  all_reclaimable set).
  
  An elevated refcount will also not be noticed by any of the schemes under 
  consideration to improve LRU scanning performance.
 
 Christoph, I am not sure what you are saying here.  With v4 and later,
 I thought we were able to use the rmap invalidation to remove the ref
 count that XPMEM was holding and therefore be able to swapout.  Did I miss
 something?  I agree the existing XPMEM does pin.  I hope we are not saying
 the XPMEM based upon these patches will not be able to swap/migrate.

Correct.

You missed the turn of the conversation to how ib_umem_get() works. 
Currently it seems to pin the same way that the SLES10 XPmem works.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Robin Holt
On Fri, Feb 08, 2008 at 03:41:24PM -0800, Christoph Lameter wrote:
 On Fri, 8 Feb 2008, Robin Holt wrote:
 
What about ib_umem_get()?
 
 Correct.
 
 You missed the turn of the conversation to how ib_umem_get() works. 
 Currently it seems to pin the same way that the SLES10 XPmem works.

Ah.  I took Andrew's question as more of a probe about whether we had
worked with the IB folks to ensure this fits the ib_umem_get needs
as well.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/