Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 07:46:10PM +1100, Nick Piggin wrote:
> On Sunday 17 February 2008 06:22, Christoph Lameter wrote:
> > On Fri, 15 Feb 2008, Andrew Morton wrote:
> 
> > > > flush_cache_page(vma, address, pte_pfn(*pte));
> > > > entry = ptep_clear_flush(vma, address, pte);
> > > > +   mmu_notifier(invalidate_page, mm, address);
> > >
> > > I just don't see how ths can be done if the callee has another thread in
> > > the middle of establishing IO against this region of memory.
> > > ->invalidate_page() _has_ to be able to block.  Confused.
> >
> > The page lock is held and that holds off I/O?
> 
> I think the actual answer is that "it doesn't matter".

Agreed. The PG_lock itself taken when invalidate_page is called, is
used to serialized the VM against the VM, not the VM against I/O.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-19 Thread Nick Piggin
On Sunday 17 February 2008 06:22, Christoph Lameter wrote:
> On Fri, 15 Feb 2008, Andrew Morton wrote:

> > >   flush_cache_page(vma, address, pte_pfn(*pte));
> > >   entry = ptep_clear_flush(vma, address, pte);
> > > + mmu_notifier(invalidate_page, mm, address);
> >
> > I just don't see how ths can be done if the callee has another thread in
> > the middle of establishing IO against this region of memory.
> > ->invalidate_page() _has_ to be able to block.  Confused.
>
> The page lock is held and that holds off I/O?

I think the actual answer is that "it doesn't matter".

ptes are not exactly the entity via which IO gets established, so
all we really care about here is that after the callback finishes,
we will not get any more reads or writes to the page via the
external mapping.

As far as holding off local IO goes, that is the job of the core
VM. (And no, page lock does not necessarily hold it off FYI -- it
can be writeback IO or even IO directly via buffers).

Holding off IO via the external references I guess is a job for
the notifier driver.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-19 Thread Nick Piggin
On Sunday 17 February 2008 06:22, Christoph Lameter wrote:
 On Fri, 15 Feb 2008, Andrew Morton wrote:

 flush_cache_page(vma, address, pte_pfn(*pte));
 entry = ptep_clear_flush(vma, address, pte);
   + mmu_notifier(invalidate_page, mm, address);
 
  I just don't see how ths can be done if the callee has another thread in
  the middle of establishing IO against this region of memory.
  -invalidate_page() _has_ to be able to block.  Confused.

 The page lock is held and that holds off I/O?

I think the actual answer is that it doesn't matter.

ptes are not exactly the entity via which IO gets established, so
all we really care about here is that after the callback finishes,
we will not get any more reads or writes to the page via the
external mapping.

As far as holding off local IO goes, that is the job of the core
VM. (And no, page lock does not necessarily hold it off FYI -- it
can be writeback IO or even IO directly via buffers).

Holding off IO via the external references I guess is a job for
the notifier driver.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 07:46:10PM +1100, Nick Piggin wrote:
 On Sunday 17 February 2008 06:22, Christoph Lameter wrote:
  On Fri, 15 Feb 2008, Andrew Morton wrote:
 
flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
  
   I just don't see how ths can be done if the callee has another thread in
   the middle of establishing IO against this region of memory.
   -invalidate_page() _has_ to be able to block.  Confused.
 
  The page lock is held and that holds off I/O?
 
 I think the actual answer is that it doesn't matter.

Agreed. The PG_lock itself taken when invalidate_page is called, is
used to serialized the VM against the VM, not the VM against I/O.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-17 Thread Nick Piggin
On Saturday 16 February 2008 14:37, Andrew Morton wrote:
> On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter <[EMAIL PROTECTED]> 
wrote:
> > Two callbacks to remove individual pages as done in rmap code
> >
> > invalidate_page()
> >
> > Called from the inner loop of rmap walks to invalidate pages.
> >
> > age_page()
> >
> > Called for the determination of the page referenced status.
> >
> > If we do not care about page referenced status then an age_page callback
> > may be be omitted. PageLock and pte lock are held when either of the
> > functions is called.
>
> The age_page mystery shallows.

BTW. can this callback be called mmu_notifier_clear_flush_young? To
match the core VM.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-17 Thread Nick Piggin
On Saturday 16 February 2008 14:37, Andrew Morton wrote:
 On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter [EMAIL PROTECTED] 
wrote:
  Two callbacks to remove individual pages as done in rmap code
 
  invalidate_page()
 
  Called from the inner loop of rmap walks to invalidate pages.
 
  age_page()
 
  Called for the determination of the page referenced status.
 
  If we do not care about page referenced status then an age_page callback
  may be be omitted. PageLock and pte lock are held when either of the
  functions is called.

 The age_page mystery shallows.

BTW. can this callback be called mmu_notifier_clear_flush_young? To
match the core VM.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-16 Thread Christoph Lameter
On Fri, 15 Feb 2008, Andrew Morton wrote:

> > @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
> > if (vma->vm_flags & VM_LOCKED) {
> > referenced++;
> > *mapcount = 1;  /* break early from loop */
> > -   } else if (ptep_clear_flush_young(vma, address, pte))
> > +   } else if (ptep_clear_flush_young(vma, address, pte) |
> > +  mmu_notifier_age_page(mm, address))
> > referenced++;
> 
> The "|" is obviously deliberate.  But no explanation is provided telling us
> why we still call the callback if ptep_clear_flush_young() said the page
> was recently referenced.  People who read your code will want to understand
> this.

Andrea?

> > flush_cache_page(vma, address, pte_pfn(*pte));
> > entry = ptep_clear_flush(vma, address, pte);
> > +   mmu_notifier(invalidate_page, mm, address);
> 
> I just don't see how ths can be done if the callee has another thread in
> the middle of establishing IO against this region of memory. 
> ->invalidate_page() _has_ to be able to block.  Confused.

The page lock is held and that holds off I/O?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-16 Thread Andrea Arcangeli
On Fri, Feb 15, 2008 at 07:37:36PM -0800, Andrew Morton wrote:
> The "|" is obviously deliberate.  But no explanation is provided telling us
> why we still call the callback if ptep_clear_flush_young() said the page
> was recently referenced.  People who read your code will want to understand
> this.

This is to clear the young bit in every pte and spte to such physical
page before backing off because any young bit was on. So if any young
bit will be on in the next scan, we're guaranteed the page has been
touched recently and not ages before (otherwise it would take a worst
case N rounds of the lru before the page can be freed, where N is the
number of pte or sptes pointing to the page).

> I just don't see how ths can be done if the callee has another thread in
> the middle of establishing IO against this region of memory. 
> ->invalidate_page() _has_ to be able to block.  Confused.

invalidate_page marking the spte invalid and flushing the asid/tlb
doesn't need to block the same way ptep_clear_flush doesn't need to
block for the main linux pte. Infact before invalidate_page and
ptep_clear_flush can touch anything at all, they've to take their own
spinlocks (mmu_lock for the former, and PT lock for the latter).

The only sleeping trouble is for networked driven message passing,
where they want to schedule while they wait the message to arrive or
it'd hang the whole cpu to spin for so long.

sptes are cpu-clocked entities like ptes so scheduling there is by far
not necessary because there's zero delay in invalidating them and
flushing their tlbs. GRU is similar. Because we boost the reference
count of the pages for every spte mapping, only implementing
invalidate_range_end is enough, but I need to figure out the
get_user_pages->rmap_add window too and because get_user_pages can
schedule, and if I want to add a critical section around it to avoid
calling get_user_pages twice during the kvm page fault, a mutex would
be the only way (it sure can't be a spinlock). But a mutex can't be
taken by invalidate_page to stop it. So that leaves me with the idea
of adding a get_user_pages variant that returns the page locked. So
instead of calling get_user_pages a second time after rmap_add
returns, I will only need to call unlock_page which should be faster
than a follow_page. And setting the PG_lock before dropping the PT
lock in follow_page, should be fast enough too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-16 Thread Andrea Arcangeli
On Fri, Feb 15, 2008 at 07:37:36PM -0800, Andrew Morton wrote:
 The | is obviously deliberate.  But no explanation is provided telling us
 why we still call the callback if ptep_clear_flush_young() said the page
 was recently referenced.  People who read your code will want to understand
 this.

This is to clear the young bit in every pte and spte to such physical
page before backing off because any young bit was on. So if any young
bit will be on in the next scan, we're guaranteed the page has been
touched recently and not ages before (otherwise it would take a worst
case N rounds of the lru before the page can be freed, where N is the
number of pte or sptes pointing to the page).

 I just don't see how ths can be done if the callee has another thread in
 the middle of establishing IO against this region of memory. 
 -invalidate_page() _has_ to be able to block.  Confused.

invalidate_page marking the spte invalid and flushing the asid/tlb
doesn't need to block the same way ptep_clear_flush doesn't need to
block for the main linux pte. Infact before invalidate_page and
ptep_clear_flush can touch anything at all, they've to take their own
spinlocks (mmu_lock for the former, and PT lock for the latter).

The only sleeping trouble is for networked driven message passing,
where they want to schedule while they wait the message to arrive or
it'd hang the whole cpu to spin for so long.

sptes are cpu-clocked entities like ptes so scheduling there is by far
not necessary because there's zero delay in invalidating them and
flushing their tlbs. GRU is similar. Because we boost the reference
count of the pages for every spte mapping, only implementing
invalidate_range_end is enough, but I need to figure out the
get_user_pages-rmap_add window too and because get_user_pages can
schedule, and if I want to add a critical section around it to avoid
calling get_user_pages twice during the kvm page fault, a mutex would
be the only way (it sure can't be a spinlock). But a mutex can't be
taken by invalidate_page to stop it. So that leaves me with the idea
of adding a get_user_pages variant that returns the page locked. So
instead of calling get_user_pages a second time after rmap_add
returns, I will only need to call unlock_page which should be faster
than a follow_page. And setting the PG_lock before dropping the PT
lock in follow_page, should be fast enough too.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-16 Thread Christoph Lameter
On Fri, 15 Feb 2008, Andrew Morton wrote:

  @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
  if (vma-vm_flags  VM_LOCKED) {
  referenced++;
  *mapcount = 1;  /* break early from loop */
  -   } else if (ptep_clear_flush_young(vma, address, pte))
  +   } else if (ptep_clear_flush_young(vma, address, pte) |
  +  mmu_notifier_age_page(mm, address))
  referenced++;
 
 The | is obviously deliberate.  But no explanation is provided telling us
 why we still call the callback if ptep_clear_flush_young() said the page
 was recently referenced.  People who read your code will want to understand
 this.

Andrea?

  flush_cache_page(vma, address, pte_pfn(*pte));
  entry = ptep_clear_flush(vma, address, pte);
  +   mmu_notifier(invalidate_page, mm, address);
 
 I just don't see how ths can be done if the callee has another thread in
 the middle of establishing IO against this region of memory. 
 -invalidate_page() _has_ to be able to block.  Confused.

The page lock is held and that holds off I/O?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-15 Thread Andrew Morton
On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter <[EMAIL PROTECTED]> wrote:

> Two callbacks to remove individual pages as done in rmap code
> 
>   invalidate_page()
> 
> Called from the inner loop of rmap walks to invalidate pages.
> 
>   age_page()
> 
> Called for the determination of the page referenced status.
> 
> If we do not care about page referenced status then an age_page callback
> may be be omitted. PageLock and pte lock are held when either of the
> functions is called.

The age_page mystery shallows.

It would be useful to have some rationale somewhere in the patchset for the
existence of this callback.

>  #include 
>  
> @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
>   if (vma->vm_flags & VM_LOCKED) {
>   referenced++;
>   *mapcount = 1;  /* break early from loop */
> - } else if (ptep_clear_flush_young(vma, address, pte))
> + } else if (ptep_clear_flush_young(vma, address, pte) |
> +mmu_notifier_age_page(mm, address))
>   referenced++;

The "|" is obviously deliberate.  But no explanation is provided telling us
why we still call the callback if ptep_clear_flush_young() said the page
was recently referenced.  People who read your code will want to understand
this.

>   /* Pretend the page is referenced if the task has the
> @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page 
>  
>   flush_cache_page(vma, address, pte_pfn(*pte));
>   entry = ptep_clear_flush(vma, address, pte);
> + mmu_notifier(invalidate_page, mm, address);

I just don't see how ths can be done if the callee has another thread in
the middle of establishing IO against this region of memory. 
->invalidate_page() _has_ to be able to block.  Confused.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-15 Thread Andrew Morton
On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter [EMAIL PROTECTED] wrote:

 Two callbacks to remove individual pages as done in rmap code
 
   invalidate_page()
 
 Called from the inner loop of rmap walks to invalidate pages.
 
   age_page()
 
 Called for the determination of the page referenced status.
 
 If we do not care about page referenced status then an age_page callback
 may be be omitted. PageLock and pte lock are held when either of the
 functions is called.

The age_page mystery shallows.

It would be useful to have some rationale somewhere in the patchset for the
existence of this callback.

  #include asm/tlbflush.h
  
 @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
   if (vma-vm_flags  VM_LOCKED) {
   referenced++;
   *mapcount = 1;  /* break early from loop */
 - } else if (ptep_clear_flush_young(vma, address, pte))
 + } else if (ptep_clear_flush_young(vma, address, pte) |
 +mmu_notifier_age_page(mm, address))
   referenced++;

The | is obviously deliberate.  But no explanation is provided telling us
why we still call the callback if ptep_clear_flush_young() said the page
was recently referenced.  People who read your code will want to understand
this.

   /* Pretend the page is referenced if the task has the
 @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page 
  
   flush_cache_page(vma, address, pte_pfn(*pte));
   entry = ptep_clear_flush(vma, address, pte);
 + mmu_notifier(invalidate_page, mm, address);

I just don't see how ths can be done if the callee has another thread in
the middle of establishing IO against this region of memory. 
-invalidate_page() _has_ to be able to block.  Confused.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-14 Thread Christoph Lameter
Two callbacks to remove individual pages as done in rmap code

invalidate_page()

Called from the inner loop of rmap walks to invalidate pages.

age_page()

Called for the determination of the page referenced status.

If we do not care about page referenced status then an age_page callback
may be be omitted. PageLock and pte lock are held when either of the
functions is called.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
Signed-off-by: Robin Holt <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/rmap.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800
+++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
if (vma->vm_flags & VM_LOCKED) {
referenced++;
*mapcount = 1;  /* break early from loop */
-   } else if (ptep_clear_flush_young(vma, address, pte))
+   } else if (ptep_clear_flush_young(vma, address, pte) |
+  mmu_notifier_age_page(mm, address))
referenced++;
 
/* Pretend the page is referenced if the task has the
@@ -455,6 +457,7 @@ static int page_mkclean_one(struct page 
 
flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
@@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page 
 * skipped over this mm) then we should reactivate it.
 */
if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-   (ptep_clear_flush_young(vma, address, pte {
+   (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page 
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
@@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));
 
-   if (ptep_clear_flush_young(vma, address, pte))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
continue;
 
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* If nonlinear, store the file page offset in the pte. */
if (page->index != linear_page_index(vma, address))

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-14 Thread Christoph Lameter
Two callbacks to remove individual pages as done in rmap code

invalidate_page()

Called from the inner loop of rmap walks to invalidate pages.

age_page()

Called for the determination of the page referenced status.

If we do not care about page referenced status then an age_page callback
may be be omitted. PageLock and pte lock are held when either of the
functions is called.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Robin Holt [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/rmap.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800
+++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800
@@ -49,6 +49,7 @@
 #include linux/module.h
 #include linux/kallsyms.h
 #include linux/memcontrol.h
+#include linux/mmu_notifier.h
 
 #include asm/tlbflush.h
 
@@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
if (vma-vm_flags  VM_LOCKED) {
referenced++;
*mapcount = 1;  /* break early from loop */
-   } else if (ptep_clear_flush_young(vma, address, pte))
+   } else if (ptep_clear_flush_young(vma, address, pte) |
+  mmu_notifier_age_page(mm, address))
referenced++;
 
/* Pretend the page is referenced if the task has the
@@ -455,6 +457,7 @@ static int page_mkclean_one(struct page 
 
flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
@@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page 
 * skipped over this mm) then we should reactivate it.
 */
if (!migration  ((vma-vm_flags  VM_LOCKED) ||
-   (ptep_clear_flush_young(vma, address, pte {
+   (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page 
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
@@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));
 
-   if (ptep_clear_flush_young(vma, address, pte))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
continue;
 
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* If nonlinear, store the file page offset in the pte. */
if (page-index != linear_page_index(vma, address))

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-08 Thread Christoph Lameter
Two callbacks to remove individual pages as done in rmap code

invalidate_page()

Called from the inner loop of rmap walks to invalidate pages.

age_page()

Called for the determination of the page referenced status.

If we do not care about page referenced status then an age_page callback
may be be omitted. PageLock and pte lock are held when either of the
functions is called.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
Signed-off-by: Robin Holt <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/rmap.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800
+++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
if (vma->vm_flags & VM_LOCKED) {
referenced++;
*mapcount = 1;  /* break early from loop */
-   } else if (ptep_clear_flush_young(vma, address, pte))
+   } else if (ptep_clear_flush_young(vma, address, pte) |
+  mmu_notifier_age_page(mm, address))
referenced++;
 
/* Pretend the page is referenced if the task has the
@@ -455,6 +457,7 @@ static int page_mkclean_one(struct page 
 
flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
@@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page 
 * skipped over this mm) then we should reactivate it.
 */
if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-   (ptep_clear_flush_young(vma, address, pte {
+   (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page 
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
@@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));
 
-   if (ptep_clear_flush_young(vma, address, pte))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
continue;
 
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* If nonlinear, store the file page offset in the pte. */
if (page->index != linear_page_index(vma, address))

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-08 Thread Christoph Lameter
Two callbacks to remove individual pages as done in rmap code

invalidate_page()

Called from the inner loop of rmap walks to invalidate pages.

age_page()

Called for the determination of the page referenced status.

If we do not care about page referenced status then an age_page callback
may be be omitted. PageLock and pte lock are held when either of the
functions is called.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Robin Holt [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/rmap.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800
+++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800
@@ -49,6 +49,7 @@
 #include linux/module.h
 #include linux/kallsyms.h
 #include linux/memcontrol.h
+#include linux/mmu_notifier.h
 
 #include asm/tlbflush.h
 
@@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
if (vma-vm_flags  VM_LOCKED) {
referenced++;
*mapcount = 1;  /* break early from loop */
-   } else if (ptep_clear_flush_young(vma, address, pte))
+   } else if (ptep_clear_flush_young(vma, address, pte) |
+  mmu_notifier_age_page(mm, address))
referenced++;
 
/* Pretend the page is referenced if the task has the
@@ -455,6 +457,7 @@ static int page_mkclean_one(struct page 
 
flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
@@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page 
 * skipped over this mm) then we should reactivate it.
 */
if (!migration  ((vma-vm_flags  VM_LOCKED) ||
-   (ptep_clear_flush_young(vma, address, pte {
+   (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page 
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
@@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));
 
-   if (ptep_clear_flush_young(vma, address, pte))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
continue;
 
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* If nonlinear, store the file page offset in the pte. */
if (page-index != linear_page_index(vma, address))

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-30 Thread Robin Holt
This is the second part of a patch posted to patch 1/6.


Index: git-linus/mm/rmap.c
===
--- git-linus.orig/mm/rmap.c2008-01-30 11:55:56.0 -0600
+++ git-linus/mm/rmap.c 2008-01-30 12:01:28.0 -0600
@@ -476,8 +476,10 @@ int page_mkclean(struct page *page)
struct address_space *mapping = page_mapping(page);
if (mapping) {
ret = page_mkclean_file(mapping, page);
-   if (unlikely(PageExternalRmap(page)))
+   if (unlikely(PageExternalRmap(page))) {
mmu_rmap_notifier(invalidate_page, page);
+   ClearPageExported(page);
+   }
if (page_test_dirty(page)) {
page_clear_dirty(page);
ret = 1;
@@ -980,8 +982,10 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
-   if (unlikely(PageExternalRmap(page)))
+   if (unlikely(PageExternalRmap(page))) {
mmu_rmap_notifier(invalidate_page, page);
+   ClearPageExported(page);
+   }
 
if (!page_mapped(page))
ret = SWAP_SUCCESS;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-30 Thread Robin Holt
This is the second part of a patch posted to patch 1/6.


Index: git-linus/mm/rmap.c
===
--- git-linus.orig/mm/rmap.c2008-01-30 11:55:56.0 -0600
+++ git-linus/mm/rmap.c 2008-01-30 12:01:28.0 -0600
@@ -476,8 +476,10 @@ int page_mkclean(struct page *page)
struct address_space *mapping = page_mapping(page);
if (mapping) {
ret = page_mkclean_file(mapping, page);
-   if (unlikely(PageExternalRmap(page)))
+   if (unlikely(PageExternalRmap(page))) {
mmu_rmap_notifier(invalidate_page, page);
+   ClearPageExported(page);
+   }
if (page_test_dirty(page)) {
page_clear_dirty(page);
ret = 1;
@@ -980,8 +982,10 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
-   if (unlikely(PageExternalRmap(page)))
+   if (unlikely(PageExternalRmap(page))) {
mmu_rmap_notifier(invalidate_page, page);
+   ClearPageExported(page);
+   }
 
if (!page_mapped(page))
ret = SWAP_SUCCESS;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-29 Thread Christoph Lameter
Callbacks to remove individual pages if the subsystem has an
rmap capability. The pagelock is held but no spinlocks are held.
The refcount of the page is elevated so that dropping the refcount
in the subsystem will not directly free the page.

The callbacks occur after the Linux rmaps have been walked.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/rmap.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800
+++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -473,6 +474,8 @@ int page_mkclean(struct page *page)
struct address_space *mapping = page_mapping(page);
if (mapping) {
ret = page_mkclean_file(mapping, page);
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
if (page_test_dirty(page)) {
page_clear_dirty(page);
ret = 1;
@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-29 Thread Robin Holt
I don't understand how this is intended to work.  I think the page flag
needs to be maintained by the mmu_notifier subsystem.

Let's assume we have a mapping that has a grant from xpmem and an
additional grant from kvm.  The exporters are not important, the fact
that there may be two is.

Assume that the user revokes the grant from xpmem (we call that
xpmem_remove).  As far as xpmem is concerned, there are no longer any
exports of that page so the page should no longer have its exported
flag set.  Note: This is not a process exit, but a function of xpmem.

In that case, at the remove time, we have no idea whether the flag should
be cleared.

For the invalidate_page side, I think we should have:
> @@ -473,6 +474,10 @@ int page_mkclean(struct page *page)
>   struct address_space *mapping = page_mapping(page);
>   if (mapping) {
>   ret = page_mkclean_file(mapping, page);
> + if (unlikely(PageExternalRmap(page))) {
> + mmu_rmap_notifier(invalidate_page, page);
> + ClearPageExternalRmap(page);
> + }
>   if (page_test_dirty(page)) {
>   page_clear_dirty(page);
>   ret = 1;

I would assume we would then want a function which sets the page flag.

Additionally, I would think we would want some intervention in the
freeing of the page side to ensure the page flag is cleared as well.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-29 Thread Robin Holt
I don't understand how this is intended to work.  I think the page flag
needs to be maintained by the mmu_notifier subsystem.

Let's assume we have a mapping that has a grant from xpmem and an
additional grant from kvm.  The exporters are not important, the fact
that there may be two is.

Assume that the user revokes the grant from xpmem (we call that
xpmem_remove).  As far as xpmem is concerned, there are no longer any
exports of that page so the page should no longer have its exported
flag set.  Note: This is not a process exit, but a function of xpmem.

In that case, at the remove time, we have no idea whether the flag should
be cleared.

For the invalidate_page side, I think we should have:
 @@ -473,6 +474,10 @@ int page_mkclean(struct page *page)
   struct address_space *mapping = page_mapping(page);
   if (mapping) {
   ret = page_mkclean_file(mapping, page);
 + if (unlikely(PageExternalRmap(page))) {
 + mmu_rmap_notifier(invalidate_page, page);
 + ClearPageExternalRmap(page);
 + }
   if (page_test_dirty(page)) {
   page_clear_dirty(page);
   ret = 1;

I would assume we would then want a function which sets the page flag.

Additionally, I would think we would want some intervention in the
freeing of the page side to ensure the page flag is cleared as well.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-29 Thread Christoph Lameter
Callbacks to remove individual pages if the subsystem has an
rmap capability. The pagelock is held but no spinlocks are held.
The refcount of the page is elevated so that dropping the refcount
in the subsystem will not directly free the page.

The callbacks occur after the Linux rmaps have been walked.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/rmap.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800
+++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800
@@ -49,6 +49,7 @@
 #include linux/rcupdate.h
 #include linux/module.h
 #include linux/kallsyms.h
+#include linux/mmu_notifier.h
 
 #include asm/tlbflush.h
 
@@ -473,6 +474,8 @@ int page_mkclean(struct page *page)
struct address_space *mapping = page_mapping(page);
if (mapping) {
ret = page_mkclean_file(mapping, page);
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
if (page_test_dirty(page)) {
page_clear_dirty(page);
ret = 1;
@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-28 Thread Christoph Lameter
Callbacks to remove individual pages if the subsystem has an
rmap capability. The pagelock is held but no spinlocks are held.
The refcount of the page is elevated so that dropping the refcount
in the subsystem will not directly free the page.

The callbacks occur after the Linux rmaps have been walked.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/rmap.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800
+++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -473,6 +474,8 @@ int page_mkclean(struct page *page)
struct address_space *mapping = page_mapping(page);
if (mapping) {
ret = page_mkclean_file(mapping, page);
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
if (page_test_dirty(page)) {
page_clear_dirty(page);
ret = 1;
@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-28 Thread Christoph Lameter
Callbacks to remove individual pages if the subsystem has an
rmap capability. The pagelock is held but no spinlocks are held.
The refcount of the page is elevated so that dropping the refcount
in the subsystem will not directly free the page.

The callbacks occur after the Linux rmaps have been walked.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/rmap.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800
+++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800
@@ -49,6 +49,7 @@
 #include linux/rcupdate.h
 #include linux/module.h
 #include linux/kallsyms.h
+#include linux/mmu_notifier.h
 
 #include asm/tlbflush.h
 
@@ -473,6 +474,8 @@ int page_mkclean(struct page *page)
struct address_space *mapping = page_mapping(page);
if (mapping) {
ret = page_mkclean_file(mapping, page);
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
if (page_test_dirty(page)) {
page_clear_dirty(page);
ret = 1;
@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/