Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-02-01 Thread Christoph Lameter
On Fri, 1 Feb 2008, Robin Holt wrote:

> Maybe I haven't looked closely enough, but let's start with some common
> assumptions.  Looking at do_wp_page from 2.6.24 (I believe that is what
> my work area is based upon).  On line 1559, the function begins being
> declared.

Aah I looked at the wrong file.

> On lines 1614 and 1630, we do "goto unlock" where the _end callout is
> soon made.  The _begin callout does not come until after those branches
> have been taken (occurs on line 1648).

There are actually two cases...

---
 mm/memory.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2008-02-01 11:04:21.0 -0800
+++ linux-2.6/mm/memory.c   2008-02-01 11:12:12.0 -0800
@@ -1611,8 +1611,10 @@ static int do_wp_page(struct mm_struct *
page_table = pte_offset_map_lock(mm, pmd, address,
 );
page_cache_release(old_page);
-   if (!pte_same(*page_table, orig_pte))
-   goto unlock;
+   if (!pte_same(*page_table, orig_pte)) {
+   pte_unmap_unlock(page_table, ptl);
+   goto check_dirty;
+   }
 
page_mkwrite = 1;
}
@@ -1628,7 +1630,8 @@ static int do_wp_page(struct mm_struct *
if (ptep_set_access_flags(vma, address, page_table, entry,1))
update_mmu_cache(vma, address, entry);
ret |= VM_FAULT_WRITE;
-   goto unlock;
+   pte_unmap_unlock(page_table, ptl);
+   goto check_dirty;
}
 
/*
@@ -1684,10 +1687,10 @@ gotten:
page_cache_release(new_page);
if (old_page)
page_cache_release(old_page);
-unlock:
pte_unmap_unlock(page_table, ptl);
mmu_notifier(invalidate_range_end, mm,
address, address + PAGE_SIZE, 0);
+check_dirty:
if (dirty_page) {
if (vma->vm_file)
file_update_time(vma->vm_file);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-02-01 Thread Robin Holt
On Fri, Feb 01, 2008 at 04:32:21AM -0600, Robin Holt wrote:
> On Thu, Jan 31, 2008 at 08:43:58PM -0800, Christoph Lameter wrote:
> > On Thu, 31 Jan 2008, Robin Holt wrote:
> > 
> > > > Index: linux-2.6/mm/memory.c
> > > ...
> > > > @@ -1668,6 +1678,7 @@ gotten:
> > > > page_cache_release(old_page);
> > > >  unlock:
> > > > pte_unmap_unlock(page_table, ptl);
> > > > +   mmu_notifier(invalidate_range_end, mm, 0);
> > > 
> > > I think we can get an _end call without the _begin call before it.
> > 
> > If that would be true then also the pte would have been left locked.
> > 
> > We always hit unlock. Maybe I just do not see it?
> 
> Maybe I haven't looked closely enough, but let's start with some common
> assumptions.  Looking at do_wp_page from 2.6.24 (I believe that is what
> my work area is based upon).  On line 1559, the function begins being
> declared.
> 
> On lines 1614 and 1630, we do "goto unlock" where the _end callout is
> soon made.  The _begin callout does not come until after those branches
> have been taken (occurs on line 1648).
> 
> Thanks,
> Robin

Ignore this thread, I am going to throw a patch against the new version.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-02-01 Thread Robin Holt
On Thu, Jan 31, 2008 at 08:43:58PM -0800, Christoph Lameter wrote:
> On Thu, 31 Jan 2008, Robin Holt wrote:
> 
> > > Index: linux-2.6/mm/memory.c
> > ...
> > > @@ -1668,6 +1678,7 @@ gotten:
> > >   page_cache_release(old_page);
> > >  unlock:
> > >   pte_unmap_unlock(page_table, ptl);
> > > + mmu_notifier(invalidate_range_end, mm, 0);
> > 
> > I think we can get an _end call without the _begin call before it.
> 
> If that would be true then also the pte would have been left locked.
> 
> We always hit unlock. Maybe I just do not see it?

Maybe I haven't looked closely enough, but let's start with some common
assumptions.  Looking at do_wp_page from 2.6.24 (I believe that is what
my work area is based upon).  On line 1559, the function begins being
declared.

On lines 1614 and 1630, we do "goto unlock" where the _end callout is
soon made.  The _begin callout does not come until after those branches
have been taken (occurs on line 1648).

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-02-01 Thread Robin Holt
On Fri, Feb 01, 2008 at 04:32:21AM -0600, Robin Holt wrote:
 On Thu, Jan 31, 2008 at 08:43:58PM -0800, Christoph Lameter wrote:
  On Thu, 31 Jan 2008, Robin Holt wrote:
  
Index: linux-2.6/mm/memory.c
   ...
@@ -1668,6 +1678,7 @@ gotten:
page_cache_release(old_page);
 unlock:
pte_unmap_unlock(page_table, ptl);
+   mmu_notifier(invalidate_range_end, mm, 0);
   
   I think we can get an _end call without the _begin call before it.
  
  If that would be true then also the pte would have been left locked.
  
  We always hit unlock. Maybe I just do not see it?
 
 Maybe I haven't looked closely enough, but let's start with some common
 assumptions.  Looking at do_wp_page from 2.6.24 (I believe that is what
 my work area is based upon).  On line 1559, the function begins being
 declared.
 
 On lines 1614 and 1630, we do goto unlock where the _end callout is
 soon made.  The _begin callout does not come until after those branches
 have been taken (occurs on line 1648).
 
 Thanks,
 Robin

Ignore this thread, I am going to throw a patch against the new version.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-02-01 Thread Robin Holt
On Thu, Jan 31, 2008 at 08:43:58PM -0800, Christoph Lameter wrote:
 On Thu, 31 Jan 2008, Robin Holt wrote:
 
   Index: linux-2.6/mm/memory.c
  ...
   @@ -1668,6 +1678,7 @@ gotten:
 page_cache_release(old_page);
unlock:
 pte_unmap_unlock(page_table, ptl);
   + mmu_notifier(invalidate_range_end, mm, 0);
  
  I think we can get an _end call without the _begin call before it.
 
 If that would be true then also the pte would have been left locked.
 
 We always hit unlock. Maybe I just do not see it?

Maybe I haven't looked closely enough, but let's start with some common
assumptions.  Looking at do_wp_page from 2.6.24 (I believe that is what
my work area is based upon).  On line 1559, the function begins being
declared.

On lines 1614 and 1630, we do goto unlock where the _end callout is
soon made.  The _begin callout does not come until after those branches
have been taken (occurs on line 1648).

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-02-01 Thread Christoph Lameter
On Fri, 1 Feb 2008, Robin Holt wrote:

 Maybe I haven't looked closely enough, but let's start with some common
 assumptions.  Looking at do_wp_page from 2.6.24 (I believe that is what
 my work area is based upon).  On line 1559, the function begins being
 declared.

Aah I looked at the wrong file.

 On lines 1614 and 1630, we do goto unlock where the _end callout is
 soon made.  The _begin callout does not come until after those branches
 have been taken (occurs on line 1648).

There are actually two cases...

---
 mm/memory.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2008-02-01 11:04:21.0 -0800
+++ linux-2.6/mm/memory.c   2008-02-01 11:12:12.0 -0800
@@ -1611,8 +1611,10 @@ static int do_wp_page(struct mm_struct *
page_table = pte_offset_map_lock(mm, pmd, address,
 ptl);
page_cache_release(old_page);
-   if (!pte_same(*page_table, orig_pte))
-   goto unlock;
+   if (!pte_same(*page_table, orig_pte)) {
+   pte_unmap_unlock(page_table, ptl);
+   goto check_dirty;
+   }
 
page_mkwrite = 1;
}
@@ -1628,7 +1630,8 @@ static int do_wp_page(struct mm_struct *
if (ptep_set_access_flags(vma, address, page_table, entry,1))
update_mmu_cache(vma, address, entry);
ret |= VM_FAULT_WRITE;
-   goto unlock;
+   pte_unmap_unlock(page_table, ptl);
+   goto check_dirty;
}
 
/*
@@ -1684,10 +1687,10 @@ gotten:
page_cache_release(new_page);
if (old_page)
page_cache_release(old_page);
-unlock:
pte_unmap_unlock(page_table, ptl);
mmu_notifier(invalidate_range_end, mm,
address, address + PAGE_SIZE, 0);
+check_dirty:
if (dirty_page) {
if (vma-vm_file)
file_update_time(vma-vm_file);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Christoph Lameter
On Thu, 31 Jan 2008, Robin Holt wrote:

> > Index: linux-2.6/mm/memory.c
> ...
> > @@ -1668,6 +1678,7 @@ gotten:
> > page_cache_release(old_page);
> >  unlock:
> > pte_unmap_unlock(page_table, ptl);
> > +   mmu_notifier(invalidate_range_end, mm, 0);
> 
> I think we can get an _end call without the _begin call before it.

If that would be true then also the pte would have been left locked.

We always hit unlock. Maybe I just do not see it?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Robin Holt
> Index: linux-2.6/mm/memory.c
...
> @@ -1668,6 +1678,7 @@ gotten:
>   page_cache_release(old_page);
>  unlock:
>   pte_unmap_unlock(page_table, ptl);
> + mmu_notifier(invalidate_range_end, mm, 0);

I think we can get an _end call without the _begin call before it.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Christoph Lameter
On Thu, 31 Jan 2008, Andrea Arcangeli wrote:

> On Wed, Jan 30, 2008 at 08:57:52PM -0800, Christoph Lameter wrote:
> > @@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
> > spin_unlock(>i_mmap_lock);
> > }
> >  
> > +   mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
> > err = populate_range(mm, vma, start, size, pgoff);
> > +   mmu_notifier(invalidate_range_end, mm, 0);
> > if (!err && !(flags & MAP_NONBLOCK)) {
> > if (unlikely(has_write_lock)) {
> > downgrade_write(>mmap_sem);
> 
> This can't be enough for GRU, infact it can't work for KVM either. You
> got 1) to have some invalidate_page for GRU before freeing the page,
> and 2) to pass start, end to range_end (if you want kvm to use it
> instead of invalidate_page).

The external references are dropped when calling invalidate_range_begin. 
This would work both for the KVM and the GRU. Why would KVM not be able to 
invalidate the range before? Locking conventions is that no additional 
external reference can be added between invalidate_range_begin and 
invalidate_range_end. So KVM is fine too.

> mremap still missing as a whole.

mremap uses do_munmap which calls into unmap_region() that already has 
callbacks. So what is wrong there?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 08:57:52PM -0800, Christoph Lameter wrote:
> @@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
>   spin_unlock(>i_mmap_lock);
>   }
>  
> + mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
>   err = populate_range(mm, vma, start, size, pgoff);
> + mmu_notifier(invalidate_range_end, mm, 0);
>   if (!err && !(flags & MAP_NONBLOCK)) {
>   if (unlikely(has_write_lock)) {
>   downgrade_write(>mmap_sem);

This can't be enough for GRU, infact it can't work for KVM either. You
got 1) to have some invalidate_page for GRU before freeing the page,
and 2) to pass start, end to range_end (if you want kvm to use it
instead of invalidate_page).

mremap still missing as a whole.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 08:57:52PM -0800, Christoph Lameter wrote:
 @@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
   spin_unlock(mapping-i_mmap_lock);
   }
  
 + mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
   err = populate_range(mm, vma, start, size, pgoff);
 + mmu_notifier(invalidate_range_end, mm, 0);
   if (!err  !(flags  MAP_NONBLOCK)) {
   if (unlikely(has_write_lock)) {
   downgrade_write(mm-mmap_sem);

This can't be enough for GRU, infact it can't work for KVM either. You
got 1) to have some invalidate_page for GRU before freeing the page,
and 2) to pass start, end to range_end (if you want kvm to use it
instead of invalidate_page).

mremap still missing as a whole.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Christoph Lameter
On Thu, 31 Jan 2008, Andrea Arcangeli wrote:

 On Wed, Jan 30, 2008 at 08:57:52PM -0800, Christoph Lameter wrote:
  @@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
  spin_unlock(mapping-i_mmap_lock);
  }
   
  +   mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
  err = populate_range(mm, vma, start, size, pgoff);
  +   mmu_notifier(invalidate_range_end, mm, 0);
  if (!err  !(flags  MAP_NONBLOCK)) {
  if (unlikely(has_write_lock)) {
  downgrade_write(mm-mmap_sem);
 
 This can't be enough for GRU, infact it can't work for KVM either. You
 got 1) to have some invalidate_page for GRU before freeing the page,
 and 2) to pass start, end to range_end (if you want kvm to use it
 instead of invalidate_page).

The external references are dropped when calling invalidate_range_begin. 
This would work both for the KVM and the GRU. Why would KVM not be able to 
invalidate the range before? Locking conventions is that no additional 
external reference can be added between invalidate_range_begin and 
invalidate_range_end. So KVM is fine too.

 mremap still missing as a whole.

mremap uses do_munmap which calls into unmap_region() that already has 
callbacks. So what is wrong there?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Robin Holt
 Index: linux-2.6/mm/memory.c
...
 @@ -1668,6 +1678,7 @@ gotten:
   page_cache_release(old_page);
  unlock:
   pte_unmap_unlock(page_table, ptl);
 + mmu_notifier(invalidate_range_end, mm, 0);

I think we can get an _end call without the _begin call before it.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-31 Thread Christoph Lameter
On Thu, 31 Jan 2008, Robin Holt wrote:

  Index: linux-2.6/mm/memory.c
 ...
  @@ -1668,6 +1678,7 @@ gotten:
  page_cache_release(old_page);
   unlock:
  pte_unmap_unlock(page_table, ptl);
  +   mmu_notifier(invalidate_range_end, mm, 0);
 
 I think we can get an _end call without the _begin call before it.

If that would be true then also the pte would have been left locked.

We always hit unlock. Maybe I just do not see it?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-30 Thread Christoph Lameter
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.

invalidate_range_begin/end() is frequently called with only mmap_sem
held. If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate that no sleeping is
possible.

In two cases we use invalidate_range_begin/end to invalidate
single pages because the pair allows holding off new references
(idea by Robin Holt).

do_wp_page(): We hold off new references while update the pte.

xip_unmap: We are not taking the PageLock so we cannot
use the invalidate_page mmu_rmap_notifier. invalidate_range_begin/end
stands in.

Comments state that mmap_sem must be held for
remap_pfn_range() but various drivers do not seem to do this.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
Signed-off-by: Robin Holt <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/filemap_xip.c |4 
 mm/fremap.c  |3 +++
 mm/hugetlb.c |3 +++
 mm/memory.c  |   15 +--
 mm/mmap.c|2 ++
 5 files changed, 25 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/fremap.c
===
--- linux-2.6.orig/mm/fremap.c  2008-01-30 20:03:05.0 -0800
+++ linux-2.6/mm/fremap.c   2008-01-30 20:05:39.0 -0800
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
spin_unlock(>i_mmap_lock);
}
 
+   mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
err = populate_range(mm, vma, start, size, pgoff);
+   mmu_notifier(invalidate_range_end, mm, 0);
if (!err && !(flags & MAP_NONBLOCK)) {
if (unlikely(has_write_lock)) {
downgrade_write(>mmap_sem);
Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2008-01-30 20:03:05.0 -0800
+++ linux-2.6/mm/memory.c   2008-01-30 20:07:27.0 -0800
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -883,13 +884,16 @@ unsigned long zap_page_range(struct vm_a
struct mmu_gather *tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
+   int atomic = details ? (details->i_mmap_lock != 0) : 0;
 
lru_add_drain();
tlb = tlb_gather_mmu(mm, 0);
update_hiwater_rss(mm);
+   mmu_notifier(invalidate_range_begin, mm, address, end, atomic);
end = unmap_vmas(, vma, address, end, _accounted, details);
if (tlb)
tlb_finish_mmu(tlb, address, end);
+   mmu_notifier(invalidate_range_end, mm, atomic);
return end;
 }
 
@@ -1318,7 +1322,7 @@ int remap_pfn_range(struct vm_area_struc
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + PAGE_ALIGN(size);
+   unsigned long start = addr, end = addr + PAGE_ALIGN(size);
struct mm_struct *mm = vma->vm_mm;
int err;
 
@@ -1352,6 +1356,7 @@ int remap_pfn_range(struct vm_area_struc
pfn -= addr >> PAGE_SHIFT;
pgd = pgd_offset(mm, addr);
flush_cache_range(vma, addr, end);
+   mmu_notifier(invalidate_range_begin, mm, start, end, 0);
do {
next = pgd_addr_end(addr, end);
err = remap_pud_range(mm, pgd, addr, next,
@@ -1359,6 +1364,7 @@ int remap_pfn_range(struct vm_area_struc
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range_end, mm, 0);
return err;
 }
 EXPORT_SYMBOL(remap_pfn_range);
@@ -1442,10 +1448,11 @@ int apply_to_page_range(struct mm_struct
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + size;
+   unsigned long start = addr, end = addr + size;
int err;
 
BUG_ON(addr >= end);
+   mmu_notifier(invalidate_range_begin, mm, start, end, 0);
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);
@@ -1453,6 +1460,7 @@ int apply_to_page_range(struct mm_struct
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range_end, mm, 0);
return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1630,6 +1638,8 @@ gotten:
goto oom;
cow_user_page(new_page, old_page, address, vma);
 
+   mmu_notifier(invalidate_range_begin, mm, address,
+   address + PAGE_SIZE - 1, 0);
/*
 * Re-check the pte - we dropped the lock
 */
@@ -1668,6 +1678,7 @@ gotten:
page_cache_release(old_page);
 unlock:
pte_unmap_unlock(page_table, ptl);
+   mmu_notifier(invalidate_range_end, mm, 

[patch 2/3] mmu_notifier: Callbacks to invalidate address ranges

2008-01-30 Thread Christoph Lameter
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.

invalidate_range_begin/end() is frequently called with only mmap_sem
held. If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate that no sleeping is
possible.

In two cases we use invalidate_range_begin/end to invalidate
single pages because the pair allows holding off new references
(idea by Robin Holt).

do_wp_page(): We hold off new references while update the pte.

xip_unmap: We are not taking the PageLock so we cannot
use the invalidate_page mmu_rmap_notifier. invalidate_range_begin/end
stands in.

Comments state that mmap_sem must be held for
remap_pfn_range() but various drivers do not seem to do this.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Robin Holt [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/filemap_xip.c |4 
 mm/fremap.c  |3 +++
 mm/hugetlb.c |3 +++
 mm/memory.c  |   15 +--
 mm/mmap.c|2 ++
 5 files changed, 25 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/fremap.c
===
--- linux-2.6.orig/mm/fremap.c  2008-01-30 20:03:05.0 -0800
+++ linux-2.6/mm/fremap.c   2008-01-30 20:05:39.0 -0800
@@ -15,6 +15,7 @@
 #include linux/rmap.h
 #include linux/module.h
 #include linux/syscalls.h
+#include linux/mmu_notifier.h
 
 #include asm/mmu_context.h
 #include asm/cacheflush.h
@@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
spin_unlock(mapping-i_mmap_lock);
}
 
+   mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
err = populate_range(mm, vma, start, size, pgoff);
+   mmu_notifier(invalidate_range_end, mm, 0);
if (!err  !(flags  MAP_NONBLOCK)) {
if (unlikely(has_write_lock)) {
downgrade_write(mm-mmap_sem);
Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2008-01-30 20:03:05.0 -0800
+++ linux-2.6/mm/memory.c   2008-01-30 20:07:27.0 -0800
@@ -50,6 +50,7 @@
 #include linux/delayacct.h
 #include linux/init.h
 #include linux/writeback.h
+#include linux/mmu_notifier.h
 
 #include asm/pgalloc.h
 #include asm/uaccess.h
@@ -883,13 +884,16 @@ unsigned long zap_page_range(struct vm_a
struct mmu_gather *tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
+   int atomic = details ? (details-i_mmap_lock != 0) : 0;
 
lru_add_drain();
tlb = tlb_gather_mmu(mm, 0);
update_hiwater_rss(mm);
+   mmu_notifier(invalidate_range_begin, mm, address, end, atomic);
end = unmap_vmas(tlb, vma, address, end, nr_accounted, details);
if (tlb)
tlb_finish_mmu(tlb, address, end);
+   mmu_notifier(invalidate_range_end, mm, atomic);
return end;
 }
 
@@ -1318,7 +1322,7 @@ int remap_pfn_range(struct vm_area_struc
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + PAGE_ALIGN(size);
+   unsigned long start = addr, end = addr + PAGE_ALIGN(size);
struct mm_struct *mm = vma-vm_mm;
int err;
 
@@ -1352,6 +1356,7 @@ int remap_pfn_range(struct vm_area_struc
pfn -= addr  PAGE_SHIFT;
pgd = pgd_offset(mm, addr);
flush_cache_range(vma, addr, end);
+   mmu_notifier(invalidate_range_begin, mm, start, end, 0);
do {
next = pgd_addr_end(addr, end);
err = remap_pud_range(mm, pgd, addr, next,
@@ -1359,6 +1364,7 @@ int remap_pfn_range(struct vm_area_struc
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range_end, mm, 0);
return err;
 }
 EXPORT_SYMBOL(remap_pfn_range);
@@ -1442,10 +1448,11 @@ int apply_to_page_range(struct mm_struct
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + size;
+   unsigned long start = addr, end = addr + size;
int err;
 
BUG_ON(addr = end);
+   mmu_notifier(invalidate_range_begin, mm, start, end, 0);
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);
@@ -1453,6 +1460,7 @@ int apply_to_page_range(struct mm_struct
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range_end, mm, 0);
return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1630,6 +1638,8 @@ gotten:
goto oom;
cow_user_page(new_page, old_page, address, vma);
 
+   mmu_notifier(invalidate_range_begin, mm, address,
+   address + PAGE_SIZE - 1, 0);
/*
 * Re-check the pte - we dropped the lock