Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-22 Thread Miklos Szeredi
> > >
> > >  /*
> > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > > + * It will force a pagefault on the next write access.
> > > + */
> > > +static void vma_wrprotect(struct vm_area_struct *vma)
> > > +{
> > > + unsigned long addr;
> > > +
> > > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> > > + spinlock_t *ptl;
> > > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > > + pud_t *pud = pud_offset(pgd, addr);
> > > + pmd_t *pmd = pmd_offset(pud, addr);
> > > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, 
> > > );
> >
> > This is extremely expensive over bigger areas, especially sparsely mapped
> > ones (it does all the lookups for all four levels over and over and over
> > again for eachg page).
> >
> > I think Peter Zijlstra posted a version that uses the regular kind of
> > nested loop (with inline functions to keep the thing nice and clean),
> > which gets rid of that.
> 
> Thanks for your feedback, Linus!
> 
> I will use Peter Zijlstra's version of such an operation in my next
> patch series.

But note, that those functions iterate over all the vmas for the given
page range, not just the one msync was performed on.  This might get
even more expensive, if the file is mapped lots of times.

The old version, that Linus was referring to, needs some modification
as well, because it doesn't write protect the ptes, just marks them
clean.

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-22 Thread Miklos Szeredi
  
/*
   + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
   + * It will force a pagefault on the next write access.
   + */
   +static void vma_wrprotect(struct vm_area_struct *vma)
   +{
   + unsigned long addr;
   +
   + for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {
   + spinlock_t *ptl;
   + pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
   + pud_t *pud = pud_offset(pgd, addr);
   + pmd_t *pmd = pmd_offset(pud, addr);
   + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, 
   ptl);
 
  This is extremely expensive over bigger areas, especially sparsely mapped
  ones (it does all the lookups for all four levels over and over and over
  again for eachg page).
 
  I think Peter Zijlstra posted a version that uses the regular kind of
  nested loop (with inline functions to keep the thing nice and clean),
  which gets rid of that.
 
 Thanks for your feedback, Linus!
 
 I will use Peter Zijlstra's version of such an operation in my next
 patch series.

But note, that those functions iterate over all the vmas for the given
page range, not just the one msync was performed on.  This might get
even more expensive, if the file is mapped lots of times.

The old version, that Linus was referring to, needs some modification
as well, because it doesn't write protect the ptes, just marks them
clean.

Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Andi Kleen
Anton Salikhmetov <[EMAIL PROTECTED]> writes:

You should probably put your design document somewhere in Documentation
with a patch.

> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> + unsigned long addr;
> +
> + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> + spinlock_t *ptl;
> + pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> + pud_t *pud = pud_offset(pgd, addr);
> + pmd_t *pmd = pmd_offset(pud, addr);
> + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, );

This means on i386 with highmem ptes you will map/flush tlb/unmap each
PTE individually. You will do 512 times as much work as really needed
per PTE leaf page.

The performance critical address space walkers use a different design
pattern that avoids this.

> + if (pte_dirty(*pte) && pte_write(*pte)) {
> + pte_t entry = ptep_clear_flush(vma, addr, pte);

Flushing TLBs unbatched can also be very expensive because if the MM is
shared by several CPUs you'll have a inter-processor interrupt for 
each iteration. They are quite costly even on smaller systems.

It would be better if you did a single flush_tlb_range() at the end.
This means on x86 this will currently always do a full flush, but that's
still better than really slowing down in the heavily multithreaded case.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Linus Torvalds <[EMAIL PROTECTED]>:
>
>
> On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
> >
> >  /*
> > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > + * It will force a pagefault on the next write access.
> > + */
> > +static void vma_wrprotect(struct vm_area_struct *vma)
> > +{
> > + unsigned long addr;
> > +
> > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> > + spinlock_t *ptl;
> > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > + pud_t *pud = pud_offset(pgd, addr);
> > + pmd_t *pmd = pmd_offset(pud, addr);
> > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, );
>
> This is extremely expensive over bigger areas, especially sparsely mapped
> ones (it does all the lookups for all four levels over and over and over
> again for eachg page).
>
> I think Peter Zijlstra posted a version that uses the regular kind of
> nested loop (with inline functions to keep the thing nice and clean),
> which gets rid of that.

Thanks for your feedback, Linus!

I will use Peter Zijlstra's version of such an operation in my next
patch series.

>
> [ The sad/funny part is that this is all how we *used* to do msync(), back
>   in the days: we're literally going back to the "pre-cleanup" logic. See
>   commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup"
>   for details ]
>
> Quite frankly, I really think you might be better off just doing a
>
> git revert 204ec841fbea3e5138168edbc3a76d46747cc987
>
> and working from there! I just checked, and it still reverts cleanly, and
> you'd end up with a nice code-base that (a) has gotten years of testing
> and (b) already has the looping-over-the-pagetables code.
>
> Linus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> > > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > > > Some very pedantic nitpicking below;
> > > >
> > > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> > ...
> > > > > +   if (file && (vma->vm_flags & VM_SHARED)) {
> > > > > +   if (flags & MS_ASYNC)
> > > > > +   vma_wrprotect(vma);
> > > > > +   if (flags & MS_SYNC) {
> > > >
> > > > "else if" ??
> > >
> > > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> > > did not use the "else-if" here. Moreover, this function itself checks
> > > that they never come together.
> > >
> >
> > I would say that them being mutually exclusive would be a reason *for*
> > using "else-if" here.
>
> This check is performed by the sys_msync() function itself in its very
> beginning.
>
> We don't need to check it later.
>

Sure, it's just that, to me, using 'else-if' makes it explicit that
the two are mutually exclusive. Using "if (...), if (...)" doesn't.
Maybe it's just me, but I feel that 'else-if' here better shows the
intention...  No big deal.

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Linus Torvalds


On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
>  
>  /*
> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> + unsigned long addr;
> +
> + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> + spinlock_t *ptl;
> + pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> + pud_t *pud = pud_offset(pgd, addr);
> + pmd_t *pmd = pmd_offset(pud, addr);
> + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, );

This is extremely expensive over bigger areas, especially sparsely mapped 
ones (it does all the lookups for all four levels over and over and over 
again for eachg page).

I think Peter Zijlstra posted a version that uses the regular kind of 
nested loop (with inline functions to keep the thing nice and clean), 
which gets rid of that.

[ The sad/funny part is that this is all how we *used* to do msync(), back 
  in the days: we're literally going back to the "pre-cleanup" logic. See 
  commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup" 
  for details ]

Quite frankly, I really think you might be better off just doing a

git revert 204ec841fbea3e5138168edbc3a76d46747cc987

and working from there! I just checked, and it still reverts cleanly, and 
you'd end up with a nice code-base that (a) has gotten years of testing 
and (b) already has the looping-over-the-pagetables code.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> 2008/1/22, Anton Salikhmetov <[EMAIL PROTECTED]>:
> > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > > Some very pedantic nitpicking below;
> > >
...
>
> By the way, if we're talking "pedantic", then:
>
> >>>
>
> debian:/tmp$ cat c.c
> void f()
> {
>for (unsigned long i = 0; i < 10; i++)
>continue;
> }
> debian:/tmp$ gcc -c -pedantic c.c
> c.c: In function 'f':
> c.c:3: error: 'for' loop initial declaration used outside C99 mode
> debian:/tmp$
>

Well, I just wrote the way I'd have writen the loop, I know it's not
the common kernel style.  Just offering my feedback/input :)


-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Anton Salikhmetov <[EMAIL PROTECTED]>:
> 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > Some very pedantic nitpicking below;
> >
> > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
> > >
> > > Update file times at write references to memory-mapped files.
> > > Force file times update at the next write reference after
> > > calling the msync() system call with the MS_ASYNC flag.
> > >
> > > Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]>
> > > ---
> > >  mm/memory.c |6 ++
> > >  mm/msync.c  |   57 
> > > -
> > >  2 files changed, 50 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 6dd1cd8..4b0144b 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1670,6 +1670,9 @@ gotten:
> > >  unlock:
> > > pte_unmap_unlock(page_table, ptl);
> > > if (dirty_page) {
> > > +   if (vma->vm_file)
> > > +   file_update_time(vma->vm_file);
> > > +
> > > /*
> > >  * Yes, Virginia, this is actually required to prevent a 
> > > race
> > >  * with clear_page_dirty_for_io() from clearing the page 
> > > dirty
> > > @@ -2343,6 +2346,9 @@ out_unlocked:
> > > if (anon)
> > > page_cache_release(vmf.page);
> > > else if (dirty_page) {
> > > +   if (vma->vm_file)
> > > +   file_update_time(vma->vm_file);
> > > +
> > > set_page_dirty_balance(dirty_page, page_mkwrite);
> > > put_page(dirty_page);
> > > }
> > > diff --git a/mm/msync.c b/mm/msync.c
> > > index a4de868..394130d 100644
> > > --- a/mm/msync.c
> > > +++ b/mm/msync.c
> > > @@ -5,6 +5,7 @@
> > >   * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]>
> > >   */
> > >
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -13,11 +14,37 @@
> > >  #include 
> > >
> > >  /*
> > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > > + * It will force a pagefault on the next write access.
> > > + */
> > > +static void vma_wrprotect(struct vm_area_struct *vma)
> > > +{
> > > +   unsigned long addr;
> > > +
> > > +   for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) 
> > > {
> >
> > I know it's not the common "Linux Kernel way", but 'addr' could be
> > made to have just 'for' scope here according to C99;
>
> I believe that the C89 style is more common for the Linux kernel, so
> I've used the out-of-scope variable declaration.
>
> >
> >for (unsigned long addr = vma->vm_start; addr < vma->vm_end;
> > addr += PAGE_SIZE) {

By the way, if we're talking "pedantic", then:

>>>

debian:/tmp$ cat c.c
void f()
{
   for (unsigned long i = 0; i < 10; i++)
   continue;
}
debian:/tmp$ gcc -c -pedantic c.c
c.c: In function 'f':
c.c:3: error: 'for' loop initial declaration used outside C99 mode
debian:/tmp$

<<<

No pun intended :)

> >
> >
> > > +   spinlock_t *ptl;
> > > +   pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > > +   pud_t *pud = pud_offset(pgd, addr);
> > > +   pmd_t *pmd = pmd_offset(pud, addr);
> > > +   pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, 
> > > );
> > > +
> > > +   if (pte_dirty(*pte) && pte_write(*pte)) {
> > > +   pte_t entry = ptep_clear_flush(vma, addr, pte);
> > > +
> > > +   entry = pte_wrprotect(entry);
> > > +   set_pte_at(vma->vm_mm, addr, pte, entry);
> > > +   }
> > > +   pte_unmap_unlock(pte, ptl);
> > > +   }
> > > +}
> > > +
> > > +/*
> > >   * MS_SYNC syncs the entire file - including mappings.
> > >   *
> > > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
> > > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
> > > - * Now it doesn't do anything, since dirty pages are properly tracked.
> >
> > I think keeping some version of the "up to ..." comments makes sense.
> > It documents that we previously had different behaviour.
>
> Earlier I had a request to remove any "changelog-style" comments from the 
> code.
>
> >
> > > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
> > > + * read-only by calling vma_wrprotect(). This is needed to catch the next
> > > + * write reference to the mapped region and update the file times
> > > + * accordingly.
> > >   *
> > >   * The application may now run fsync() to write out the dirty pages and
> > >   * wait on the writeout and check the result. Or the application may run
> > > @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, 
> > > size_t len, int flags)
> > > error = 0;
> > > start = vma->vm_end;
> > > file = vma->vm_file;
> > > -   

Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > > Some very pedantic nitpicking below;
> > >
> > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> ...
> > > > +   if (file && (vma->vm_flags & VM_SHARED)) {
> > > > +   if (flags & MS_ASYNC)
> > > > +   vma_wrprotect(vma);
> > > > +   if (flags & MS_SYNC) {
> > >
> > > "else if" ??
> >
> > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> > did not use the "else-if" here. Moreover, this function itself checks
> > that they never come together.
> >
>
> I would say that them being mutually exclusive would be a reason *for*
> using "else-if" here.

This check is performed by the sys_msync() function itself in its very
beginning.

We don't need to check it later.

>
> --
> Jesper Juhl <[EMAIL PROTECTED]>
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please  http://www.expita.com/nomime.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > Some very pedantic nitpicking below;
> >
> > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
...
> > > +   if (file && (vma->vm_flags & VM_SHARED)) {
> > > +   if (flags & MS_ASYNC)
> > > +   vma_wrprotect(vma);
> > > +   if (flags & MS_SYNC) {
> >
> > "else if" ??
>
> The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> did not use the "else-if" here. Moreover, this function itself checks
> that they never come together.
>

I would say that them being mutually exclusive would be a reason *for*
using "else-if" here.

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> Some very pedantic nitpicking below;
>
> On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
> >
> > Update file times at write references to memory-mapped files.
> > Force file times update at the next write reference after
> > calling the msync() system call with the MS_ASYNC flag.
> >
> > Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]>
> > ---
> >  mm/memory.c |6 ++
> >  mm/msync.c  |   57 
> > -
> >  2 files changed, 50 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 6dd1cd8..4b0144b 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1670,6 +1670,9 @@ gotten:
> >  unlock:
> > pte_unmap_unlock(page_table, ptl);
> > if (dirty_page) {
> > +   if (vma->vm_file)
> > +   file_update_time(vma->vm_file);
> > +
> > /*
> >  * Yes, Virginia, this is actually required to prevent a 
> > race
> >  * with clear_page_dirty_for_io() from clearing the page 
> > dirty
> > @@ -2343,6 +2346,9 @@ out_unlocked:
> > if (anon)
> > page_cache_release(vmf.page);
> > else if (dirty_page) {
> > +   if (vma->vm_file)
> > +   file_update_time(vma->vm_file);
> > +
> > set_page_dirty_balance(dirty_page, page_mkwrite);
> > put_page(dirty_page);
> > }
> > diff --git a/mm/msync.c b/mm/msync.c
> > index a4de868..394130d 100644
> > --- a/mm/msync.c
> > +++ b/mm/msync.c
> > @@ -5,6 +5,7 @@
> >   * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]>
> >   */
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -13,11 +14,37 @@
> >  #include 
> >
> >  /*
> > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > + * It will force a pagefault on the next write access.
> > + */
> > +static void vma_wrprotect(struct vm_area_struct *vma)
> > +{
> > +   unsigned long addr;
> > +
> > +   for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
>
> I know it's not the common "Linux Kernel way", but 'addr' could be
> made to have just 'for' scope here according to C99;

I believe that the C89 style is more common for the Linux kernel, so
I've used the out-of-scope variable declaration.

>
>for (unsigned long addr = vma->vm_start; addr < vma->vm_end;
> addr += PAGE_SIZE) {
>
>
> > +   spinlock_t *ptl;
> > +   pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > +   pud_t *pud = pud_offset(pgd, addr);
> > +   pmd_t *pmd = pmd_offset(pud, addr);
> > +   pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, 
> > );
> > +
> > +   if (pte_dirty(*pte) && pte_write(*pte)) {
> > +   pte_t entry = ptep_clear_flush(vma, addr, pte);
> > +
> > +   entry = pte_wrprotect(entry);
> > +   set_pte_at(vma->vm_mm, addr, pte, entry);
> > +   }
> > +   pte_unmap_unlock(pte, ptl);
> > +   }
> > +}
> > +
> > +/*
> >   * MS_SYNC syncs the entire file - including mappings.
> >   *
> > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
> > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
> > - * Now it doesn't do anything, since dirty pages are properly tracked.
>
> I think keeping some version of the "up to ..." comments makes sense.
> It documents that we previously had different behaviour.

Earlier I had a request to remove any "changelog-style" comments from the code.

>
> > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
> > + * read-only by calling vma_wrprotect(). This is needed to catch the next
> > + * write reference to the mapped region and update the file times
> > + * accordingly.
> >   *
> >   * The application may now run fsync() to write out the dirty pages and
> >   * wait on the writeout and check the result. Or the application may run
> > @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t 
> > len, int flags)
> > error = 0;
> > start = vma->vm_end;
> > file = vma->vm_file;
> > -   if (file && (vma->vm_flags & VM_SHARED) && (flags & 
> > MS_SYNC)) {
> > -   get_file(file);
> > -   up_read(>mmap_sem);
> > -   error = do_fsync(file, 0);
> > -   fput(file);
> > -   if (error || start >= end)
> > -   goto out;
> > -   down_read(>mmap_sem);
> > -   vma = find_vma(mm, start);
> > -   continue;
> > +   if (file && (vma->vm_flags & VM_SHARED)) {
> > +   if (flags & 

Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
Some very pedantic nitpicking below;

On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
>
> Update file times at write references to memory-mapped files.
> Force file times update at the next write reference after
> calling the msync() system call with the MS_ASYNC flag.
>
> Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]>
> ---
>  mm/memory.c |6 ++
>  mm/msync.c  |   57 -
>  2 files changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dd1cd8..4b0144b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1670,6 +1670,9 @@ gotten:
>  unlock:
> pte_unmap_unlock(page_table, ptl);
> if (dirty_page) {
> +   if (vma->vm_file)
> +   file_update_time(vma->vm_file);
> +
> /*
>  * Yes, Virginia, this is actually required to prevent a race
>  * with clear_page_dirty_for_io() from clearing the page dirty
> @@ -2343,6 +2346,9 @@ out_unlocked:
> if (anon)
> page_cache_release(vmf.page);
> else if (dirty_page) {
> +   if (vma->vm_file)
> +   file_update_time(vma->vm_file);
> +
> set_page_dirty_balance(dirty_page, page_mkwrite);
> put_page(dirty_page);
> }
> diff --git a/mm/msync.c b/mm/msync.c
> index a4de868..394130d 100644
> --- a/mm/msync.c
> +++ b/mm/msync.c
> @@ -5,6 +5,7 @@
>   * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]>
>   */
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -13,11 +14,37 @@
>  #include 
>
>  /*
> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> +   unsigned long addr;
> +
> +   for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {

I know it's not the common "Linux Kernel way", but 'addr' could be
made to have just 'for' scope here according to C99;

   for (unsigned long addr = vma->vm_start; addr < vma->vm_end;
addr += PAGE_SIZE) {


> +   spinlock_t *ptl;
> +   pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> +   pud_t *pud = pud_offset(pgd, addr);
> +   pmd_t *pmd = pmd_offset(pud, addr);
> +   pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, );
> +
> +   if (pte_dirty(*pte) && pte_write(*pte)) {
> +   pte_t entry = ptep_clear_flush(vma, addr, pte);
> +
> +   entry = pte_wrprotect(entry);
> +   set_pte_at(vma->vm_mm, addr, pte, entry);
> +   }
> +   pte_unmap_unlock(pte, ptl);
> +   }
> +}
> +
> +/*
>   * MS_SYNC syncs the entire file - including mappings.
>   *
> - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
> - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
> - * Now it doesn't do anything, since dirty pages are properly tracked.

I think keeping some version of the "up to ..." comments makes sense.
It documents that we previously had different behaviour.

> + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
> + * read-only by calling vma_wrprotect(). This is needed to catch the next
> + * write reference to the mapped region and update the file times
> + * accordingly.
>   *
>   * The application may now run fsync() to write out the dirty pages and
>   * wait on the writeout and check the result. Or the application may run
> @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t 
> len, int flags)
> error = 0;
> start = vma->vm_end;
> file = vma->vm_file;
> -   if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) 
> {
> -   get_file(file);
> -   up_read(>mmap_sem);
> -   error = do_fsync(file, 0);
> -   fput(file);
> -   if (error || start >= end)
> -   goto out;
> -   down_read(>mmap_sem);
> -   vma = find_vma(mm, start);
> -   continue;
> +   if (file && (vma->vm_flags & VM_SHARED)) {
> +   if (flags & MS_ASYNC)
> +   vma_wrprotect(vma);
> +   if (flags & MS_SYNC) {

"else if" ??

> +   get_file(file);
> +   up_read(>mmap_sem);
> +   error = do_fsync(file, 0);
> +   fput(file);
> +   if (error || start >= end)
> +   goto out;
> +   down_read(>mmap_sem);

[PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40

Update file times at write references to memory-mapped files.
Force file times update at the next write reference after
calling the msync() system call with the MS_ASYNC flag.

Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]>
---
 mm/memory.c |6 ++
 mm/msync.c  |   57 -
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6dd1cd8..4b0144b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1670,6 +1670,9 @@ gotten:
 unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
+   if (vma->vm_file)
+   file_update_time(vma->vm_file);
+
/*
 * Yes, Virginia, this is actually required to prevent a race
 * with clear_page_dirty_for_io() from clearing the page dirty
@@ -2343,6 +2346,9 @@ out_unlocked:
if (anon)
page_cache_release(vmf.page);
else if (dirty_page) {
+   if (vma->vm_file)
+   file_update_time(vma->vm_file);
+
set_page_dirty_balance(dirty_page, page_mkwrite);
put_page(dirty_page);
}
diff --git a/mm/msync.c b/mm/msync.c
index a4de868..394130d 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]>
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -13,11 +14,37 @@
 #include 
 
 /*
+ * Scan the PTEs for pages belonging to the VMA and mark them read-only.
+ * It will force a pagefault on the next write access.
+ */
+static void vma_wrprotect(struct vm_area_struct *vma)
+{
+   unsigned long addr;
+
+   for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
+   spinlock_t *ptl;
+   pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
+   pud_t *pud = pud_offset(pgd, addr);
+   pmd_t *pmd = pmd_offset(pud, addr);
+   pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, );
+
+   if (pte_dirty(*pte) && pte_write(*pte)) {
+   pte_t entry = ptep_clear_flush(vma, addr, pte);
+
+   entry = pte_wrprotect(entry);
+   set_pte_at(vma->vm_mm, addr, pte, entry);
+   }
+   pte_unmap_unlock(pte, ptl);
+   }
+}
+
+/*
  * MS_SYNC syncs the entire file - including mappings.
  *
- * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
- * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
- * Now it doesn't do anything, since dirty pages are properly tracked.
+ * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
+ * read-only by calling vma_wrprotect(). This is needed to catch the next
+ * write reference to the mapped region and update the file times
+ * accordingly.
  *
  * The application may now run fsync() to write out the dirty pages and
  * wait on the writeout and check the result. Or the application may run
@@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, 
int flags)
error = 0;
start = vma->vm_end;
file = vma->vm_file;
-   if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) {
-   get_file(file);
-   up_read(>mmap_sem);
-   error = do_fsync(file, 0);
-   fput(file);
-   if (error || start >= end)
-   goto out;
-   down_read(>mmap_sem);
-   vma = find_vma(mm, start);
-   continue;
+   if (file && (vma->vm_flags & VM_SHARED)) {
+   if (flags & MS_ASYNC)
+   vma_wrprotect(vma);
+   if (flags & MS_SYNC) {
+   get_file(file);
+   up_read(>mmap_sem);
+   error = do_fsync(file, 0);
+   fput(file);
+   if (error || start >= end)
+   goto out;
+   down_read(>mmap_sem);
+   vma = find_vma(mm, start);
+   continue;
+   }
}
 
vma = vma->vm_next;
-- 
1.4.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40

Update file times at write references to memory-mapped files.
Force file times update at the next write reference after
calling the msync() system call with the MS_ASYNC flag.

Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED]
---
 mm/memory.c |6 ++
 mm/msync.c  |   57 -
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6dd1cd8..4b0144b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1670,6 +1670,9 @@ gotten:
 unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
+   if (vma-vm_file)
+   file_update_time(vma-vm_file);
+
/*
 * Yes, Virginia, this is actually required to prevent a race
 * with clear_page_dirty_for_io() from clearing the page dirty
@@ -2343,6 +2346,9 @@ out_unlocked:
if (anon)
page_cache_release(vmf.page);
else if (dirty_page) {
+   if (vma-vm_file)
+   file_update_time(vma-vm_file);
+
set_page_dirty_balance(dirty_page, page_mkwrite);
put_page(dirty_page);
}
diff --git a/mm/msync.c b/mm/msync.c
index a4de868..394130d 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED]
  */
 
+#include asm/tlbflush.h
 #include linux/file.h
 #include linux/fs.h
 #include linux/mm.h
@@ -13,11 +14,37 @@
 #include linux/syscalls.h
 
 /*
+ * Scan the PTEs for pages belonging to the VMA and mark them read-only.
+ * It will force a pagefault on the next write access.
+ */
+static void vma_wrprotect(struct vm_area_struct *vma)
+{
+   unsigned long addr;
+
+   for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {
+   spinlock_t *ptl;
+   pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
+   pud_t *pud = pud_offset(pgd, addr);
+   pmd_t *pmd = pmd_offset(pud, addr);
+   pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
+
+   if (pte_dirty(*pte)  pte_write(*pte)) {
+   pte_t entry = ptep_clear_flush(vma, addr, pte);
+
+   entry = pte_wrprotect(entry);
+   set_pte_at(vma-vm_mm, addr, pte, entry);
+   }
+   pte_unmap_unlock(pte, ptl);
+   }
+}
+
+/*
  * MS_SYNC syncs the entire file - including mappings.
  *
- * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
- * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
- * Now it doesn't do anything, since dirty pages are properly tracked.
+ * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
+ * read-only by calling vma_wrprotect(). This is needed to catch the next
+ * write reference to the mapped region and update the file times
+ * accordingly.
  *
  * The application may now run fsync() to write out the dirty pages and
  * wait on the writeout and check the result. Or the application may run
@@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, 
int flags)
error = 0;
start = vma-vm_end;
file = vma-vm_file;
-   if (file  (vma-vm_flags  VM_SHARED)  (flags  MS_SYNC)) {
-   get_file(file);
-   up_read(mm-mmap_sem);
-   error = do_fsync(file, 0);
-   fput(file);
-   if (error || start = end)
-   goto out;
-   down_read(mm-mmap_sem);
-   vma = find_vma(mm, start);
-   continue;
+   if (file  (vma-vm_flags  VM_SHARED)) {
+   if (flags  MS_ASYNC)
+   vma_wrprotect(vma);
+   if (flags  MS_SYNC) {
+   get_file(file);
+   up_read(mm-mmap_sem);
+   error = do_fsync(file, 0);
+   fput(file);
+   if (error || start = end)
+   goto out;
+   down_read(mm-mmap_sem);
+   vma = find_vma(mm, start);
+   continue;
+   }
}
 
vma = vma-vm_next;
-- 
1.4.4.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
Some very pedantic nitpicking below;

On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
 http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40

 Update file times at write references to memory-mapped files.
 Force file times update at the next write reference after
 calling the msync() system call with the MS_ASYNC flag.

 Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED]
 ---
  mm/memory.c |6 ++
  mm/msync.c  |   57 -
  2 files changed, 50 insertions(+), 13 deletions(-)

 diff --git a/mm/memory.c b/mm/memory.c
 index 6dd1cd8..4b0144b 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -1670,6 +1670,9 @@ gotten:
  unlock:
 pte_unmap_unlock(page_table, ptl);
 if (dirty_page) {
 +   if (vma-vm_file)
 +   file_update_time(vma-vm_file);
 +
 /*
  * Yes, Virginia, this is actually required to prevent a race
  * with clear_page_dirty_for_io() from clearing the page dirty
 @@ -2343,6 +2346,9 @@ out_unlocked:
 if (anon)
 page_cache_release(vmf.page);
 else if (dirty_page) {
 +   if (vma-vm_file)
 +   file_update_time(vma-vm_file);
 +
 set_page_dirty_balance(dirty_page, page_mkwrite);
 put_page(dirty_page);
 }
 diff --git a/mm/msync.c b/mm/msync.c
 index a4de868..394130d 100644
 --- a/mm/msync.c
 +++ b/mm/msync.c
 @@ -5,6 +5,7 @@
   * Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED]
   */

 +#include asm/tlbflush.h
  #include linux/file.h
  #include linux/fs.h
  #include linux/mm.h
 @@ -13,11 +14,37 @@
  #include linux/syscalls.h

  /*
 + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
 + * It will force a pagefault on the next write access.
 + */
 +static void vma_wrprotect(struct vm_area_struct *vma)
 +{
 +   unsigned long addr;
 +
 +   for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {

I know it's not the common Linux Kernel way, but 'addr' could be
made to have just 'for' scope here according to C99;

   for (unsigned long addr = vma-vm_start; addr  vma-vm_end;
addr += PAGE_SIZE) {


 +   spinlock_t *ptl;
 +   pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
 +   pud_t *pud = pud_offset(pgd, addr);
 +   pmd_t *pmd = pmd_offset(pud, addr);
 +   pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
 +
 +   if (pte_dirty(*pte)  pte_write(*pte)) {
 +   pte_t entry = ptep_clear_flush(vma, addr, pte);
 +
 +   entry = pte_wrprotect(entry);
 +   set_pte_at(vma-vm_mm, addr, pte, entry);
 +   }
 +   pte_unmap_unlock(pte, ptl);
 +   }
 +}
 +
 +/*
   * MS_SYNC syncs the entire file - including mappings.
   *
 - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
 - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
 - * Now it doesn't do anything, since dirty pages are properly tracked.

I think keeping some version of the up to ... comments makes sense.
It documents that we previously had different behaviour.

 + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
 + * read-only by calling vma_wrprotect(). This is needed to catch the next
 + * write reference to the mapped region and update the file times
 + * accordingly.
   *
   * The application may now run fsync() to write out the dirty pages and
   * wait on the writeout and check the result. Or the application may run
 @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t 
 len, int flags)
 error = 0;
 start = vma-vm_end;
 file = vma-vm_file;
 -   if (file  (vma-vm_flags  VM_SHARED)  (flags  MS_SYNC)) 
 {
 -   get_file(file);
 -   up_read(mm-mmap_sem);
 -   error = do_fsync(file, 0);
 -   fput(file);
 -   if (error || start = end)
 -   goto out;
 -   down_read(mm-mmap_sem);
 -   vma = find_vma(mm, start);
 -   continue;
 +   if (file  (vma-vm_flags  VM_SHARED)) {
 +   if (flags  MS_ASYNC)
 +   vma_wrprotect(vma);
 +   if (flags  MS_SYNC) {

else if ??

 +   get_file(file);
 +   up_read(mm-mmap_sem);
 +   error = do_fsync(file, 0);
 +   fput(file);
 +   if (error || start = end)
 +   goto out;
 +   down_read(mm-mmap_sem);
 +   vma = find_vma(mm, start);
 +  

Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
 Some very pedantic nitpicking below;

 On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
  http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
 
  Update file times at write references to memory-mapped files.
  Force file times update at the next write reference after
  calling the msync() system call with the MS_ASYNC flag.
 
  Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED]
  ---
   mm/memory.c |6 ++
   mm/msync.c  |   57 
  -
   2 files changed, 50 insertions(+), 13 deletions(-)
 
  diff --git a/mm/memory.c b/mm/memory.c
  index 6dd1cd8..4b0144b 100644
  --- a/mm/memory.c
  +++ b/mm/memory.c
  @@ -1670,6 +1670,9 @@ gotten:
   unlock:
  pte_unmap_unlock(page_table, ptl);
  if (dirty_page) {
  +   if (vma-vm_file)
  +   file_update_time(vma-vm_file);
  +
  /*
   * Yes, Virginia, this is actually required to prevent a 
  race
   * with clear_page_dirty_for_io() from clearing the page 
  dirty
  @@ -2343,6 +2346,9 @@ out_unlocked:
  if (anon)
  page_cache_release(vmf.page);
  else if (dirty_page) {
  +   if (vma-vm_file)
  +   file_update_time(vma-vm_file);
  +
  set_page_dirty_balance(dirty_page, page_mkwrite);
  put_page(dirty_page);
  }
  diff --git a/mm/msync.c b/mm/msync.c
  index a4de868..394130d 100644
  --- a/mm/msync.c
  +++ b/mm/msync.c
  @@ -5,6 +5,7 @@
* Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED]
*/
 
  +#include asm/tlbflush.h
   #include linux/file.h
   #include linux/fs.h
   #include linux/mm.h
  @@ -13,11 +14,37 @@
   #include linux/syscalls.h
 
   /*
  + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
  + * It will force a pagefault on the next write access.
  + */
  +static void vma_wrprotect(struct vm_area_struct *vma)
  +{
  +   unsigned long addr;
  +
  +   for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {

 I know it's not the common Linux Kernel way, but 'addr' could be
 made to have just 'for' scope here according to C99;

I believe that the C89 style is more common for the Linux kernel, so
I've used the out-of-scope variable declaration.


for (unsigned long addr = vma-vm_start; addr  vma-vm_end;
 addr += PAGE_SIZE) {


  +   spinlock_t *ptl;
  +   pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
  +   pud_t *pud = pud_offset(pgd, addr);
  +   pmd_t *pmd = pmd_offset(pud, addr);
  +   pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, 
  ptl);
  +
  +   if (pte_dirty(*pte)  pte_write(*pte)) {
  +   pte_t entry = ptep_clear_flush(vma, addr, pte);
  +
  +   entry = pte_wrprotect(entry);
  +   set_pte_at(vma-vm_mm, addr, pte, entry);
  +   }
  +   pte_unmap_unlock(pte, ptl);
  +   }
  +}
  +
  +/*
* MS_SYNC syncs the entire file - including mappings.
*
  - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
  - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
  - * Now it doesn't do anything, since dirty pages are properly tracked.

 I think keeping some version of the up to ... comments makes sense.
 It documents that we previously had different behaviour.

Earlier I had a request to remove any changelog-style comments from the code.


  + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
  + * read-only by calling vma_wrprotect(). This is needed to catch the next
  + * write reference to the mapped region and update the file times
  + * accordingly.
*
* The application may now run fsync() to write out the dirty pages and
* wait on the writeout and check the result. Or the application may run
  @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t 
  len, int flags)
  error = 0;
  start = vma-vm_end;
  file = vma-vm_file;
  -   if (file  (vma-vm_flags  VM_SHARED)  (flags  
  MS_SYNC)) {
  -   get_file(file);
  -   up_read(mm-mmap_sem);
  -   error = do_fsync(file, 0);
  -   fput(file);
  -   if (error || start = end)
  -   goto out;
  -   down_read(mm-mmap_sem);
  -   vma = find_vma(mm, start);
  -   continue;
  +   if (file  (vma-vm_flags  VM_SHARED)) {
  +   if (flags  MS_ASYNC)
  +   vma_wrprotect(vma);
  +   if (flags  MS_SYNC) {

 else if ??

The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
did not use the 

Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
 2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
  Some very pedantic nitpicking below;
 
  On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
...
   +   if (file  (vma-vm_flags  VM_SHARED)) {
   +   if (flags  MS_ASYNC)
   +   vma_wrprotect(vma);
   +   if (flags  MS_SYNC) {
 
  else if ??

 The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
 did not use the else-if here. Moreover, this function itself checks
 that they never come together.


I would say that them being mutually exclusive would be a reason *for*
using else-if here.

-- 
Jesper Juhl [EMAIL PROTECTED]
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
 On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
  2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
   Some very pedantic nitpicking below;
  
   On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
 ...
+   if (file  (vma-vm_flags  VM_SHARED)) {
+   if (flags  MS_ASYNC)
+   vma_wrprotect(vma);
+   if (flags  MS_SYNC) {
  
   else if ??
 
  The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
  did not use the else-if here. Moreover, this function itself checks
  that they never come together.
 

 I would say that them being mutually exclusive would be a reason *for*
 using else-if here.

This check is performed by the sys_msync() function itself in its very
beginning.

We don't need to check it later.


 --
 Jesper Juhl [EMAIL PROTECTED]
 Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
 Plain text mails only, please  http://www.expita.com/nomime.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Anton Salikhmetov [EMAIL PROTECTED]:
 2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
  Some very pedantic nitpicking below;
 
  On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
   http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
  
   Update file times at write references to memory-mapped files.
   Force file times update at the next write reference after
   calling the msync() system call with the MS_ASYNC flag.
  
   Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED]
   ---
mm/memory.c |6 ++
mm/msync.c  |   57 
   -
2 files changed, 50 insertions(+), 13 deletions(-)
  
   diff --git a/mm/memory.c b/mm/memory.c
   index 6dd1cd8..4b0144b 100644
   --- a/mm/memory.c
   +++ b/mm/memory.c
   @@ -1670,6 +1670,9 @@ gotten:
unlock:
   pte_unmap_unlock(page_table, ptl);
   if (dirty_page) {
   +   if (vma-vm_file)
   +   file_update_time(vma-vm_file);
   +
   /*
* Yes, Virginia, this is actually required to prevent a 
   race
* with clear_page_dirty_for_io() from clearing the page 
   dirty
   @@ -2343,6 +2346,9 @@ out_unlocked:
   if (anon)
   page_cache_release(vmf.page);
   else if (dirty_page) {
   +   if (vma-vm_file)
   +   file_update_time(vma-vm_file);
   +
   set_page_dirty_balance(dirty_page, page_mkwrite);
   put_page(dirty_page);
   }
   diff --git a/mm/msync.c b/mm/msync.c
   index a4de868..394130d 100644
   --- a/mm/msync.c
   +++ b/mm/msync.c
   @@ -5,6 +5,7 @@
 * Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED]
 */
  
   +#include asm/tlbflush.h
#include linux/file.h
#include linux/fs.h
#include linux/mm.h
   @@ -13,11 +14,37 @@
#include linux/syscalls.h
  
/*
   + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
   + * It will force a pagefault on the next write access.
   + */
   +static void vma_wrprotect(struct vm_area_struct *vma)
   +{
   +   unsigned long addr;
   +
   +   for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) 
   {
 
  I know it's not the common Linux Kernel way, but 'addr' could be
  made to have just 'for' scope here according to C99;

 I believe that the C89 style is more common for the Linux kernel, so
 I've used the out-of-scope variable declaration.

 
 for (unsigned long addr = vma-vm_start; addr  vma-vm_end;
  addr += PAGE_SIZE) {

By the way, if we're talking pedantic, then:



debian:/tmp$ cat c.c
void f()
{
   for (unsigned long i = 0; i  10; i++)
   continue;
}
debian:/tmp$ gcc -c -pedantic c.c
c.c: In function 'f':
c.c:3: error: 'for' loop initial declaration used outside C99 mode
debian:/tmp$



No pun intended :)

 
 
   +   spinlock_t *ptl;
   +   pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
   +   pud_t *pud = pud_offset(pgd, addr);
   +   pmd_t *pmd = pmd_offset(pud, addr);
   +   pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, 
   ptl);
   +
   +   if (pte_dirty(*pte)  pte_write(*pte)) {
   +   pte_t entry = ptep_clear_flush(vma, addr, pte);
   +
   +   entry = pte_wrprotect(entry);
   +   set_pte_at(vma-vm_mm, addr, pte, entry);
   +   }
   +   pte_unmap_unlock(pte, ptl);
   +   }
   +}
   +
   +/*
 * MS_SYNC syncs the entire file - including mappings.
 *
   - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
   - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
   - * Now it doesn't do anything, since dirty pages are properly tracked.
 
  I think keeping some version of the up to ... comments makes sense.
  It documents that we previously had different behaviour.

 Earlier I had a request to remove any changelog-style comments from the 
 code.

 
   + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
   + * read-only by calling vma_wrprotect(). This is needed to catch the next
   + * write reference to the mapped region and update the file times
   + * accordingly.
 *
 * The application may now run fsync() to write out the dirty pages and
 * wait on the writeout and check the result. Or the application may run
   @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, 
   size_t len, int flags)
   error = 0;
   start = vma-vm_end;
   file = vma-vm_file;
   -   if (file  (vma-vm_flags  VM_SHARED)  (flags  
   MS_SYNC)) {
   -   get_file(file);
   -   up_read(mm-mmap_sem);
   -   error = do_fsync(file, 0);
   -   fput(file);
   -   if (error || start = end)
   - 

Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
 2008/1/22, Anton Salikhmetov [EMAIL PROTECTED]:
  2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
   Some very pedantic nitpicking below;
  
...

 By the way, if we're talking pedantic, then:

 

 debian:/tmp$ cat c.c
 void f()
 {
for (unsigned long i = 0; i  10; i++)
continue;
 }
 debian:/tmp$ gcc -c -pedantic c.c
 c.c: In function 'f':
 c.c:3: error: 'for' loop initial declaration used outside C99 mode
 debian:/tmp$


Well, I just wrote the way I'd have writen the loop, I know it's not
the common kernel style.  Just offering my feedback/input :)


-- 
Jesper Juhl [EMAIL PROTECTED]
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Linus Torvalds


On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
  
  /*
 + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
 + * It will force a pagefault on the next write access.
 + */
 +static void vma_wrprotect(struct vm_area_struct *vma)
 +{
 + unsigned long addr;
 +
 + for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {
 + spinlock_t *ptl;
 + pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
 + pud_t *pud = pud_offset(pgd, addr);
 + pmd_t *pmd = pmd_offset(pud, addr);
 + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);

This is extremely expensive over bigger areas, especially sparsely mapped 
ones (it does all the lookups for all four levels over and over and over 
again for eachg page).

I think Peter Zijlstra posted a version that uses the regular kind of 
nested loop (with inline functions to keep the thing nice and clean), 
which gets rid of that.

[ The sad/funny part is that this is all how we *used* to do msync(), back 
  in the days: we're literally going back to the pre-cleanup logic. See 
  commit 204ec841fbea3e5138168edbc3a76d46747cc987: mm: msync() cleanup 
  for details ]

Quite frankly, I really think you might be better off just doing a

git revert 204ec841fbea3e5138168edbc3a76d46747cc987

and working from there! I just checked, and it still reverts cleanly, and 
you'd end up with a nice code-base that (a) has gotten years of testing 
and (b) already has the looping-over-the-pagetables code.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
 2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
  On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
   2008/1/22, Jesper Juhl [EMAIL PROTECTED]:
Some very pedantic nitpicking below;
   
On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote:
  ...
 +   if (file  (vma-vm_flags  VM_SHARED)) {
 +   if (flags  MS_ASYNC)
 +   vma_wrprotect(vma);
 +   if (flags  MS_SYNC) {
   
else if ??
  
   The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
   did not use the else-if here. Moreover, this function itself checks
   that they never come together.
  
 
  I would say that them being mutually exclusive would be a reason *for*
  using else-if here.

 This check is performed by the sys_msync() function itself in its very
 beginning.

 We don't need to check it later.


Sure, it's just that, to me, using 'else-if' makes it explicit that
the two are mutually exclusive. Using if (...), if (...) doesn't.
Maybe it's just me, but I feel that 'else-if' here better shows the
intention...  No big deal.

-- 
Jesper Juhl [EMAIL PROTECTED]
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Linus Torvalds [EMAIL PROTECTED]:


 On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
 
   /*
  + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
  + * It will force a pagefault on the next write access.
  + */
  +static void vma_wrprotect(struct vm_area_struct *vma)
  +{
  + unsigned long addr;
  +
  + for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {
  + spinlock_t *ptl;
  + pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
  + pud_t *pud = pud_offset(pgd, addr);
  + pmd_t *pmd = pmd_offset(pud, addr);
  + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);

 This is extremely expensive over bigger areas, especially sparsely mapped
 ones (it does all the lookups for all four levels over and over and over
 again for eachg page).

 I think Peter Zijlstra posted a version that uses the regular kind of
 nested loop (with inline functions to keep the thing nice and clean),
 which gets rid of that.

Thanks for your feedback, Linus!

I will use Peter Zijlstra's version of such an operation in my next
patch series.


 [ The sad/funny part is that this is all how we *used* to do msync(), back
   in the days: we're literally going back to the pre-cleanup logic. See
   commit 204ec841fbea3e5138168edbc3a76d46747cc987: mm: msync() cleanup
   for details ]

 Quite frankly, I really think you might be better off just doing a

 git revert 204ec841fbea3e5138168edbc3a76d46747cc987

 and working from there! I just checked, and it still reverts cleanly, and
 you'd end up with a nice code-base that (a) has gotten years of testing
 and (b) already has the looping-over-the-pagetables code.

 Linus

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/