Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
> > > > > > /* > > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > > > + * It will force a pagefault on the next write access. > > > + */ > > > +static void vma_wrprotect(struct vm_area_struct *vma) > > > +{ > > > + unsigned long addr; > > > + > > > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > > > + spinlock_t *ptl; > > > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > > > + pud_t *pud = pud_offset(pgd, addr); > > > + pmd_t *pmd = pmd_offset(pud, addr); > > > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, > > > ); > > > > This is extremely expensive over bigger areas, especially sparsely mapped > > ones (it does all the lookups for all four levels over and over and over > > again for eachg page). > > > > I think Peter Zijlstra posted a version that uses the regular kind of > > nested loop (with inline functions to keep the thing nice and clean), > > which gets rid of that. > > Thanks for your feedback, Linus! > > I will use Peter Zijlstra's version of such an operation in my next > patch series. But note, that those functions iterate over all the vmas for the given page range, not just the one msync was performed on. This might get even more expensive, if the file is mapped lots of times. The old version, that Linus was referring to, needs some modification as well, because it doesn't write protect the ptes, just marks them clean. Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
/* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma-vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); This is extremely expensive over bigger areas, especially sparsely mapped ones (it does all the lookups for all four levels over and over and over again for eachg page). I think Peter Zijlstra posted a version that uses the regular kind of nested loop (with inline functions to keep the thing nice and clean), which gets rid of that. Thanks for your feedback, Linus! I will use Peter Zijlstra's version of such an operation in my next patch series. But note, that those functions iterate over all the vmas for the given page range, not just the one msync was performed on. This might get even more expensive, if the file is mapped lots of times. The old version, that Linus was referring to, needs some modification as well, because it doesn't write protect the ptes, just marks them clean. Miklos -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
Anton Salikhmetov <[EMAIL PROTECTED]> writes: You should probably put your design document somewhere in Documentation with a patch. > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > + * It will force a pagefault on the next write access. > + */ > +static void vma_wrprotect(struct vm_area_struct *vma) > +{ > + unsigned long addr; > + > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > + spinlock_t *ptl; > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > + pud_t *pud = pud_offset(pgd, addr); > + pmd_t *pmd = pmd_offset(pud, addr); > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ); This means on i386 with highmem ptes you will map/flush tlb/unmap each PTE individually. You will do 512 times as much work as really needed per PTE leaf page. The performance critical address space walkers use a different design pattern that avoids this. > + if (pte_dirty(*pte) && pte_write(*pte)) { > + pte_t entry = ptep_clear_flush(vma, addr, pte); Flushing TLBs unbatched can also be very expensive because if the MM is shared by several CPUs you'll have a inter-processor interrupt for each iteration. They are quite costly even on smaller systems. It would be better if you did a single flush_tlb_range() at the end. This means on x86 this will currently always do a full flush, but that's still better than really slowing down in the heavily multithreaded case. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Linus Torvalds <[EMAIL PROTECTED]>: > > > On Tue, 22 Jan 2008, Anton Salikhmetov wrote: > > > > /* > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > > + * It will force a pagefault on the next write access. > > + */ > > +static void vma_wrprotect(struct vm_area_struct *vma) > > +{ > > + unsigned long addr; > > + > > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > > + spinlock_t *ptl; > > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > > + pud_t *pud = pud_offset(pgd, addr); > > + pmd_t *pmd = pmd_offset(pud, addr); > > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ); > > This is extremely expensive over bigger areas, especially sparsely mapped > ones (it does all the lookups for all four levels over and over and over > again for eachg page). > > I think Peter Zijlstra posted a version that uses the regular kind of > nested loop (with inline functions to keep the thing nice and clean), > which gets rid of that. Thanks for your feedback, Linus! I will use Peter Zijlstra's version of such an operation in my next patch series. > > [ The sad/funny part is that this is all how we *used* to do msync(), back > in the days: we're literally going back to the "pre-cleanup" logic. See > commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup" > for details ] > > Quite frankly, I really think you might be better off just doing a > > git revert 204ec841fbea3e5138168edbc3a76d46747cc987 > > and working from there! I just checked, and it still reverts cleanly, and > you'd end up with a nice code-base that (a) has gotten years of testing > and (b) already has the looping-over-the-pagetables code. > > Linus > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > > > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > > > Some very pedantic nitpicking below; > > > > > > > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > > ... > > > > > + if (file && (vma->vm_flags & VM_SHARED)) { > > > > > + if (flags & MS_ASYNC) > > > > > + vma_wrprotect(vma); > > > > > + if (flags & MS_SYNC) { > > > > > > > > "else if" ?? > > > > > > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I > > > did not use the "else-if" here. Moreover, this function itself checks > > > that they never come together. > > > > > > > I would say that them being mutually exclusive would be a reason *for* > > using "else-if" here. > > This check is performed by the sys_msync() function itself in its very > beginning. > > We don't need to check it later. > Sure, it's just that, to me, using 'else-if' makes it explicit that the two are mutually exclusive. Using "if (...), if (...)" doesn't. Maybe it's just me, but I feel that 'else-if' here better shows the intention... No big deal. -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On Tue, 22 Jan 2008, Anton Salikhmetov wrote: > > /* > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > + * It will force a pagefault on the next write access. > + */ > +static void vma_wrprotect(struct vm_area_struct *vma) > +{ > + unsigned long addr; > + > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > + spinlock_t *ptl; > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > + pud_t *pud = pud_offset(pgd, addr); > + pmd_t *pmd = pmd_offset(pud, addr); > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ); This is extremely expensive over bigger areas, especially sparsely mapped ones (it does all the lookups for all four levels over and over and over again for eachg page). I think Peter Zijlstra posted a version that uses the regular kind of nested loop (with inline functions to keep the thing nice and clean), which gets rid of that. [ The sad/funny part is that this is all how we *used* to do msync(), back in the days: we're literally going back to the "pre-cleanup" logic. See commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup" for details ] Quite frankly, I really think you might be better off just doing a git revert 204ec841fbea3e5138168edbc3a76d46747cc987 and working from there! I just checked, and it still reverts cleanly, and you'd end up with a nice code-base that (a) has gotten years of testing and (b) already has the looping-over-the-pagetables code. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > 2008/1/22, Anton Salikhmetov <[EMAIL PROTECTED]>: > > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > > Some very pedantic nitpicking below; > > > ... > > By the way, if we're talking "pedantic", then: > > >>> > > debian:/tmp$ cat c.c > void f() > { >for (unsigned long i = 0; i < 10; i++) >continue; > } > debian:/tmp$ gcc -c -pedantic c.c > c.c: In function 'f': > c.c:3: error: 'for' loop initial declaration used outside C99 mode > debian:/tmp$ > Well, I just wrote the way I'd have writen the loop, I know it's not the common kernel style. Just offering my feedback/input :) -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Anton Salikhmetov <[EMAIL PROTECTED]>: > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > Some very pedantic nitpicking below; > > > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 > > > > > > Update file times at write references to memory-mapped files. > > > Force file times update at the next write reference after > > > calling the msync() system call with the MS_ASYNC flag. > > > > > > Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]> > > > --- > > > mm/memory.c |6 ++ > > > mm/msync.c | 57 > > > - > > > 2 files changed, 50 insertions(+), 13 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 6dd1cd8..4b0144b 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -1670,6 +1670,9 @@ gotten: > > > unlock: > > > pte_unmap_unlock(page_table, ptl); > > > if (dirty_page) { > > > + if (vma->vm_file) > > > + file_update_time(vma->vm_file); > > > + > > > /* > > > * Yes, Virginia, this is actually required to prevent a > > > race > > > * with clear_page_dirty_for_io() from clearing the page > > > dirty > > > @@ -2343,6 +2346,9 @@ out_unlocked: > > > if (anon) > > > page_cache_release(vmf.page); > > > else if (dirty_page) { > > > + if (vma->vm_file) > > > + file_update_time(vma->vm_file); > > > + > > > set_page_dirty_balance(dirty_page, page_mkwrite); > > > put_page(dirty_page); > > > } > > > diff --git a/mm/msync.c b/mm/msync.c > > > index a4de868..394130d 100644 > > > --- a/mm/msync.c > > > +++ b/mm/msync.c > > > @@ -5,6 +5,7 @@ > > > * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]> > > > */ > > > > > > +#include > > > #include > > > #include > > > #include > > > @@ -13,11 +14,37 @@ > > > #include > > > > > > /* > > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > > > + * It will force a pagefault on the next write access. > > > + */ > > > +static void vma_wrprotect(struct vm_area_struct *vma) > > > +{ > > > + unsigned long addr; > > > + > > > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) > > > { > > > > I know it's not the common "Linux Kernel way", but 'addr' could be > > made to have just 'for' scope here according to C99; > > I believe that the C89 style is more common for the Linux kernel, so > I've used the out-of-scope variable declaration. > > > > >for (unsigned long addr = vma->vm_start; addr < vma->vm_end; > > addr += PAGE_SIZE) { By the way, if we're talking "pedantic", then: >>> debian:/tmp$ cat c.c void f() { for (unsigned long i = 0; i < 10; i++) continue; } debian:/tmp$ gcc -c -pedantic c.c c.c: In function 'f': c.c:3: error: 'for' loop initial declaration used outside C99 mode debian:/tmp$ <<< No pun intended :) > > > > > > > + spinlock_t *ptl; > > > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > > > + pud_t *pud = pud_offset(pgd, addr); > > > + pmd_t *pmd = pmd_offset(pud, addr); > > > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, > > > ); > > > + > > > + if (pte_dirty(*pte) && pte_write(*pte)) { > > > + pte_t entry = ptep_clear_flush(vma, addr, pte); > > > + > > > + entry = pte_wrprotect(entry); > > > + set_pte_at(vma->vm_mm, addr, pte, entry); > > > + } > > > + pte_unmap_unlock(pte, ptl); > > > + } > > > +} > > > + > > > +/* > > > * MS_SYNC syncs the entire file - including mappings. > > > * > > > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > > > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > > > - * Now it doesn't do anything, since dirty pages are properly tracked. > > > > I think keeping some version of the "up to ..." comments makes sense. > > It documents that we previously had different behaviour. > > Earlier I had a request to remove any "changelog-style" comments from the > code. > > > > > > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages > > > + * read-only by calling vma_wrprotect(). This is needed to catch the next > > > + * write reference to the mapped region and update the file times > > > + * accordingly. > > > * > > > * The application may now run fsync() to write out the dirty pages and > > > * wait on the writeout and check the result. Or the application may run > > > @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, > > > size_t len, int flags) > > > error = 0; > > > start = vma->vm_end; > > > file = vma->vm_file; > > > -
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > > Some very pedantic nitpicking below; > > > > > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > ... > > > > + if (file && (vma->vm_flags & VM_SHARED)) { > > > > + if (flags & MS_ASYNC) > > > > + vma_wrprotect(vma); > > > > + if (flags & MS_SYNC) { > > > > > > "else if" ?? > > > > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I > > did not use the "else-if" here. Moreover, this function itself checks > > that they never come together. > > > > I would say that them being mutually exclusive would be a reason *for* > using "else-if" here. This check is performed by the sys_msync() function itself in its very beginning. We don't need to check it later. > > -- > Jesper Juhl <[EMAIL PROTECTED]> > Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html > Plain text mails only, please http://www.expita.com/nomime.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > Some very pedantic nitpicking below; > > > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: ... > > > + if (file && (vma->vm_flags & VM_SHARED)) { > > > + if (flags & MS_ASYNC) > > > + vma_wrprotect(vma); > > > + if (flags & MS_SYNC) { > > > > "else if" ?? > > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I > did not use the "else-if" here. Moreover, this function itself checks > that they never come together. > I would say that them being mutually exclusive would be a reason *for* using "else-if" here. -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > Some very pedantic nitpicking below; > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 > > > > Update file times at write references to memory-mapped files. > > Force file times update at the next write reference after > > calling the msync() system call with the MS_ASYNC flag. > > > > Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]> > > --- > > mm/memory.c |6 ++ > > mm/msync.c | 57 > > - > > 2 files changed, 50 insertions(+), 13 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 6dd1cd8..4b0144b 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1670,6 +1670,9 @@ gotten: > > unlock: > > pte_unmap_unlock(page_table, ptl); > > if (dirty_page) { > > + if (vma->vm_file) > > + file_update_time(vma->vm_file); > > + > > /* > > * Yes, Virginia, this is actually required to prevent a > > race > > * with clear_page_dirty_for_io() from clearing the page > > dirty > > @@ -2343,6 +2346,9 @@ out_unlocked: > > if (anon) > > page_cache_release(vmf.page); > > else if (dirty_page) { > > + if (vma->vm_file) > > + file_update_time(vma->vm_file); > > + > > set_page_dirty_balance(dirty_page, page_mkwrite); > > put_page(dirty_page); > > } > > diff --git a/mm/msync.c b/mm/msync.c > > index a4de868..394130d 100644 > > --- a/mm/msync.c > > +++ b/mm/msync.c > > @@ -5,6 +5,7 @@ > > * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]> > > */ > > > > +#include > > #include > > #include > > #include > > @@ -13,11 +14,37 @@ > > #include > > > > /* > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > > + * It will force a pagefault on the next write access. > > + */ > > +static void vma_wrprotect(struct vm_area_struct *vma) > > +{ > > + unsigned long addr; > > + > > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > > I know it's not the common "Linux Kernel way", but 'addr' could be > made to have just 'for' scope here according to C99; I believe that the C89 style is more common for the Linux kernel, so I've used the out-of-scope variable declaration. > >for (unsigned long addr = vma->vm_start; addr < vma->vm_end; > addr += PAGE_SIZE) { > > > > + spinlock_t *ptl; > > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > > + pud_t *pud = pud_offset(pgd, addr); > > + pmd_t *pmd = pmd_offset(pud, addr); > > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, > > ); > > + > > + if (pte_dirty(*pte) && pte_write(*pte)) { > > + pte_t entry = ptep_clear_flush(vma, addr, pte); > > + > > + entry = pte_wrprotect(entry); > > + set_pte_at(vma->vm_mm, addr, pte, entry); > > + } > > + pte_unmap_unlock(pte, ptl); > > + } > > +} > > + > > +/* > > * MS_SYNC syncs the entire file - including mappings. > > * > > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > > - * Now it doesn't do anything, since dirty pages are properly tracked. > > I think keeping some version of the "up to ..." comments makes sense. > It documents that we previously had different behaviour. Earlier I had a request to remove any "changelog-style" comments from the code. > > > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages > > + * read-only by calling vma_wrprotect(). This is needed to catch the next > > + * write reference to the mapped region and update the file times > > + * accordingly. > > * > > * The application may now run fsync() to write out the dirty pages and > > * wait on the writeout and check the result. Or the application may run > > @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t > > len, int flags) > > error = 0; > > start = vma->vm_end; > > file = vma->vm_file; > > - if (file && (vma->vm_flags & VM_SHARED) && (flags & > > MS_SYNC)) { > > - get_file(file); > > - up_read(>mmap_sem); > > - error = do_fsync(file, 0); > > - fput(file); > > - if (error || start >= end) > > - goto out; > > - down_read(>mmap_sem); > > - vma = find_vma(mm, start); > > - continue; > > + if (file && (vma->vm_flags & VM_SHARED)) { > > + if (flags &
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
Some very pedantic nitpicking below; On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 > > Update file times at write references to memory-mapped files. > Force file times update at the next write reference after > calling the msync() system call with the MS_ASYNC flag. > > Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]> > --- > mm/memory.c |6 ++ > mm/msync.c | 57 - > 2 files changed, 50 insertions(+), 13 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 6dd1cd8..4b0144b 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1670,6 +1670,9 @@ gotten: > unlock: > pte_unmap_unlock(page_table, ptl); > if (dirty_page) { > + if (vma->vm_file) > + file_update_time(vma->vm_file); > + > /* > * Yes, Virginia, this is actually required to prevent a race > * with clear_page_dirty_for_io() from clearing the page dirty > @@ -2343,6 +2346,9 @@ out_unlocked: > if (anon) > page_cache_release(vmf.page); > else if (dirty_page) { > + if (vma->vm_file) > + file_update_time(vma->vm_file); > + > set_page_dirty_balance(dirty_page, page_mkwrite); > put_page(dirty_page); > } > diff --git a/mm/msync.c b/mm/msync.c > index a4de868..394130d 100644 > --- a/mm/msync.c > +++ b/mm/msync.c > @@ -5,6 +5,7 @@ > * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]> > */ > > +#include > #include > #include > #include > @@ -13,11 +14,37 @@ > #include > > /* > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > + * It will force a pagefault on the next write access. > + */ > +static void vma_wrprotect(struct vm_area_struct *vma) > +{ > + unsigned long addr; > + > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { I know it's not the common "Linux Kernel way", but 'addr' could be made to have just 'for' scope here according to C99; for (unsigned long addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > + spinlock_t *ptl; > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > + pud_t *pud = pud_offset(pgd, addr); > + pmd_t *pmd = pmd_offset(pud, addr); > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ); > + > + if (pte_dirty(*pte) && pte_write(*pte)) { > + pte_t entry = ptep_clear_flush(vma, addr, pte); > + > + entry = pte_wrprotect(entry); > + set_pte_at(vma->vm_mm, addr, pte, entry); > + } > + pte_unmap_unlock(pte, ptl); > + } > +} > + > +/* > * MS_SYNC syncs the entire file - including mappings. > * > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > - * Now it doesn't do anything, since dirty pages are properly tracked. I think keeping some version of the "up to ..." comments makes sense. It documents that we previously had different behaviour. > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages > + * read-only by calling vma_wrprotect(). This is needed to catch the next > + * write reference to the mapped region and update the file times > + * accordingly. > * > * The application may now run fsync() to write out the dirty pages and > * wait on the writeout and check the result. Or the application may run > @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t > len, int flags) > error = 0; > start = vma->vm_end; > file = vma->vm_file; > - if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) > { > - get_file(file); > - up_read(>mmap_sem); > - error = do_fsync(file, 0); > - fput(file); > - if (error || start >= end) > - goto out; > - down_read(>mmap_sem); > - vma = find_vma(mm, start); > - continue; > + if (file && (vma->vm_flags & VM_SHARED)) { > + if (flags & MS_ASYNC) > + vma_wrprotect(vma); > + if (flags & MS_SYNC) { "else if" ?? > + get_file(file); > + up_read(>mmap_sem); > + error = do_fsync(file, 0); > + fput(file); > + if (error || start >= end) > + goto out; > + down_read(>mmap_sem);
[PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 Update file times at write references to memory-mapped files. Force file times update at the next write reference after calling the msync() system call with the MS_ASYNC flag. Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]> --- mm/memory.c |6 ++ mm/msync.c | 57 - 2 files changed, 50 insertions(+), 13 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6dd1cd8..4b0144b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1670,6 +1670,9 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma->vm_file) + file_update_time(vma->vm_file); + /* * Yes, Virginia, this is actually required to prevent a race * with clear_page_dirty_for_io() from clearing the page dirty @@ -2343,6 +2346,9 @@ out_unlocked: if (anon) page_cache_release(vmf.page); else if (dirty_page) { + if (vma->vm_file) + file_update_time(vma->vm_file); + set_page_dirty_balance(dirty_page, page_mkwrite); put_page(dirty_page); } diff --git a/mm/msync.c b/mm/msync.c index a4de868..394130d 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -5,6 +5,7 @@ * Copyright (C) 2008 Anton Salikhmetov <[EMAIL PROTECTED]> */ +#include #include #include #include @@ -13,11 +14,37 @@ #include /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ); + + if (pte_dirty(*pte) && pte_write(*pte)) { + pte_t entry = ptep_clear_flush(vma, addr, pte); + + entry = pte_wrprotect(entry); + set_pte_at(vma->vm_mm, addr, pte, entry); + } + pte_unmap_unlock(pte, ptl); + } +} + +/* * MS_SYNC syncs the entire file - including mappings. * - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). - * Now it doesn't do anything, since dirty pages are properly tracked. + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages + * read-only by calling vma_wrprotect(). This is needed to catch the next + * write reference to the mapped region and update the file times + * accordingly. * * The application may now run fsync() to write out the dirty pages and * wait on the writeout and check the result. Or the application may run @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) error = 0; start = vma->vm_end; file = vma->vm_file; - if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) { - get_file(file); - up_read(>mmap_sem); - error = do_fsync(file, 0); - fput(file); - if (error || start >= end) - goto out; - down_read(>mmap_sem); - vma = find_vma(mm, start); - continue; + if (file && (vma->vm_flags & VM_SHARED)) { + if (flags & MS_ASYNC) + vma_wrprotect(vma); + if (flags & MS_SYNC) { + get_file(file); + up_read(>mmap_sem); + error = do_fsync(file, 0); + fput(file); + if (error || start >= end) + goto out; + down_read(>mmap_sem); + vma = find_vma(mm, start); + continue; + } } vma = vma->vm_next; -- 1.4.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 Update file times at write references to memory-mapped files. Force file times update at the next write reference after calling the msync() system call with the MS_ASYNC flag. Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED] --- mm/memory.c |6 ++ mm/msync.c | 57 - 2 files changed, 50 insertions(+), 13 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6dd1cd8..4b0144b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1670,6 +1670,9 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + /* * Yes, Virginia, this is actually required to prevent a race * with clear_page_dirty_for_io() from clearing the page dirty @@ -2343,6 +2346,9 @@ out_unlocked: if (anon) page_cache_release(vmf.page); else if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + set_page_dirty_balance(dirty_page, page_mkwrite); put_page(dirty_page); } diff --git a/mm/msync.c b/mm/msync.c index a4de868..394130d 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -5,6 +5,7 @@ * Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED] */ +#include asm/tlbflush.h #include linux/file.h #include linux/fs.h #include linux/mm.h @@ -13,11 +14,37 @@ #include linux/syscalls.h /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma-vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); + + if (pte_dirty(*pte) pte_write(*pte)) { + pte_t entry = ptep_clear_flush(vma, addr, pte); + + entry = pte_wrprotect(entry); + set_pte_at(vma-vm_mm, addr, pte, entry); + } + pte_unmap_unlock(pte, ptl); + } +} + +/* * MS_SYNC syncs the entire file - including mappings. * - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). - * Now it doesn't do anything, since dirty pages are properly tracked. + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages + * read-only by calling vma_wrprotect(). This is needed to catch the next + * write reference to the mapped region and update the file times + * accordingly. * * The application may now run fsync() to write out the dirty pages and * wait on the writeout and check the result. Or the application may run @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) error = 0; start = vma-vm_end; file = vma-vm_file; - if (file (vma-vm_flags VM_SHARED) (flags MS_SYNC)) { - get_file(file); - up_read(mm-mmap_sem); - error = do_fsync(file, 0); - fput(file); - if (error || start = end) - goto out; - down_read(mm-mmap_sem); - vma = find_vma(mm, start); - continue; + if (file (vma-vm_flags VM_SHARED)) { + if (flags MS_ASYNC) + vma_wrprotect(vma); + if (flags MS_SYNC) { + get_file(file); + up_read(mm-mmap_sem); + error = do_fsync(file, 0); + fput(file); + if (error || start = end) + goto out; + down_read(mm-mmap_sem); + vma = find_vma(mm, start); + continue; + } } vma = vma-vm_next; -- 1.4.4.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
Some very pedantic nitpicking below; On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 Update file times at write references to memory-mapped files. Force file times update at the next write reference after calling the msync() system call with the MS_ASYNC flag. Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED] --- mm/memory.c |6 ++ mm/msync.c | 57 - 2 files changed, 50 insertions(+), 13 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6dd1cd8..4b0144b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1670,6 +1670,9 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + /* * Yes, Virginia, this is actually required to prevent a race * with clear_page_dirty_for_io() from clearing the page dirty @@ -2343,6 +2346,9 @@ out_unlocked: if (anon) page_cache_release(vmf.page); else if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + set_page_dirty_balance(dirty_page, page_mkwrite); put_page(dirty_page); } diff --git a/mm/msync.c b/mm/msync.c index a4de868..394130d 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -5,6 +5,7 @@ * Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED] */ +#include asm/tlbflush.h #include linux/file.h #include linux/fs.h #include linux/mm.h @@ -13,11 +14,37 @@ #include linux/syscalls.h /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { I know it's not the common Linux Kernel way, but 'addr' could be made to have just 'for' scope here according to C99; for (unsigned long addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma-vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); + + if (pte_dirty(*pte) pte_write(*pte)) { + pte_t entry = ptep_clear_flush(vma, addr, pte); + + entry = pte_wrprotect(entry); + set_pte_at(vma-vm_mm, addr, pte, entry); + } + pte_unmap_unlock(pte, ptl); + } +} + +/* * MS_SYNC syncs the entire file - including mappings. * - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). - * Now it doesn't do anything, since dirty pages are properly tracked. I think keeping some version of the up to ... comments makes sense. It documents that we previously had different behaviour. + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages + * read-only by calling vma_wrprotect(). This is needed to catch the next + * write reference to the mapped region and update the file times + * accordingly. * * The application may now run fsync() to write out the dirty pages and * wait on the writeout and check the result. Or the application may run @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) error = 0; start = vma-vm_end; file = vma-vm_file; - if (file (vma-vm_flags VM_SHARED) (flags MS_SYNC)) { - get_file(file); - up_read(mm-mmap_sem); - error = do_fsync(file, 0); - fput(file); - if (error || start = end) - goto out; - down_read(mm-mmap_sem); - vma = find_vma(mm, start); - continue; + if (file (vma-vm_flags VM_SHARED)) { + if (flags MS_ASYNC) + vma_wrprotect(vma); + if (flags MS_SYNC) { else if ?? + get_file(file); + up_read(mm-mmap_sem); + error = do_fsync(file, 0); + fput(file); + if (error || start = end) + goto out; + down_read(mm-mmap_sem); + vma = find_vma(mm, start); +
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Jesper Juhl [EMAIL PROTECTED]: Some very pedantic nitpicking below; On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 Update file times at write references to memory-mapped files. Force file times update at the next write reference after calling the msync() system call with the MS_ASYNC flag. Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED] --- mm/memory.c |6 ++ mm/msync.c | 57 - 2 files changed, 50 insertions(+), 13 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6dd1cd8..4b0144b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1670,6 +1670,9 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + /* * Yes, Virginia, this is actually required to prevent a race * with clear_page_dirty_for_io() from clearing the page dirty @@ -2343,6 +2346,9 @@ out_unlocked: if (anon) page_cache_release(vmf.page); else if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + set_page_dirty_balance(dirty_page, page_mkwrite); put_page(dirty_page); } diff --git a/mm/msync.c b/mm/msync.c index a4de868..394130d 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -5,6 +5,7 @@ * Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED] */ +#include asm/tlbflush.h #include linux/file.h #include linux/fs.h #include linux/mm.h @@ -13,11 +14,37 @@ #include linux/syscalls.h /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { I know it's not the common Linux Kernel way, but 'addr' could be made to have just 'for' scope here according to C99; I believe that the C89 style is more common for the Linux kernel, so I've used the out-of-scope variable declaration. for (unsigned long addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma-vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); + + if (pte_dirty(*pte) pte_write(*pte)) { + pte_t entry = ptep_clear_flush(vma, addr, pte); + + entry = pte_wrprotect(entry); + set_pte_at(vma-vm_mm, addr, pte, entry); + } + pte_unmap_unlock(pte, ptl); + } +} + +/* * MS_SYNC syncs the entire file - including mappings. * - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). - * Now it doesn't do anything, since dirty pages are properly tracked. I think keeping some version of the up to ... comments makes sense. It documents that we previously had different behaviour. Earlier I had a request to remove any changelog-style comments from the code. + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages + * read-only by calling vma_wrprotect(). This is needed to catch the next + * write reference to the mapped region and update the file times + * accordingly. * * The application may now run fsync() to write out the dirty pages and * wait on the writeout and check the result. Or the application may run @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) error = 0; start = vma-vm_end; file = vma-vm_file; - if (file (vma-vm_flags VM_SHARED) (flags MS_SYNC)) { - get_file(file); - up_read(mm-mmap_sem); - error = do_fsync(file, 0); - fput(file); - if (error || start = end) - goto out; - down_read(mm-mmap_sem); - vma = find_vma(mm, start); - continue; + if (file (vma-vm_flags VM_SHARED)) { + if (flags MS_ASYNC) + vma_wrprotect(vma); + if (flags MS_SYNC) { else if ?? The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I did not use the
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: 2008/1/22, Jesper Juhl [EMAIL PROTECTED]: Some very pedantic nitpicking below; On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: ... + if (file (vma-vm_flags VM_SHARED)) { + if (flags MS_ASYNC) + vma_wrprotect(vma); + if (flags MS_SYNC) { else if ?? The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I did not use the else-if here. Moreover, this function itself checks that they never come together. I would say that them being mutually exclusive would be a reason *for* using else-if here. -- Jesper Juhl [EMAIL PROTECTED] Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Jesper Juhl [EMAIL PROTECTED]: On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: 2008/1/22, Jesper Juhl [EMAIL PROTECTED]: Some very pedantic nitpicking below; On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: ... + if (file (vma-vm_flags VM_SHARED)) { + if (flags MS_ASYNC) + vma_wrprotect(vma); + if (flags MS_SYNC) { else if ?? The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I did not use the else-if here. Moreover, this function itself checks that they never come together. I would say that them being mutually exclusive would be a reason *for* using else-if here. This check is performed by the sys_msync() function itself in its very beginning. We don't need to check it later. -- Jesper Juhl [EMAIL PROTECTED] Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Anton Salikhmetov [EMAIL PROTECTED]: 2008/1/22, Jesper Juhl [EMAIL PROTECTED]: Some very pedantic nitpicking below; On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40 Update file times at write references to memory-mapped files. Force file times update at the next write reference after calling the msync() system call with the MS_ASYNC flag. Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED] --- mm/memory.c |6 ++ mm/msync.c | 57 - 2 files changed, 50 insertions(+), 13 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6dd1cd8..4b0144b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1670,6 +1670,9 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + /* * Yes, Virginia, this is actually required to prevent a race * with clear_page_dirty_for_io() from clearing the page dirty @@ -2343,6 +2346,9 @@ out_unlocked: if (anon) page_cache_release(vmf.page); else if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); + set_page_dirty_balance(dirty_page, page_mkwrite); put_page(dirty_page); } diff --git a/mm/msync.c b/mm/msync.c index a4de868..394130d 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -5,6 +5,7 @@ * Copyright (C) 2008 Anton Salikhmetov [EMAIL PROTECTED] */ +#include asm/tlbflush.h #include linux/file.h #include linux/fs.h #include linux/mm.h @@ -13,11 +14,37 @@ #include linux/syscalls.h /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { I know it's not the common Linux Kernel way, but 'addr' could be made to have just 'for' scope here according to C99; I believe that the C89 style is more common for the Linux kernel, so I've used the out-of-scope variable declaration. for (unsigned long addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { By the way, if we're talking pedantic, then: debian:/tmp$ cat c.c void f() { for (unsigned long i = 0; i 10; i++) continue; } debian:/tmp$ gcc -c -pedantic c.c c.c: In function 'f': c.c:3: error: 'for' loop initial declaration used outside C99 mode debian:/tmp$ No pun intended :) + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma-vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); + + if (pte_dirty(*pte) pte_write(*pte)) { + pte_t entry = ptep_clear_flush(vma, addr, pte); + + entry = pte_wrprotect(entry); + set_pte_at(vma-vm_mm, addr, pte, entry); + } + pte_unmap_unlock(pte, ptl); + } +} + +/* * MS_SYNC syncs the entire file - including mappings. * - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). - * Now it doesn't do anything, since dirty pages are properly tracked. I think keeping some version of the up to ... comments makes sense. It documents that we previously had different behaviour. Earlier I had a request to remove any changelog-style comments from the code. + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages + * read-only by calling vma_wrprotect(). This is needed to catch the next + * write reference to the mapped region and update the file times + * accordingly. * * The application may now run fsync() to write out the dirty pages and * wait on the writeout and check the result. Or the application may run @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) error = 0; start = vma-vm_end; file = vma-vm_file; - if (file (vma-vm_flags VM_SHARED) (flags MS_SYNC)) { - get_file(file); - up_read(mm-mmap_sem); - error = do_fsync(file, 0); - fput(file); - if (error || start = end) -
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: 2008/1/22, Anton Salikhmetov [EMAIL PROTECTED]: 2008/1/22, Jesper Juhl [EMAIL PROTECTED]: Some very pedantic nitpicking below; ... By the way, if we're talking pedantic, then: debian:/tmp$ cat c.c void f() { for (unsigned long i = 0; i 10; i++) continue; } debian:/tmp$ gcc -c -pedantic c.c c.c: In function 'f': c.c:3: error: 'for' loop initial declaration used outside C99 mode debian:/tmp$ Well, I just wrote the way I'd have writen the loop, I know it's not the common kernel style. Just offering my feedback/input :) -- Jesper Juhl [EMAIL PROTECTED] Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On Tue, 22 Jan 2008, Anton Salikhmetov wrote: /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma-vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); This is extremely expensive over bigger areas, especially sparsely mapped ones (it does all the lookups for all four levels over and over and over again for eachg page). I think Peter Zijlstra posted a version that uses the regular kind of nested loop (with inline functions to keep the thing nice and clean), which gets rid of that. [ The sad/funny part is that this is all how we *used* to do msync(), back in the days: we're literally going back to the pre-cleanup logic. See commit 204ec841fbea3e5138168edbc3a76d46747cc987: mm: msync() cleanup for details ] Quite frankly, I really think you might be better off just doing a git revert 204ec841fbea3e5138168edbc3a76d46747cc987 and working from there! I just checked, and it still reverts cleanly, and you'd end up with a nice code-base that (a) has gotten years of testing and (b) already has the looping-over-the-pagetables code. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: 2008/1/22, Jesper Juhl [EMAIL PROTECTED]: On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: 2008/1/22, Jesper Juhl [EMAIL PROTECTED]: Some very pedantic nitpicking below; On 22/01/2008, Anton Salikhmetov [EMAIL PROTECTED] wrote: ... + if (file (vma-vm_flags VM_SHARED)) { + if (flags MS_ASYNC) + vma_wrprotect(vma); + if (flags MS_SYNC) { else if ?? The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I did not use the else-if here. Moreover, this function itself checks that they never come together. I would say that them being mutually exclusive would be a reason *for* using else-if here. This check is performed by the sys_msync() function itself in its very beginning. We don't need to check it later. Sure, it's just that, to me, using 'else-if' makes it explicit that the two are mutually exclusive. Using if (...), if (...) doesn't. Maybe it's just me, but I feel that 'else-if' here better shows the intention... No big deal. -- Jesper Juhl [EMAIL PROTECTED] Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Linus Torvalds [EMAIL PROTECTED]: On Tue, 22 Jan 2008, Anton Salikhmetov wrote: /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma-vm_start; addr vma-vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma-vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); This is extremely expensive over bigger areas, especially sparsely mapped ones (it does all the lookups for all four levels over and over and over again for eachg page). I think Peter Zijlstra posted a version that uses the regular kind of nested loop (with inline functions to keep the thing nice and clean), which gets rid of that. Thanks for your feedback, Linus! I will use Peter Zijlstra's version of such an operation in my next patch series. [ The sad/funny part is that this is all how we *used* to do msync(), back in the days: we're literally going back to the pre-cleanup logic. See commit 204ec841fbea3e5138168edbc3a76d46747cc987: mm: msync() cleanup for details ] Quite frankly, I really think you might be better off just doing a git revert 204ec841fbea3e5138168edbc3a76d46747cc987 and working from there! I just checked, and it still reverts cleanly, and you'd end up with a nice code-base that (a) has gotten years of testing and (b) already has the looping-over-the-pagetables code. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/