Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Tue, Feb 19, 2008 at 07:46:10PM +1100, Nick Piggin wrote: > On Sunday 17 February 2008 06:22, Christoph Lameter wrote: > > On Fri, 15 Feb 2008, Andrew Morton wrote: > > > > > flush_cache_page(vma, address, pte_pfn(*pte)); > > > > entry = ptep_clear_flush(vma, address, pte); > > > > + mmu_notifier(invalidate_page, mm, address); > > > > > > I just don't see how ths can be done if the callee has another thread in > > > the middle of establishing IO against this region of memory. > > > ->invalidate_page() _has_ to be able to block. Confused. > > > > The page lock is held and that holds off I/O? > > I think the actual answer is that "it doesn't matter". Agreed. The PG_lock itself taken when invalidate_page is called, is used to serialized the VM against the VM, not the VM against I/O. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Sunday 17 February 2008 06:22, Christoph Lameter wrote: > On Fri, 15 Feb 2008, Andrew Morton wrote: > > > flush_cache_page(vma, address, pte_pfn(*pte)); > > > entry = ptep_clear_flush(vma, address, pte); > > > + mmu_notifier(invalidate_page, mm, address); > > > > I just don't see how ths can be done if the callee has another thread in > > the middle of establishing IO against this region of memory. > > ->invalidate_page() _has_ to be able to block. Confused. > > The page lock is held and that holds off I/O? I think the actual answer is that "it doesn't matter". ptes are not exactly the entity via which IO gets established, so all we really care about here is that after the callback finishes, we will not get any more reads or writes to the page via the external mapping. As far as holding off local IO goes, that is the job of the core VM. (And no, page lock does not necessarily hold it off FYI -- it can be writeback IO or even IO directly via buffers). Holding off IO via the external references I guess is a job for the notifier driver. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Sunday 17 February 2008 06:22, Christoph Lameter wrote: On Fri, 15 Feb 2008, Andrew Morton wrote: flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); I just don't see how ths can be done if the callee has another thread in the middle of establishing IO against this region of memory. -invalidate_page() _has_ to be able to block. Confused. The page lock is held and that holds off I/O? I think the actual answer is that it doesn't matter. ptes are not exactly the entity via which IO gets established, so all we really care about here is that after the callback finishes, we will not get any more reads or writes to the page via the external mapping. As far as holding off local IO goes, that is the job of the core VM. (And no, page lock does not necessarily hold it off FYI -- it can be writeback IO or even IO directly via buffers). Holding off IO via the external references I guess is a job for the notifier driver. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Tue, Feb 19, 2008 at 07:46:10PM +1100, Nick Piggin wrote: On Sunday 17 February 2008 06:22, Christoph Lameter wrote: On Fri, 15 Feb 2008, Andrew Morton wrote: flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); I just don't see how ths can be done if the callee has another thread in the middle of establishing IO against this region of memory. -invalidate_page() _has_ to be able to block. Confused. The page lock is held and that holds off I/O? I think the actual answer is that it doesn't matter. Agreed. The PG_lock itself taken when invalidate_page is called, is used to serialized the VM against the VM, not the VM against I/O. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Saturday 16 February 2008 14:37, Andrew Morton wrote: > On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter <[EMAIL PROTECTED]> wrote: > > Two callbacks to remove individual pages as done in rmap code > > > > invalidate_page() > > > > Called from the inner loop of rmap walks to invalidate pages. > > > > age_page() > > > > Called for the determination of the page referenced status. > > > > If we do not care about page referenced status then an age_page callback > > may be be omitted. PageLock and pte lock are held when either of the > > functions is called. > > The age_page mystery shallows. BTW. can this callback be called mmu_notifier_clear_flush_young? To match the core VM. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Saturday 16 February 2008 14:37, Andrew Morton wrote: On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter [EMAIL PROTECTED] wrote: Two callbacks to remove individual pages as done in rmap code invalidate_page() Called from the inner loop of rmap walks to invalidate pages. age_page() Called for the determination of the page referenced status. If we do not care about page referenced status then an age_page callback may be be omitted. PageLock and pte lock are held when either of the functions is called. The age_page mystery shallows. BTW. can this callback be called mmu_notifier_clear_flush_young? To match the core VM. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Fri, 15 Feb 2008, Andrew Morton wrote: > > @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa > > if (vma->vm_flags & VM_LOCKED) { > > referenced++; > > *mapcount = 1; /* break early from loop */ > > - } else if (ptep_clear_flush_young(vma, address, pte)) > > + } else if (ptep_clear_flush_young(vma, address, pte) | > > + mmu_notifier_age_page(mm, address)) > > referenced++; > > The "|" is obviously deliberate. But no explanation is provided telling us > why we still call the callback if ptep_clear_flush_young() said the page > was recently referenced. People who read your code will want to understand > this. Andrea? > > flush_cache_page(vma, address, pte_pfn(*pte)); > > entry = ptep_clear_flush(vma, address, pte); > > + mmu_notifier(invalidate_page, mm, address); > > I just don't see how ths can be done if the callee has another thread in > the middle of establishing IO against this region of memory. > ->invalidate_page() _has_ to be able to block. Confused. The page lock is held and that holds off I/O? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Fri, Feb 15, 2008 at 07:37:36PM -0800, Andrew Morton wrote: > The "|" is obviously deliberate. But no explanation is provided telling us > why we still call the callback if ptep_clear_flush_young() said the page > was recently referenced. People who read your code will want to understand > this. This is to clear the young bit in every pte and spte to such physical page before backing off because any young bit was on. So if any young bit will be on in the next scan, we're guaranteed the page has been touched recently and not ages before (otherwise it would take a worst case N rounds of the lru before the page can be freed, where N is the number of pte or sptes pointing to the page). > I just don't see how ths can be done if the callee has another thread in > the middle of establishing IO against this region of memory. > ->invalidate_page() _has_ to be able to block. Confused. invalidate_page marking the spte invalid and flushing the asid/tlb doesn't need to block the same way ptep_clear_flush doesn't need to block for the main linux pte. Infact before invalidate_page and ptep_clear_flush can touch anything at all, they've to take their own spinlocks (mmu_lock for the former, and PT lock for the latter). The only sleeping trouble is for networked driven message passing, where they want to schedule while they wait the message to arrive or it'd hang the whole cpu to spin for so long. sptes are cpu-clocked entities like ptes so scheduling there is by far not necessary because there's zero delay in invalidating them and flushing their tlbs. GRU is similar. Because we boost the reference count of the pages for every spte mapping, only implementing invalidate_range_end is enough, but I need to figure out the get_user_pages->rmap_add window too and because get_user_pages can schedule, and if I want to add a critical section around it to avoid calling get_user_pages twice during the kvm page fault, a mutex would be the only way (it sure can't be a spinlock). But a mutex can't be taken by invalidate_page to stop it. So that leaves me with the idea of adding a get_user_pages variant that returns the page locked. So instead of calling get_user_pages a second time after rmap_add returns, I will only need to call unlock_page which should be faster than a follow_page. And setting the PG_lock before dropping the PT lock in follow_page, should be fast enough too. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Fri, Feb 15, 2008 at 07:37:36PM -0800, Andrew Morton wrote: The | is obviously deliberate. But no explanation is provided telling us why we still call the callback if ptep_clear_flush_young() said the page was recently referenced. People who read your code will want to understand this. This is to clear the young bit in every pte and spte to such physical page before backing off because any young bit was on. So if any young bit will be on in the next scan, we're guaranteed the page has been touched recently and not ages before (otherwise it would take a worst case N rounds of the lru before the page can be freed, where N is the number of pte or sptes pointing to the page). I just don't see how ths can be done if the callee has another thread in the middle of establishing IO against this region of memory. -invalidate_page() _has_ to be able to block. Confused. invalidate_page marking the spte invalid and flushing the asid/tlb doesn't need to block the same way ptep_clear_flush doesn't need to block for the main linux pte. Infact before invalidate_page and ptep_clear_flush can touch anything at all, they've to take their own spinlocks (mmu_lock for the former, and PT lock for the latter). The only sleeping trouble is for networked driven message passing, where they want to schedule while they wait the message to arrive or it'd hang the whole cpu to spin for so long. sptes are cpu-clocked entities like ptes so scheduling there is by far not necessary because there's zero delay in invalidating them and flushing their tlbs. GRU is similar. Because we boost the reference count of the pages for every spte mapping, only implementing invalidate_range_end is enough, but I need to figure out the get_user_pages-rmap_add window too and because get_user_pages can schedule, and if I want to add a critical section around it to avoid calling get_user_pages twice during the kvm page fault, a mutex would be the only way (it sure can't be a spinlock). But a mutex can't be taken by invalidate_page to stop it. So that leaves me with the idea of adding a get_user_pages variant that returns the page locked. So instead of calling get_user_pages a second time after rmap_add returns, I will only need to call unlock_page which should be faster than a follow_page. And setting the PG_lock before dropping the PT lock in follow_page, should be fast enough too. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Fri, 15 Feb 2008, Andrew Morton wrote: @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa if (vma-vm_flags VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) referenced++; The | is obviously deliberate. But no explanation is provided telling us why we still call the callback if ptep_clear_flush_young() said the page was recently referenced. People who read your code will want to understand this. Andrea? flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); I just don't see how ths can be done if the callee has another thread in the middle of establishing IO against this region of memory. -invalidate_page() _has_ to be able to block. Confused. The page lock is held and that holds off I/O? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter <[EMAIL PROTECTED]> wrote: > Two callbacks to remove individual pages as done in rmap code > > invalidate_page() > > Called from the inner loop of rmap walks to invalidate pages. > > age_page() > > Called for the determination of the page referenced status. > > If we do not care about page referenced status then an age_page callback > may be be omitted. PageLock and pte lock are held when either of the > functions is called. The age_page mystery shallows. It would be useful to have some rationale somewhere in the patchset for the existence of this callback. > #include > > @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa > if (vma->vm_flags & VM_LOCKED) { > referenced++; > *mapcount = 1; /* break early from loop */ > - } else if (ptep_clear_flush_young(vma, address, pte)) > + } else if (ptep_clear_flush_young(vma, address, pte) | > +mmu_notifier_age_page(mm, address)) > referenced++; The "|" is obviously deliberate. But no explanation is provided telling us why we still call the callback if ptep_clear_flush_young() said the page was recently referenced. People who read your code will want to understand this. > /* Pretend the page is referenced if the task has the > @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page > > flush_cache_page(vma, address, pte_pfn(*pte)); > entry = ptep_clear_flush(vma, address, pte); > + mmu_notifier(invalidate_page, mm, address); I just don't see how ths can be done if the callee has another thread in the middle of establishing IO against this region of memory. ->invalidate_page() _has_ to be able to block. Confused. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks
On Thu, 14 Feb 2008 22:49:02 -0800 Christoph Lameter [EMAIL PROTECTED] wrote: Two callbacks to remove individual pages as done in rmap code invalidate_page() Called from the inner loop of rmap walks to invalidate pages. age_page() Called for the determination of the page referenced status. If we do not care about page referenced status then an age_page callback may be be omitted. PageLock and pte lock are held when either of the functions is called. The age_page mystery shallows. It would be useful to have some rationale somewhere in the patchset for the existence of this callback. #include asm/tlbflush.h @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa if (vma-vm_flags VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young(vma, address, pte) | +mmu_notifier_age_page(mm, address)) referenced++; The | is obviously deliberate. But no explanation is provided telling us why we still call the callback if ptep_clear_flush_young() said the page was recently referenced. People who read your code will want to understand this. /* Pretend the page is referenced if the task has the @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); I just don't see how ths can be done if the callee has another thread in the middle of establishing IO against this region of memory. -invalidate_page() _has_ to be able to block. Confused. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks
Two callbacks to remove individual pages as done in rmap code invalidate_page() Called from the inner loop of rmap walks to invalidate pages. age_page() Called for the determination of the page referenced status. If we do not care about page referenced status then an age_page callback may be be omitted. PageLock and pte lock are held when either of the functions is called. Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> Signed-off-by: Robin Holt <[EMAIL PROTECTED]> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/rmap.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800 +++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800 @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) referenced++; /* Pretend the page is referenced if the task has the @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte { + (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address { ret = SWAP_FAIL; goto out_unmap; } @@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks
Two callbacks to remove individual pages as done in rmap code invalidate_page() Called from the inner loop of rmap walks to invalidate pages. age_page() Called for the determination of the page referenced status. If we do not care about page referenced status then an age_page callback may be be omitted. PageLock and pte lock are held when either of the functions is called. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Robin Holt [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- mm/rmap.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800 +++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800 @@ -49,6 +49,7 @@ #include linux/module.h #include linux/kallsyms.h #include linux/memcontrol.h +#include linux/mmu_notifier.h #include asm/tlbflush.h @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa if (vma-vm_flags VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) referenced++; /* Pretend the page is referenced if the task has the @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page * skipped over this mm) then we should reactivate it. */ if (!migration ((vma-vm_flags VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte { + (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address { ret = SWAP_FAIL; goto out_unmap; } @@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* If nonlinear, store the file page offset in the pte. */ if (page-index != linear_page_index(vma, address)) -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks
Two callbacks to remove individual pages as done in rmap code invalidate_page() Called from the inner loop of rmap walks to invalidate pages. age_page() Called for the determination of the page referenced status. If we do not care about page referenced status then an age_page callback may be be omitted. PageLock and pte lock are held when either of the functions is called. Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> Signed-off-by: Robin Holt <[EMAIL PROTECTED]> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/rmap.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800 +++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800 @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) referenced++; /* Pretend the page is referenced if the task has the @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte { + (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address { ret = SWAP_FAIL; goto out_unmap; } @@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks
Two callbacks to remove individual pages as done in rmap code invalidate_page() Called from the inner loop of rmap walks to invalidate pages. age_page() Called for the determination of the page referenced status. If we do not care about page referenced status then an age_page callback may be be omitted. PageLock and pte lock are held when either of the functions is called. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Robin Holt [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- mm/rmap.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800 +++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800 @@ -49,6 +49,7 @@ #include linux/module.h #include linux/kallsyms.h #include linux/memcontrol.h +#include linux/mmu_notifier.h #include asm/tlbflush.h @@ -287,7 +288,8 @@ static int page_referenced_one(struct pa if (vma-vm_flags VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) referenced++; /* Pretend the page is referenced if the task has the @@ -455,6 +457,7 @@ static int page_mkclean_one(struct page flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page * skipped over this mm) then we should reactivate it. */ if (!migration ((vma-vm_flags VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte { + (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address { ret = SWAP_FAIL; goto out_unmap; } @@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* If nonlinear, store the file page offset in the pte. */ if (page-index != linear_page_index(vma, address)) -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
This is the second part of a patch posted to patch 1/6. Index: git-linus/mm/rmap.c === --- git-linus.orig/mm/rmap.c2008-01-30 11:55:56.0 -0600 +++ git-linus/mm/rmap.c 2008-01-30 12:01:28.0 -0600 @@ -476,8 +476,10 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); - if (unlikely(PageExternalRmap(page))) + if (unlikely(PageExternalRmap(page))) { mmu_rmap_notifier(invalidate_page, page); + ClearPageExported(page); + } if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -980,8 +982,10 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); - if (unlikely(PageExternalRmap(page))) + if (unlikely(PageExternalRmap(page))) { mmu_rmap_notifier(invalidate_page, page); + ClearPageExported(page); + } if (!page_mapped(page)) ret = SWAP_SUCCESS; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
This is the second part of a patch posted to patch 1/6. Index: git-linus/mm/rmap.c === --- git-linus.orig/mm/rmap.c2008-01-30 11:55:56.0 -0600 +++ git-linus/mm/rmap.c 2008-01-30 12:01:28.0 -0600 @@ -476,8 +476,10 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); - if (unlikely(PageExternalRmap(page))) + if (unlikely(PageExternalRmap(page))) { mmu_rmap_notifier(invalidate_page, page); + ClearPageExported(page); + } if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -980,8 +982,10 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); - if (unlikely(PageExternalRmap(page))) + if (unlikely(PageExternalRmap(page))) { mmu_rmap_notifier(invalidate_page, page); + ClearPageExported(page); + } if (!page_mapped(page)) ret = SWAP_SUCCESS; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
Callbacks to remove individual pages if the subsystem has an rmap capability. The pagelock is held but no spinlocks are held. The refcount of the page is elevated so that dropping the refcount in the subsystem will not directly free the page. The callbacks occur after the Linux rmaps have been walked. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/rmap.c |6 ++ 1 file changed, 6 insertions(+) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800 +++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800 @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -473,6 +474,8 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); + if (!page_mapped(page)) ret = SWAP_SUCCESS; return ret; -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
I don't understand how this is intended to work. I think the page flag needs to be maintained by the mmu_notifier subsystem. Let's assume we have a mapping that has a grant from xpmem and an additional grant from kvm. The exporters are not important, the fact that there may be two is. Assume that the user revokes the grant from xpmem (we call that xpmem_remove). As far as xpmem is concerned, there are no longer any exports of that page so the page should no longer have its exported flag set. Note: This is not a process exit, but a function of xpmem. In that case, at the remove time, we have no idea whether the flag should be cleared. For the invalidate_page side, I think we should have: > @@ -473,6 +474,10 @@ int page_mkclean(struct page *page) > struct address_space *mapping = page_mapping(page); > if (mapping) { > ret = page_mkclean_file(mapping, page); > + if (unlikely(PageExternalRmap(page))) { > + mmu_rmap_notifier(invalidate_page, page); > + ClearPageExternalRmap(page); > + } > if (page_test_dirty(page)) { > page_clear_dirty(page); > ret = 1; I would assume we would then want a function which sets the page flag. Additionally, I would think we would want some intervention in the freeing of the page side to ensure the page flag is cleared as well. Thanks, Robin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
I don't understand how this is intended to work. I think the page flag needs to be maintained by the mmu_notifier subsystem. Let's assume we have a mapping that has a grant from xpmem and an additional grant from kvm. The exporters are not important, the fact that there may be two is. Assume that the user revokes the grant from xpmem (we call that xpmem_remove). As far as xpmem is concerned, there are no longer any exports of that page so the page should no longer have its exported flag set. Note: This is not a process exit, but a function of xpmem. In that case, at the remove time, we have no idea whether the flag should be cleared. For the invalidate_page side, I think we should have: @@ -473,6 +474,10 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); + if (unlikely(PageExternalRmap(page))) { + mmu_rmap_notifier(invalidate_page, page); + ClearPageExternalRmap(page); + } if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; I would assume we would then want a function which sets the page flag. Additionally, I would think we would want some intervention in the freeing of the page side to ensure the page flag is cleared as well. Thanks, Robin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
Callbacks to remove individual pages if the subsystem has an rmap capability. The pagelock is held but no spinlocks are held. The refcount of the page is elevated so that dropping the refcount in the subsystem will not directly free the page. The callbacks occur after the Linux rmaps have been walked. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- mm/rmap.c |6 ++ 1 file changed, 6 insertions(+) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800 +++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800 @@ -49,6 +49,7 @@ #include linux/rcupdate.h #include linux/module.h #include linux/kallsyms.h +#include linux/mmu_notifier.h #include asm/tlbflush.h @@ -473,6 +474,8 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); + if (!page_mapped(page)) ret = SWAP_SUCCESS; return ret; -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
Callbacks to remove individual pages if the subsystem has an rmap capability. The pagelock is held but no spinlocks are held. The refcount of the page is elevated so that dropping the refcount in the subsystem will not directly free the page. The callbacks occur after the Linux rmaps have been walked. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/rmap.c |6 ++ 1 file changed, 6 insertions(+) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800 +++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800 @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -473,6 +474,8 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); + if (!page_mapped(page)) ret = SWAP_SUCCESS; return ret; -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap
Callbacks to remove individual pages if the subsystem has an rmap capability. The pagelock is held but no spinlocks are held. The refcount of the page is elevated so that dropping the refcount in the subsystem will not directly free the page. The callbacks occur after the Linux rmaps have been walked. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- mm/rmap.c |6 ++ 1 file changed, 6 insertions(+) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800 +++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800 @@ -49,6 +49,7 @@ #include linux/rcupdate.h #include linux/module.h #include linux/kallsyms.h +#include linux/mmu_notifier.h #include asm/tlbflush.h @@ -473,6 +474,8 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); + if (!page_mapped(page)) ret = SWAP_SUCCESS; return ret; -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/