Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar wrote: > Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and > try_to_unmap_anon() more scalable > > rmap_walk_anon() and try_to_unmap_anon() appears to be too > careful about locking the anon vma: while it needs protection > against anon vma list modifications, it does not need exclusive > access to the list itself. > > Transforming this exclusive lock to a read-locked rwsem removes > a global lock from the hot path of page-migration intense > threaded workloads which can cause pathological performance like > this: > > 96.43%process 0 [kernel.kallsyms] [k] perf_trace_sched_switch > | > --- perf_trace_sched_switch > __schedule > schedule > schedule_preempt_disabled > __mutex_lock_common.isra.6 > __mutex_lock_slowpath > mutex_lock > | > |--50.61%-- rmap_walk > | move_to_new_page > | migrate_pages > | migrate_misplaced_page > | __do_numa_page.isra.69 > | handle_pte_fault > | handle_mm_fault > | __do_page_fault > | do_page_fault > | page_fault > | __memset_sse2 > | | > | --100.00%-- worker_thread > | | > | --100.00%-- start_thread > | > --49.39%-- page_lock_anon_vma > try_to_unmap_anon > try_to_unmap > migrate_pages > migrate_misplaced_page > __do_numa_page.isra.69 > handle_pte_fault > handle_mm_fault > __do_page_fault > do_page_fault > page_fault > __memset_sse2 > | > --100.00%-- worker_thread >start_thread > > With this change applied the profile is now nicely flat > and there's no anon-vma related scheduling/blocking. Wouldn't the same reasoning apply to i_mmap_mutex ? Should we make that a rwsem as well ? I take it that Ingo's test case does not show this, but i_mmap_mutex's role with file pages is actually quite similar to the anon_vma lock with anon pages... -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar wrote: > * Rik van Riel wrote: > >> >+static inline void anon_vma_lock_read(struct anon_vma *anon_vma) >> >+{ >> >+down_read(_vma->root->rwsem); >> >+} >> >> I see you did not rename anon_vma_lock and anon_vma_unlock to >> anon_vma_lock_write and anon_vma_unlock_write. >> >> That could get confusing to people touching that code in the >> future. > > Agreed, doing that rename makes perfect sense - I've done that > in the v2 version attached below. > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index f3f41d2..c20635c 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -118,7 +118,7 @@ static inline void vma_unlock_anon_vma(struct > vm_area_struct *vma) > up_write(_vma->root->rwsem); > } Note that you haven't changed the names for vma_lock_anon_vma() and vma_unlock_anon_vma(). I don't have any real good names to suggest though. > -static inline void anon_vma_lock(struct anon_vma *anon_vma) > +static inline void anon_vma_lock_write(struct anon_vma *anon_vma) > { > down_write(_vma->root->rwsem); > } > @@ -128,6 +128,17 @@ static inline void anon_vma_unlock(struct anon_vma > *anon_vma) > up_write(_vma->root->rwsem); > } And as Rik noticed, you forgot to rename anon_vma_unlock() too. But really, this is nitpicking. I like the idea behind the patch, and after giving it a close look, I couldn't find anything wrong with it. Reviewed-by: Michel Lespinasse -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar mi...@kernel.org wrote: * Rik van Riel r...@redhat.com wrote: +static inline void anon_vma_lock_read(struct anon_vma *anon_vma) +{ +down_read(anon_vma-root-rwsem); +} I see you did not rename anon_vma_lock and anon_vma_unlock to anon_vma_lock_write and anon_vma_unlock_write. That could get confusing to people touching that code in the future. Agreed, doing that rename makes perfect sense - I've done that in the v2 version attached below. diff --git a/include/linux/rmap.h b/include/linux/rmap.h index f3f41d2..c20635c 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -118,7 +118,7 @@ static inline void vma_unlock_anon_vma(struct vm_area_struct *vma) up_write(anon_vma-root-rwsem); } Note that you haven't changed the names for vma_lock_anon_vma() and vma_unlock_anon_vma(). I don't have any real good names to suggest though. -static inline void anon_vma_lock(struct anon_vma *anon_vma) +static inline void anon_vma_lock_write(struct anon_vma *anon_vma) { down_write(anon_vma-root-rwsem); } @@ -128,6 +128,17 @@ static inline void anon_vma_unlock(struct anon_vma *anon_vma) up_write(anon_vma-root-rwsem); } And as Rik noticed, you forgot to rename anon_vma_unlock() too. But really, this is nitpicking. I like the idea behind the patch, and after giving it a close look, I couldn't find anything wrong with it. Reviewed-by: Michel Lespinasse wal...@google.com -- Michel Walken Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar mi...@kernel.org wrote: Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable rmap_walk_anon() and try_to_unmap_anon() appears to be too careful about locking the anon vma: while it needs protection against anon vma list modifications, it does not need exclusive access to the list itself. Transforming this exclusive lock to a read-locked rwsem removes a global lock from the hot path of page-migration intense threaded workloads which can cause pathological performance like this: 96.43%process 0 [kernel.kallsyms] [k] perf_trace_sched_switch | --- perf_trace_sched_switch __schedule schedule schedule_preempt_disabled __mutex_lock_common.isra.6 __mutex_lock_slowpath mutex_lock | |--50.61%-- rmap_walk | move_to_new_page | migrate_pages | migrate_misplaced_page | __do_numa_page.isra.69 | handle_pte_fault | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | __memset_sse2 | | | --100.00%-- worker_thread | | | --100.00%-- start_thread | --49.39%-- page_lock_anon_vma try_to_unmap_anon try_to_unmap migrate_pages migrate_misplaced_page __do_numa_page.isra.69 handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault __memset_sse2 | --100.00%-- worker_thread start_thread With this change applied the profile is now nicely flat and there's no anon-vma related scheduling/blocking. Wouldn't the same reasoning apply to i_mmap_mutex ? Should we make that a rwsem as well ? I take it that Ingo's test case does not show this, but i_mmap_mutex's role with file pages is actually quite similar to the anon_vma lock with anon pages... -- Michel Walken Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
On 12/02/2012 10:12 AM, Ingo Molnar wrote: Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(), to make it clearer that it's an exclusive write-lock in that case - suggested by Rik van Riel. ... close, but you forgot to actually rename the unlock function :) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7f5a552..81a9dee 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd); #define wait_split_huge_page(__anon_vma, __pmd) \ do {\ pmd_t *pmd = (__pmd); \ - anon_vma_lock(__anon_vma); \ + anon_vma_lock_write(__anon_vma);\ anon_vma_unlock(__anon_vma);\ BUG_ON(pmd_trans_splitting(*pmd) || \ -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
* Rik van Riel wrote: > >+static inline void anon_vma_lock_read(struct anon_vma *anon_vma) > >+{ > >+down_read(_vma->root->rwsem); > >+} > > I see you did not rename anon_vma_lock and anon_vma_unlock to > anon_vma_lock_write and anon_vma_unlock_write. > > That could get confusing to people touching that code in the > future. Agreed, doing that rename makes perfect sense - I've done that in the v2 version attached below. Thanks, Ingo ---> >From 21469dcb225b9cf3160f839b7a823448f5ce5afa Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sat, 1 Dec 2012 21:15:38 +0100 Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable rmap_walk_anon() and try_to_unmap_anon() appears to be too careful about locking the anon vma: while it needs protection against anon vma list modifications, it does not need exclusive access to the list itself. Transforming this exclusive lock to a read-locked rwsem removes a global lock from the hot path of page-migration intense threaded workloads which can cause pathological performance like this: 96.43%process 0 [kernel.kallsyms] [k] perf_trace_sched_switch | --- perf_trace_sched_switch __schedule schedule schedule_preempt_disabled __mutex_lock_common.isra.6 __mutex_lock_slowpath mutex_lock | |--50.61%-- rmap_walk | move_to_new_page | migrate_pages | migrate_misplaced_page | __do_numa_page.isra.69 | handle_pte_fault | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | __memset_sse2 | | | --100.00%-- worker_thread | | | --100.00%-- start_thread | --49.39%-- page_lock_anon_vma try_to_unmap_anon try_to_unmap migrate_pages migrate_misplaced_page __do_numa_page.isra.69 handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault __memset_sse2 | --100.00%-- worker_thread start_thread With this change applied the profile is now nicely flat and there's no anon-vma related scheduling/blocking. Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(), to make it clearer that it's an exclusive write-lock in that case - suggested by Rik van Riel. Suggested-by: Linus Torvalds Cc: Peter Zijlstra Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Rik van Riel Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Hugh Dickins Signed-off-by: Ingo Molnar --- include/linux/huge_mm.h | 2 +- include/linux/rmap.h| 17 ++--- mm/huge_memory.c| 6 +++--- mm/ksm.c| 6 +++--- mm/memory-failure.c | 4 ++-- mm/migrate.c| 2 +- mm/mmap.c | 2 +- mm/mremap.c | 2 +- mm/rmap.c | 48 9 files changed, 50 insertions(+), 39 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7f5a552..81a9dee 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd); #define wait_split_huge_page(__anon_vma, __pmd) \ do {\ pmd_t *pmd = (__pmd); \ - anon_vma_lock(__anon_vma); \ + anon_vma_lock_write(__anon_vma);\ anon_vma_unlock(__anon_vma);\ BUG_ON(pmd_trans_splitting(*pmd) || \ pmd_trans_huge(*pmd)); \ diff --git a/include/linux/rmap.h b/include/linux/rmap.h index f3f41d2..c20635c 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -118,7 +118,7 @@ static inline void vma_unlock_anon_vma(struct
[PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
* Rik van Riel r...@redhat.com wrote: +static inline void anon_vma_lock_read(struct anon_vma *anon_vma) +{ +down_read(anon_vma-root-rwsem); +} I see you did not rename anon_vma_lock and anon_vma_unlock to anon_vma_lock_write and anon_vma_unlock_write. That could get confusing to people touching that code in the future. Agreed, doing that rename makes perfect sense - I've done that in the v2 version attached below. Thanks, Ingo --- From 21469dcb225b9cf3160f839b7a823448f5ce5afa Mon Sep 17 00:00:00 2001 From: Ingo Molnar mi...@kernel.org Date: Sat, 1 Dec 2012 21:15:38 +0100 Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable rmap_walk_anon() and try_to_unmap_anon() appears to be too careful about locking the anon vma: while it needs protection against anon vma list modifications, it does not need exclusive access to the list itself. Transforming this exclusive lock to a read-locked rwsem removes a global lock from the hot path of page-migration intense threaded workloads which can cause pathological performance like this: 96.43%process 0 [kernel.kallsyms] [k] perf_trace_sched_switch | --- perf_trace_sched_switch __schedule schedule schedule_preempt_disabled __mutex_lock_common.isra.6 __mutex_lock_slowpath mutex_lock | |--50.61%-- rmap_walk | move_to_new_page | migrate_pages | migrate_misplaced_page | __do_numa_page.isra.69 | handle_pte_fault | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | __memset_sse2 | | | --100.00%-- worker_thread | | | --100.00%-- start_thread | --49.39%-- page_lock_anon_vma try_to_unmap_anon try_to_unmap migrate_pages migrate_misplaced_page __do_numa_page.isra.69 handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault __memset_sse2 | --100.00%-- worker_thread start_thread With this change applied the profile is now nicely flat and there's no anon-vma related scheduling/blocking. Rename anon_vma_[un]lock() = anon_vma_[un]lock_write(), to make it clearer that it's an exclusive write-lock in that case - suggested by Rik van Riel. Suggested-by: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra a.p.zijls...@chello.nl Cc: Paul Turner p...@google.com Cc: Lee Schermerhorn lee.schermerh...@hp.com Cc: Christoph Lameter c...@linux.com Cc: Rik van Riel r...@redhat.com Cc: Mel Gorman mgor...@suse.de Cc: Andrea Arcangeli aarca...@redhat.com Cc: Johannes Weiner han...@cmpxchg.org Cc: Hugh Dickins hu...@google.com Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/huge_mm.h | 2 +- include/linux/rmap.h| 17 ++--- mm/huge_memory.c| 6 +++--- mm/ksm.c| 6 +++--- mm/memory-failure.c | 4 ++-- mm/migrate.c| 2 +- mm/mmap.c | 2 +- mm/mremap.c | 2 +- mm/rmap.c | 48 9 files changed, 50 insertions(+), 39 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7f5a552..81a9dee 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd); #define wait_split_huge_page(__anon_vma, __pmd) \ do {\ pmd_t *pmd = (__pmd); \ - anon_vma_lock(__anon_vma); \ + anon_vma_lock_write(__anon_vma);\ anon_vma_unlock(__anon_vma);\ BUG_ON(pmd_trans_splitting(*pmd) || \ pmd_trans_huge(*pmd)); \
Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
On 12/02/2012 10:12 AM, Ingo Molnar wrote: Rename anon_vma_[un]lock() = anon_vma_[un]lock_write(), to make it clearer that it's an exclusive write-lock in that case - suggested by Rik van Riel. ... close, but you forgot to actually rename the unlock function :) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7f5a552..81a9dee 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd); #define wait_split_huge_page(__anon_vma, __pmd) \ do {\ pmd_t *pmd = (__pmd); \ - anon_vma_lock(__anon_vma); \ + anon_vma_lock_write(__anon_vma);\ anon_vma_unlock(__anon_vma);\ BUG_ON(pmd_trans_splitting(*pmd) || \ -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/