Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-04 Thread Michel Lespinasse
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar  wrote:
> Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and
>  try_to_unmap_anon() more scalable
>
> rmap_walk_anon() and try_to_unmap_anon() appears to be too
> careful about locking the anon vma: while it needs protection
> against anon vma list modifications, it does not need exclusive
> access to the list itself.
>
> Transforming this exclusive lock to a read-locked rwsem removes
> a global lock from the hot path of page-migration intense
> threaded workloads which can cause pathological performance like
> this:
>
> 96.43%process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
>   |
>   --- perf_trace_sched_switch
>   __schedule
>   schedule
>   schedule_preempt_disabled
>   __mutex_lock_common.isra.6
>   __mutex_lock_slowpath
>   mutex_lock
>  |
>  |--50.61%-- rmap_walk
>  |  move_to_new_page
>  |  migrate_pages
>  |  migrate_misplaced_page
>  |  __do_numa_page.isra.69
>  |  handle_pte_fault
>  |  handle_mm_fault
>  |  __do_page_fault
>  |  do_page_fault
>  |  page_fault
>  |  __memset_sse2
>  |  |
>  |   --100.00%-- worker_thread
>  | |
>  |  --100.00%-- start_thread
>  |
>   --49.39%-- page_lock_anon_vma
> try_to_unmap_anon
> try_to_unmap
> migrate_pages
> migrate_misplaced_page
> __do_numa_page.isra.69
> handle_pte_fault
> handle_mm_fault
> __do_page_fault
> do_page_fault
> page_fault
> __memset_sse2
> |
>  --100.00%-- worker_thread
>start_thread
>
> With this change applied the profile is now nicely flat
> and there's no anon-vma related scheduling/blocking.

Wouldn't the same reasoning apply to i_mmap_mutex ? Should we make
that a rwsem as well ? I take it that Ingo's test case does not show
this, but i_mmap_mutex's role with file pages is actually quite
similar to the anon_vma lock with anon pages...

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-04 Thread Michel Lespinasse
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar  wrote:
> * Rik van Riel  wrote:
>
>> >+static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
>> >+{
>> >+down_read(_vma->root->rwsem);
>> >+}
>>
>> I see you did not rename anon_vma_lock and anon_vma_unlock to
>> anon_vma_lock_write and anon_vma_unlock_write.
>>
>> That could get confusing to people touching that code in the
>> future.
>
> Agreed, doing that rename makes perfect sense - I've done that
> in the v2 version attached below.
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index f3f41d2..c20635c 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -118,7 +118,7 @@ static inline void vma_unlock_anon_vma(struct 
> vm_area_struct *vma)
> up_write(_vma->root->rwsem);
>  }

Note that you haven't changed the names for vma_lock_anon_vma() and
vma_unlock_anon_vma().
I don't have any real good names to suggest though.

> -static inline void anon_vma_lock(struct anon_vma *anon_vma)
> +static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
>  {
> down_write(_vma->root->rwsem);
>  }
> @@ -128,6 +128,17 @@ static inline void anon_vma_unlock(struct anon_vma 
> *anon_vma)
> up_write(_vma->root->rwsem);
>  }

And as Rik noticed, you forgot to rename anon_vma_unlock() too.

But really, this is nitpicking. I like the idea behind the patch, and
after giving it a close look, I couldn't find anything wrong with it.

Reviewed-by: Michel Lespinasse 

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-04 Thread Michel Lespinasse
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar mi...@kernel.org wrote:
 * Rik van Riel r...@redhat.com wrote:

 +static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
 +{
 +down_read(anon_vma-root-rwsem);
 +}

 I see you did not rename anon_vma_lock and anon_vma_unlock to
 anon_vma_lock_write and anon_vma_unlock_write.

 That could get confusing to people touching that code in the
 future.

 Agreed, doing that rename makes perfect sense - I've done that
 in the v2 version attached below.

 diff --git a/include/linux/rmap.h b/include/linux/rmap.h
 index f3f41d2..c20635c 100644
 --- a/include/linux/rmap.h
 +++ b/include/linux/rmap.h
 @@ -118,7 +118,7 @@ static inline void vma_unlock_anon_vma(struct 
 vm_area_struct *vma)
 up_write(anon_vma-root-rwsem);
  }

Note that you haven't changed the names for vma_lock_anon_vma() and
vma_unlock_anon_vma().
I don't have any real good names to suggest though.

 -static inline void anon_vma_lock(struct anon_vma *anon_vma)
 +static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
  {
 down_write(anon_vma-root-rwsem);
  }
 @@ -128,6 +128,17 @@ static inline void anon_vma_unlock(struct anon_vma 
 *anon_vma)
 up_write(anon_vma-root-rwsem);
  }

And as Rik noticed, you forgot to rename anon_vma_unlock() too.

But really, this is nitpicking. I like the idea behind the patch, and
after giving it a close look, I couldn't find anything wrong with it.

Reviewed-by: Michel Lespinasse wal...@google.com

-- 
Michel Walken Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-04 Thread Michel Lespinasse
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar mi...@kernel.org wrote:
 Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and
  try_to_unmap_anon() more scalable

 rmap_walk_anon() and try_to_unmap_anon() appears to be too
 careful about locking the anon vma: while it needs protection
 against anon vma list modifications, it does not need exclusive
 access to the list itself.

 Transforming this exclusive lock to a read-locked rwsem removes
 a global lock from the hot path of page-migration intense
 threaded workloads which can cause pathological performance like
 this:

 96.43%process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
   |
   --- perf_trace_sched_switch
   __schedule
   schedule
   schedule_preempt_disabled
   __mutex_lock_common.isra.6
   __mutex_lock_slowpath
   mutex_lock
  |
  |--50.61%-- rmap_walk
  |  move_to_new_page
  |  migrate_pages
  |  migrate_misplaced_page
  |  __do_numa_page.isra.69
  |  handle_pte_fault
  |  handle_mm_fault
  |  __do_page_fault
  |  do_page_fault
  |  page_fault
  |  __memset_sse2
  |  |
  |   --100.00%-- worker_thread
  | |
  |  --100.00%-- start_thread
  |
   --49.39%-- page_lock_anon_vma
 try_to_unmap_anon
 try_to_unmap
 migrate_pages
 migrate_misplaced_page
 __do_numa_page.isra.69
 handle_pte_fault
 handle_mm_fault
 __do_page_fault
 do_page_fault
 page_fault
 __memset_sse2
 |
  --100.00%-- worker_thread
start_thread

 With this change applied the profile is now nicely flat
 and there's no anon-vma related scheduling/blocking.

Wouldn't the same reasoning apply to i_mmap_mutex ? Should we make
that a rwsem as well ? I take it that Ingo's test case does not show
this, but i_mmap_mutex's role with file pages is actually quite
similar to the anon_vma lock with anon pages...

-- 
Michel Walken Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-02 Thread Rik van Riel

On 12/02/2012 10:12 AM, Ingo Molnar wrote:


Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(),
to make it clearer that it's an exclusive write-lock in
that case - suggested by Rik van Riel.


... close, but you forgot to actually rename the unlock function :)


diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7f5a552..81a9dee 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, 
pmd_t *pmd);
  #define wait_split_huge_page(__anon_vma, __pmd)   
\
do {\
pmd_t *pmd = (__pmd);   \
-   anon_vma_lock(__anon_vma);  \
+   anon_vma_lock_write(__anon_vma);\
anon_vma_unlock(__anon_vma);\
BUG_ON(pmd_trans_splitting(*pmd) || \


--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-02 Thread Ingo Molnar

* Rik van Riel  wrote:

> >+static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
> >+{
> >+down_read(_vma->root->rwsem);
> >+}
> 
> I see you did not rename anon_vma_lock and anon_vma_unlock to 
> anon_vma_lock_write and anon_vma_unlock_write.
> 
> That could get confusing to people touching that code in the 
> future.

Agreed, doing that rename makes perfect sense - I've done that 
in the v2 version attached below.

Thanks,

Ingo

--->
>From 21469dcb225b9cf3160f839b7a823448f5ce5afa Mon Sep 17 00:00:00 2001
From: Ingo Molnar 
Date: Sat, 1 Dec 2012 21:15:38 +0100
Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and
 try_to_unmap_anon() more scalable

rmap_walk_anon() and try_to_unmap_anon() appears to be too
careful about locking the anon vma: while it needs protection
against anon vma list modifications, it does not need exclusive
access to the list itself.

Transforming this exclusive lock to a read-locked rwsem removes
a global lock from the hot path of page-migration intense
threaded workloads which can cause pathological performance like
this:

96.43%process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
  |
  --- perf_trace_sched_switch
  __schedule
  schedule
  schedule_preempt_disabled
  __mutex_lock_common.isra.6
  __mutex_lock_slowpath
  mutex_lock
 |
 |--50.61%-- rmap_walk
 |  move_to_new_page
 |  migrate_pages
 |  migrate_misplaced_page
 |  __do_numa_page.isra.69
 |  handle_pte_fault
 |  handle_mm_fault
 |  __do_page_fault
 |  do_page_fault
 |  page_fault
 |  __memset_sse2
 |  |
 |   --100.00%-- worker_thread
 | |
 |  --100.00%-- start_thread
 |
  --49.39%-- page_lock_anon_vma
try_to_unmap_anon
try_to_unmap
migrate_pages
migrate_misplaced_page
__do_numa_page.isra.69
handle_pte_fault
handle_mm_fault
__do_page_fault
do_page_fault
page_fault
__memset_sse2
|
 --100.00%-- worker_thread
   start_thread

With this change applied the profile is now nicely flat
and there's no anon-vma related scheduling/blocking.

Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(),
to make it clearer that it's an exclusive write-lock in
that case - suggested by Rik van Riel.

Suggested-by: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Paul Turner 
Cc: Lee Schermerhorn 
Cc: Christoph Lameter 
Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Andrea Arcangeli 
Cc: Johannes Weiner 
Cc: Hugh Dickins 
Signed-off-by: Ingo Molnar 
---
 include/linux/huge_mm.h |  2 +-
 include/linux/rmap.h| 17 ++---
 mm/huge_memory.c|  6 +++---
 mm/ksm.c|  6 +++---
 mm/memory-failure.c |  4 ++--
 mm/migrate.c|  2 +-
 mm/mmap.c   |  2 +-
 mm/mremap.c |  2 +-
 mm/rmap.c   | 48 
 9 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7f5a552..81a9dee 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, 
pmd_t *pmd);
 #define wait_split_huge_page(__anon_vma, __pmd)
\
do {\
pmd_t *pmd = (__pmd);   \
-   anon_vma_lock(__anon_vma);  \
+   anon_vma_lock_write(__anon_vma);\
anon_vma_unlock(__anon_vma);\
BUG_ON(pmd_trans_splitting(*pmd) || \
   pmd_trans_huge(*pmd));   \
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index f3f41d2..c20635c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -118,7 +118,7 @@ static inline void vma_unlock_anon_vma(struct 

[PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-02 Thread Ingo Molnar

* Rik van Riel r...@redhat.com wrote:

 +static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
 +{
 +down_read(anon_vma-root-rwsem);
 +}
 
 I see you did not rename anon_vma_lock and anon_vma_unlock to 
 anon_vma_lock_write and anon_vma_unlock_write.
 
 That could get confusing to people touching that code in the 
 future.

Agreed, doing that rename makes perfect sense - I've done that 
in the v2 version attached below.

Thanks,

Ingo

---
From 21469dcb225b9cf3160f839b7a823448f5ce5afa Mon Sep 17 00:00:00 2001
From: Ingo Molnar mi...@kernel.org
Date: Sat, 1 Dec 2012 21:15:38 +0100
Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and
 try_to_unmap_anon() more scalable

rmap_walk_anon() and try_to_unmap_anon() appears to be too
careful about locking the anon vma: while it needs protection
against anon vma list modifications, it does not need exclusive
access to the list itself.

Transforming this exclusive lock to a read-locked rwsem removes
a global lock from the hot path of page-migration intense
threaded workloads which can cause pathological performance like
this:

96.43%process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
  |
  --- perf_trace_sched_switch
  __schedule
  schedule
  schedule_preempt_disabled
  __mutex_lock_common.isra.6
  __mutex_lock_slowpath
  mutex_lock
 |
 |--50.61%-- rmap_walk
 |  move_to_new_page
 |  migrate_pages
 |  migrate_misplaced_page
 |  __do_numa_page.isra.69
 |  handle_pte_fault
 |  handle_mm_fault
 |  __do_page_fault
 |  do_page_fault
 |  page_fault
 |  __memset_sse2
 |  |
 |   --100.00%-- worker_thread
 | |
 |  --100.00%-- start_thread
 |
  --49.39%-- page_lock_anon_vma
try_to_unmap_anon
try_to_unmap
migrate_pages
migrate_misplaced_page
__do_numa_page.isra.69
handle_pte_fault
handle_mm_fault
__do_page_fault
do_page_fault
page_fault
__memset_sse2
|
 --100.00%-- worker_thread
   start_thread

With this change applied the profile is now nicely flat
and there's no anon-vma related scheduling/blocking.

Rename anon_vma_[un]lock() = anon_vma_[un]lock_write(),
to make it clearer that it's an exclusive write-lock in
that case - suggested by Rik van Riel.

Suggested-by: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra a.p.zijls...@chello.nl
Cc: Paul Turner p...@google.com
Cc: Lee Schermerhorn lee.schermerh...@hp.com
Cc: Christoph Lameter c...@linux.com
Cc: Rik van Riel r...@redhat.com
Cc: Mel Gorman mgor...@suse.de
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: Johannes Weiner han...@cmpxchg.org
Cc: Hugh Dickins hu...@google.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/huge_mm.h |  2 +-
 include/linux/rmap.h| 17 ++---
 mm/huge_memory.c|  6 +++---
 mm/ksm.c|  6 +++---
 mm/memory-failure.c |  4 ++--
 mm/migrate.c|  2 +-
 mm/mmap.c   |  2 +-
 mm/mremap.c |  2 +-
 mm/rmap.c   | 48 
 9 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7f5a552..81a9dee 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, 
pmd_t *pmd);
 #define wait_split_huge_page(__anon_vma, __pmd)
\
do {\
pmd_t *pmd = (__pmd);   \
-   anon_vma_lock(__anon_vma);  \
+   anon_vma_lock_write(__anon_vma);\
anon_vma_unlock(__anon_vma);\
BUG_ON(pmd_trans_splitting(*pmd) || \
   pmd_trans_huge(*pmd));   \

Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-02 Thread Rik van Riel

On 12/02/2012 10:12 AM, Ingo Molnar wrote:


Rename anon_vma_[un]lock() = anon_vma_[un]lock_write(),
to make it clearer that it's an exclusive write-lock in
that case - suggested by Rik van Riel.


... close, but you forgot to actually rename the unlock function :)


diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7f5a552..81a9dee 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -101,7 +101,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, 
pmd_t *pmd);
  #define wait_split_huge_page(__anon_vma, __pmd)   
\
do {\
pmd_t *pmd = (__pmd);   \
-   anon_vma_lock(__anon_vma);  \
+   anon_vma_lock_write(__anon_vma);\
anon_vma_unlock(__anon_vma);\
BUG_ON(pmd_trans_splitting(*pmd) || \


--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/