Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-15 Thread Minchan Kim
On Sun, Apr 12, 2015 at 11:48:23PM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Sat, Apr 11, 2015 at 02:40:46PM -0700, Hugh Dickins wrote:
> > On Wed, 11 Mar 2015, Minchan Kim wrote:
> > 
> > > Bascially, MADV_FREE relys on the pte dirty to decide whether
> > > it allows VM to discard the page. However, if there is swap-in,
> > > pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> > > PageDirty and PageSwapCache for those pages to not discard it
> > > because swapped-in page could live on swap cache or PageDirty
> > > when it is removed from swapcache.
> > > 
> > > The problem in here is that anonymous pages can have PageDirty if
> > > it is removed from swapcache so that VM cannot parse those pages
> > > as freeable even if we did madvise_free. Look at below example.
> > > 
> > > ptr = malloc();
> > > memset(ptr);
> > > ..
> > > heavy memory pressure -> swap-out all of pages
> > > ..
> > > out of memory pressure so there are lots of free pages
> > > ..
> > > var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
> > >but SetPageDirty
> > > 
> > > madvise_free(ptr);
> > > ..
> > > ..
> > > heavy memory pressure -> VM cannot discard the page by PageDirty.
> > > 
> > > PageDirty for anonymous page aims for avoiding duplicating
> > > swapping out. In other words, if a page have swapped-in but
> > > live swapcache(ie, !PageDirty), we could save swapout if the page
> > > is selected as victim by VM in future because swap device have
> > > kept previous swapped-out contents of the page.
> > > 
> > > So, rather than relying on the PG_dirty for working madvise_free,
> > > pte_dirty is more straightforward. Inherently, swapped-out page was
> > > pte_dirty so this patch restores the dirtiness when swap-in fault
> > > happens so madvise_free doesn't rely on the PageDirty any more.
> > > 
> > > Cc: Hugh Dickins 
> > > Cc: Cyrill Gorcunov 
> > > Cc: Pavel Emelyanov 
> > > Reported-by: Yalin Wang 
> > > Signed-off-by: Minchan Kim 
> > 
> > Sorry, but NAK to this patch,
> > mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
> > (I hope it hasn't reached linux-next yet).
> > 
> > You may well be right that pte_dirty<->PageDirty can be handled
> > differently, in a way more favourable to MADV_FREE.  And this patch
> > may be a step in the right direction, but I've barely given it thought.
> > 
> > As it stands, it segfaults more than any patch I've seen in years:
> > I just tried applying it to 4.0-rc7-mm1, and running kernel builds
> > in low memory with swap.  Even if I leave KSM out, and memcg out, and
> > swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.
> > 
> > I have a choice: spend a few hours tracking down the errors, and
> > post a fix patch on top of yours?  But even then I'd want to spend
> > a lot longer thinking through every dirty/Dirty in the source before
> > I'd feel comfortable to give an ack.
> > 
> > This is users' data, and we need to be very careful with it: errors
> > in MADV_FREE are one thing, for now that's easy to avoid; but in this
> > patch you're changing the rules for Anon PageDirty for everyone.
> > 
> > I think for now I'll have to leave it to you to do much more source
> > diligence and testing, before coming back with a corrected patch for
> > us then to review, slowly and carefully.
> 
> Sorry for my bad. I will keep your advise in mind.
> I will investigate the problem as soon as I get back to work
> after vacation.
> 
> Thanks for the the review.

When I look at the code, migration doesn't restore dirty bit of pte
in remove_migration_pte and relys on PG_dirty which was set by
try_to_unmap_one. I think it was a reason you saw segfault.
I will spend more time to investigate another code piece which might
ignore dirty bit restore.

Thanks.



> 
> -- 
> Kind regards,
> Minchan Kim

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-15 Thread Minchan Kim
On Sun, Apr 12, 2015 at 11:48:23PM +0900, Minchan Kim wrote:
 Hello Hugh,
 
 On Sat, Apr 11, 2015 at 02:40:46PM -0700, Hugh Dickins wrote:
  On Wed, 11 Mar 2015, Minchan Kim wrote:
  
   Bascially, MADV_FREE relys on the pte dirty to decide whether
   it allows VM to discard the page. However, if there is swap-in,
   pte pointed out the page has no pte_dirty. So, MADV_FREE checks
   PageDirty and PageSwapCache for those pages to not discard it
   because swapped-in page could live on swap cache or PageDirty
   when it is removed from swapcache.
   
   The problem in here is that anonymous pages can have PageDirty if
   it is removed from swapcache so that VM cannot parse those pages
   as freeable even if we did madvise_free. Look at below example.
   
   ptr = malloc();
   memset(ptr);
   ..
   heavy memory pressure - swap-out all of pages
   ..
   out of memory pressure so there are lots of free pages
   ..
   var = *ptr; - swap-in page/remove the page from swapcache. so pte_clean
  but SetPageDirty
   
   madvise_free(ptr);
   ..
   ..
   heavy memory pressure - VM cannot discard the page by PageDirty.
   
   PageDirty for anonymous page aims for avoiding duplicating
   swapping out. In other words, if a page have swapped-in but
   live swapcache(ie, !PageDirty), we could save swapout if the page
   is selected as victim by VM in future because swap device have
   kept previous swapped-out contents of the page.
   
   So, rather than relying on the PG_dirty for working madvise_free,
   pte_dirty is more straightforward. Inherently, swapped-out page was
   pte_dirty so this patch restores the dirtiness when swap-in fault
   happens so madvise_free doesn't rely on the PageDirty any more.
   
   Cc: Hugh Dickins hu...@google.com
   Cc: Cyrill Gorcunov gorcu...@gmail.com
   Cc: Pavel Emelyanov xe...@parallels.com
   Reported-by: Yalin Wang yalin.w...@sonymobile.com
   Signed-off-by: Minchan Kim minc...@kernel.org
  
  Sorry, but NAK to this patch,
  mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
  (I hope it hasn't reached linux-next yet).
  
  You may well be right that pte_dirty-PageDirty can be handled
  differently, in a way more favourable to MADV_FREE.  And this patch
  may be a step in the right direction, but I've barely given it thought.
  
  As it stands, it segfaults more than any patch I've seen in years:
  I just tried applying it to 4.0-rc7-mm1, and running kernel builds
  in low memory with swap.  Even if I leave KSM out, and memcg out, and
  swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.
  
  I have a choice: spend a few hours tracking down the errors, and
  post a fix patch on top of yours?  But even then I'd want to spend
  a lot longer thinking through every dirty/Dirty in the source before
  I'd feel comfortable to give an ack.
  
  This is users' data, and we need to be very careful with it: errors
  in MADV_FREE are one thing, for now that's easy to avoid; but in this
  patch you're changing the rules for Anon PageDirty for everyone.
  
  I think for now I'll have to leave it to you to do much more source
  diligence and testing, before coming back with a corrected patch for
  us then to review, slowly and carefully.
 
 Sorry for my bad. I will keep your advise in mind.
 I will investigate the problem as soon as I get back to work
 after vacation.
 
 Thanks for the the review.

When I look at the code, migration doesn't restore dirty bit of pte
in remove_migration_pte and relys on PG_dirty which was set by
try_to_unmap_one. I think it was a reason you saw segfault.
I will spend more time to investigate another code piece which might
ignore dirty bit restore.

Thanks.



 
 -- 
 Kind regards,
 Minchan Kim

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-12 Thread Minchan Kim
Hello Hugh,

On Sat, Apr 11, 2015 at 02:40:46PM -0700, Hugh Dickins wrote:
> On Wed, 11 Mar 2015, Minchan Kim wrote:
> 
> > Bascially, MADV_FREE relys on the pte dirty to decide whether
> > it allows VM to discard the page. However, if there is swap-in,
> > pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> > PageDirty and PageSwapCache for those pages to not discard it
> > because swapped-in page could live on swap cache or PageDirty
> > when it is removed from swapcache.
> > 
> > The problem in here is that anonymous pages can have PageDirty if
> > it is removed from swapcache so that VM cannot parse those pages
> > as freeable even if we did madvise_free. Look at below example.
> > 
> > ptr = malloc();
> > memset(ptr);
> > ..
> > heavy memory pressure -> swap-out all of pages
> > ..
> > out of memory pressure so there are lots of free pages
> > ..
> > var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
> >but SetPageDirty
> > 
> > madvise_free(ptr);
> > ..
> > ..
> > heavy memory pressure -> VM cannot discard the page by PageDirty.
> > 
> > PageDirty for anonymous page aims for avoiding duplicating
> > swapping out. In other words, if a page have swapped-in but
> > live swapcache(ie, !PageDirty), we could save swapout if the page
> > is selected as victim by VM in future because swap device have
> > kept previous swapped-out contents of the page.
> > 
> > So, rather than relying on the PG_dirty for working madvise_free,
> > pte_dirty is more straightforward. Inherently, swapped-out page was
> > pte_dirty so this patch restores the dirtiness when swap-in fault
> > happens so madvise_free doesn't rely on the PageDirty any more.
> > 
> > Cc: Hugh Dickins 
> > Cc: Cyrill Gorcunov 
> > Cc: Pavel Emelyanov 
> > Reported-by: Yalin Wang 
> > Signed-off-by: Minchan Kim 
> 
> Sorry, but NAK to this patch,
> mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
> (I hope it hasn't reached linux-next yet).
> 
> You may well be right that pte_dirty<->PageDirty can be handled
> differently, in a way more favourable to MADV_FREE.  And this patch
> may be a step in the right direction, but I've barely given it thought.
> 
> As it stands, it segfaults more than any patch I've seen in years:
> I just tried applying it to 4.0-rc7-mm1, and running kernel builds
> in low memory with swap.  Even if I leave KSM out, and memcg out, and
> swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.
> 
> I have a choice: spend a few hours tracking down the errors, and
> post a fix patch on top of yours?  But even then I'd want to spend
> a lot longer thinking through every dirty/Dirty in the source before
> I'd feel comfortable to give an ack.
> 
> This is users' data, and we need to be very careful with it: errors
> in MADV_FREE are one thing, for now that's easy to avoid; but in this
> patch you're changing the rules for Anon PageDirty for everyone.
> 
> I think for now I'll have to leave it to you to do much more source
> diligence and testing, before coming back with a corrected patch for
> us then to review, slowly and carefully.

Sorry for my bad. I will keep your advise in mind.
I will investigate the problem as soon as I get back to work
after vacation.

Thanks for the the review.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-12 Thread Minchan Kim
Hello Hugh,

On Sat, Apr 11, 2015 at 02:40:46PM -0700, Hugh Dickins wrote:
 On Wed, 11 Mar 2015, Minchan Kim wrote:
 
  Bascially, MADV_FREE relys on the pte dirty to decide whether
  it allows VM to discard the page. However, if there is swap-in,
  pte pointed out the page has no pte_dirty. So, MADV_FREE checks
  PageDirty and PageSwapCache for those pages to not discard it
  because swapped-in page could live on swap cache or PageDirty
  when it is removed from swapcache.
  
  The problem in here is that anonymous pages can have PageDirty if
  it is removed from swapcache so that VM cannot parse those pages
  as freeable even if we did madvise_free. Look at below example.
  
  ptr = malloc();
  memset(ptr);
  ..
  heavy memory pressure - swap-out all of pages
  ..
  out of memory pressure so there are lots of free pages
  ..
  var = *ptr; - swap-in page/remove the page from swapcache. so pte_clean
 but SetPageDirty
  
  madvise_free(ptr);
  ..
  ..
  heavy memory pressure - VM cannot discard the page by PageDirty.
  
  PageDirty for anonymous page aims for avoiding duplicating
  swapping out. In other words, if a page have swapped-in but
  live swapcache(ie, !PageDirty), we could save swapout if the page
  is selected as victim by VM in future because swap device have
  kept previous swapped-out contents of the page.
  
  So, rather than relying on the PG_dirty for working madvise_free,
  pte_dirty is more straightforward. Inherently, swapped-out page was
  pte_dirty so this patch restores the dirtiness when swap-in fault
  happens so madvise_free doesn't rely on the PageDirty any more.
  
  Cc: Hugh Dickins hu...@google.com
  Cc: Cyrill Gorcunov gorcu...@gmail.com
  Cc: Pavel Emelyanov xe...@parallels.com
  Reported-by: Yalin Wang yalin.w...@sonymobile.com
  Signed-off-by: Minchan Kim minc...@kernel.org
 
 Sorry, but NAK to this patch,
 mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
 (I hope it hasn't reached linux-next yet).
 
 You may well be right that pte_dirty-PageDirty can be handled
 differently, in a way more favourable to MADV_FREE.  And this patch
 may be a step in the right direction, but I've barely given it thought.
 
 As it stands, it segfaults more than any patch I've seen in years:
 I just tried applying it to 4.0-rc7-mm1, and running kernel builds
 in low memory with swap.  Even if I leave KSM out, and memcg out, and
 swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.
 
 I have a choice: spend a few hours tracking down the errors, and
 post a fix patch on top of yours?  But even then I'd want to spend
 a lot longer thinking through every dirty/Dirty in the source before
 I'd feel comfortable to give an ack.
 
 This is users' data, and we need to be very careful with it: errors
 in MADV_FREE are one thing, for now that's easy to avoid; but in this
 patch you're changing the rules for Anon PageDirty for everyone.
 
 I think for now I'll have to leave it to you to do much more source
 diligence and testing, before coming back with a corrected patch for
 us then to review, slowly and carefully.

Sorry for my bad. I will keep your advise in mind.
I will investigate the problem as soon as I get back to work
after vacation.

Thanks for the the review.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-11 Thread Hugh Dickins
On Wed, 11 Mar 2015, Minchan Kim wrote:

> Bascially, MADV_FREE relys on the pte dirty to decide whether
> it allows VM to discard the page. However, if there is swap-in,
> pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> PageDirty and PageSwapCache for those pages to not discard it
> because swapped-in page could live on swap cache or PageDirty
> when it is removed from swapcache.
> 
> The problem in here is that anonymous pages can have PageDirty if
> it is removed from swapcache so that VM cannot parse those pages
> as freeable even if we did madvise_free. Look at below example.
> 
> ptr = malloc();
> memset(ptr);
> ..
> heavy memory pressure -> swap-out all of pages
> ..
> out of memory pressure so there are lots of free pages
> ..
> var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
>but SetPageDirty
> 
> madvise_free(ptr);
> ..
> ..
> heavy memory pressure -> VM cannot discard the page by PageDirty.
> 
> PageDirty for anonymous page aims for avoiding duplicating
> swapping out. In other words, if a page have swapped-in but
> live swapcache(ie, !PageDirty), we could save swapout if the page
> is selected as victim by VM in future because swap device have
> kept previous swapped-out contents of the page.
> 
> So, rather than relying on the PG_dirty for working madvise_free,
> pte_dirty is more straightforward. Inherently, swapped-out page was
> pte_dirty so this patch restores the dirtiness when swap-in fault
> happens so madvise_free doesn't rely on the PageDirty any more.
> 
> Cc: Hugh Dickins 
> Cc: Cyrill Gorcunov 
> Cc: Pavel Emelyanov 
> Reported-by: Yalin Wang 
> Signed-off-by: Minchan Kim 

Sorry, but NAK to this patch,
mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
(I hope it hasn't reached linux-next yet).

You may well be right that pte_dirty<->PageDirty can be handled
differently, in a way more favourable to MADV_FREE.  And this patch
may be a step in the right direction, but I've barely given it thought.

As it stands, it segfaults more than any patch I've seen in years:
I just tried applying it to 4.0-rc7-mm1, and running kernel builds
in low memory with swap.  Even if I leave KSM out, and memcg out, and
swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.

I have a choice: spend a few hours tracking down the errors, and
post a fix patch on top of yours?  But even then I'd want to spend
a lot longer thinking through every dirty/Dirty in the source before
I'd feel comfortable to give an ack.

This is users' data, and we need to be very careful with it: errors
in MADV_FREE are one thing, for now that's easy to avoid; but in this
patch you're changing the rules for Anon PageDirty for everyone.

I think for now I'll have to leave it to you to do much more source
diligence and testing, before coming back with a corrected patch for
us then to review, slowly and carefully.

Hugh

> ---
>  mm/madvise.c | 1 -
>  mm/memory.c  | 9 +++--
>  mm/rmap.c| 2 +-
>  mm/vmscan.c  | 3 +--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 22e8f0c..a045798 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
> long addr,
>   continue;
>   }
>  
> - ClearPageDirty(page);
>   unlock_page(page);
>   }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 0f96a4a..40428a5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct 
> vm_area_struct *vma,
>  
>   inc_mm_counter_fast(mm, MM_ANONPAGES);
>   dec_mm_counter_fast(mm, MM_SWAPENTS);
> - pte = mk_pte(page, vma->vm_page_prot);
> +
> + /*
> +  * Every page swapped-out was pte_dirty so we make pte dirty again.
> +  * MADV_FREE relies on it.
> +  */
> + pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
>   if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> + pte = maybe_mkwrite(pte, vma);
>   flags &= ~FAULT_FLAG_WRITE;
>   ret |= VM_FAULT_WRITE;
>   exclusive = 1;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 47b3ba8..34c1d66 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct 
> vm_area_struct *vma,
>  
>   if (flags & TTU_FREE) {
>   VM_BUG_ON_PAGE(PageSwapCache(page), page);
> - if (!dirty && !PageDirty(page)) {
> + if (!dirty) {
>   /* It's a freeable page by MADV_FREE */
>   dec_mm_counter(mm, MM_ANONPAGES);
>   goto discard;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413..3357ffa 100644

Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-11 Thread Hugh Dickins
On Wed, 11 Mar 2015, Minchan Kim wrote:

 Bascially, MADV_FREE relys on the pte dirty to decide whether
 it allows VM to discard the page. However, if there is swap-in,
 pte pointed out the page has no pte_dirty. So, MADV_FREE checks
 PageDirty and PageSwapCache for those pages to not discard it
 because swapped-in page could live on swap cache or PageDirty
 when it is removed from swapcache.
 
 The problem in here is that anonymous pages can have PageDirty if
 it is removed from swapcache so that VM cannot parse those pages
 as freeable even if we did madvise_free. Look at below example.
 
 ptr = malloc();
 memset(ptr);
 ..
 heavy memory pressure - swap-out all of pages
 ..
 out of memory pressure so there are lots of free pages
 ..
 var = *ptr; - swap-in page/remove the page from swapcache. so pte_clean
but SetPageDirty
 
 madvise_free(ptr);
 ..
 ..
 heavy memory pressure - VM cannot discard the page by PageDirty.
 
 PageDirty for anonymous page aims for avoiding duplicating
 swapping out. In other words, if a page have swapped-in but
 live swapcache(ie, !PageDirty), we could save swapout if the page
 is selected as victim by VM in future because swap device have
 kept previous swapped-out contents of the page.
 
 So, rather than relying on the PG_dirty for working madvise_free,
 pte_dirty is more straightforward. Inherently, swapped-out page was
 pte_dirty so this patch restores the dirtiness when swap-in fault
 happens so madvise_free doesn't rely on the PageDirty any more.
 
 Cc: Hugh Dickins hu...@google.com
 Cc: Cyrill Gorcunov gorcu...@gmail.com
 Cc: Pavel Emelyanov xe...@parallels.com
 Reported-by: Yalin Wang yalin.w...@sonymobile.com
 Signed-off-by: Minchan Kim minc...@kernel.org

Sorry, but NAK to this patch,
mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
(I hope it hasn't reached linux-next yet).

You may well be right that pte_dirty-PageDirty can be handled
differently, in a way more favourable to MADV_FREE.  And this patch
may be a step in the right direction, but I've barely given it thought.

As it stands, it segfaults more than any patch I've seen in years:
I just tried applying it to 4.0-rc7-mm1, and running kernel builds
in low memory with swap.  Even if I leave KSM out, and memcg out, and
swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.

I have a choice: spend a few hours tracking down the errors, and
post a fix patch on top of yours?  But even then I'd want to spend
a lot longer thinking through every dirty/Dirty in the source before
I'd feel comfortable to give an ack.

This is users' data, and we need to be very careful with it: errors
in MADV_FREE are one thing, for now that's easy to avoid; but in this
patch you're changing the rules for Anon PageDirty for everyone.

I think for now I'll have to leave it to you to do much more source
diligence and testing, before coming back with a corrected patch for
us then to review, slowly and carefully.

Hugh

 ---
  mm/madvise.c | 1 -
  mm/memory.c  | 9 +++--
  mm/rmap.c| 2 +-
  mm/vmscan.c  | 3 +--
  4 files changed, 9 insertions(+), 6 deletions(-)
 
 diff --git a/mm/madvise.c b/mm/madvise.c
 index 22e8f0c..a045798 100644
 --- a/mm/madvise.c
 +++ b/mm/madvise.c
 @@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
 long addr,
   continue;
   }
  
 - ClearPageDirty(page);
   unlock_page(page);
   }
  
 diff --git a/mm/memory.c b/mm/memory.c
 index 0f96a4a..40428a5 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct 
 vm_area_struct *vma,
  
   inc_mm_counter_fast(mm, MM_ANONPAGES);
   dec_mm_counter_fast(mm, MM_SWAPENTS);
 - pte = mk_pte(page, vma-vm_page_prot);
 +
 + /*
 +  * Every page swapped-out was pte_dirty so we make pte dirty again.
 +  * MADV_FREE relies on it.
 +  */
 + pte = pte_mkdirty(mk_pte(page, vma-vm_page_prot));
   if ((flags  FAULT_FLAG_WRITE)  reuse_swap_page(page)) {
 - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 + pte = maybe_mkwrite(pte, vma);
   flags = ~FAULT_FLAG_WRITE;
   ret |= VM_FAULT_WRITE;
   exclusive = 1;
 diff --git a/mm/rmap.c b/mm/rmap.c
 index 47b3ba8..34c1d66 100644
 --- a/mm/rmap.c
 +++ b/mm/rmap.c
 @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct 
 vm_area_struct *vma,
  
   if (flags  TTU_FREE) {
   VM_BUG_ON_PAGE(PageSwapCache(page), page);
 - if (!dirty  !PageDirty(page)) {
 + if (!dirty) {
   /* It's a freeable page by MADV_FREE */
   dec_mm_counter(mm, MM_ANONPAGES);
   goto discard;
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 260c413..3357ffa 100644
 --- a/mm/vmscan.c

Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-09 Thread Andrew Morton
On Thu, 9 Apr 2015 08:50:25 +0900 Minchan Kim  wrote:

> Bump.

I'm getting the feeling that MADV_FREE is out of control.

Below is the overall rollup of

mm-support-madvisemadv_free.patch
mm-support-madvisemadv_free-fix.patch
mm-support-madvisemadv_free-fix-2.patch
mm-dont-split-thp-page-when-syscall-is-called.patch
mm-dont-split-thp-page-when-syscall-is-called-fix.patch
mm-dont-split-thp-page-when-syscall-is-called-fix-2.patch
mm-free-swp_entry-in-madvise_free.patch
mm-move-lazy-free-pages-to-inactive-list.patch
mm-move-lazy-free-pages-to-inactive-list-fix.patch
mm-move-lazy-free-pages-to-inactive-list-fix-fix.patch
mm-move-lazy-free-pages-to-inactive-list-fix-fix-fix.patch
mm-make-every-pte-dirty-on-do_swap_page.patch


It's pretty large and has its sticky little paws in all sorts of places.


The feature would need to be pretty darn useful to justify a mainline
merge.  Has any such usefulness been demonstrated?




 arch/alpha/include/uapi/asm/mman.h |1 
 arch/mips/include/uapi/asm/mman.h  |1 
 arch/parisc/include/uapi/asm/mman.h|1 
 arch/xtensa/include/uapi/asm/mman.h|1 
 include/linux/huge_mm.h|4 
 include/linux/rmap.h   |9 -
 include/linux/swap.h   |1 
 include/linux/vm_event_item.h  |1 
 include/uapi/asm-generic/mman-common.h |1 
 mm/huge_memory.c   |   35 
 mm/madvise.c   |  175 +++
 mm/memory.c|   10 +
 mm/rmap.c  |   46 +-
 mm/swap.c  |   44 +
 mm/vmscan.c|   63 ++--
 mm/vmstat.c|1 
 16 files changed, 372 insertions(+), 22 deletions(-)

diff -puN include/linux/rmap.h~mm-support-madvisemadv_free include/linux/rmap.h
--- a/include/linux/rmap.h~mm-support-madvisemadv_free
+++ a/include/linux/rmap.h
@@ -85,6 +85,7 @@ enum ttu_flags {
TTU_UNMAP = 1,  /* unmap mode */
TTU_MIGRATION = 2,  /* migration mode */
TTU_MUNLOCK = 4,/* munlock mode */
+   TTU_FREE = 8,   /* free mode */
 
TTU_IGNORE_MLOCK = (1 << 8),/* ignore mlock */
TTU_IGNORE_ACCESS = (1 << 9),   /* don't age */
@@ -183,7 +184,8 @@ static inline void page_dup_rmap(struct
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-   struct mem_cgroup *memcg, unsigned long *vm_flags);
+   struct mem_cgroup *memcg, unsigned long *vm_flags,
+   int *is_pte_dirty);
 
 #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
 
@@ -260,9 +262,12 @@ int rmap_walk(struct page *page, struct
 
 static inline int page_referenced(struct page *page, int is_locked,
  struct mem_cgroup *memcg,
- unsigned long *vm_flags)
+ unsigned long *vm_flags,
+ int *is_pte_dirty)
 {
*vm_flags = 0;
+   if (is_pte_dirty)
+   *is_pte_dirty = 0;
return 0;
 }
 
diff -puN include/linux/vm_event_item.h~mm-support-madvisemadv_free 
include/linux/vm_event_item.h
--- a/include/linux/vm_event_item.h~mm-support-madvisemadv_free
+++ a/include/linux/vm_event_item.h
@@ -25,6 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
FOR_ALL_ZONES(PGALLOC),
PGFREE, PGACTIVATE, PGDEACTIVATE,
PGFAULT, PGMAJFAULT,
+   PGLAZYFREED,
FOR_ALL_ZONES(PGREFILL),
FOR_ALL_ZONES(PGSTEAL_KSWAPD),
FOR_ALL_ZONES(PGSTEAL_DIRECT),
diff -puN include/uapi/asm-generic/mman-common.h~mm-support-madvisemadv_free 
include/uapi/asm-generic/mman-common.h
--- a/include/uapi/asm-generic/mman-common.h~mm-support-madvisemadv_free
+++ a/include/uapi/asm-generic/mman-common.h
@@ -34,6 +34,7 @@
 #define MADV_SEQUENTIAL2   /* expect sequential page 
references */
 #define MADV_WILLNEED  3   /* will need these pages */
 #define MADV_DONTNEED  4   /* don't need these pages */
+#define MADV_FREE  5   /* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
 #define MADV_REMOVE9   /* remove these pages & resources */
diff -puN mm/madvise.c~mm-support-madvisemadv_free mm/madvise.c
--- a/mm/madvise.c~mm-support-madvisemadv_free
+++ a/mm/madvise.c
@@ -19,6 +19,14 @@
 #include 
 #include 
 #include 
+#include 
+
+#include 
+
+struct madvise_free_private {
+   struct vm_area_struct *vma;
+   struct mmu_gather *tlb;
+};
 
 /*
  * Any behaviour which results in changes to the vma->vm_flags needs to
@@ -31,6 +39,7 @@ static int madvise_need_mmap_write(int b
case MADV_REMOVE:
case 

Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-09 Thread Andrew Morton
On Thu, 9 Apr 2015 08:50:25 +0900 Minchan Kim minc...@kernel.org wrote:

 Bump.

I'm getting the feeling that MADV_FREE is out of control.

Below is the overall rollup of

mm-support-madvisemadv_free.patch
mm-support-madvisemadv_free-fix.patch
mm-support-madvisemadv_free-fix-2.patch
mm-dont-split-thp-page-when-syscall-is-called.patch
mm-dont-split-thp-page-when-syscall-is-called-fix.patch
mm-dont-split-thp-page-when-syscall-is-called-fix-2.patch
mm-free-swp_entry-in-madvise_free.patch
mm-move-lazy-free-pages-to-inactive-list.patch
mm-move-lazy-free-pages-to-inactive-list-fix.patch
mm-move-lazy-free-pages-to-inactive-list-fix-fix.patch
mm-move-lazy-free-pages-to-inactive-list-fix-fix-fix.patch
mm-make-every-pte-dirty-on-do_swap_page.patch


It's pretty large and has its sticky little paws in all sorts of places.


The feature would need to be pretty darn useful to justify a mainline
merge.  Has any such usefulness been demonstrated?




 arch/alpha/include/uapi/asm/mman.h |1 
 arch/mips/include/uapi/asm/mman.h  |1 
 arch/parisc/include/uapi/asm/mman.h|1 
 arch/xtensa/include/uapi/asm/mman.h|1 
 include/linux/huge_mm.h|4 
 include/linux/rmap.h   |9 -
 include/linux/swap.h   |1 
 include/linux/vm_event_item.h  |1 
 include/uapi/asm-generic/mman-common.h |1 
 mm/huge_memory.c   |   35 
 mm/madvise.c   |  175 +++
 mm/memory.c|   10 +
 mm/rmap.c  |   46 +-
 mm/swap.c  |   44 +
 mm/vmscan.c|   63 ++--
 mm/vmstat.c|1 
 16 files changed, 372 insertions(+), 22 deletions(-)

diff -puN include/linux/rmap.h~mm-support-madvisemadv_free include/linux/rmap.h
--- a/include/linux/rmap.h~mm-support-madvisemadv_free
+++ a/include/linux/rmap.h
@@ -85,6 +85,7 @@ enum ttu_flags {
TTU_UNMAP = 1,  /* unmap mode */
TTU_MIGRATION = 2,  /* migration mode */
TTU_MUNLOCK = 4,/* munlock mode */
+   TTU_FREE = 8,   /* free mode */
 
TTU_IGNORE_MLOCK = (1  8),/* ignore mlock */
TTU_IGNORE_ACCESS = (1  9),   /* don't age */
@@ -183,7 +184,8 @@ static inline void page_dup_rmap(struct
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-   struct mem_cgroup *memcg, unsigned long *vm_flags);
+   struct mem_cgroup *memcg, unsigned long *vm_flags,
+   int *is_pte_dirty);
 
 #define TTU_ACTION(x) ((x)  TTU_ACTION_MASK)
 
@@ -260,9 +262,12 @@ int rmap_walk(struct page *page, struct
 
 static inline int page_referenced(struct page *page, int is_locked,
  struct mem_cgroup *memcg,
- unsigned long *vm_flags)
+ unsigned long *vm_flags,
+ int *is_pte_dirty)
 {
*vm_flags = 0;
+   if (is_pte_dirty)
+   *is_pte_dirty = 0;
return 0;
 }
 
diff -puN include/linux/vm_event_item.h~mm-support-madvisemadv_free 
include/linux/vm_event_item.h
--- a/include/linux/vm_event_item.h~mm-support-madvisemadv_free
+++ a/include/linux/vm_event_item.h
@@ -25,6 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
FOR_ALL_ZONES(PGALLOC),
PGFREE, PGACTIVATE, PGDEACTIVATE,
PGFAULT, PGMAJFAULT,
+   PGLAZYFREED,
FOR_ALL_ZONES(PGREFILL),
FOR_ALL_ZONES(PGSTEAL_KSWAPD),
FOR_ALL_ZONES(PGSTEAL_DIRECT),
diff -puN include/uapi/asm-generic/mman-common.h~mm-support-madvisemadv_free 
include/uapi/asm-generic/mman-common.h
--- a/include/uapi/asm-generic/mman-common.h~mm-support-madvisemadv_free
+++ a/include/uapi/asm-generic/mman-common.h
@@ -34,6 +34,7 @@
 #define MADV_SEQUENTIAL2   /* expect sequential page 
references */
 #define MADV_WILLNEED  3   /* will need these pages */
 #define MADV_DONTNEED  4   /* don't need these pages */
+#define MADV_FREE  5   /* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
 #define MADV_REMOVE9   /* remove these pages  resources */
diff -puN mm/madvise.c~mm-support-madvisemadv_free mm/madvise.c
--- a/mm/madvise.c~mm-support-madvisemadv_free
+++ a/mm/madvise.c
@@ -19,6 +19,14 @@
 #include linux/blkdev.h
 #include linux/swap.h
 #include linux/swapops.h
+#include linux/mmu_notifier.h
+
+#include asm/tlb.h
+
+struct madvise_free_private {
+   struct vm_area_struct *vma;
+   struct mmu_gather *tlb;
+};
 
 /*
  * Any behaviour which results in changes to the vma-vm_flags needs to
@@ -31,6 +39,7 @@ static int 

Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-08 Thread Minchan Kim
Bump.

On Wed, Mar 11, 2015 at 10:20:38AM +0900, Minchan Kim wrote:
> Bascially, MADV_FREE relys on the pte dirty to decide whether
> it allows VM to discard the page. However, if there is swap-in,
> pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> PageDirty and PageSwapCache for those pages to not discard it
> because swapped-in page could live on swap cache or PageDirty
> when it is removed from swapcache.
> 
> The problem in here is that anonymous pages can have PageDirty if
> it is removed from swapcache so that VM cannot parse those pages
> as freeable even if we did madvise_free. Look at below example.
> 
> ptr = malloc();
> memset(ptr);
> ..
> heavy memory pressure -> swap-out all of pages
> ..
> out of memory pressure so there are lots of free pages
> ..
> var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
>but SetPageDirty
> 
> madvise_free(ptr);
> ..
> ..
> heavy memory pressure -> VM cannot discard the page by PageDirty.
> 
> PageDirty for anonymous page aims for avoiding duplicating
> swapping out. In other words, if a page have swapped-in but
> live swapcache(ie, !PageDirty), we could save swapout if the page
> is selected as victim by VM in future because swap device have
> kept previous swapped-out contents of the page.
> 
> So, rather than relying on the PG_dirty for working madvise_free,
> pte_dirty is more straightforward. Inherently, swapped-out page was
> pte_dirty so this patch restores the dirtiness when swap-in fault
> happens so madvise_free doesn't rely on the PageDirty any more.
> 
> Cc: Hugh Dickins 
> Cc: Cyrill Gorcunov 
> Cc: Pavel Emelyanov 
> Reported-by: Yalin Wang 
> Signed-off-by: Minchan Kim 
> ---
>  mm/madvise.c | 1 -
>  mm/memory.c  | 9 +++--
>  mm/rmap.c| 2 +-
>  mm/vmscan.c  | 3 +--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 22e8f0c..a045798 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
> long addr,
>   continue;
>   }
>  
> - ClearPageDirty(page);
>   unlock_page(page);
>   }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 0f96a4a..40428a5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct 
> vm_area_struct *vma,
>  
>   inc_mm_counter_fast(mm, MM_ANONPAGES);
>   dec_mm_counter_fast(mm, MM_SWAPENTS);
> - pte = mk_pte(page, vma->vm_page_prot);
> +
> + /*
> +  * Every page swapped-out was pte_dirty so we make pte dirty again.
> +  * MADV_FREE relies on it.
> +  */
> + pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
>   if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> + pte = maybe_mkwrite(pte, vma);
>   flags &= ~FAULT_FLAG_WRITE;
>   ret |= VM_FAULT_WRITE;
>   exclusive = 1;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 47b3ba8..34c1d66 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct 
> vm_area_struct *vma,
>  
>   if (flags & TTU_FREE) {
>   VM_BUG_ON_PAGE(PageSwapCache(page), page);
> - if (!dirty && !PageDirty(page)) {
> + if (!dirty) {
>   /* It's a freeable page by MADV_FREE */
>   dec_mm_counter(mm, MM_ANONPAGES);
>   goto discard;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413..3357ffa 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -805,8 +805,7 @@ static enum page_references page_check_references(struct 
> page *page,
>   return PAGEREF_KEEP;
>   }
>  
> - if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
> - !PageDirty(page))
> + if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
>   *freeable = true;
>  
>   /* Reclaim if clean, defer dirty pages to writeback */
> -- 
> 1.9.3
> 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-04-08 Thread Minchan Kim
Bump.

On Wed, Mar 11, 2015 at 10:20:38AM +0900, Minchan Kim wrote:
 Bascially, MADV_FREE relys on the pte dirty to decide whether
 it allows VM to discard the page. However, if there is swap-in,
 pte pointed out the page has no pte_dirty. So, MADV_FREE checks
 PageDirty and PageSwapCache for those pages to not discard it
 because swapped-in page could live on swap cache or PageDirty
 when it is removed from swapcache.
 
 The problem in here is that anonymous pages can have PageDirty if
 it is removed from swapcache so that VM cannot parse those pages
 as freeable even if we did madvise_free. Look at below example.
 
 ptr = malloc();
 memset(ptr);
 ..
 heavy memory pressure - swap-out all of pages
 ..
 out of memory pressure so there are lots of free pages
 ..
 var = *ptr; - swap-in page/remove the page from swapcache. so pte_clean
but SetPageDirty
 
 madvise_free(ptr);
 ..
 ..
 heavy memory pressure - VM cannot discard the page by PageDirty.
 
 PageDirty for anonymous page aims for avoiding duplicating
 swapping out. In other words, if a page have swapped-in but
 live swapcache(ie, !PageDirty), we could save swapout if the page
 is selected as victim by VM in future because swap device have
 kept previous swapped-out contents of the page.
 
 So, rather than relying on the PG_dirty for working madvise_free,
 pte_dirty is more straightforward. Inherently, swapped-out page was
 pte_dirty so this patch restores the dirtiness when swap-in fault
 happens so madvise_free doesn't rely on the PageDirty any more.
 
 Cc: Hugh Dickins hu...@google.com
 Cc: Cyrill Gorcunov gorcu...@gmail.com
 Cc: Pavel Emelyanov xe...@parallels.com
 Reported-by: Yalin Wang yalin.w...@sonymobile.com
 Signed-off-by: Minchan Kim minc...@kernel.org
 ---
  mm/madvise.c | 1 -
  mm/memory.c  | 9 +++--
  mm/rmap.c| 2 +-
  mm/vmscan.c  | 3 +--
  4 files changed, 9 insertions(+), 6 deletions(-)
 
 diff --git a/mm/madvise.c b/mm/madvise.c
 index 22e8f0c..a045798 100644
 --- a/mm/madvise.c
 +++ b/mm/madvise.c
 @@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
 long addr,
   continue;
   }
  
 - ClearPageDirty(page);
   unlock_page(page);
   }
  
 diff --git a/mm/memory.c b/mm/memory.c
 index 0f96a4a..40428a5 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct 
 vm_area_struct *vma,
  
   inc_mm_counter_fast(mm, MM_ANONPAGES);
   dec_mm_counter_fast(mm, MM_SWAPENTS);
 - pte = mk_pte(page, vma-vm_page_prot);
 +
 + /*
 +  * Every page swapped-out was pte_dirty so we make pte dirty again.
 +  * MADV_FREE relies on it.
 +  */
 + pte = pte_mkdirty(mk_pte(page, vma-vm_page_prot));
   if ((flags  FAULT_FLAG_WRITE)  reuse_swap_page(page)) {
 - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 + pte = maybe_mkwrite(pte, vma);
   flags = ~FAULT_FLAG_WRITE;
   ret |= VM_FAULT_WRITE;
   exclusive = 1;
 diff --git a/mm/rmap.c b/mm/rmap.c
 index 47b3ba8..34c1d66 100644
 --- a/mm/rmap.c
 +++ b/mm/rmap.c
 @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct 
 vm_area_struct *vma,
  
   if (flags  TTU_FREE) {
   VM_BUG_ON_PAGE(PageSwapCache(page), page);
 - if (!dirty  !PageDirty(page)) {
 + if (!dirty) {
   /* It's a freeable page by MADV_FREE */
   dec_mm_counter(mm, MM_ANONPAGES);
   goto discard;
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 260c413..3357ffa 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -805,8 +805,7 @@ static enum page_references page_check_references(struct 
 page *page,
   return PAGEREF_KEEP;
   }
  
 - if (PageAnon(page)  !pte_dirty  !PageSwapCache(page) 
 - !PageDirty(page))
 + if (PageAnon(page)  !pte_dirty  !PageSwapCache(page))
   *freeable = true;
  
   /* Reclaim if clean, defer dirty pages to writeback */
 -- 
 1.9.3
 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Minchan Kim
On Tue, Mar 31, 2015 at 12:14:46AM +0300, Cyrill Gorcunov wrote:
> On Mon, Mar 30, 2015 at 05:59:15PM +0900, Minchan Kim wrote:
> > Hi Cyrill,
> > 
> > On Mon, Mar 30, 2015 at 11:51:12AM +0300, Cyrill Gorcunov wrote:
> > > On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
> > > > 2nd description trial.
> > > ...
> > > Hi Minchan, could you please point for which repo this patch,
> > > linux-next?
> > 
> > It was based on v4.0-rc5-mmotm-2015-03-24-17-02.
> > As well, I confirmed it was applied on local-next-20150327.
> > 
> > Thanks.
> 
> Hi Minchan! I managed to fetch mmotm and the change looks
> reasonable to me. Still better to wait for review from Mel
> or Hugh, maybe I miss something obvious.

Thanks for the review!

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Cyrill Gorcunov
On Mon, Mar 30, 2015 at 05:59:15PM +0900, Minchan Kim wrote:
> Hi Cyrill,
> 
> On Mon, Mar 30, 2015 at 11:51:12AM +0300, Cyrill Gorcunov wrote:
> > On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
> > > 2nd description trial.
> > ...
> > Hi Minchan, could you please point for which repo this patch,
> > linux-next?
> 
> It was based on v4.0-rc5-mmotm-2015-03-24-17-02.
> As well, I confirmed it was applied on local-next-20150327.
> 
> Thanks.

Hi Minchan! I managed to fetch mmotm and the change looks
reasonable to me. Still better to wait for review from Mel
or Hugh, maybe I miss something obvious.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Minchan Kim
Hi Cyrill,

On Mon, Mar 30, 2015 at 11:51:12AM +0300, Cyrill Gorcunov wrote:
> On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
> > 2nd description trial.
> ...
> Hi Minchan, could you please point for which repo this patch,
> linux-next?

It was based on v4.0-rc5-mmotm-2015-03-24-17-02.
As well, I confirmed it was applied on local-next-20150327.

Thanks.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Cyrill Gorcunov
On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
> 2nd description trial.
...
Hi Minchan, could you please point for which repo this patch,
linux-next?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Minchan Kim
Hi Cyrill,

On Mon, Mar 30, 2015 at 11:51:12AM +0300, Cyrill Gorcunov wrote:
 On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
  2nd description trial.
 ...
 Hi Minchan, could you please point for which repo this patch,
 linux-next?

It was based on v4.0-rc5-mmotm-2015-03-24-17-02.
As well, I confirmed it was applied on local-next-20150327.

Thanks.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Cyrill Gorcunov
On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
 2nd description trial.
...
Hi Minchan, could you please point for which repo this patch,
linux-next?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Cyrill Gorcunov
On Mon, Mar 30, 2015 at 05:59:15PM +0900, Minchan Kim wrote:
 Hi Cyrill,
 
 On Mon, Mar 30, 2015 at 11:51:12AM +0300, Cyrill Gorcunov wrote:
  On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
   2nd description trial.
  ...
  Hi Minchan, could you please point for which repo this patch,
  linux-next?
 
 It was based on v4.0-rc5-mmotm-2015-03-24-17-02.
 As well, I confirmed it was applied on local-next-20150327.
 
 Thanks.

Hi Minchan! I managed to fetch mmotm and the change looks
reasonable to me. Still better to wait for review from Mel
or Hugh, maybe I miss something obvious.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-30 Thread Minchan Kim
On Tue, Mar 31, 2015 at 12:14:46AM +0300, Cyrill Gorcunov wrote:
 On Mon, Mar 30, 2015 at 05:59:15PM +0900, Minchan Kim wrote:
  Hi Cyrill,
  
  On Mon, Mar 30, 2015 at 11:51:12AM +0300, Cyrill Gorcunov wrote:
   On Mon, Mar 30, 2015 at 02:22:50PM +0900, Minchan Kim wrote:
2nd description trial.
   ...
   Hi Minchan, could you please point for which repo this patch,
   linux-next?
  
  It was based on v4.0-rc5-mmotm-2015-03-24-17-02.
  As well, I confirmed it was applied on local-next-20150327.
  
  Thanks.
 
 Hi Minchan! I managed to fetch mmotm and the change looks
 reasonable to me. Still better to wait for review from Mel
 or Hugh, maybe I miss something obvious.

Thanks for the review!

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-29 Thread Minchan Kim
2nd description trial.

>From ccfc6c79634f6cec69d8fb23b0e863ebfa5b893c Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Mon, 30 Mar 2015 13:43:08 +0900
Subject: [PATCH v2] mm: make every pte dirty on do_swap_page

Bascially, MADV_FREE relys on the dirty bit in page table entry
to decide whether VM allows to discard the page or not.
IOW, if page table entry includes marked dirty bit, VM shouldn't
discard the page.

However, if swap-in by read fault happens, page table entry
point out the page doesn't have marked dirty bit so MADV_FREE
might discard the page wrongly. For avoiding the problem,
MADV_FREE did more checks with PageDirty and PageSwapCache.
It worked out because swapped-in page lives on swap cache
and since it was evicted from the swap cache, the page has
PG_dirty flag. So both page flags checks effectvely prevent
wrong discarding by MADV_FREE.

A problem in above logic is that swapped-in page has PG_dirty
since they are removed from swap cache so VM cannot consider
those pages as freeable any more alghouth madvise_free is
called in future. Look at below example for detail.

ptr = malloc();
memset(ptr);
..
..
.. heavy memory pressure so all of pages are swapped out
..
..
var = *ptr; -> a page swapped-in and removed from swapcache.
   page table doesn't mark dirty bit and page
   descriptor includes PG_dirty
..
..
madvise_free(ptr);
..
..
..
.. heavy memory pressure again.
.. In this time, VM cannot discard the page because the page
.. has *PG_dirty*

Rather than relying on the PG_dirty of page descriptor
for preventing discarding a page, dirty bit in page table is more
straightforward and simple. So, this patch makes page table dirty
bit marked whenever swap-in happens. Inherenty, page table entry
point out swapped-out page had dirty bit so I think it's no prblem.

With this, it removes complicated logic and makes freeable page
checking by madvise_free simple. Of course, we could solve
above mentioned example.

Cc: Hugh Dickins 
Cc: Cyrill Gorcunov 
Cc: Pavel Emelyanov 
Reported-by: Yalin Wang 
Signed-off-by: Minchan Kim 
---

* From v1:
  * Rewrite description - Andrew

 mm/madvise.c |  1 -
 mm/memory.c  | 10 --
 mm/rmap.c|  2 +-
 mm/vmscan.c  |  3 +--
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 22e8f0c..a045798 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
continue;
}
 
-   ClearPageDirty(page);
unlock_page(page);
}
 
diff --git a/mm/memory.c b/mm/memory.c
index 6743966..48ff537 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2521,9 +2521,15 @@ static int do_swap_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
inc_mm_counter_fast(mm, MM_ANONPAGES);
dec_mm_counter_fast(mm, MM_SWAPENTS);
-   pte = mk_pte(page, vma->vm_page_prot);
+
+   /*
+* The page is swapping in now was dirty before it was swapped out
+* so restore the state again(ie, pte_mkdirty) because MADV_FREE
+* relies on the dirty bit on page table.
+*/
+   pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
-   pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+   pte = maybe_mkwrite(pte, vma);
flags &= ~FAULT_FLAG_WRITE;
ret |= VM_FAULT_WRITE;
exclusive = 1;
diff --git a/mm/rmap.c b/mm/rmap.c
index dad23a4..281e806 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1275,7 +1275,7 @@ static int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
 
if (flags & TTU_FREE) {
VM_BUG_ON_PAGE(PageSwapCache(page), page);
-   if (!dirty && !PageDirty(page)) {
+   if (!dirty) {
/* It's a freeable page by MADV_FREE */
dec_mm_counter(mm, MM_ANONPAGES);
goto discard;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dc6cd51..fffebf0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -805,8 +805,7 @@ static enum page_references page_check_references(struct 
page *page,
return PAGEREF_KEEP;
}
 
-   if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
-   !PageDirty(page))
+   if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
*freeable = true;
 
/* Reclaim if clean, defer dirty pages to writeback */
-- 
1.9.3

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-29 Thread Minchan Kim
2nd description trial.

From ccfc6c79634f6cec69d8fb23b0e863ebfa5b893c Mon Sep 17 00:00:00 2001
From: Minchan Kim minc...@kernel.org
Date: Mon, 30 Mar 2015 13:43:08 +0900
Subject: [PATCH v2] mm: make every pte dirty on do_swap_page

Bascially, MADV_FREE relys on the dirty bit in page table entry
to decide whether VM allows to discard the page or not.
IOW, if page table entry includes marked dirty bit, VM shouldn't
discard the page.

However, if swap-in by read fault happens, page table entry
point out the page doesn't have marked dirty bit so MADV_FREE
might discard the page wrongly. For avoiding the problem,
MADV_FREE did more checks with PageDirty and PageSwapCache.
It worked out because swapped-in page lives on swap cache
and since it was evicted from the swap cache, the page has
PG_dirty flag. So both page flags checks effectvely prevent
wrong discarding by MADV_FREE.

A problem in above logic is that swapped-in page has PG_dirty
since they are removed from swap cache so VM cannot consider
those pages as freeable any more alghouth madvise_free is
called in future. Look at below example for detail.

ptr = malloc();
memset(ptr);
..
..
.. heavy memory pressure so all of pages are swapped out
..
..
var = *ptr; - a page swapped-in and removed from swapcache.
   page table doesn't mark dirty bit and page
   descriptor includes PG_dirty
..
..
madvise_free(ptr);
..
..
..
.. heavy memory pressure again.
.. In this time, VM cannot discard the page because the page
.. has *PG_dirty*

Rather than relying on the PG_dirty of page descriptor
for preventing discarding a page, dirty bit in page table is more
straightforward and simple. So, this patch makes page table dirty
bit marked whenever swap-in happens. Inherenty, page table entry
point out swapped-out page had dirty bit so I think it's no prblem.

With this, it removes complicated logic and makes freeable page
checking by madvise_free simple. Of course, we could solve
above mentioned example.

Cc: Hugh Dickins hu...@google.com
Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Pavel Emelyanov xe...@parallels.com
Reported-by: Yalin Wang yalin.w...@sonymobile.com
Signed-off-by: Minchan Kim minc...@kernel.org
---

* From v1:
  * Rewrite description - Andrew

 mm/madvise.c |  1 -
 mm/memory.c  | 10 --
 mm/rmap.c|  2 +-
 mm/vmscan.c  |  3 +--
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 22e8f0c..a045798 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
continue;
}
 
-   ClearPageDirty(page);
unlock_page(page);
}
 
diff --git a/mm/memory.c b/mm/memory.c
index 6743966..48ff537 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2521,9 +2521,15 @@ static int do_swap_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
inc_mm_counter_fast(mm, MM_ANONPAGES);
dec_mm_counter_fast(mm, MM_SWAPENTS);
-   pte = mk_pte(page, vma-vm_page_prot);
+
+   /*
+* The page is swapping in now was dirty before it was swapped out
+* so restore the state again(ie, pte_mkdirty) because MADV_FREE
+* relies on the dirty bit on page table.
+*/
+   pte = pte_mkdirty(mk_pte(page, vma-vm_page_prot));
if ((flags  FAULT_FLAG_WRITE)  reuse_swap_page(page)) {
-   pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+   pte = maybe_mkwrite(pte, vma);
flags = ~FAULT_FLAG_WRITE;
ret |= VM_FAULT_WRITE;
exclusive = 1;
diff --git a/mm/rmap.c b/mm/rmap.c
index dad23a4..281e806 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1275,7 +1275,7 @@ static int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
 
if (flags  TTU_FREE) {
VM_BUG_ON_PAGE(PageSwapCache(page), page);
-   if (!dirty  !PageDirty(page)) {
+   if (!dirty) {
/* It's a freeable page by MADV_FREE */
dec_mm_counter(mm, MM_ANONPAGES);
goto discard;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dc6cd51..fffebf0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -805,8 +805,7 @@ static enum page_references page_check_references(struct 
page *page,
return PAGEREF_KEEP;
}
 
-   if (PageAnon(page)  !pte_dirty  !PageSwapCache(page) 
-   !PageDirty(page))
+   if (PageAnon(page)  !pte_dirty  !PageSwapCache(page))
*freeable = true;
 
/* Reclaim if clean, defer dirty pages to writeback */
-- 
1.9.3

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-10 Thread Minchan Kim
Bascially, MADV_FREE relys on the pte dirty to decide whether
it allows VM to discard the page. However, if there is swap-in,
pte pointed out the page has no pte_dirty. So, MADV_FREE checks
PageDirty and PageSwapCache for those pages to not discard it
because swapped-in page could live on swap cache or PageDirty
when it is removed from swapcache.

The problem in here is that anonymous pages can have PageDirty if
it is removed from swapcache so that VM cannot parse those pages
as freeable even if we did madvise_free. Look at below example.

ptr = malloc();
memset(ptr);
..
heavy memory pressure -> swap-out all of pages
..
out of memory pressure so there are lots of free pages
..
var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
   but SetPageDirty

madvise_free(ptr);
..
..
heavy memory pressure -> VM cannot discard the page by PageDirty.

PageDirty for anonymous page aims for avoiding duplicating
swapping out. In other words, if a page have swapped-in but
live swapcache(ie, !PageDirty), we could save swapout if the page
is selected as victim by VM in future because swap device have
kept previous swapped-out contents of the page.

So, rather than relying on the PG_dirty for working madvise_free,
pte_dirty is more straightforward. Inherently, swapped-out page was
pte_dirty so this patch restores the dirtiness when swap-in fault
happens so madvise_free doesn't rely on the PageDirty any more.

Cc: Hugh Dickins 
Cc: Cyrill Gorcunov 
Cc: Pavel Emelyanov 
Reported-by: Yalin Wang 
Signed-off-by: Minchan Kim 
---
 mm/madvise.c | 1 -
 mm/memory.c  | 9 +++--
 mm/rmap.c| 2 +-
 mm/vmscan.c  | 3 +--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 22e8f0c..a045798 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
continue;
}
 
-   ClearPageDirty(page);
unlock_page(page);
}
 
diff --git a/mm/memory.c b/mm/memory.c
index 0f96a4a..40428a5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
inc_mm_counter_fast(mm, MM_ANONPAGES);
dec_mm_counter_fast(mm, MM_SWAPENTS);
-   pte = mk_pte(page, vma->vm_page_prot);
+
+   /*
+* Every page swapped-out was pte_dirty so we make pte dirty again.
+* MADV_FREE relies on it.
+*/
+   pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
-   pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+   pte = maybe_mkwrite(pte, vma);
flags &= ~FAULT_FLAG_WRITE;
ret |= VM_FAULT_WRITE;
exclusive = 1;
diff --git a/mm/rmap.c b/mm/rmap.c
index 47b3ba8..34c1d66 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
 
if (flags & TTU_FREE) {
VM_BUG_ON_PAGE(PageSwapCache(page), page);
-   if (!dirty && !PageDirty(page)) {
+   if (!dirty) {
/* It's a freeable page by MADV_FREE */
dec_mm_counter(mm, MM_ANONPAGES);
goto discard;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 260c413..3357ffa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -805,8 +805,7 @@ static enum page_references page_check_references(struct 
page *page,
return PAGEREF_KEEP;
}
 
-   if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
-   !PageDirty(page))
+   if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
*freeable = true;
 
/* Reclaim if clean, defer dirty pages to writeback */
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] mm: make every pte dirty on do_swap_page

2015-03-10 Thread Minchan Kim
Bascially, MADV_FREE relys on the pte dirty to decide whether
it allows VM to discard the page. However, if there is swap-in,
pte pointed out the page has no pte_dirty. So, MADV_FREE checks
PageDirty and PageSwapCache for those pages to not discard it
because swapped-in page could live on swap cache or PageDirty
when it is removed from swapcache.

The problem in here is that anonymous pages can have PageDirty if
it is removed from swapcache so that VM cannot parse those pages
as freeable even if we did madvise_free. Look at below example.

ptr = malloc();
memset(ptr);
..
heavy memory pressure - swap-out all of pages
..
out of memory pressure so there are lots of free pages
..
var = *ptr; - swap-in page/remove the page from swapcache. so pte_clean
   but SetPageDirty

madvise_free(ptr);
..
..
heavy memory pressure - VM cannot discard the page by PageDirty.

PageDirty for anonymous page aims for avoiding duplicating
swapping out. In other words, if a page have swapped-in but
live swapcache(ie, !PageDirty), we could save swapout if the page
is selected as victim by VM in future because swap device have
kept previous swapped-out contents of the page.

So, rather than relying on the PG_dirty for working madvise_free,
pte_dirty is more straightforward. Inherently, swapped-out page was
pte_dirty so this patch restores the dirtiness when swap-in fault
happens so madvise_free doesn't rely on the PageDirty any more.

Cc: Hugh Dickins hu...@google.com
Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Pavel Emelyanov xe...@parallels.com
Reported-by: Yalin Wang yalin.w...@sonymobile.com
Signed-off-by: Minchan Kim minc...@kernel.org
---
 mm/madvise.c | 1 -
 mm/memory.c  | 9 +++--
 mm/rmap.c| 2 +-
 mm/vmscan.c  | 3 +--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 22e8f0c..a045798 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
continue;
}
 
-   ClearPageDirty(page);
unlock_page(page);
}
 
diff --git a/mm/memory.c b/mm/memory.c
index 0f96a4a..40428a5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
inc_mm_counter_fast(mm, MM_ANONPAGES);
dec_mm_counter_fast(mm, MM_SWAPENTS);
-   pte = mk_pte(page, vma-vm_page_prot);
+
+   /*
+* Every page swapped-out was pte_dirty so we make pte dirty again.
+* MADV_FREE relies on it.
+*/
+   pte = pte_mkdirty(mk_pte(page, vma-vm_page_prot));
if ((flags  FAULT_FLAG_WRITE)  reuse_swap_page(page)) {
-   pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+   pte = maybe_mkwrite(pte, vma);
flags = ~FAULT_FLAG_WRITE;
ret |= VM_FAULT_WRITE;
exclusive = 1;
diff --git a/mm/rmap.c b/mm/rmap.c
index 47b3ba8..34c1d66 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
 
if (flags  TTU_FREE) {
VM_BUG_ON_PAGE(PageSwapCache(page), page);
-   if (!dirty  !PageDirty(page)) {
+   if (!dirty) {
/* It's a freeable page by MADV_FREE */
dec_mm_counter(mm, MM_ANONPAGES);
goto discard;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 260c413..3357ffa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -805,8 +805,7 @@ static enum page_references page_check_references(struct 
page *page,
return PAGEREF_KEEP;
}
 
-   if (PageAnon(page)  !pte_dirty  !PageSwapCache(page) 
-   !PageDirty(page))
+   if (PageAnon(page)  !pte_dirty  !PageSwapCache(page))
*freeable = true;
 
/* Reclaim if clean, defer dirty pages to writeback */
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/