On Mon, 20 Aug 2012, Mel Gorman wrote:
> On Sun, Aug 19, 2012 at 11:49:31AM -0700, Sage Weil wrote:
> > I've bisected and identified this commit:
> > 
> >     netvm: propagate page->pfmemalloc to skb
> >     
> >     The skb->pfmemalloc flag gets set to true iff during the slab allocation
> >     of data in __alloc_skb that the the PFMEMALLOC reserves were used.  If 
> > the
> >     packet is fragmented, it is possible that pages will be allocated from 
> > the
> >     PFMEMALLOC reserve without propagating this information to the skb.  
> > This
> >     patch propagates page->pfmemalloc from pages allocated for fragments to
> >     the skb.
> >     
> >     Signed-off-by: Mel Gorman <mgor...@suse.de>
> >     Acked-by: David S. Miller <da...@davemloft.net>
> >     Cc: Neil Brown <ne...@suse.de>
> >     Cc: Peter Zijlstra <a.p.zijls...@chello.nl>
> >     Cc: Mike Christie <micha...@cs.wisc.edu>
> >     Cc: Eric B Munson <emun...@mgebm.net>
> >     Cc: Eric Dumazet <eric.duma...@gmail.com>
> >     Cc: Sebastian Andrzej Siewior <sebast...@breakpoint.cc>
> >     Cc: Mel Gorman <mgor...@suse.de>
> >     Cc: Christoph Lameter <c...@linux.com>
> >     Signed-off-by: Andrew Morton <a...@linux-foundation.org>
> >     Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
> > 
> 
> Ok, thanks.
> 
> > I've retested several times and confirmed that this change leads to the 
> > breakage, and also confirmed that reverting it on top of -rc1 also fixes 
> > the problem.
> > 
> > I've also added some additional instrumentation to my code and confirmed 
> > that the process is blocking on poll(2) while netstat is reporting 
> > data available on the socket.
> > 
> > What can I do to help track this down?
> > 
> 
> Can the following patch be tested please? It is reported to fix an fio
> regression that may be similar to what you are experiencing but has not
> been picked up yet.

This patch appears to resolve things for me as well, at least after a 
couple of passes.  I'll let you know if I see any further problems come up 
with more testing.

Thanks!
sage


> 
> ---8<---
> From: Alex Shi <alex....@intel.com>
> Subject: [PATCH] mm: correct page->pfmemalloc to fix deactivate_slab 
> regression
> 
> commit cfd19c5a9ec (mm: only set page->pfmemalloc when
> ALLOC_NO_WATERMARKS was used) try to narrow down page->pfmemalloc
> setting, but it missed some places the pfmemalloc should be set.
> 
> So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
> cause incorrect deactivate_slab() on our core2 server:
> 
>     64.73%           fio  [kernel.kallsyms]     [k] _raw_spin_lock
>                      |
>                      --- _raw_spin_lock
>                         |
>                         |---0.34%-- deactivate_slab
>                         |          __slab_alloc
>                         |          kmem_cache_alloc
>                         |          |
> 
> That causes our fio sync write performance has 40% regression.
> 
> This patch move the checking in get_page_from_freelist, that resolved
> this issue.
> 
> Signed-off-by: Alex Shi <alex....@intel.com>
> ---
>  mm/page_alloc.c |   21 +++++++++++----------
>  1 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 009ac28..07f1924 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1928,6 +1928,17 @@ this_zone_full:
>               zlc_active = 0;
>               goto zonelist_scan;
>       }
> +
> +     if (page)
> +             /*
> +              * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> +              * necessary to allocate the page. The expectation is
> +              * that the caller is taking steps that will free more
> +              * memory. The caller should avoid the page being used
> +              * for !PFMEMALLOC purposes.
> +              */
> +             page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> +
>       return page;
>  }
>  
> @@ -2389,14 +2400,6 @@ rebalance:
>                               zonelist, high_zoneidx, nodemask,
>                               preferred_zone, migratetype);
>               if (page) {
> -                     /*
> -                      * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> -                      * necessary to allocate the page. The expectation is
> -                      * that the caller is taking steps that will free more
> -                      * memory. The caller should avoid the page being used
> -                      * for !PFMEMALLOC purposes.
> -                      */
> -                     page->pfmemalloc = true;
>                       goto got_pg;
>               }
>       }
> @@ -2569,8 +2572,6 @@ retry_cpuset:
>               page = __alloc_pages_slowpath(gfp_mask, order,
>                               zonelist, high_zoneidx, nodemask,
>                               preferred_zone, migratetype);
> -     else
> -             page->pfmemalloc = false;
>  
>       trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>  
> -- 
> 1.7.5.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to