On 03/01/2018 12:42 PM, David Rientjes wrote:
> It's possible for buddy pages to become stranded on pcps that, if drained,
> could be merged with other buddy pages on the zone's free area to form
> large order pages, including up to MAX_ORDER.
> 
> Consider a verbose example using the tools/vm/page-types tool at the
> beginning of a ZONE_NORMAL, where 'B' indicates a buddy page and 'S'
> indicates a slab page, which the migration scanner is attempting to
> defragment (and doing it well, absent coalescing up to cc.order):

How can the migration scanner defragment a slab page?

> 109954  1       
> _______S________________________________________________________
> 109955  2       
> __________B_____________________________________________________
> 109957  1       
> ________________________________________________________________
> 109958  1       
> __________B_____________________________________________________
> 109959  7       
> ________________________________________________________________
> 109960  1       
> __________B_____________________________________________________
> 109961  9       
> ________________________________________________________________
> 10996a  1       
> __________B_____________________________________________________
> 10996b  3       
> ________________________________________________________________
> 10996e  1       
> __________B_____________________________________________________
> 10996f  1       
> ________________________________________________________________
> 109970  1       
> __________B_____________________________________________________
> 109971  f       
> ________________________________________________________________
> ...
> 109f88  1       
> __________B_____________________________________________________
> 109f89  3       
> ________________________________________________________________
> 109f8c  1       
> __________B_____________________________________________________
> 109f8d  2       
> ________________________________________________________________
> 109f8f  2       
> __________B_____________________________________________________
> 109f91  f       
> ________________________________________________________________
> 109fa0  1       
> __________B_____________________________________________________
> 109fa1  7       
> ________________________________________________________________
> 109fa8  1       
> __________B_____________________________________________________
> 109fa9  1       
> ________________________________________________________________
> 109faa  1       
> __________B_____________________________________________________
> 109fab  1       
> _______S________________________________________________________
> 
> These buddy pages, spanning 1,621 pages, could be coalesced and allow for
> three transparent hugepages to be dynamically allocated.  Totaling all
> hugepage length spans that could be coalesced, this could yield over 400
> hugepages on the zone's free area when at the time this /proc/kpageflags

I don't understand the numbers here. With order-9 hugepages it's 512
pages per hugepage. If the buddy pages span 1621 pages, how can they
yield 400 hugepages?

> was collected, there was _no_ order-9 or order-10 pages available for
> allocation even after triggering compaction through procfs.
> 
> When kcompactd fails to defragment memory such that a cc.order page can
> be allocated, drain all pcps for the zone back to the buddy allocator so
> this stranding cannot occur.  Compaction for that order will subsequently
> be deferred, which acts as a ratelimit on this drain.

I don't mind the change given the ratelimit, but what difference was
observed in practice?

BTW I wonder if we could be smarter and quicker about the drains. Let a
pcp struct page be easily recognized as such, and store the cpu number
in there. Migration scanner could then maintain a cpumask, and recognize
if the only missing pages for coalescing a cc->order block are on the
pcplists, and then do a targeted drain.
But that only makes sense to implement if it can make a noticeable
difference to offset the additional overhead, of course.

> Signed-off-by: David Rientjes <[email protected]>
> ---
>  mm/compaction.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1987,6 +1987,14 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>               if (status == COMPACT_SUCCESS) {
>                       compaction_defer_reset(zone, cc.order, false);
>               } else if (status == COMPACT_PARTIAL_SKIPPED || status == 
> COMPACT_COMPLETE) {
> +                     /*
> +                      * Buddy pages may become stranded on pcps that could
> +                      * otherwise coalesce on the zone's free area for
> +                      * order >= cc.order.  This is ratelimited by the
> +                      * upcoming deferral.
> +                      */
> +                     drain_all_pages(zone);
> +
>                       /*
>                        * We use sync migration mode here, so we defer like
>                        * sync direct compaction does.
> 

Reply via email to