On Mon, Feb 08, 2016 at 02:38:08PM +0100, Vlastimil Babka wrote:
> Memory compaction can be currently performed in several contexts:
>
> - kswapd balancing a zone after a high-order allocation failure
> - direct compaction to satisfy a high-order allocation, including THP page
> fault attemps
> - khugepaged trying to collapse a hugepage
> - manually from /proc
>
> The purpose of compaction is two-fold. The obvious purpose is to satisfy a
> (pending or future) high-order allocation, and is easy to evaluate. The other
> purpose is to keep overal memory fragmentation low and help the
> anti-fragmentation mechanism. The success wrt the latter purpose is more
> difficult to evaluate though.
>
> The current situation wrt the purposes has a few drawbacks:
>
> - compaction is invoked only when a high-order page or hugepage is not
> available (or manually). This might be too late for the purposes of keeping
> memory fragmentation low.
> - direct compaction increases latency of allocations. Again, it would be
> better if compaction was performed asynchronously to keep fragmentation low,
> before the allocation itself comes.
> - (a special case of the previous) the cost of compaction during THP page
> faults can easily offset the benefits of THP.
> - kswapd compaction appears to be complex, fragile and not working in some
> scenarios. It could also end up compacting for a high-order allocation
> request when it should be reclaiming memory for a later order-0 request.
>
> To improve the situation, we should be able to benefit from an equivalent of
> kswapd, but for compaction - i.e. a background thread which responds to
> fragmentation and the need for high-order allocations (including hugepages)
> somewhat proactively.
>
> One possibility is to extend the responsibilities of kswapd, which could
> however complicate its design too much. It should be better to let kswapd
> handle reclaim, as order-0 allocations are often more critical than high-order
> ones.
>
> Another possibility is to extend khugepaged, but this kthread is a single
> instance and tied to THP configs.
>
> This patch goes with the option of a new set of per-node kthreads called
> kcompactd, and lays the foundations, without introducing any new tunables.
> The lifecycle mimics kswapd kthreads, including the memory hotplug hooks.
>
> For compaction, kcompactd uses the standard compaction_suitable() and
> ompact_finished() criteria and the deferred compaction functionality. Unlike
> direct compaction, it uses only sync compaction, as there's no allocation
> latency to minimize.
>
> This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
> compact/reclaim loop for high-order pages will be replaced by waking up
> kcompactd in the next patch with the description of what's wrong with the old
> approach.
>
> Waking up of the kcompactd threads is also tied to kswapd activity and follows
> these rules:
> - we don't want to affect any fastpaths, so wake up kcompactd only from the
> slowpath, as it's done for kswapd
> - if kswapd is doing reclaim, it's more important than compaction, so don't
> invoke kcompactd until kswapd goes to sleep
> - the target order used for kswapd is passed to kcompactd
>
> Future possible future uses for kcompactd include the ability to wake up
> kcompactd on demand in special situations, such as when hugepages are not
> available (currently not done due to __GFP_NO_KSWAPD) or when a fragmentation
> event (i.e. __rmqueue_fallback()) occurs. It's also possible to perform
> periodic compaction with kcompactd.
>
> Signed-off-by: Vlastimil Babka
> ---
> include/linux/compaction.h| 16 +++
> include/linux/mmzone.h| 6 ++
> include/linux/vm_event_item.h | 1 +
> include/trace/events/compaction.h | 55 ++
> mm/compaction.c | 220
> ++
> mm/memory_hotplug.c | 9 +-
> mm/page_alloc.c | 3 +
> mm/vmstat.c | 1 +
> 8 files changed, 309 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 4cd4ddf64cc7..1367c0564d42 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -52,6 +52,10 @@ extern void compaction_defer_reset(struct zone *zone, int
> order,
> bool alloc_success);
> extern bool compaction_restarting(struct zone *zone, int order);
>
> +extern int kcompactd_run(int nid);
> +extern void kcompactd_stop(int nid);
> +extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
> +
> #else
> static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
> unsigned int order, int alloc_flags,
> @@ -84,6 +88,18 @@ static inline bool compaction_deferred(struct zone *zone,
> int order)
> return true;
> }
>
> +static int kcompactd_run(int nid)
> +{
> + return 0;
> +}
> +static void