On Thu, 6 Dec 2018, Michal Hocko wrote: > MADV_HUGEPAGE changes the picture because the caller expressed a need > for THP and is willing to go extra mile to get it. That involves > allocation latency and as of now also a potential remote access. We do > not have complete agreement on the later but the prevailing argument is > that any strong NUMA locality is just reinventing node-reclaim story > again or makes THP success rate down the toilet (to quote Mel). I agree > that we do not want to fallback to a remote node overeagerly. I believe > that something like the below would be sensible > 1) THP on a local node with compaction not giving up too early > 2) THP on a remote node in NOWAIT mode - so no direct > compaction/reclaim (trigger kswapd/kcompactd only for > defrag=defer+madvise) > 3) fallback to the base page allocation >
I disagree that MADV_HUGEPAGE should take on any new semantic that overrides the preference of node local memory for a hugepage, which is the nearly four year behavior. The order of MADV_HUGEPAGE preferences listed above would cause current users to regress who rely on local small page fallback rather than remote hugepages because the access latency is much better. I think the preference of remote hugepages over local small pages needs to be expressed differently to prevent regression.