Hi, I have received a bug report for memory hotplug triggered hugetlb migration on a distribution kernel but the very same issue is still present in the current upstream code. The bug is described in patch 2 but in short the issue is that new_node_page doesn't really try to consume preallocated hugetlb pages in the pool on other than the next node which is really suboptimal. This results in very likely failures of memory hotremove even though there are many hugetlb pages in the pool. I think it is fair to call this a bug.
Patches 1 and 3 are cleanups and the last patch is still a RFC because I am not sure we really need/want to go that way. The thing is that the page allocator relies on zonelists to do the proper allocation fallback wrt. numa distances. We do not have anything like that for hugetlb allocations because they are not zone aware in general. Making them fully zonlist (or alternately nodelist) aware is quite a large project I guess. Instead I admittedly went the path of least resistance and instead provided a much simpler approach. More on that in patch 4. If this doesn't seem good enough I will drop it from the series but to me it looks like a reasonable compromise code wise. Thoughts, ideas, objections? Diffstat include/linux/hugetlb.h | 3 +++ include/linux/migrate.h | 17 +++++++++++++++++ include/linux/nodemask.h | 20 ++++++++++++++++++++ mm/hugetlb.c | 30 ++++++++++++++++++++++++++++++ mm/memory_hotplug.c | 25 ++++++------------------- mm/page_isolation.c | 18 ++---------------- 6 files changed, 78 insertions(+), 35 deletions(-) Shortlog Michal Hocko (4): mm, memory_hotplug: simplify empty node mask handling in new_node_page hugetlb, memory_hotplug: prefer to use reserved pages for migration mm: unify new_node_page and alloc_migrate_target hugetlb: add support for preferred node to alloc_huge_page_nodemask