Hi,
I have received a bug report for memory hotplug triggered hugetlb
migration on a distribution kernel but the very same issue is still
present in the current upstream code. The bug is described in patch
2 but in short the issue is that new_node_page doesn't really try to
consume preallocated hugetlb pages in the pool on other than the next
node which is really suboptimal. This results in very likely failures of
memory hotremove even though there are many hugetlb pages in the pool.
I think it is fair to call this a bug.

Patches 1 and 3 are cleanups and the last patch is still a RFC because
I am not sure we really need/want to go that way. The thing is that the
page allocator relies on zonelists to do the proper allocation fallback
wrt. numa distances.  We do not have anything like that for hugetlb
allocations because they are not zone aware in general. Making them
fully zonlist (or alternately nodelist) aware is quite a large project
I guess. Instead I admittedly went the path of least resistance and
instead provided a much simpler approach. More on that in patch 4.  If
this doesn't seem good enough I will drop it from the series but to me
it looks like a reasonable compromise code wise.

Thoughts, ideas, objections?

Diffstat
 include/linux/hugetlb.h  |  3 +++
 include/linux/migrate.h  | 17 +++++++++++++++++
 include/linux/nodemask.h | 20 ++++++++++++++++++++
 mm/hugetlb.c             | 30 ++++++++++++++++++++++++++++++
 mm/memory_hotplug.c      | 25 ++++++-------------------
 mm/page_isolation.c      | 18 ++----------------
 6 files changed, 78 insertions(+), 35 deletions(-)

Shortlog
Michal Hocko (4):
      mm, memory_hotplug: simplify empty node mask handling in new_node_page
      hugetlb, memory_hotplug: prefer to use reserved pages for migration
      mm: unify new_node_page and alloc_migrate_target
      hugetlb: add support for preferred node to alloc_huge_page_nodemask


Reply via email to