On 2026-04-18 06:55, Mike Rapoport wrote:
From: "Mike Rapoport (Microsoft)" <[email protected]>

migration skips HugeTLB tests if there are no free huge pages
prepared by a wrapper script.

Add setup of HugeTLB pages to the test and make sure that the original
settings are restored on the test exit.

Since kselftest_harness runs fixture setup and the tests in child
processes, use HUGETLB_SETUP_DEFAULT_PAGES() that defines a constructor
that runs in the main process and add verification that there are enough
free huge pages to the tests that use them.

Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
---
  tools/testing/selftests/mm/migration.c | 8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/tools/testing/selftests/mm/migration.c 
b/tools/testing/selftests/mm/migration.c
index ccf42002ce86..61fb00953f83 100644
--- a/tools/testing/selftests/mm/migration.c
+++ b/tools/testing/selftests/mm/migration.c
@@ -23,6 +23,8 @@
  #define MAX_RETRIES   100
  #define ALIGN(x, a)   (((x) + (a - 1)) & (~((a) - 1)))
+HUGETLB_SETUP_DEFAULT_PAGES(1)

Hey Mike,

I've been reviewing and testing this series and got a reproducible issue
with this test when running it on a x86 KVM guest with 88 vCPUs.

The issue is that, when executing the full MM suite with
sudo ./run_vmtests.sh -d -a, all 6 migration test pass but it doesn't exit.
Instead, it gets stuck after this output:

"""
# # PASSED: 6 / 6 tests passed.
# # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
"""

Getting a backtrace from gdb I see:

"""
#0  0x00007efd2f2c247b in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007efd2f26fa88 in __run_exit_handlers () from /lib64/libc.so.6
#2  0x00007efd2f26fabe in exit () from /lib64/libc.so.6
#3  0x0000000000404f2e in hugepage_restore_settings_sighandler ()
#4  <signal handler called>
#5  0x00007efd2f32f416 in __unregister_atfork () from /lib64/libc.so.6
#6  0x00007efd2f26f338 in __cxa_finalize () from /lib64/libc.so.6
#7  0x00007efd2f4548c7 in __do_global_dtors_aux () from /lib64/libm.so.6
#8  0x00007ffd66ae0320 in ?? ()
#9  0x00007efd2f55b2d2 in _dl_call_fini (closure_map=0x7efd2f5500c0) at 
dl-call_fini.c:43
"""

Could we be messing with libc internal state somehow? I also get systemd
services hung when I try to reboot.

Some of the migration tests fork() and then kill() their children
processes. Won't those all restore the hugetlb state concurrently
from hugepage_restore_settings_atexit()?

Also, for shared_anon_htlb, don't we need to reserve a HugeTLB page per
children?

And there's another issue: when running the migration test individually,
private_anon_htlb gets skipped. I guess it's because the previous test
is restoring the HugeTLB state:

"""
TAP version 13
# -------------------
# running ./migration
# -------------------
# # [INFO] detected hugetlb page size: 2048 KiB
# # [INFO] detected hugetlb page size: 1048576 KiB
# TAP version 13
# 1..6
# # Starting 6 tests from 1 test cases.
# #  RUN           migration.shared_anon_htlb ...
# #            OK  migration.shared_anon_htlb
# ok 1 migration.shared_anon_htlb
# #  RUN           migration.private_anon_htlb ...
# #      SKIP      Not enough huge pages
#
# #            OK  migration.private_anon_htlb
# ok 2 migration.private_anon_htlb # SKIP Not enough huge pages
#
# #  RUN           migration.shared_anon_thp ...
# #            OK  migration.shared_anon_thp
# ok 3 migration.shared_anon_thp
# #  RUN           migration.private_anon_thp ...
# #            OK  migration.private_anon_thp
# ok 4 migration.private_anon_thp
# #  RUN           migration.shared_anon ...
# #            OK  migration.shared_anon
# ok 5 migration.shared_anon
# #  RUN           migration.private_anon ...
# #            OK  migration.private_anon
# ok 6 migration.private_anon
# # PASSED: 6 / 6 tests passed.
# # 1 skipped test(s) detected. Consider enabling relevant config options to 
improve coverage.
# # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:1 error:0
"""

(I have minor comments about earlier patches, but I decided to send this
first since it's the most important).

+
  FIXTURE(migration)
  {
        pthread_t *threads;
@@ -277,6 +279,9 @@ TEST_F_TIMEOUT(migration, private_anon_htlb, 2*RUNTIME)
        if (!hugepage_size)
                SKIP(return, "Reading HugeTLB pagesize failed\n");
+ if (hugetlb_free_default_pages() < 1)
+               SKIP(return, "Not enough huge pages\n");
+
        ptr = mmap(NULL, hugepage_size, PROT_READ | PROT_WRITE,
                MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
        ASSERT_NE(ptr, MAP_FAILED);
@@ -308,6 +313,9 @@ TEST_F_TIMEOUT(migration, shared_anon_htlb, 2*RUNTIME)
        if (!hugepage_size)
                SKIP(return, "Reading HugeTLB pagesize failed\n");
+ if (hugetlb_free_default_pages() < 1)
+               SKIP(return, "Not enough huge pages\n");
+
        ptr = mmap(NULL, hugepage_size, PROT_READ | PROT_WRITE,
                MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
        ASSERT_NE(ptr, MAP_FAILED);


Reply via email to