On 2025/6/12 18:08, David Hildenbrand wrote:
On 12.06.25 05:54, Baolin Wang wrote:
When running the khugepaged selftest for shmem (./khugepaged all:shmem),

Hmm, this combination is not run automatically through run_tests.sh, right? IIUC, it only runs "./khugepaged" which tests anon only ...

Should we add it there? Then I would probably have noticed that myself earlier :)

Yes, see patch 2.

I encountered the following test failures:
"
Run test: collapse_full (khugepaged:shmem)
Collapse multiple fully populated PTE table.... Fail
...
Run test: collapse_single_pte_entry (khugepaged:shmem)
Collapse PTE table with single PTE entry present.... Fail
...
Run test: collapse_full_of_compound (khugepaged:shmem)
Allocate huge page... OK
Split huge page leaving single PTE page table full of compound pages... OK
Collapse PTE table full of compound pages.... Fail
"

The reason for the failure is that, it will set MADV_NOHUGEPAGE to prevent
khugepaged from continuing to scan shmem VMA after khugepaged finishes
scanning in the wait_for_scan() function. Moreover, shmem requires a refault
to establish PMD mappings.

However, after commit 2b0f922323cc, PMD mappings are prevented if the VMA is set with MADV_NOHUGEPAGE flag, so shmem cannot establish PMD mappings during
refault.

Right. It's always problematic when we have some contradicting information in the VMA vs. pagecache.


To fix this issue, we can set the MADV_NOHUGEPAGE flag after the shmem refault.
With this fix, the shmem test case passes.

Fixes: 2b0f922323cc ("mm: don't install PMD mappings when THPs are disabled by the hw/process/vma")
Signed-off-by: Baolin Wang <baolin.w...@linux.alibaba.com>
---
  tools/testing/selftests/mm/khugepaged.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 8a4d34cce36b..d462f62d8116 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -561,8 +561,6 @@ static bool wait_for_scan(const char *msg, char *p, int nr_hpages,
          usleep(TICK);
      }
-    madvise(p, nr_hpages * hpage_pmd_size, MADV_NOHUGEPAGE);
-
      return timeout == -1;
  }
@@ -585,6 +583,7 @@ static void khugepaged_collapse(const char *msg, char *p, int nr_hpages,
      if (ops != &__anon_ops)
          ops->fault(p, 0, nr_hpages * hpage_pmd_size);
+    madvise(p, nr_hpages * hpage_pmd_size, MADV_NOHUGEPAGE);
      if (ops->check_huge(p, expect ? nr_hpages : 0))
          success("OK");
      else

It's a shame we have this weird interface: there is no way we can clear VM_HUGEPAGE without setting VM_NOHUGEPAGE :(

Right.

But, do we even care about setting MADV_NOHUGEPAGE at all? IIUC, we'll almost immediately later call cleanup_area() where we munmap(), right?

I tested removing the MADV_NOHUGEPAGE setting, and the khugepaged test cases all passed.

However, a potential impact of removing MADV_NOHUGEPAGE is that, khugepaged might report 'timeout', but check_huge() would still report 'success' (assuming khugepaged tries to scan the VMA and successfully collapses it after the timeout). Such test result could be confusing.

I know this kind of case is very rare and may not even be detectable in tests. At least, I couldn't reproduce it. But I prefer to keep the original test logic and fix the issue without making further changes.

Reply via email to