khugepaged: fix dirty page handling for MADV_COLLAPSE

Andrew Morton Sun, 18 Jan 2026 12:32:05 -0800

On Sun, 18 Jan 2026 19:09:38 +0000 Shivank Garg <[email protected]> wrote:


> MADV_COLLAPSE on file-backed mappings fails with -EINVAL when TEXT pages
> are dirty. This affects scenarios like package/container updates or
> executing binaries immediately after writing them, etc.
> 
> The issue is that collapse_file() triggers async writeback and returns
> SCAN_FAIL (maps to -EINVAL), expecting khugepaged to revisit later. But
> MADV_COLLAPSE is synchronous and userspace expects immediate success or
> a clear retry signal.
> 
> Reproduction:
>  - Compile or copy 2MB-aligned executable to XFS/ext4 FS
>  - Call MADV_COLLAPSE on .text section
>  - First call fails with -EINVAL (text pages dirty from copy)
>  - Second call succeeds (async writeback completed)
> 
> Issue Report:
> https://lore.kernel.org/all/[email protected]

Updated, thanks.

Please tolerate a little whining about the timeliess here.  We're at
-rc6, v4 was added to mm.git over a month ago, had quite a lot of
review, this is very close to being moved into the mm-stable branch and now
we get v5.  Argh.

> V5:
> - In patch 2/2, Simplify dirty writeback retry logic (David)

Are you sure this is the only change?  It looks like a lot for a
simplification and I'm wondering if we should retain the v4 series and
defer a simplification for separate consideration during the next
cycle.

Below is how this updated altered mm.git.  Could reviewers please check
this fairly soon?


--- a/mm/khugepaged.c~b
+++ a/mm/khugepaged.c
@@ -2788,11 +2788,11 @@ int madvise_collapse(struct vm_area_stru
        hend = end & HPAGE_PMD_MASK;
 
        for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
-               bool retried = false;
                int result = SCAN_FAIL;
+               bool triggered_wb = false;
 
-               if (!mmap_locked) {
 retry:
+               if (!mmap_locked) {
                        cond_resched();
                        mmap_read_lock(mm);
                        mmap_locked = true;
@@ -2812,52 +2812,27 @@ retry:
 
                        mmap_read_unlock(mm);
                        mmap_locked = false;
+                       *lock_dropped = true;
                        result = hpage_collapse_scan_file(mm, addr, file, pgoff,
                                                          cc);
-                       fput(file);
-               } else {
-                       result = hpage_collapse_scan_pmd(mm, vma, addr,
-                                                        &mmap_locked, cc);
-               }
-               if (!mmap_locked)
-                       *lock_dropped = true;
-
-               /*
-                * If the file-backed VMA has dirty pages, the scan triggers
-                * async writeback and returns SCAN_PAGE_DIRTY_OR_WRITEBACK.
-                * Since MADV_COLLAPSE is sync, we force sync writeback and
-                * retry once.
-                */
-               if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !retried) {
-                       /*
-                        * File scan drops the lock. We must re-acquire it to
-                        * safely inspect the VMA and hold the file reference.
-                        */
-                       if (!mmap_locked) {
-                               cond_resched();
-                               mmap_read_lock(mm);
-                               mmap_locked = true;
-                               result = hugepage_vma_revalidate(mm, addr, 
false, &vma, cc);
-                               if (result != SCAN_SUCCEED)
-                                       goto handle_result;
-                       }
 
-                       if (!vma_is_anonymous(vma) && vma->vm_file &&
-                           mapping_can_writeback(vma->vm_file->f_mapping)) {
-                               struct file *file = get_file(vma->vm_file);
-                               pgoff_t pgoff = linear_page_index(vma, addr);
+                       if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && 
!triggered_wb &&
+                           mapping_can_writeback(file->f_mapping)) {
                                loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
                                loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
 
-                               mmap_read_unlock(mm);
-                               mmap_locked = false;
-                               *lock_dropped = true;
                                filemap_write_and_wait_range(file->f_mapping, 
lstart, lend);
+                               triggered_wb = true;
                                fput(file);
-                               retried = true;
                                goto retry;
                        }
+                       fput(file);
+               } else {
+                       result = hpage_collapse_scan_pmd(mm, vma, addr,
+                                                        &mmap_locked, cc);
                }
+               if (!mmap_locked)
+                       *lock_dropped = true;
 
 handle_result:
                switch (result) {
_

Re: [PATCH V5 0/2] mm/khugepaged: fix dirty page handling for MADV_COLLAPSE

Reply via email to