On Sun, 18 Jan 2026 19:09:38 +0000 Shivank Garg <[email protected]> wrote:
> MADV_COLLAPSE on file-backed mappings fails with -EINVAL when TEXT pages > are dirty. This affects scenarios like package/container updates or > executing binaries immediately after writing them, etc. > > The issue is that collapse_file() triggers async writeback and returns > SCAN_FAIL (maps to -EINVAL), expecting khugepaged to revisit later. But > MADV_COLLAPSE is synchronous and userspace expects immediate success or > a clear retry signal. > > Reproduction: > - Compile or copy 2MB-aligned executable to XFS/ext4 FS > - Call MADV_COLLAPSE on .text section > - First call fails with -EINVAL (text pages dirty from copy) > - Second call succeeds (async writeback completed) > > Issue Report: > https://lore.kernel.org/all/[email protected] Updated, thanks. Please tolerate a little whining about the timeliess here. We're at -rc6, v4 was added to mm.git over a month ago, had quite a lot of review, this is very close to being moved into the mm-stable branch and now we get v5. Argh. > V5: > - In patch 2/2, Simplify dirty writeback retry logic (David) Are you sure this is the only change? It looks like a lot for a simplification and I'm wondering if we should retain the v4 series and defer a simplification for separate consideration during the next cycle. Below is how this updated altered mm.git. Could reviewers please check this fairly soon? --- a/mm/khugepaged.c~b +++ a/mm/khugepaged.c @@ -2788,11 +2788,11 @@ int madvise_collapse(struct vm_area_stru hend = end & HPAGE_PMD_MASK; for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { - bool retried = false; int result = SCAN_FAIL; + bool triggered_wb = false; - if (!mmap_locked) { retry: + if (!mmap_locked) { cond_resched(); mmap_read_lock(mm); mmap_locked = true; @@ -2812,52 +2812,27 @@ retry: mmap_read_unlock(mm); mmap_locked = false; + *lock_dropped = true; result = hpage_collapse_scan_file(mm, addr, file, pgoff, cc); - fput(file); - } else { - result = hpage_collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); - } - if (!mmap_locked) - *lock_dropped = true; - - /* - * If the file-backed VMA has dirty pages, the scan triggers - * async writeback and returns SCAN_PAGE_DIRTY_OR_WRITEBACK. - * Since MADV_COLLAPSE is sync, we force sync writeback and - * retry once. - */ - if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !retried) { - /* - * File scan drops the lock. We must re-acquire it to - * safely inspect the VMA and hold the file reference. - */ - if (!mmap_locked) { - cond_resched(); - mmap_read_lock(mm); - mmap_locked = true; - result = hugepage_vma_revalidate(mm, addr, false, &vma, cc); - if (result != SCAN_SUCCEED) - goto handle_result; - } - if (!vma_is_anonymous(vma) && vma->vm_file && - mapping_can_writeback(vma->vm_file->f_mapping)) { - struct file *file = get_file(vma->vm_file); - pgoff_t pgoff = linear_page_index(vma, addr); + if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb && + mapping_can_writeback(file->f_mapping)) { loff_t lstart = (loff_t)pgoff << PAGE_SHIFT; loff_t lend = lstart + HPAGE_PMD_SIZE - 1; - mmap_read_unlock(mm); - mmap_locked = false; - *lock_dropped = true; filemap_write_and_wait_range(file->f_mapping, lstart, lend); + triggered_wb = true; fput(file); - retried = true; goto retry; } + fput(file); + } else { + result = hpage_collapse_scan_pmd(mm, vma, addr, + &mmap_locked, cc); } + if (!mmap_locked) + *lock_dropped = true; handle_result: switch (result) { _
