On 4/14/26 4:34 AM, Zi Yan wrote:
On 13 Apr 2026, at 16:20, Matthew Wilcox wrote:

On Mon, Apr 13, 2026 at 03:20:19PM -0400, Zi Yan wrote:
collapse_file() requires FSes supporting large folio with at least
PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.

While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.

Why?  These are bugs.  I don't think we gain anything from continuing.

The goal is to catch these issues during development. VM_BUG_ON crashes
the system and that is too much for such issues in collapse_file().


+       /*
+        * skip files without PMD-order folio support
+        * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
+        */
+       if (!shmem_file(file) && mapping_max_folio_order(mapping) < PMD_ORDER)
+               return SCAN_FAIL;

I wonder if it should.  If the commit message to 5a90c155defa is
to be believed,

     Since 'deny' is for emergencies and 'force' is for testing, performance
     issues should not be a problem in real production environments, so don't
     call mapping_set_large_folios() in __shmem_get_inode() when large folio is
     disabled with mount huge=never option (default policy).

so maybe MADV_COLLAPSE should honour huge=never?
Documentation/filesystems/tmpfs.rst implies that we do!

huge=never       Do not allocate huge pages.  This is the default.
huge=always      Attempt to allocate huge page every time a new page is needed.
huge=within_size Only allocate huge page if it will be fully within i_size.
                  Also respect madvise(2) hints.
huge=advise      Only allocate huge page if requested with madvise(2).

so what's the difference between huge=never and huge=madvise?

I think madvise means MADV_HUGEPAGE for the region, not MADV_COLLAPSE.

Right.

In v1, I did the check for shmem, but that regressed MADV_COLLAPSE, which
always can collapse THPs on shmem. I know it sounds unreasonable, but
that ship has sailed.

Previously, I tried to make MADV_COLLAPSE also honour the THP configuration of shmem/tmpfs[1], but Hugh strongly objected and explained the original intent of MADV_COLLAPSE[2]. I’ll quote Hugh’s comments:

"
Seldom has a feature been so thorougly documented as MADV_COLLAPSE,
in its 6.1 commits and in the "man 2 madvise" page: which are
explicit about MADV_COLLAPSE providing a way to get THPs where the
sysfs setting governing automatic behaviour does not insert them.

We would all prefer a less messy world of THP tunables.  I certainly
find plenty to dislike there too; and wish that a less assertive name
than "never" had been chosen originally for the default off position.

But please don't break the accepted and documented behaviour of
MADV_COLLAPSE now.

If you want to exclude all possibility of THPs, then please use the
prctl(PR_SET_THP_DISABLE); or shmem_enabled=deny (I think it was me
who insisted that be respected by MADV_COLLAPSE back then).
"

Afterwards, we reached an agreement to keep the current logic, and Lorenzo helped update the docs, see commit a27848a03504 (“docs: update THP documentation to clarify sysfs ‘never’ setting”).

[1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/

Reply via email to