On Mon, Sep 15, 2025 at 12:22:07PM +0200, David Hildenbrand wrote: > On 15.09.25 11:22, Kiryl Shutsemau wrote: > > On Fri, Sep 12, 2025 at 05:31:51PM -0600, Nico Pache wrote: > > > On Fri, Sep 12, 2025 at 6:25 AM David Hildenbrand <[email protected]> > > > wrote: > > > > > > > > On 12.09.25 14:19, Kiryl Shutsemau wrote: > > > > > On Thu, Sep 11, 2025 at 09:27:55PM -0600, Nico Pache wrote: > > > > > > The following series provides khugepaged with the capability to > > > > > > collapse > > > > > > anonymous memory regions to mTHPs. > > > > > > > > > > > > To achieve this we generalize the khugepaged functions to no longer > > > > > > depend > > > > > > on PMD_ORDER. Then during the PMD scan, we use a bitmap to track > > > > > > individual > > > > > > pages that are occupied (!none/zero). After the PMD scan is done, > > > > > > we do > > > > > > binary recursion on the bitmap to find the optimal mTHP sizes for > > > > > > the PMD > > > > > > range. The restriction on max_ptes_none is removed during the scan, > > > > > > to make > > > > > > sure we account for the whole PMD range. When no mTHP size is > > > > > > enabled, the > > > > > > legacy behavior of khugepaged is maintained. max_ptes_none will be > > > > > > scaled > > > > > > by the attempted collapse order to determine how full a mTHP must > > > > > > be to be > > > > > > eligible for the collapse to occur. If a mTHP collapse is > > > > > > attempted, but > > > > > > contains swapped out, or shared pages, we don't perform the > > > > > > collapse. It is > > > > > > now also possible to collapse to mTHPs without requiring the PMD > > > > > > THP size > > > > > > to be enabled. > > > > > > > > > > > > When enabling (m)THP sizes, if max_ptes_none >= HPAGE_PMD_NR/2 (255 > > > > > > on > > > > > > 4K page size), it will be automatically capped to HPAGE_PMD_NR/2 - > > > > > > 1 for > > > > > > mTHP collapses to prevent collapse "creep" behavior. This prevents > > > > > > constantly promoting mTHPs to the next available size, which would > > > > > > occur > > > > > > because a collapse introduces more non-zero pages that would > > > > > > satisfy the > > > > > > promotion condition on subsequent scans. > > > > > > > > > > Hm. Maybe instead of capping at HPAGE_PMD_NR/2 - 1 we can count > > > > > all-zeros 4k as none_or_zero? It mirrors the logic of shrinker. > > > > > > > > > > > > > I am all for not adding any more ugliness on top of all the ugliness we > > > > added in the past. > > > > > > > > I will soon propose deprecating that parameter in favor of something > > > > that makes a bit more sense. > > > > > > > > In essence, we'll likely have an "eagerness" parameter that ranges from > > > > 0 to 10. 10 is essentially "always collapse" and 0 "never collapse if > > > > not all is populated". > > > Hi David, > > > > > > Do you have any reason for 0-10, I'm guessing these will map to > > > different max_ptes_none values. > > > I suggest 0-5, mapping to 0,32,64,128,255,511 > > > > That's too x86-64 specific. > > > > And the whole idea is not to map to directly, but give kernel wiggle > > room to play. > > Initially we will start out simple and map it directly. But yeah, the idea > is to give us some more room later.
I think it's less 'wiggle room' and more us being able to _abstract_ what this measurement means while reserving the right to adjust this. But maybe we are saying the same thing in different ways. > > I had something logarithmic in mind which would roughly be (ignoring the the > weird -1 for simplicity and expressing it as "used" instead of none-or-zero) > > 0 -> ~100% used (~0% none) So equivalent to 511 today? > 1 -> ~50% used (~50% none) > 2 -> ~25% used (~75% none) > 3 -> ~12.5% used (~87.5% none) > 4 -> ~11.25% used (~88,75% none) > ... > 10 -> ~0% used (~100% none) So equivalent to 0 today? And with a logarithmic weighting towards values closer to "0% used"? This seems sensible given the only reports we've had of non-0/511 uses here are in that range... But ofc this interpretation should be something we determine + treated as an implementation detail that we can modify later. > > Mapping that to actual THP sizes (#pages in a thp) on an arch will be easy. And at different mTHP levels too right? > > -- > Cheers > > David / dhildenb > Cheers, Lorenzo
