No creep, because you'll always collapse.

OK so in the 511 scenario, do we simply immediately collapse to the largest
possible _mTHP_ page size if based on adjacent none/zero page entries in the
PTE, and _never_ collapse to PMD on this basis even if we do have sufficient
none/zero PTE entries to do so?

Right. And if we fail to allocate a PMD, we would collapse to smaller sizes, and later, once a PMD is possible, collapse to a PMD.

But there is no creep, as we would have collapsed a PMD right from the start either way.


And only collapse to PMD size if we have sufficient adjacent PTE entries that
are populated?

Let's really nail this down actually so we can be super clear what the issue is
here.


I hope what I wrote above made sense.



Creep only happens if you wouldn't collapse a PMD without prior mTHP
collapse, but suddenly would in the same scenario simply because you had
prior mTHP collapse.

At least that's my understanding.

OK, that makes sense, is the logic (this may be part of the bit I haven't
reviewed yet tbh) then that for khugepaged mTHP we have the system where we
always require prior mTHP collapse _first_?

So I would describe creep as

"we would not collapse a PMD THP because max_ptes_none is violated, but because we collapsed smaller mTHP THPs before, we essentially suddenly have more PTEs that are not none-or-zero, making us suddenly collapse a PMD THP at the same place".

Assume the following: max_ptes_none = 256

This means we would only collapse if at most half (256/512) of the PTEs are none-or-zero.

But imagine the (simplified) PTE layout with PMD = 8 entries to simplify:

[ P Z P Z P Z Z Z ]

3 Present vs. 5 Zero -> do not collapse a PMD (8)

But sssume we collapse smaller mTHP (2 entries) first

[ P P P P P P Z Z ]

We collapsed 3x "P Z" into "P P" because the ratio allowed for it.

Suddenly we have

6 Present vs 2 Zero and we collapse a PMD (8)

[ P P P P P P P P ]

That's the "creep" problem.




max_ptes_none == 0 -> collapse mTHP only if all non-none/zero

And for the intermediate values

(1) pr_warn() when mTHPs are enabled, stating that mTHP collapse is not
supported yet with other values

It feels a bit much to issue a kernel warning every time somebody twiddles that
value, and it's kind of against user expectation a bit.

pr_warn_once() is what I meant.

Right, but even then it feels a bit extreme, warnings are pretty serious
things. Then again there's precedent for this, and it may be the least worse
solution.

I just picture a cloud provider turning this on with mTHP then getting their
monitoring team reporting some urgent communication about warnings in dmesg :)

I mean, one could make the states mutually, maybe?

Disallow enabling mTHP with max_ptes_none set to unsupported values and the other way around.

That would probably be cleanest, although the implementation might get a bit more involved (but it's solvable).

But the concern could be that there are configs that could suddenly break: someone that set max_ptes_none and enabled mTHP.


I'll note that we could also consider only supporting "max_ptes_none = 511" (default) to start with.

The nice thing about that value is that it us fully supported with the underused shrinker, because max_ptes_none=511 -> never shrink.

--
Cheers

David / dhildenb


Reply via email to