On 4/27/26 1:52 PM, David Hildenbrand (Arm) wrote:
> On 4/19/26 20:57, Nico Pache wrote:
>> The following cleanup reworks all the max_ptes_* handling into helper
>> functions. This increases the code readability and will later be used to
>> implement the mTHP handling of these variables.
>>
>> With these changes we abstract all the madvise_collapse() special casing
>> (dont respect the sysctls) away from the functions that utilize them. And
>> will later in this series to cleanly restrict mTHP collapses behaviors.
>>
>> Suggested-by: David Hildenbrand <[email protected]>
>> Signed-off-by: Nico Pache <[email protected]>
>> ---
>>   mm/khugepaged.c | 114 +++++++++++++++++++++++++++++++++---------------
>>   1 file changed, 78 insertions(+), 36 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index afac6bc4e76d..f42b55421191 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -348,6 +348,58 @@ static bool pte_none_or_zero(pte_t pte)
>>      return pte_present(pte) && is_zero_pfn(pte_pfn(pte));
>>   }
>>
>> +/**
>> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for 
>> collapse
>
> empty PTE or PTE mapping the shared zeropage ? That should be clarified also 
> below.

Ah fair point, "empty" isn't the best representation of a "none"/zeropage.

>
>> + * @cc: The collapse control struct
>> + * @vma: The vma to check for userfaultfd
>> + *
>> + * If we are not in khugepaged mode use HPAGE_PMD_NR to allow any
>> + * empty page.
>
> Not completely accurate due to uffd. And it's not really "empty page".

Sorry I forgot to update this comment. I originally planned on skipping
the VMA passing, but then figured later that it would make the code even
more uniform (as you suggested)

>
> Is that information really necessary for the caller? I'd suggest you drop this
> here and instead add a comment inline above the "return HPAGE_PMD_NR;".

Yeah, I'm not really sure; I can shorten them. I was heeding to lorenzos
request to add these with docstring headers

>
>> + *
>> + * Return: Maximum number of empty PTEs allowed for the collapse operation
>> + */
>> +static unsigned int collapse_max_ptes_none(struct collapse_control *cc,
>> +            struct vm_area_struct *vma)
>> +{
>> +    if (vma && userfaultfd_armed(vma))
>> +            return 0;
>> +    if (!cc->is_khugepaged)
>> +            return HPAGE_PMD_NR;
>> +    return khugepaged_max_ptes_none;
>> +}
>> +
>> +/**
>> + * collapse_max_ptes_shared - Calculate maximum allowed shared PTEs for 
>> collapse
>
> "shared PTE" is not quite clear.
>
> "PTEs that map shared anonymous pages" ?

That works for me, thank you

>
>> + * @cc: The collapse control struct
>> + *
>> + * If we are not in khugepaged mode use HPAGE_PMD_NR to allow any
>> + * shared page.
>
> Same comment as above.

ack

>
>> + *
>> + * Return: Maximum number of shared PTEs allowed for the collapse operation
>> + */
>> +static unsigned int collapse_max_ptes_shared(struct collapse_control *cc)
>> +{
>> +    if (!cc->is_khugepaged)
>> +            return HPAGE_PMD_NR;
>> +    return khugepaged_max_ptes_shared;
>> +}
>> +
>> +/**
>> + * collapse_max_ptes_swap - Calculate maximum allowed swap PTEs for collapse
>
> We're actually checking non-present page table entries (anonymous THP 
> collapse)
> or non-present pagecache entries (file THP collapse).
>
> I wonder if there is an easy way to clarify that here, at least in the
> description (confusing name can stay unless we find something better).

I'll update the comment to include some form of this. In my mind the
name should probably stay relatively consistent to the sysctl value.

>
>> + * @cc: The collapse control struct
>> + *
>> + * If we are not in khugepaged mode use HPAGE_PMD_NR to allow any
>> + * swap page.
>
> Dito.

ack!

>
>> + *
>> + * Return: Maximum number of swap PTEs allowed for the collapse operation
>> + */
>> +static unsigned int collapse_max_ptes_swap(struct collapse_control *cc)
>> +{
>> +    if (!cc->is_khugepaged)
>> +            return HPAGE_PMD_NR;
>> +    return khugepaged_max_ptes_swap;
>> +}
>> +
>>   int hugepage_madvise(struct vm_area_struct *vma,
>>                   vm_flags_t *vm_flags, int advice)
>>   {
>> @@ -546,21 +598,19 @@ static enum scan_result 
>> __collapse_huge_page_isolate(struct vm_area_struct *vma,
>>      pte_t *_pte;
>>      int none_or_zero = 0, shared = 0, referenced = 0;
>>      enum scan_result result = SCAN_FAIL;
>> +    unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma);
>> +    unsigned int max_ptes_shared = collapse_max_ptes_shared(cc);
>
> These could be const, right? Or will that change in future patches?

Yes I believe these can be const now! Thank you

>
>>
>>      for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>>           _pte++, addr += PAGE_SIZE) {
>>              pte_t pteval = ptep_get(_pte);
>>              if (pte_none_or_zero(pteval)) {
>> -                    ++none_or_zero;
>> -                    if (!userfaultfd_armed(vma) &&
>> -                        (!cc->is_khugepaged ||
>> -                         none_or_zero <= khugepaged_max_ptes_none)) {
>> -                            continue;
>> -                    } else {
>> +                    if (++none_or_zero > max_ptes_none) {
>>                              result = SCAN_EXCEED_NONE_PTE;
>>                              count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>                              goto out;
>>                      }
>> +                    continue;
>>              }
>>              if (!pte_present(pteval)) {
>>                      result = SCAN_PTE_NON_PRESENT;
>> @@ -591,9 +641,7 @@ static enum scan_result 
>> __collapse_huge_page_isolate(struct vm_area_struct *vma,
>>
>>              /* See collapse_scan_pmd(). */
>>              if (folio_maybe_mapped_shared(folio)) {
>> -                    ++shared;
>> -                    if (cc->is_khugepaged &&
>> -                        shared > khugepaged_max_ptes_shared) {
>> +                    if (++shared > max_ptes_shared) {
>>                              result = SCAN_EXCEED_SHARED_PTE;
>>                              count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>>                              goto out;
>> @@ -1270,6 +1318,9 @@ static enum scan_result collapse_scan_pmd(struct 
>> mm_struct *mm,
>>      unsigned long addr;
>>      spinlock_t *ptl;
>>      int node = NUMA_NO_NODE, unmapped = 0;
>> +    unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma);
>> +    unsigned int max_ptes_shared = collapse_max_ptes_shared(cc);
>> +    unsigned int max_ptes_swap = collapse_max_ptes_swap(cc);
>
> Same question here.

ack! will adjust.

>
>>
>>      VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>>
>
>
> In general, LGTM. With the doc fixed up
>
> Acked-by: David Hildenbrand (Arm) <[email protected]>

Thank you Ill get those updated.
>


Reply via email to