On 4/27/26 2:13 PM, David Hildenbrand (Arm) wrote:
> On 4/19/26 20:57, Nico Pache wrote:
>> Pass an order and offset to collapse_huge_page to support collapsing anon
>> memory to arbitrary orders within a PMD. order indicates what mTHP size we
>> are attempting to collapse to, and offset indicates were in the PMD to
>> start the collapse attempt.
>>
>> For non-PMD collapse we must leave the anon VMA write locked until after
>> we collapse the mTHP-- in the PMD case all the pages are isolated, but in
>> the mTHP case this is not true, and we must keep the lock to prevent
>> access/changes to the page tables. This can happen if the rmap walkers hit
>> a pmd_none while the PMD entry is currently unavailable due to being
>> temporarily removed during the collapse phase.
>>
>> Signed-off-by: Nico Pache <[email protected]>
>> ---
>> mm/khugepaged.c | 103 +++++++++++++++++++++++++++---------------------
>> 1 file changed, 57 insertions(+), 46 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 283bb63854a5..ff6f9f1883ed 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1198,42 +1198,36 @@ static enum scan_result alloc_charge_folio(struct
>> folio **foliop, struct mm_stru
>> return SCAN_SUCCEED;
>> }
>>
>> -static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned
>> long address,
>> - int referenced, int unmapped, struct collapse_control *cc)
>> +static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned
>> long start_addr,
>> + int referenced, int unmapped, struct collapse_control *cc,
>> + unsigned int order)
>> {
>> LIST_HEAD(compound_pagelist);
>> pmd_t *pmd, _pmd;
>> - pte_t *pte;
>> + pte_t *pte = NULL;
>> pgtable_t pgtable;
>> struct folio *folio;
>> spinlock_t *pmd_ptl, *pte_ptl;
>> enum scan_result result = SCAN_FAIL;
>> struct vm_area_struct *vma;
>> struct mmu_notifier_range range;
>> + bool anon_vma_locked = false;
>> + const unsigned long pmd_addr = start_addr & HPAGE_PMD_MASK;
>> + const unsigned long end_addr = start_addr + (PAGE_SIZE << order);
>
> In general, const read better when they are at the very top of this list.
Ack! still getting in the habit of this, sorry.
>
>>
>> - VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>> -
>> - /*
>> - * Before allocating the hugepage, release the mmap_lock read lock.
>> - * The allocation can take potentially a long time if it involves
>> - * sync compaction, and we do not need to hold the mmap_lock during
>> - * that. We will recheck the vma after taking it again in write mode.
>> - */
>> - mmap_read_unlock(mm);
>> -
>
> You should spell out that locking change (moving it to the caller), and why it
> is required, in the patch description.
>
> I'd even have put this into a separate patch, as it's independent to the
> order-passing changes.
Yeah good point ill separate this out into its own patch, I also
realized I never updated the commit message for this patch to properly
describe this change. Whoops.
> [...]
>
>> */
>> __folio_mark_uptodate(folio);
>> - pgtable = pmd_pgtable(_pmd);
>> -
>> spin_lock(pmd_ptl);
>> - BUG_ON(!pmd_none(*pmd));
>> - pgtable_trans_huge_deposit(mm, pmd, pgtable);
>> - map_anon_folio_pmd_nopf(folio, pmd, vma, address);
>> + WARN_ON_ONCE(!pmd_none(*pmd));
>> + if (is_pmd_order(order)) { /* PMD collapse */
>> + pgtable = pmd_pgtable(_pmd);
>> + pgtable_trans_huge_deposit(mm, pmd, pgtable);
>> + map_anon_folio_pmd_nopf(folio, pmd, vma, pmd_addr);
>> + } else { /* mTHP collapse */
>
> Do both these comments (PMD collapse ...) really add any value? I'd say it's
> pretty self-documenting code already.
Yeah with the new is_pmd_order helper its def a little overkill. I'll
remove them. Thanks!
>
>> + map_anon_folio_pte_nopf(folio, pte, vma, start_addr,
>> /*uffd_wp=*/ false);
>> + smp_wmb(); /* make PTEs visible before PMD. See pmd_install() */
>> + pmd_populate(mm, pmd, pmd_pgtable(_pmd));
>> + }
>> spin_unlock(pmd_ptl);
>>
>> folio = NULL;
>>
>> result = SCAN_SUCCEED;
>> out_up_write:
>> + if (anon_vma_locked)
>> + anon_vma_unlock_write(vma->anon_vma);
>> + if (pte)
>> + pte_unmap(pte);
>