On Tue, Jun 03, 2025 at 11:16:33PM +0200, David Hildenbrand wrote:
> Marking PMDs that map a "normal" refcounted folios as special is
> against our rules documented for vm_normal_page().
> 
> Fortunately, there are not that many pmd_special() check that can be
> mislead, and most vm_normal_page_pmd()/vm_normal_folio_pmd() users that
> would get this wrong right now are rather harmless: e.g., none so far
> bases decisions whether to grab a folio reference on that decision.
> 
> Well, and GUP-fast will fallback to GUP-slow. All in all, so far no big
> implications as it seems.
> 
> Getting this right will get more important as we use
> folio_normal_page_pmd() in more places.
> 
> Fix it by just inlining the relevant code, making the whole
> pmd_none() handling cleaner. We can now use folio_mk_pmd().
> 
> While at it, make sure that a pmd that is not-none is actually present
> before comparing PFNs.
> 
> Fixes: 6c88f72691f8 ("mm/huge_memory: add vmf_insert_folio_pmd()")
> Signed-off-by: David Hildenbrand <[email protected]>

Hi David,

> ---
>  mm/huge_memory.c | 39 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d3e66136e41a3..f9e23dfea76f8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1474,9 +1474,10 @@ vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, 
> struct folio *folio,
>       struct vm_area_struct *vma = vmf->vma;
>       unsigned long addr = vmf->address & PMD_MASK;
>       struct mm_struct *mm = vma->vm_mm;
> +     pmd_t *pmd = vmf->pmd;
>       spinlock_t *ptl;
>       pgtable_t pgtable = NULL;
> -     int error;
> +     pmd_t entry;
>  
>       if (addr < vma->vm_start || addr >= vma->vm_end)
>               return VM_FAULT_SIGBUS;
> @@ -1490,17 +1491,41 @@ vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, 
> struct folio *folio,
>                       return VM_FAULT_OOM;
>       }
>  
> -     ptl = pmd_lock(mm, vmf->pmd);
> -     if (pmd_none(*vmf->pmd)) {
> +     ptl = pmd_lock(mm, pmd);
> +     if (pmd_none(*pmd)) {
>               folio_get(folio);
>               folio_add_file_rmap_pmd(folio, &folio->page, vma);
>               add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR);
> +
> +             entry = folio_mk_pmd(folio, vma->vm_page_prot);
> +             if (write) {
> +                     entry = pmd_mkyoung(pmd_mkdirty(entry));
> +                     entry = maybe_pmd_mkwrite(entry, vma);
> +             }
> +             set_pmd_at(mm, addr, pmd, entry);
> +             update_mmu_cache_pmd(vma, addr, pmd);
> +
> +             if (pgtable) {
> +                     pgtable_trans_huge_deposit(mm, pmd, pgtable);
> +                     mm_inc_nr_ptes(mm);
> +                     pgtable = NULL;
> +             }
> +     } else if (pmd_present(*pmd) && write) {
> +             /*
> +              * We only allow for upgrading write permissions if the
> +              * same folio is already mapped.
> +              */
> +             if (pmd_pfn(*pmd) == folio_pfn(folio)) {
> +                     entry = pmd_mkyoung(*pmd);
> +                     entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> +                     if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
> +                             update_mmu_cache_pmd(vma, addr, pmd);
> +             } else {
> +                     WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
> +             }

So, this is pretty much insert_pfn_pmd without pmd_mkdevmap/pmd_mkspecial().
I guess vmf_inser_folio_pmd() doesn't have to be concerned with devmaps
either, right?

Looks good to me, just a nit: would it not be better to pass a boolean
to insert_pfn_pmd() that lets it know whether it "can" create a
devmap/special entries?


-- 
Oscar Salvador
SUSE Labs

Reply via email to