On 13.06.25 16:00, Lorenzo Stoakes wrote:
On Fri, Jun 13, 2025 at 03:53:58PM +0200, David Hildenbrand wrote:
On 13.06.25 15:49, Oscar Salvador wrote:
On Fri, Jun 13, 2025 at 11:27:01AM +0200, David Hildenbrand wrote:
Marking PMDs that map a "normal" refcounted folios as special is
against our rules documented for vm_normal_page(): normal (refcounted)
folios shall never have the page table mapping marked as special.

Fortunately, there are not that many pmd_special() check that can be
mislead, and most vm_normal_page_pmd()/vm_normal_folio_pmd() users that
would get this wrong right now are rather harmless: e.g., none so far
bases decisions whether to grab a folio reference on that decision.

Well, and GUP-fast will fallback to GUP-slow. All in all, so far no big
implications as it seems.

Getting this right will get more important as we use
folio_normal_page_pmd() in more places.

Fix it by teaching insert_pfn_pmd() to properly handle folios and
pfns -- moving refcount/mapcount/etc handling in there, renaming it to
insert_pmd(), and distinguishing between both cases using a new simple
"struct folio_or_pfn" structure.

Use folio_mk_pmd() to create a pmd for a folio cleanly.

Fixes: 6c88f72691f8 ("mm/huge_memory: add vmf_insert_folio_pmd()")
Reviewed-by: Jason Gunthorpe <j...@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoa...@oracle.com>
Reviewed-by: Dan Williams <dan.j.willi...@intel.com>
Tested-by: Dan Williams <dan.j.willi...@intel.com>
Signed-off-by: David Hildenbrand <da...@redhat.com>

Altough we have it quite well explained here in the changelog, maybe
having a little comment in insert_pmd() noting why pmds mapping normal
folios cannot be marked special would be nice.

Well, I don't think we should be replicating that all over the place. The
big comment above vm_normal_page() is currently our source of truth (which I
will teak soon further).

Suggestion:

"Kinda self-explanatory (special means don't touch) unless you use museum piece
hardware OR IF YOU ARE XEN!"

;)

I looked into the XEN stuff and it is *extremely* nasty.

No, it doesn't do a pte_mkspecial(). It updates the PTE using ...

        !!! A HYPERCALL !!!

WTF, why did we ever allow that.

It's documented to require GUP to work because ... QEMU AIO. Otherwise we could easily convert it to a proper PFNMAP.

--
Cheers,

David / dhildenb


Reply via email to