On 12/9/20 6:14 PM, Matthew Wilcox wrote:
> On Wed, Dec 09, 2020 at 12:24:38PM -0400, Jason Gunthorpe wrote:
>> On Wed, Dec 09, 2020 at 04:02:05PM +0000, Joao Martins wrote:
>>
>>> Today (without the series) struct pages are not represented the way they
>>> are expressed in the page tables, which is what I am hoping to fix in this
>>> series thus initializing these as compound pages of a given order. But me
>>> introducing PGMAP_COMPOUND was to conservatively keep both old 
>>> (non-compound)
>>> and new (compound pages) co-exist.
>>
>> Oooh, that I didn't know.. That is kind of horrible to have a PMD
>> pointing at an order 0 page only in this one special case.
> 
> Uh, yes.  I'm surprised it hasn't caused more problems.
> 
There was 1 or 2 problems in the KVM MMU related to zone device pages.

See commit e851265a816f ("KVM: x86/mmu: Use huge pages for DAX-backed files")
which eventually lead to commit db5432165e9b5 ("KVM: x86/mmu: Walk host page
tables to find THP mappings") to be less amenable to metadata changes.

>> Still, I think it would be easier to teach record_subpages() that a
>> PMD doesn't necessarily point to a high order page, eg do something
>> like I suggested for the SGL where it extracts the page order and
>> iterates over the contiguous range of pfns.
> 
> But we also see good performance improvements from doing all reference
> counts on the head page instead of spread throughout the pages, so we
> really want compound pages.

Going further than just refcounts and borrowing your (or someone else?)
idea, perhaps also a FOLL_HEAD gup flag that would let us only work with
head pages (or folios). Which would consequently let us pin/grab bigger
swathes of memory e.g. 1G (in 2M head pages) or 512G (in 1G head pages)
with just 1 page for storing the struct pages[*]. Albeit I suspect the
numbers would have to justify it.

        Joao

[*] One page happens to be what's used for RDMA/umem and vdpa as callers
of pin_user_pages*()
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Reply via email to