On Fri, Feb 08, 2019 at 07:43:57AM +0530, Anshuman Khandual wrote:
> Hello,
> 
> THP is currently supported for
> 
> - PMD level pages (anon and file)
> - PUD level pages (file - DAX file system)
> 
> THP is a single entry mapping at standard page table levels (either PMD or 
> PUD)
> 
> But architectures like ARM64 supports non-standard page table level huge pages
> with contiguous bits.
> 
> - These are created as multiple entries at either PTE or PMD level
> - These multiple entries carry pages which are physically contiguous
> - A special PTE bit (PTE_CONT) is set indicating single entry to be contiguous
> 
> These multiple contiguous entries create a huge page size which is different
> than standard PMD/PUD level but they provide benefits of huge memory like
> less number of faults, bigger TLB coverage, less TLB miss etc.
> 
> Currently they are used as HugeTLB pages because
> 
>       - HugeTLB page sizes is carried in the VMA
>       - Page table walker can operate on multiple PTE or PMD entries given 
> its size in VMA
>       - Irrespective of HugeTLB page size its operated with set_huge_pte_at() 
> at any level
>       - set_huge_pte_at() is arch specific which knows how to encode multiple 
> consecutive entries
>       
> But not as THP huge pages because
> 
>       - THP size is not encoded any where like VMA
>       - Page table walker expects it to be either at PUD (HPAGE_PUD_SIZE) or 
> at PMD (HPAGE_PMD_SIZE)
>       - Page table operates directly with set_pmd_at() or set_pud_at()
>       - Direct faulted or promoted huge pages is verified with 
> [pmd|pud]_trans_huge()
> 
> How non-standard huge pages can be supported for THP
> 
>       - THP starts recognizing non standard huge page (exported by arch) like 
> HPAGE_CONT_(PMD|PTE)_SIZE
>       - THP starts operating for either on HPAGE_PMD_SIZE or 
> HPAGE_CONT_PMD_SIZE or HPAGE_CONT_PTE_SIZE
>       - set_pmd_at() only recognizes HPAGE_PMD_SIZE hence replace 
> set_pmd_at() with set_huge_pmd_at()
>       - set_huge_pmd_at() could differentiate between HPAGE_PMD_SIZE or 
> HPAGE_CONT_PMD_SIZE
>       - In case for HPAGE_CONT_PTE_SIZE extend page table walker till PTE 
> level
>       - Use set_huge_pte_at() which can operate on multiple contiguous PTE 
> bits

You only listed trivial things. All tricky stuff is what make THP
transparent.

To consider it seriously we need to understand what it means for
split_huge_p?d()/split_huge_page()? How khugepaged will deal with this?

In particular, I'm worry to expose (to user or CPU) page table state in
the middle of conversion (huge->small or small->huge). Handling this on
page table level provides a level atomicity that you will not have.

Honestly, I'm very skeptical about the idea. It took a lot of time to
stabilize THP for singe page size, equal to PMD page table, but this looks
like a new can of worms. :P

It *might* be possible to support it for DAX, but beyond that...

-- 
 Kirill A. Shutemov

Reply via email to