On Fri, Feb 08, 2019 at 07:43:57AM +0530, Anshuman Khandual wrote: > Hello, > > THP is currently supported for > > - PMD level pages (anon and file) > - PUD level pages (file - DAX file system) > > THP is a single entry mapping at standard page table levels (either PMD or > PUD) > > But architectures like ARM64 supports non-standard page table level huge pages > with contiguous bits. > > - These are created as multiple entries at either PTE or PMD level > - These multiple entries carry pages which are physically contiguous > - A special PTE bit (PTE_CONT) is set indicating single entry to be contiguous > > These multiple contiguous entries create a huge page size which is different > than standard PMD/PUD level but they provide benefits of huge memory like > less number of faults, bigger TLB coverage, less TLB miss etc. > > Currently they are used as HugeTLB pages because > > - HugeTLB page sizes is carried in the VMA > - Page table walker can operate on multiple PTE or PMD entries given > its size in VMA > - Irrespective of HugeTLB page size its operated with set_huge_pte_at() > at any level > - set_huge_pte_at() is arch specific which knows how to encode multiple > consecutive entries > > But not as THP huge pages because > > - THP size is not encoded any where like VMA > - Page table walker expects it to be either at PUD (HPAGE_PUD_SIZE) or > at PMD (HPAGE_PMD_SIZE) > - Page table operates directly with set_pmd_at() or set_pud_at() > - Direct faulted or promoted huge pages is verified with > [pmd|pud]_trans_huge() > > How non-standard huge pages can be supported for THP > > - THP starts recognizing non standard huge page (exported by arch) like > HPAGE_CONT_(PMD|PTE)_SIZE > - THP starts operating for either on HPAGE_PMD_SIZE or > HPAGE_CONT_PMD_SIZE or HPAGE_CONT_PTE_SIZE > - set_pmd_at() only recognizes HPAGE_PMD_SIZE hence replace > set_pmd_at() with set_huge_pmd_at() > - set_huge_pmd_at() could differentiate between HPAGE_PMD_SIZE or > HPAGE_CONT_PMD_SIZE > - In case for HPAGE_CONT_PTE_SIZE extend page table walker till PTE > level > - Use set_huge_pte_at() which can operate on multiple contiguous PTE > bits
You only listed trivial things. All tricky stuff is what make THP transparent. To consider it seriously we need to understand what it means for split_huge_p?d()/split_huge_page()? How khugepaged will deal with this? In particular, I'm worry to expose (to user or CPU) page table state in the middle of conversion (huge->small or small->huge). Handling this on page table level provides a level atomicity that you will not have. Honestly, I'm very skeptical about the idea. It took a lot of time to stabilize THP for singe page size, equal to PMD page table, but this looks like a new can of worms. :P It *might* be possible to support it for DAX, but beyond that... -- Kirill A. Shutemov