On Wed, Feb 13, 2019 at 06:20:03PM +0530, Anshuman Khandual wrote: > > > On 02/12/2019 02:03 PM, Kirill A. Shutemov wrote: > > On Fri, Feb 08, 2019 at 07:43:57AM +0530, Anshuman Khandual wrote: > >> Hello, > >> > >> THP is currently supported for > >> > >> - PMD level pages (anon and file) > >> - PUD level pages (file - DAX file system) > >> > >> THP is a single entry mapping at standard page table levels (either PMD or > >> PUD) > >> > >> But architectures like ARM64 supports non-standard page table level huge > >> pages > >> with contiguous bits. > >> > >> - These are created as multiple entries at either PTE or PMD level > >> - These multiple entries carry pages which are physically contiguous > >> - A special PTE bit (PTE_CONT) is set indicating single entry to be > >> contiguous > >> > >> These multiple contiguous entries create a huge page size which is > >> different > >> than standard PMD/PUD level but they provide benefits of huge memory like > >> less number of faults, bigger TLB coverage, less TLB miss etc. > >> > >> Currently they are used as HugeTLB pages because > >> > >> - HugeTLB page sizes is carried in the VMA > >> - Page table walker can operate on multiple PTE or PMD entries given > >> its size in VMA > >> - Irrespective of HugeTLB page size its operated with set_huge_pte_at() > >> at any level > >> - set_huge_pte_at() is arch specific which knows how to encode multiple > >> consecutive entries > >> > >> But not as THP huge pages because > >> > >> - THP size is not encoded any where like VMA > >> - Page table walker expects it to be either at PUD (HPAGE_PUD_SIZE) or > >> at PMD (HPAGE_PMD_SIZE) > >> - Page table operates directly with set_pmd_at() or set_pud_at() > >> - Direct faulted or promoted huge pages is verified with > >> [pmd|pud]_trans_huge() > >> > >> How non-standard huge pages can be supported for THP > >> > >> - THP starts recognizing non standard huge page (exported by arch) like > >> HPAGE_CONT_(PMD|PTE)_SIZE > >> - THP starts operating for either on HPAGE_PMD_SIZE or > >> HPAGE_CONT_PMD_SIZE or HPAGE_CONT_PTE_SIZE > >> - set_pmd_at() only recognizes HPAGE_PMD_SIZE hence replace > >> set_pmd_at() with set_huge_pmd_at() > >> - set_huge_pmd_at() could differentiate between HPAGE_PMD_SIZE or > >> HPAGE_CONT_PMD_SIZE > >> - In case for HPAGE_CONT_PTE_SIZE extend page table walker till PTE > >> level > >> - Use set_huge_pte_at() which can operate on multiple contiguous PTE > >> bits > > > > You only listed trivial things. All tricky stuff is what make THP > > transparent. > > Agreed. I was trying to draw an analogy from HugeTLB with respect to page > table creation and it's walking. Huge page collapse and split on such non > standard huge pages will involve taking care of much details. > > > > > To consider it seriously we need to understand what it means for > > split_huge_p?d()/split_huge_page()? How khugepaged will deal with this? > > Absolutely. Can these operate on non standard probably multi entry based > huge pages ? How to handle atomicity etc.
We need to handle split for them to provide transparency. > > In particular, I'm worry to expose (to user or CPU) page table state in > > the middle of conversion (huge->small or small->huge). Handling this on > > page table level provides a level atomicity that you will not have. > > I understand it might require a software based lock instead of standard HW > atomicity constructs which will make it slow but is that even possible ? I'm not yet sure if it is possible. I don't yet wrap my head around the idea yet. > > Honestly, I'm very skeptical about the idea. It took a lot of time to > > stabilize THP for singe page size, equal to PMD page table, but this looks > > like a new can of worms. :P > > I understand your concern here but HW providing some more TLB sizes beyond > standard page table level (PMD/PUD/PGD) based huge pages can help achieve > performance improvement when the buddy is already fragmented enough not to > provide higher order pages. PUD THP file mapping is already supported for > DAX and PUD THP anon mapping might be supported in near future (it is not > much challenging other than allocating HPAGE_PUD_SIZE huge page at runtime > will be much difficult). That's a bold claim. I would like to look at code. :) Supporting more than one THP page size at the same time brings a lot more questions, besides allocation path (although I'm sure compaction will be happy about this). For instance, what page size you'll allocate for a given fault address? How do you deal with pre-allocated page tables? Deposit 513 page tables for a given PUD THP page might be fun. :P > Around PMD sizes like HPAGE_CONT_PMD_SIZE or > HPAGE_CONT_PTE_SIZE really have better chances as future non-PMD level anon > mapping than a PUD size anon mapping support in THP. > > > > > It *might* be possible to support it for DAX, but beyond that... > > > > Did not get that. Why would you think that this is possible or appropriate > only for DAX file mapping but not for anon mapping ? DAX THP is inherently simpler: no struct pages -- less state to track and no need in split_huge_page(), split_huge_p?d() can be handled by dropping entities in question and re-faulting them as smaller entires. No problem with compation... -- Kirill A. Shutemov

