On Wed, Nov 11, 2020 at 06:26:20PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 11, 2020 at 06:22:53PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2020 at 04:38:48PM +0000, Matthew Wilcox wrote:
> > >   if (pud_leaf(pud))
> > >           return PUD_SIZE;
> > 
> > But that doesn't handle non-pagetable aligned hugetlb sizes. Granted,
> > that's unlikely at the PUD level, but why be inconsistent..
> > 
> > So we really want:
> > 
> >     if (p*d_leaf(p*d)) {
> >             if (!'special') {
> >                     page = p*d_page(p*d);
> >                     if (PageHuge(page))
> >                             return page_size(compound_head(page));
> >             }
> >             return P*D_SIZE;
> >     }
> 
> Still doesn't work because pages can be mapped at funny offsets.

Wait, what?! Is there hardware that has unaligned TLB page-sizes?

Can you start a 64K page at an 8k offset? I don't think I've ever seen
that. Still even with that, how would the above go wrong there? It would
find the compound page covering @addr, PageHuge() (and possibly some
addition arch specific condition) returns true and we get the compound
size to find the hardware page size used.

> What we really want is for a weak definition of
> 
> unsigned long tlb_size(struct mm_struct *mm, unsigned long addr)
> {
>       if (p*d_leaf(p*d))
>               return p*d_size(p*d);
> }
> 
> then ARM can look at its special bit in the page table to determine
> whether this is a singleton or part of a brace of pages.

That's basically what we provide. but really the only thing that's
missing from this generic page walker is the ability to detect if a
!PageHuge compound page is actually still a hardware page.

> > Now, when you add !PMD THP sizes (presumably for architectures that have
> > 'funny' sizes, otherwise what's the point), then you get to add '||
> 
> This is the problem with all the huge page support in Linux today.
> It's written by people who work for hardware companies who think only
> about exploiting the hardware features they sell.  You all ignore the
> very real software overhedas of trying to manage millions of pages.
> I see a 6% reduction in kernel overhead when running kernbench using
> THPs that may go as large as 256kB.  On x86.  Intel x86, at that.

That's a really nice improvement. However then this code doesn't care
about it. Please make it possible to distinguish between THP on hardware
pages vs software pages.

Reply via email to