On Thu, Apr 07, 2022 at 07:28:54AM -0700, Richard Henderson wrote:
> On 4/7/22 06:18, Kirill A. Shutemov wrote:
> > > The new hook is incorrect, in that it doesn't apply to addresses along
> > > the tlb fast path.
> > 
> > I'm not sure what you mean by that. tlb_hit() mechanics works. We strip
> > the tag bits before tlb lookup.
> > 
> > Could you elaborate?
> 
> The fast path does not clear the bits, so you enter the slow path before you
> get to clearing the bits.  You've lost most of the advantage of the tlb
> already.

Sorry for my ignorance, but what do you mean by fast path here?

My understanding is that it is the case when tlb_hit() is true and you
don't need to get into tlb_fill(). Are we talking about the same scheme?

For store_helper() I clear the bits before doing TLB look and fill. So TLB
will always deal with clean addresses.

Hm?

> > To be honest I don't fully understand how TBI emulation works.
> 
> In get_phys_addr_lpae:
> 
>         addrsize = 64 - 8 * param.tbi;
> ...
>         target_ulong top_bits = sextract64(address, inputsize,
>                                            addrsize - inputsize);
>         if (-top_bits != param.select) {
>             /* The gap between the two regions is a Translation fault */
>             fault_type = ARMFault_Translation;
>             goto do_fault;
>         }
> 
> which does not include TBI bits in the validation of the sign-extended 
> address.
> 
> > Consider store_helper(). I failed to find where tag bits get stripped
> > before getting there for !CONFIG_USER_ONLY. clean_data_tbi() only covers
> > user-only case.
> > 
> > And if we get there with tags, I don't see how we will ever get to fast
> > path: tlb_hit() should never return true there if any bit in top byte is
> > set as cached tlb_addr has them stripped.
> > 
> > tlb_fill() will get it handled correctly, but it is wasteful to go through
> > pagewalk on every tagged pointer dereference.
> 
> We won't do a pagewalk for every tagged pointer dereference.  It'll be
> pointer dereferences with differing tags past the limit of the victim cache
> (CPU_VTLB_SIZE).  And one tag will get to use the fast path, e.g. on the
> store following a load.
> 
> I've just now had a browse through the Intel docs, and I see that you're not
> performing the required modified canonicality check.

Modified is effectively done by clearing (and sign-extending) the address
before the check.

> While a proper tagged address will have the tag removed in CR2 during a
> page fault, an improper tagged address (with bit 63 != {47,56}) should
> have the original address reported to CR2.

Hm. I don't see it in spec. It rather points to other direction:

        Page faults report the faulting linear address in CR2. Because LAM
        masking (by sign-extension) applies before paging, the faulting
        linear address recorded in CR2 does not contain the masked
        metadata.

Yes, it talks about CR2 in case of page fault, not #GP due to canonicality
checking, but still.

> I could imagine a hook that could aid the victim cache in ignoring the tag,
> so that we need go through tlb_fill fewer times.  But I wouldn't want to
> include that in the base version of this feature, and I'd want take more
> than a moment in the design so that it could be used by ARM and RISC-V as
> well.

But what other options do you see. Clering the bits before TLB look up
matches the architectural spec and makes INVLPG match described behaviour
without special handling.

-- 
 Kirill A. Shutemov

Reply via email to