On Thu, Apr 07, 2022 at 07:28:54AM -0700, Richard Henderson wrote: > On 4/7/22 06:18, Kirill A. Shutemov wrote: > > > The new hook is incorrect, in that it doesn't apply to addresses along > > > the tlb fast path. > > > > I'm not sure what you mean by that. tlb_hit() mechanics works. We strip > > the tag bits before tlb lookup. > > > > Could you elaborate? > > The fast path does not clear the bits, so you enter the slow path before you > get to clearing the bits. You've lost most of the advantage of the tlb > already.
Sorry for my ignorance, but what do you mean by fast path here? My understanding is that it is the case when tlb_hit() is true and you don't need to get into tlb_fill(). Are we talking about the same scheme? For store_helper() I clear the bits before doing TLB look and fill. So TLB will always deal with clean addresses. Hm? > > To be honest I don't fully understand how TBI emulation works. > > In get_phys_addr_lpae: > > addrsize = 64 - 8 * param.tbi; > ... > target_ulong top_bits = sextract64(address, inputsize, > addrsize - inputsize); > if (-top_bits != param.select) { > /* The gap between the two regions is a Translation fault */ > fault_type = ARMFault_Translation; > goto do_fault; > } > > which does not include TBI bits in the validation of the sign-extended > address. > > > Consider store_helper(). I failed to find where tag bits get stripped > > before getting there for !CONFIG_USER_ONLY. clean_data_tbi() only covers > > user-only case. > > > > And if we get there with tags, I don't see how we will ever get to fast > > path: tlb_hit() should never return true there if any bit in top byte is > > set as cached tlb_addr has them stripped. > > > > tlb_fill() will get it handled correctly, but it is wasteful to go through > > pagewalk on every tagged pointer dereference. > > We won't do a pagewalk for every tagged pointer dereference. It'll be > pointer dereferences with differing tags past the limit of the victim cache > (CPU_VTLB_SIZE). And one tag will get to use the fast path, e.g. on the > store following a load. > > I've just now had a browse through the Intel docs, and I see that you're not > performing the required modified canonicality check. Modified is effectively done by clearing (and sign-extending) the address before the check. > While a proper tagged address will have the tag removed in CR2 during a > page fault, an improper tagged address (with bit 63 != {47,56}) should > have the original address reported to CR2. Hm. I don't see it in spec. It rather points to other direction: Page faults report the faulting linear address in CR2. Because LAM masking (by sign-extension) applies before paging, the faulting linear address recorded in CR2 does not contain the masked metadata. Yes, it talks about CR2 in case of page fault, not #GP due to canonicality checking, but still. > I could imagine a hook that could aid the victim cache in ignoring the tag, > so that we need go through tlb_fill fewer times. But I wouldn't want to > include that in the base version of this feature, and I'd want take more > than a moment in the design so that it could be used by ARM and RISC-V as > well. But what other options do you see. Clering the bits before TLB look up matches the architectural spec and makes INVLPG match described behaviour without special handling. -- Kirill A. Shutemov