On 8 April 2016 at 18:20, Tom Hanson <thomas.han...@linaro.org> wrote: > On Mon, 2016-04-04 at 10:56 -0700, Richard Henderson wrote: >> On 04/04/2016 09:31 AM, Peter Maydell wrote: >> > On 4 April 2016 at 17:28, Richard Henderson <r...@twiddle.net> wrote: >> >> On 04/04/2016 08:51 AM, Peter Maydell wrote: >> >>> In particular I think if you just do the relevant handling of the tag >> >>> bits in target-arm's get_phys_addr() and its subroutines then this >> >>> should work ok, with the exceptions that: >> >>> * the QEMU TLB code will think that [tag A + address X] and >> >>> [tag B + address X] are different virtual addresses and they will >> >>> miss each other in the TLB >> >> >> >> >> >> Yep. Not only miss, but actively contend with each other. >> > >> > Yes. Can we avoid that, or do we just have to live with it? I guess >> > if the TCG fast path is doing a compare on full insn+tag then we >> > pretty much have to live with it. >> >> We have to live with it. Implementing a more complex hashing algorithm in >> the >> fast path is probably a non-starter. >> >> Hopefully if one is using multiple tags, they'll still be in the victim cache >> and so you won't have to fall back to the full tlb lookup.
> It seems like the "best" solution would be to mask the tag in the TLB > and it feels like it should be possible. BUT I need to dig into the > code more. > > Is it an option to mask off the tag bits in all cases? Is there any case > it which those bits are valid address bits? The problem, as Richard says, is that our fast path for guest loads/stores is a bit of inline assembly that basically fishes the right entry out of the TLB and compares it against the input address (ie whatever the guest address to the load is including the tag). A comparison match means we take the fast path and do an inline access to the backing guest RAM. A mismatch means we take the slow path (for TLB misses, IO devices, and various other cases). Since the guest address that the fast path sees includes the tag bits, if the TLB entry doesn't include the tag bits then we'd need to do an extra mask operation in the fast path, which is (a) not good for performance and (b) would require modifying nine different TCG backends. For a rarely used feature this is much too much effort (and it slows down all the code that doesn't use tags for an uncertain benefit to the code that does use them). (If you're curious about the inline assembly, it's generated by functions like tlb_out_tlb_load() in tcg/i386/tcg-target.inc.c for the x86 backend; similarly for the various other backends.) thanks -- PMM