Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

Alexander Graf Wed, 10 Jul 2013 17:17:57 -0700

On 11.07.2013, at 02:15, Scott Wood wrote:

> On 07/10/2013 05:50:01 PM, Alexander Graf wrote:
>> On 10.07.2013, at 20:42, Scott Wood wrote:
>> > On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
>> >> On 10.07.2013, at 02:06, Scott Wood wrote:
>> >> > On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
>> >> >> On 09.07.2013, at 20:46, Scott Wood wrote:
>> >> >> > I suspect that tlbsx is faster, or at worst similar.  And unlike 
>> >> >> > comparing tlbsx to lwepx (not counting a fix for the threading 
>> >> >> > problem), we don't already have code to search the guest TLB, so 
>> >> >> > testing would be more work.
>> >> >> We have code to walk the guest TLB for TLB misses. This really is just 
>> >> >> the TLB miss search without host TLB injection.
>> >> >> So let's say we're using the shadow TLB. The guest always has its say 
>> >> >> 64 TLB entries that it can count on - we never evict anything by 
>> >> >> accident, because we store all of the 64 entries in our guest TLB 
>> >> >> cache. When the guest faults at an address, the first thing we do is 
>> >> >> we check the cache whether we have that page already mapped.
>> >> >> However, with this method we now have 2 enumeration methods for guest 
>> >> >> TLB searches. We have the tlbsx one which searches the host TLB and we 
>> >> >> have our guest TLB cache. The guest TLB cache might still contain an 
>> >> >> entry for an address that we already invalidated on the host. Would 
>> >> >> that impose a problem?
>> >> >> I guess not because we're swizzling the exit code around to instead be 
>> >> >> an instruction miss which means we restore the TLB entry into our 
>> >> >> host's TLB so that when we resume, we land here and the tlbsx hits. 
>> >> >> But it feels backwards.
>> >> >
>> >> > Any better way?  Searching the guest TLB won't work for the LRAT case, 
>> >> > so we'd need to have this logic around anyway.  We shouldn't add a 
>> >> > second codepath unless it's a clear performance gain -- and again, I 
>> >> > suspect it would be the opposite, especially if the entry is not in 
>> >> > TLB0 or in one of the first few entries searched in TLB1.  The tlbsx 
>> >> > miss case is not what we should optimize for.
>> >> Hrm.
>> >> So let's redesign this thing theoretically. We would have an exit that 
>> >> requires an instruction fetch. We would override kvmppc_get_last_inst() 
>> >> to always do kvmppc_ld_inst(). That one can fail because it can't find 
>> >> the TLB entry in the host TLB. When it fails, we have to abort the 
>> >> emulation and resume the guest at the same IP.
>> >> Now the guest gets the TLB miss, we populate, go back into the guest. The 
>> >> guest hits the emulation failure again. We go back to kvmppc_ld_inst() 
>> >> which succeeds this time and we can emulate the instruction.
>> >
>> > That's pretty much what this patch does, except that it goes immediately 
>> > to the TLB miss code rather than having the extra round-trip back to the 
>> > guest.  Is there any benefit from adding that extra round-trip?  Rewriting 
>> > the exit type instead doesn't seem that bad...
>> It's pretty bad. I want to have code that is easy to follow - and I don't 
>> care whether the very rare case of a TLB entry getting evicted by a random 
>> other thread right when we execute the exit path is slower by a few percent 
>> if we get cleaner code for that.
> 
> I guess I just don't see how this is so much harder to follow than returning 
> to guest.  I find it harder to follow the flow when there are more round 
> trips to the guest involved.  "Treat this as an ITLB miss" is simpler than, 
> "Let this fail, and make sure we retry the trapping instruction on failure.  
> Then, an ITLB miss will happen."
> 
> Also note that making kvmppc_get_last_inst() able to fail means updating 
> several existing callsites, both for the change in function signature and to 
> actually handle failures.
> 
> I don't care that deeply either way, it just doesn't seem obviously better.
> 
>> >> I think this works. Just make sure that the gateway to the instruction 
>> >> fetch is kvmppc_get_last_inst() and make that failable. Then the 
>> >> difference between looking for the TLB entry in the host's TLB or in the 
>> >> guest's TLB cache is hopefully negligible.
>> >
>> > I don't follow here.  What does this have to do with looking in the guest 
>> > TLB?
>> I want to hide the fact that we're cheating as much as possible, that's it.
> 
> How are we cheating, and what specifically are you proposing to do to hide 
> that?  How is the guest TLB involved at all in the change you're asking for?


It's not involved, but it's basically what we do on book3s pr kvm. There 
kvmppc_ld reads the guest htab, not the host htab. I think it's fine to expose 
both cases as the same thing to the rest of kvm.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

Reply via email to