Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 02:06, Scott Wood wrote:

 On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
 On 09.07.2013, at 20:46, Scott Wood wrote:
  I suspect that tlbsx is faster, or at worst similar.  And unlike comparing 
  tlbsx to lwepx (not counting a fix for the threading problem), we don't 
  already have code to search the guest TLB, so testing would be more work.
 We have code to walk the guest TLB for TLB misses. This really is just the 
 TLB miss search without host TLB injection.
 So let's say we're using the shadow TLB. The guest always has its say 64 TLB 
 entries that it can count on - we never evict anything by accident, because 
 we store all of the 64 entries in our guest TLB cache. When the guest faults 
 at an address, the first thing we do is we check the cache whether we have 
 that page already mapped.
 However, with this method we now have 2 enumeration methods for guest TLB 
 searches. We have the tlbsx one which searches the host TLB and we have our 
 guest TLB cache. The guest TLB cache might still contain an entry for an 
 address that we already invalidated on the host. Would that impose a problem?
 I guess not because we're swizzling the exit code around to instead be an 
 instruction miss which means we restore the TLB entry into our host's TLB so 
 that when we resume, we land here and the tlbsx hits. But it feels backwards.
 
 Any better way?  Searching the guest TLB won't work for the LRAT case, so 
 we'd need to have this logic around anyway.  We shouldn't add a second 
 codepath unless it's a clear performance gain -- and again, I suspect it 
 would be the opposite, especially if the entry is not in TLB0 or in one of 
 the first few entries searched in TLB1.  The tlbsx miss case is not what we 
 should optimize for.

Hrm.

So let's redesign this thing theoretically. We would have an exit that requires 
an instruction fetch. We would override kvmppc_get_last_inst() to always do 
kvmppc_ld_inst(). That one can fail because it can't find the TLB entry in the 
host TLB. When it fails, we have to abort the emulation and resume the guest at 
the same IP.

Now the guest gets the TLB miss, we populate, go back into the guest. The guest 
hits the emulation failure again. We go back to kvmppc_ld_inst() which succeeds 
this time and we can emulate the instruction.

I think this works. Just make sure that the gateway to the instruction fetch is 
kvmppc_get_last_inst() and make that failable. Then the difference between 
looking for the TLB entry in the host's TLB or in the guest's TLB cache is 
hopefully negligible.

 
 At least this code has to become something more generic, such as 
 kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu 
 implementation, as it's 100% host mmu specific.
 
 I agree that e500_mmu_host.c is a better place for it (with an ifdef for 
 BOOKEHV), but supporting anything other than instruction fetches could wait 
 until we have a user for it (it means extra code to figure out if permissions 
 are correct).

Works for me, as long as it's either documented through BUG_ON/WARN_ON's or an 
explicit naming convention.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 02:12, Scott Wood wrote:

 On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
 On 28.06.2013, at 11:20, Mihai Caraman wrote:
  +  /* Get page size */
  +  if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
  +  psize_shift = PAGE_SHIFT;
  +  else
  +  psize_shift = MAS1_GET_TSIZE(mas1) + 10;
  +
  +  mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
  +  mfspr(SPRN_MAS3);
  +  addr = (mas7_mas3  (~0ULL  psize_shift)) |
  + (geaddr  ((1ULL  psize_shift) - 1ULL));
  +
  +  /* Map a page and get guest's instruction */
  +  page = pfn_to_page(addr  PAGE_SHIFT);
 While looking at this I just realized that you're missing a check here. What 
 if our IP is in some PCI BAR? Or can't we execute from those?
 
 We at least need to check pfn_valid() first.  That'll just keep us from 
 accessing a bad pointer in the host kernel, though -- it won't make the 
 emulation actually work.  If we need that, we'll probably need to create a 
 temporary TLB entry manually.

ioremap()?

However, if we were walking the guest TLB cache instead we would get a guest 
physical address which we can always resolve to a host virtual address.

I'm not sure how important that whole use case is though. Maybe we should just 
error out to the guest for now.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Scott Wood

On 07/10/2013 05:18:10 AM, Alexander Graf wrote:


On 10.07.2013, at 02:12, Scott Wood wrote:

 On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
 On 28.06.2013, at 11:20, Mihai Caraman wrote:
  + /* Get page size */
  + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
  + psize_shift = PAGE_SHIFT;
  + else
  + psize_shift = MAS1_GET_TSIZE(mas1) + 10;
  +
  + mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
  + mfspr(SPRN_MAS3);
  + addr = (mas7_mas3  (~0ULL  psize_shift)) |
  +(geaddr  ((1ULL  psize_shift) - 1ULL));
  +
  + /* Map a page and get guest's instruction */
  + page = pfn_to_page(addr  PAGE_SHIFT);
 While looking at this I just realized that you're missing a check  
here. What if our IP is in some PCI BAR? Or can't we execute from  
those?


 We at least need to check pfn_valid() first.  That'll just keep us  
from accessing a bad pointer in the host kernel, though -- it won't  
make the emulation actually work.  If we need that, we'll probably  
need to create a temporary TLB entry manually.


ioremap()?


That's a bit heavy... also we'd need to deal with cacheability.  This  
code is already engaged in directly creating TLB entries, so it doesn't  
seem like much of a stretch to create one for this.  It should be  
faster than ioremap() or kmap_atomic().


The one complication is allocating the virtual address space, but maybe  
we could just use the page that kmap_atomic would have used?  Of  
course, if we want to handle execution from other than normal kernel  
memory, we'll need to make sure that the virtual address space is  
allocated even when highmem is not present (e.g. 64-bit).


However, if we were walking the guest TLB cache instead we would get  
a guest physical address which we can always resolve to a host  
virtual address.


I'm not sure how important that whole use case is though. Maybe we  
should just error out to the guest for now.


It's not that important, now that we are using hugetlb rather than  
directly mapping a large hunk of reserved memory.  It would be nice to  
handle it though, if we can do without too much hassle.  And I think  
manually creating a TLB entry could be faster than kmap_atomic(), or  
searching the guest TLB and then doing a reverse memslot lookup.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Scott Wood

On 07/10/2013 05:15:09 AM, Alexander Graf wrote:


On 10.07.2013, at 02:06, Scott Wood wrote:

 On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
 On 09.07.2013, at 20:46, Scott Wood wrote:
  I suspect that tlbsx is faster, or at worst similar.  And unlike  
comparing tlbsx to lwepx (not counting a fix for the threading  
problem), we don't already have code to search the guest TLB, so  
testing would be more work.
 We have code to walk the guest TLB for TLB misses. This really is  
just the TLB miss search without host TLB injection.
 So let's say we're using the shadow TLB. The guest always has its  
say 64 TLB entries that it can count on - we never evict anything by  
accident, because we store all of the 64 entries in our guest TLB  
cache. When the guest faults at an address, the first thing we do is  
we check the cache whether we have that page already mapped.
 However, with this method we now have 2 enumeration methods for  
guest TLB searches. We have the tlbsx one which searches the host TLB  
and we have our guest TLB cache. The guest TLB cache might still  
contain an entry for an address that we already invalidated on the  
host. Would that impose a problem?
 I guess not because we're swizzling the exit code around to  
instead be an instruction miss which means we restore the TLB entry  
into our host's TLB so that when we resume, we land here and the  
tlbsx hits. But it feels backwards.


 Any better way?  Searching the guest TLB won't work for the LRAT  
case, so we'd need to have this logic around anyway.  We shouldn't  
add a second codepath unless it's a clear performance gain -- and  
again, I suspect it would be the opposite, especially if the entry is  
not in TLB0 or in one of the first few entries searched in TLB1.  The  
tlbsx miss case is not what we should optimize for.


Hrm.

So let's redesign this thing theoretically. We would have an exit  
that requires an instruction fetch. We would override  
kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can  
fail because it can't find the TLB entry in the host TLB. When it  
fails, we have to abort the emulation and resume the guest at the  
same IP.


Now the guest gets the TLB miss, we populate, go back into the guest.  
The guest hits the emulation failure again. We go back to  
kvmppc_ld_inst() which succeeds this time and we can emulate the  
instruction.


That's pretty much what this patch does, except that it goes  
immediately to the TLB miss code rather than having the extra  
round-trip back to the guest.  Is there any benefit from adding that  
extra round-trip?  Rewriting the exit type instead doesn't seem that  
bad...


I think this works. Just make sure that the gateway to the  
instruction fetch is kvmppc_get_last_inst() and make that failable.  
Then the difference between looking for the TLB entry in the host's  
TLB or in the guest's TLB cache is hopefully negligible.


I don't follow here.  What does this have to do with looking in the  
guest TLB?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 20:37, Scott Wood wrote:

 On 07/10/2013 05:18:10 AM, Alexander Graf wrote:
 On 10.07.2013, at 02:12, Scott Wood wrote:
  On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
  On 28.06.2013, at 11:20, Mihai Caraman wrote:
   +   /* Get page size */
   +   if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
   +   psize_shift = PAGE_SHIFT;
   +   else
   +   psize_shift = MAS1_GET_TSIZE(mas1) + 10;
   +
   +   mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
   +   mfspr(SPRN_MAS3);
   +   addr = (mas7_mas3  (~0ULL  psize_shift)) |
   +  (geaddr  ((1ULL  psize_shift) - 1ULL));
   +
   +   /* Map a page and get guest's instruction */
   +   page = pfn_to_page(addr  PAGE_SHIFT);
  While looking at this I just realized that you're missing a check here. 
  What if our IP is in some PCI BAR? Or can't we execute from those?
 
  We at least need to check pfn_valid() first.  That'll just keep us from 
  accessing a bad pointer in the host kernel, though -- it won't make the 
  emulation actually work.  If we need that, we'll probably need to create a 
  temporary TLB entry manually.
 ioremap()?
 
 That's a bit heavy... also we'd need to deal with cacheability.  This code is 
 already engaged in directly creating TLB entries, so it doesn't seem like 
 much of a stretch to create one for this.  It should be faster than ioremap() 
 or kmap_atomic().

Do we have any guarantees that the TLB entry we create doesn't get evicted by 
another thread by the time we want to use it?


Alex

 The one complication is allocating the virtual address space, but maybe we 
 could just use the page that kmap_atomic would have used?  Of course, if we 
 want to handle execution from other than normal kernel memory, we'll need to 
 make sure that the virtual address space is allocated even when highmem is 
 not present (e.g. 64-bit).
 
 However, if we were walking the guest TLB cache instead we would get a guest 
 physical address which we can always resolve to a host virtual address.
 I'm not sure how important that whole use case is though. Maybe we should 
 just error out to the guest for now.
 
 It's not that important, now that we are using hugetlb rather than directly 
 mapping a large hunk of reserved memory.  It would be nice to handle it 
 though, if we can do without too much hassle.  And I think manually creating 
 a TLB entry could be faster than kmap_atomic(), or searching the guest TLB 
 and then doing a reverse memslot lookup.
 
 -Scott
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 20:42, Scott Wood wrote:

 On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
 On 10.07.2013, at 02:06, Scott Wood wrote:
  On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
  On 09.07.2013, at 20:46, Scott Wood wrote:
   I suspect that tlbsx is faster, or at worst similar.  And unlike 
   comparing tlbsx to lwepx (not counting a fix for the threading 
   problem), we don't already have code to search the guest TLB, so 
   testing would be more work.
  We have code to walk the guest TLB for TLB misses. This really is just 
  the TLB miss search without host TLB injection.
  So let's say we're using the shadow TLB. The guest always has its say 64 
  TLB entries that it can count on - we never evict anything by accident, 
  because we store all of the 64 entries in our guest TLB cache. When the 
  guest faults at an address, the first thing we do is we check the cache 
  whether we have that page already mapped.
  However, with this method we now have 2 enumeration methods for guest TLB 
  searches. We have the tlbsx one which searches the host TLB and we have 
  our guest TLB cache. The guest TLB cache might still contain an entry for 
  an address that we already invalidated on the host. Would that impose a 
  problem?
  I guess not because we're swizzling the exit code around to instead be an 
  instruction miss which means we restore the TLB entry into our host's TLB 
  so that when we resume, we land here and the tlbsx hits. But it feels 
  backwards.
 
  Any better way?  Searching the guest TLB won't work for the LRAT case, so 
  we'd need to have this logic around anyway.  We shouldn't add a second 
  codepath unless it's a clear performance gain -- and again, I suspect it 
  would be the opposite, especially if the entry is not in TLB0 or in one of 
  the first few entries searched in TLB1.  The tlbsx miss case is not what 
  we should optimize for.
 Hrm.
 So let's redesign this thing theoretically. We would have an exit that 
 requires an instruction fetch. We would override kvmppc_get_last_inst() to 
 always do kvmppc_ld_inst(). That one can fail because it can't find the TLB 
 entry in the host TLB. When it fails, we have to abort the emulation and 
 resume the guest at the same IP.
 Now the guest gets the TLB miss, we populate, go back into the guest. The 
 guest hits the emulation failure again. We go back to kvmppc_ld_inst() which 
 succeeds this time and we can emulate the instruction.
 
 That's pretty much what this patch does, except that it goes immediately to 
 the TLB miss code rather than having the extra round-trip back to the guest.  
 Is there any benefit from adding that extra round-trip?  Rewriting the exit 
 type instead doesn't seem that bad...

It's pretty bad. I want to have code that is easy to follow - and I don't care 
whether the very rare case of a TLB entry getting evicted by a random other 
thread right when we execute the exit path is slower by a few percent if we get 
cleaner code for that.

 
 I think this works. Just make sure that the gateway to the instruction fetch 
 is kvmppc_get_last_inst() and make that failable. Then the difference 
 between looking for the TLB entry in the host's TLB or in the guest's TLB 
 cache is hopefully negligible.
 
 I don't follow here.  What does this have to do with looking in the guest TLB?

I want to hide the fact that we're cheating as much as possible, that's it.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Scott Wood

On 07/10/2013 05:50:01 PM, Alexander Graf wrote:


On 10.07.2013, at 20:42, Scott Wood wrote:

 On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
 On 10.07.2013, at 02:06, Scott Wood wrote:
  On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
  On 09.07.2013, at 20:46, Scott Wood wrote:
   I suspect that tlbsx is faster, or at worst similar.  And  
unlike comparing tlbsx to lwepx (not counting a fix for the threading  
problem), we don't already have code to search the guest TLB, so  
testing would be more work.
  We have code to walk the guest TLB for TLB misses. This really  
is just the TLB miss search without host TLB injection.
  So let's say we're using the shadow TLB. The guest always has  
its say 64 TLB entries that it can count on - we never evict anything  
by accident, because we store all of the 64 entries in our guest TLB  
cache. When the guest faults at an address, the first thing we do is  
we check the cache whether we have that page already mapped.
  However, with this method we now have 2 enumeration methods for  
guest TLB searches. We have the tlbsx one which searches the host TLB  
and we have our guest TLB cache. The guest TLB cache might still  
contain an entry for an address that we already invalidated on the  
host. Would that impose a problem?
  I guess not because we're swizzling the exit code around to  
instead be an instruction miss which means we restore the TLB entry  
into our host's TLB so that when we resume, we land here and the  
tlbsx hits. But it feels backwards.

 
  Any better way?  Searching the guest TLB won't work for the LRAT  
case, so we'd need to have this logic around anyway.  We shouldn't  
add a second codepath unless it's a clear performance gain -- and  
again, I suspect it would be the opposite, especially if the entry is  
not in TLB0 or in one of the first few entries searched in TLB1.  The  
tlbsx miss case is not what we should optimize for.

 Hrm.
 So let's redesign this thing theoretically. We would have an exit  
that requires an instruction fetch. We would override  
kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can  
fail because it can't find the TLB entry in the host TLB. When it  
fails, we have to abort the emulation and resume the guest at the  
same IP.
 Now the guest gets the TLB miss, we populate, go back into the  
guest. The guest hits the emulation failure again. We go back to  
kvmppc_ld_inst() which succeeds this time and we can emulate the  
instruction.


 That's pretty much what this patch does, except that it goes  
immediately to the TLB miss code rather than having the extra  
round-trip back to the guest.  Is there any benefit from adding that  
extra round-trip?  Rewriting the exit type instead doesn't seem that  
bad...


It's pretty bad. I want to have code that is easy to follow - and I  
don't care whether the very rare case of a TLB entry getting evicted  
by a random other thread right when we execute the exit path is  
slower by a few percent if we get cleaner code for that.


I guess I just don't see how this is so much harder to follow than  
returning to guest.  I find it harder to follow the flow when there are  
more round trips to the guest involved.  Treat this as an ITLB miss  
is simpler than, Let this fail, and make sure we retry the trapping  
instruction on failure.  Then, an ITLB miss will happen.


Also note that making kvmppc_get_last_inst() able to fail means  
updating several existing callsites, both for the change in function  
signature and to actually handle failures.


I don't care that deeply either way, it just doesn't seem obviously  
better.


 I think this works. Just make sure that the gateway to the  
instruction fetch is kvmppc_get_last_inst() and make that failable.  
Then the difference between looking for the TLB entry in the host's  
TLB or in the guest's TLB cache is hopefully negligible.


 I don't follow here.  What does this have to do with looking in the  
guest TLB?


I want to hide the fact that we're cheating as much as possible,  
that's it.


How are we cheating, and what specifically are you proposing to do to  
hide that?  How is the guest TLB involved at all in the change you're  
asking for?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 11.07.2013, at 02:15, Scott Wood wrote:

 On 07/10/2013 05:50:01 PM, Alexander Graf wrote:
 On 10.07.2013, at 20:42, Scott Wood wrote:
  On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
  On 10.07.2013, at 02:06, Scott Wood wrote:
   On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
   On 09.07.2013, at 20:46, Scott Wood wrote:
I suspect that tlbsx is faster, or at worst similar.  And unlike 
comparing tlbsx to lwepx (not counting a fix for the threading 
problem), we don't already have code to search the guest TLB, so 
testing would be more work.
   We have code to walk the guest TLB for TLB misses. This really is just 
   the TLB miss search without host TLB injection.
   So let's say we're using the shadow TLB. The guest always has its say 
   64 TLB entries that it can count on - we never evict anything by 
   accident, because we store all of the 64 entries in our guest TLB 
   cache. When the guest faults at an address, the first thing we do is 
   we check the cache whether we have that page already mapped.
   However, with this method we now have 2 enumeration methods for guest 
   TLB searches. We have the tlbsx one which searches the host TLB and we 
   have our guest TLB cache. The guest TLB cache might still contain an 
   entry for an address that we already invalidated on the host. Would 
   that impose a problem?
   I guess not because we're swizzling the exit code around to instead be 
   an instruction miss which means we restore the TLB entry into our 
   host's TLB so that when we resume, we land here and the tlbsx hits. 
   But it feels backwards.
  
   Any better way?  Searching the guest TLB won't work for the LRAT case, 
   so we'd need to have this logic around anyway.  We shouldn't add a 
   second codepath unless it's a clear performance gain -- and again, I 
   suspect it would be the opposite, especially if the entry is not in 
   TLB0 or in one of the first few entries searched in TLB1.  The tlbsx 
   miss case is not what we should optimize for.
  Hrm.
  So let's redesign this thing theoretically. We would have an exit that 
  requires an instruction fetch. We would override kvmppc_get_last_inst() 
  to always do kvmppc_ld_inst(). That one can fail because it can't find 
  the TLB entry in the host TLB. When it fails, we have to abort the 
  emulation and resume the guest at the same IP.
  Now the guest gets the TLB miss, we populate, go back into the guest. The 
  guest hits the emulation failure again. We go back to kvmppc_ld_inst() 
  which succeeds this time and we can emulate the instruction.
 
  That's pretty much what this patch does, except that it goes immediately 
  to the TLB miss code rather than having the extra round-trip back to the 
  guest.  Is there any benefit from adding that extra round-trip?  Rewriting 
  the exit type instead doesn't seem that bad...
 It's pretty bad. I want to have code that is easy to follow - and I don't 
 care whether the very rare case of a TLB entry getting evicted by a random 
 other thread right when we execute the exit path is slower by a few percent 
 if we get cleaner code for that.
 
 I guess I just don't see how this is so much harder to follow than returning 
 to guest.  I find it harder to follow the flow when there are more round 
 trips to the guest involved.  Treat this as an ITLB miss is simpler than, 
 Let this fail, and make sure we retry the trapping instruction on failure.  
 Then, an ITLB miss will happen.
 
 Also note that making kvmppc_get_last_inst() able to fail means updating 
 several existing callsites, both for the change in function signature and to 
 actually handle failures.
 
 I don't care that deeply either way, it just doesn't seem obviously better.
 
  I think this works. Just make sure that the gateway to the instruction 
  fetch is kvmppc_get_last_inst() and make that failable. Then the 
  difference between looking for the TLB entry in the host's TLB or in the 
  guest's TLB cache is hopefully negligible.
 
  I don't follow here.  What does this have to do with looking in the guest 
  TLB?
 I want to hide the fact that we're cheating as much as possible, that's it.
 
 How are we cheating, and what specifically are you proposing to do to hide 
 that?  How is the guest TLB involved at all in the change you're asking for?

It's not involved, but it's basically what we do on book3s pr kvm. There 
kvmppc_ld reads the guest htab, not the host htab. I think it's fine to expose 
both cases as the same thing to the rest of kvm.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-09 Thread Scott Wood

On 07/08/2013 08:39:05 AM, Alexander Graf wrote:


On 28.06.2013, at 11:20, Mihai Caraman wrote:

 lwepx faults needs to be handled by KVM and this implies additional  
code
 in DO_KVM macro to identify the source of the exception originated  
from

 host context. This requires to check the Exception Syndrome Register
 (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for  
DTB_MISS,

 DSI and LRAT exceptions which is too intrusive for the host.

 Get rid of lwepx and acquire last instuction in  
kvmppc_handle_exit() by
 searching for the physical address and kmap it. This fixes an  
infinite loop


What's the difference in speed for this?

Also, could we call lwepx later in host code, when  
kvmppc_get_last_inst() gets invoked?


Any use of lwepx is problematic unless we want to add overhead to the  
main Linux TLB miss handler.



 +  return;
 +  }
 +
 +  mas3 = mfspr(SPRN_MAS3);
 +  pr = vcpu-arch.shared-msr  MSR_PR;
 +	if ((pr  (!(mas3  MAS3_UX))) || ((!pr)  (!(mas3   
MAS3_SX {

 +  /*
 +		 * Another thread may rewrite the TLB entry in  
parallel, don't
 +		 * execute from the address if the execute permission  
is not set


Isn't this racy?


Yes, that's the point.  We want to access permissions atomically with  
the address.  If the guest races here, the unpredictable behavior is  
its own fault, but we don't want to make it worse by assuming that the  
new TLB entry is executable just because the old TLB entry was.


There's still a potential problem if the instruction at the new TLB  
entry is valid but not something that KVM emulates (because it wouldn't  
have trapped).  Given that the guest is already engaging in  
unpredictable behavior, though, and that it's no longer a security  
issue (it'll just cause the guest to exit), I don't think we need to  
worry too much about it.



 +   */
 +  vcpu-arch.fault_esr = 0;
 +  *exit_nr = BOOKE_INTERRUPT_INST_STORAGE;
 +  return;
 +  }
 +
 +  /* Get page size */
 +  if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
 +  psize_shift = PAGE_SHIFT;
 +  else
 +  psize_shift = MAS1_GET_TSIZE(mas1) + 10;
 +
 +  mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
 +  mfspr(SPRN_MAS3);

You're non-atomically reading MAS3/MAS7 after you've checked for  
permissions on MAS3. I'm surprised there's no handler that allows  
MAS3/7 access through the new, combined SPR for 64bit systems.


There is, but then we'd need to special-case 64-bit systems.  Why does  
atomicity matter here?  The MAS registers were filled in when we did  
the tlbsx.  They are thread-local.  They don't magically change just  
because the other thread rewrites the TLB entry that was used to fill  
them.



 +  addr = (mas7_mas3  (~0ULL  psize_shift)) |
 + (geaddr  ((1ULL  psize_shift) - 1ULL));
 +
 +  /* Map a page and get guest's instruction */
 +  page = pfn_to_page(addr  PAGE_SHIFT);

So it seems to me like you're jumping through a lot of hoops to make  
sure this works for LRAT and non-LRAT at the same time. Can't we just  
treat them as the different things they are?


What if we have different MMU backends for LRAT and non-LRAT? The  
non-LRAT case could then try lwepx, if that fails, fall back to read  
the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall  
back to this logic.


This isn't about LRAT; it's about hardware threads.  It also fixes the  
handling of execute-only pages on current chips.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-09 Thread Alexander Graf

On 07/09/2013 07:13 PM, Scott Wood wrote:

On 07/08/2013 08:39:05 AM, Alexander Graf wrote:


On 28.06.2013, at 11:20, Mihai Caraman wrote:

 lwepx faults needs to be handled by KVM and this implies additional 
code
 in DO_KVM macro to identify the source of the exception originated 
from

 host context. This requires to check the Exception Syndrome Register
 (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for 
DTB_MISS,

 DSI and LRAT exceptions which is too intrusive for the host.

 Get rid of lwepx and acquire last instuction in 
kvmppc_handle_exit() by
 searching for the physical address and kmap it. This fixes an 
infinite loop


What's the difference in speed for this?

Also, could we call lwepx later in host code, when 
kvmppc_get_last_inst() gets invoked?


Any use of lwepx is problematic unless we want to add overhead to the 
main Linux TLB miss handler.


What exactly would be missing?

I'd also still like to see some performance benchmarks on this to make 
sure we're not walking into a bad direction.





 +return;
 +}
 +
 +mas3 = mfspr(SPRN_MAS3);
 +pr = vcpu-arch.shared-msr  MSR_PR;
 +if ((pr  (!(mas3  MAS3_UX))) || ((!pr)  (!(mas3  
MAS3_SX {

 + /*
 + * Another thread may rewrite the TLB entry in parallel, 
don't
 + * execute from the address if the execute permission is 
not set


Isn't this racy?


Yes, that's the point.  We want to access permissions atomically with 
the address.  If the guest races here, the unpredictable behavior is 
its own fault, but we don't want to make it worse by assuming that the 
new TLB entry is executable just because the old TLB entry was.


I see.



There's still a potential problem if the instruction at the new TLB 
entry is valid but not something that KVM emulates (because it 
wouldn't have trapped).  Given that the guest is already engaging in 
unpredictable behavior, though, and that it's no longer a security 
issue (it'll just cause the guest to exit), I don't think we need to 
worry too much about it.


No, that case is fine. It's the same as book3s pr.




 + */
 +vcpu-arch.fault_esr = 0;
 +*exit_nr = BOOKE_INTERRUPT_INST_STORAGE;
 +return;
 +}
 +
 +/* Get page size */
 +if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
 +psize_shift = PAGE_SHIFT;
 +else
 +psize_shift = MAS1_GET_TSIZE(mas1) + 10;
 +
 +mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
 +mfspr(SPRN_MAS3);

You're non-atomically reading MAS3/MAS7 after you've checked for 
permissions on MAS3. I'm surprised there's no handler that allows 
MAS3/7 access through the new, combined SPR for 64bit systems.


There is, but then we'd need to special-case 64-bit systems.


Oh, what I was trying to say is that I'm surprised there's nothing in 
Linux already like


static inline u64 get_mas73(void) {
#ifdef CONFIG_PPC64
return mfspr(SPRN_MAS73)
#else
return ((u64)mfspr(SPRN_MAS7)  32) | mfspr(SPRN_MAS3);
#endif
}

  Why does atomicity matter here?  The MAS registers were filled in 
when we did the tlbsx.  They are thread-local.  They don't magically 
change just because the other thread rewrites the TLB entry that was 
used to fill them.


Yeah, it doesn't matter.




 +addr = (mas7_mas3  (~0ULL  psize_shift)) |
 +   (geaddr  ((1ULL  psize_shift) - 1ULL));
 +
 +/* Map a page and get guest's instruction */
 +page = pfn_to_page(addr  PAGE_SHIFT);

So it seems to me like you're jumping through a lot of hoops to make 
sure this works for LRAT and non-LRAT at the same time. Can't we just 
treat them as the different things they are?


What if we have different MMU backends for LRAT and non-LRAT? The 
non-LRAT case could then try lwepx, if that fails, fall back to read 
the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall 
back to this logic.


This isn't about LRAT; it's about hardware threads.  It also fixes the 
handling of execute-only pages on current chips.


On non-LRAT systems we could always check our shadow copy of the guest's 
TLB, no? I'd really like to know what the performance difference would 
be for the 2 approaches.



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-09 Thread Scott Wood

On 07/09/2013 12:44:32 PM, Alexander Graf wrote:

On 07/09/2013 07:13 PM, Scott Wood wrote:

On 07/08/2013 08:39:05 AM, Alexander Graf wrote:


On 28.06.2013, at 11:20, Mihai Caraman wrote:

 lwepx faults needs to be handled by KVM and this implies  
additional code
 in DO_KVM macro to identify the source of the exception  
originated from
 host context. This requires to check the Exception Syndrome  
Register
 (ESR[EPID]) and External PID Load Context Register (EPLC[EGS])  
for DTB_MISS,

 DSI and LRAT exceptions which is too intrusive for the host.

 Get rid of lwepx and acquire last instuction in  
kvmppc_handle_exit() by
 searching for the physical address and kmap it. This fixes an  
infinite loop


What's the difference in speed for this?

Also, could we call lwepx later in host code, when  
kvmppc_get_last_inst() gets invoked?


Any use of lwepx is problematic unless we want to add overhead to  
the main Linux TLB miss handler.


What exactly would be missing?


If lwepx faults, it goes to the normal host TLB miss handler.  Without  
adding code to it to recognize that it's an external-PID fault, it will  
try to search the normal Linux page tables and insert a normal host  
entry.  If it thinks it has succeeded, it will retry the instruction  
rather than search for an exception handler.  The instruction will  
fault again, and you get a hang.


I'd also still like to see some performance benchmarks on this to  
make sure we're not walking into a bad direction.


I doubt it'll be significantly different.  There's overhead involved in  
setting up for lwepx as well.  It doesn't hurt to test, though this is  
a functional correctness issue, so I'm not sure what better  
alternatives we have.  I don't want to slow down non-KVM TLB misses for  
this.



 +addr = (mas7_mas3  (~0ULL  psize_shift)) |
 +   (geaddr  ((1ULL  psize_shift) - 1ULL));
 +
 +/* Map a page and get guest's instruction */
 +page = pfn_to_page(addr  PAGE_SHIFT);

So it seems to me like you're jumping through a lot of hoops to  
make sure this works for LRAT and non-LRAT at the same time. Can't  
we just treat them as the different things they are?


What if we have different MMU backends for LRAT and non-LRAT? The  
non-LRAT case could then try lwepx, if that fails, fall back to  
read the shadow TLB. For the LRAT case, we'd do lwepx, if that  
fails fall back to this logic.


This isn't about LRAT; it's about hardware threads.  It also fixes  
the handling of execute-only pages on current chips.


On non-LRAT systems we could always check our shadow copy of the  
guest's TLB, no? I'd really like to know what the performance  
difference would be for the 2 approaches.


I suspect that tlbsx is faster, or at worst similar.  And unlike  
comparing tlbsx to lwepx (not counting a fix for the threading  
problem), we don't already have code to search the guest TLB, so  
testing would be more work.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-09 Thread Alexander Graf

On 09.07.2013, at 20:46, Scott Wood wrote:

 On 07/09/2013 12:44:32 PM, Alexander Graf wrote:
 On 07/09/2013 07:13 PM, Scott Wood wrote:
 On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
 On 28.06.2013, at 11:20, Mihai Caraman wrote:
  lwepx faults needs to be handled by KVM and this implies additional code
  in DO_KVM macro to identify the source of the exception originated from
  host context. This requires to check the Exception Syndrome Register
  (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for 
  DTB_MISS,
  DSI and LRAT exceptions which is too intrusive for the host.
 
  Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
  searching for the physical address and kmap it. This fixes an infinite 
  loop
 What's the difference in speed for this?
 Also, could we call lwepx later in host code, when kvmppc_get_last_inst() 
 gets invoked?
 Any use of lwepx is problematic unless we want to add overhead to the main 
 Linux TLB miss handler.
 What exactly would be missing?
 
 If lwepx faults, it goes to the normal host TLB miss handler.  Without adding 
 code to it to recognize that it's an external-PID fault, it will try to 
 search the normal Linux page tables and insert a normal host entry.  If it 
 thinks it has succeeded, it will retry the instruction rather than search for 
 an exception handler.  The instruction will fault again, and you get a hang.

:(

So we either have to rewrite IVOR / IVPR or add a branch in the hot TLB miss 
interrupt handler. Both alternatives suck.

 
 I'd also still like to see some performance benchmarks on this to make sure 
 we're not walking into a bad direction.
 
 I doubt it'll be significantly different.  There's overhead involved in 
 setting up for lwepx as well.  It doesn't hurt to test, though this is a 
 functional correctness issue, so I'm not sure what better alternatives we 
 have.  I don't want to slow down non-KVM TLB misses for this.

Yeah, I concur on that part. It probably won't get better. Sigh.

 
  +addr = (mas7_mas3  (~0ULL  psize_shift)) |
  +   (geaddr  ((1ULL  psize_shift) - 1ULL));
  +
  +/* Map a page and get guest's instruction */
  +page = pfn_to_page(addr  PAGE_SHIFT);
 So it seems to me like you're jumping through a lot of hoops to make sure 
 this works for LRAT and non-LRAT at the same time. Can't we just treat 
 them as the different things they are?
 What if we have different MMU backends for LRAT and non-LRAT? The non-LRAT 
 case could then try lwepx, if that fails, fall back to read the shadow 
 TLB. For the LRAT case, we'd do lwepx, if that fails fall back to this 
 logic.
 This isn't about LRAT; it's about hardware threads.  It also fixes the 
 handling of execute-only pages on current chips.
 On non-LRAT systems we could always check our shadow copy of the guest's 
 TLB, no? I'd really like to know what the performance difference would be 
 for the 2 approaches.
 
 I suspect that tlbsx is faster, or at worst similar.  And unlike comparing 
 tlbsx to lwepx (not counting a fix for the threading problem), we don't 
 already have code to search the guest TLB, so testing would be more work.

We have code to walk the guest TLB for TLB misses. This really is just the TLB 
miss search without host TLB injection.

So let's say we're using the shadow TLB. The guest always has its say 64 TLB 
entries that it can count on - we never evict anything by accident, because we 
store all of the 64 entries in our guest TLB cache. When the guest faults at an 
address, the first thing we do is we check the cache whether we have that page 
already mapped.

However, with this method we now have 2 enumeration methods for guest TLB 
searches. We have the tlbsx one which searches the host TLB and we have our 
guest TLB cache. The guest TLB cache might still contain an entry for an 
address that we already invalidated on the host. Would that impose a problem?

I guess not because we're swizzling the exit code around to instead be an 
instruction miss which means we restore the TLB entry into our host's TLB so 
that when we resume, we land here and the tlbsx hits. But it feels backwards.

At least this code has to become something more generic, such as 
kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu 
implementation, as it's 100% host mmu specific.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-09 Thread Alexander Graf

On 28.06.2013, at 11:20, Mihai Caraman wrote:

 lwepx faults needs to be handled by KVM and this implies additional code
 in DO_KVM macro to identify the source of the exception originated from
 host context. This requires to check the Exception Syndrome Register
 (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
 DSI and LRAT exceptions which is too intrusive for the host.
 
 Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
 searching for the physical address and kmap it. This fixes an infinite loop
 caused by lwepx's data TLB miss handled in the host and the TODO for TLB
 eviction and execute-but-not-read entries.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
 for a while.
 
 arch/powerpc/include/asm/mmu-book3e.h |6 ++-
 arch/powerpc/kvm/booke.c  |6 +++
 arch/powerpc/kvm/booke.h  |2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-
 arch/powerpc/kvm/e500.c   |4 ++
 arch/powerpc/kvm/e500mc.c |   69 +
 6 files changed, 91 insertions(+), 28 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
 b/arch/powerpc/include/asm/mmu-book3e.h
 index 99d43e0..32e470e 100644
 --- a/arch/powerpc/include/asm/mmu-book3e.h
 +++ b/arch/powerpc/include/asm/mmu-book3e.h
 @@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
 -#define MAS0_TLBSEL(x)   (((x)  28)  0x3000)
 +#define MAS0_TLBSEL_MASK 0x3000
 +#define MAS0_TLBSEL_SHIFT28
 +#define MAS0_TLBSEL(x)   (((x)  MAS0_TLBSEL_SHIFT)  
 MAS0_TLBSEL_MASK)
 +#define MAS0_GET_TLBSEL(mas0)(((mas0)  MAS0_TLBSEL_MASK)  
 MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK0x0FFF
 #define MAS0_ESEL_SHIFT   16
 #define MAS0_ESEL(x)  (((x)  MAS0_ESEL_SHIFT)  MAS0_ESEL_MASK)
 @@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK   0x0f80
 #define MAS1_TSIZE_SHIFT  7
 #define MAS1_TSIZE(x) (((x)  MAS1_TSIZE_SHIFT)  MAS1_TSIZE_MASK)
 +#define MAS1_GET_TSIZE(mas1) (((mas1)  MAS1_TSIZE_MASK)  MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN  (~0xFFFUL)
 #define MAS2_X0   0x0040
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 1020119..6764a8e 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   /* update before a new last_exit_type is rewritten */
   kvmppc_update_timing_stats(vcpu);
 
 + /*
 +  * The exception type can change at this point, such as if the TLB entry
 +  * for the emulated instruction has been evicted.
 +  */
 + kvmppc_prepare_for_emulation(vcpu, exit_nr);
 +
   /* restart interrupts if they were meant for the host */
   kvmppc_restart_interrupt(vcpu, exit_nr);
 
 diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
 index 5fd1ba6..a0d0fea 100644
 --- a/arch/powerpc/kvm/booke.h
 +++ b/arch/powerpc/kvm/booke.h
 @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
 +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int 
 *exit_nr);
 +
 enum int_class {
   INT_CLASS_NONCRIT,
   INT_CLASS_CRIT,
 diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
 b/arch/powerpc/kvm/bookehv_interrupts.S
 index 20c7a54..0538ab9 100644
 --- a/arch/powerpc/kvm/bookehv_interrupts.S
 +++ b/arch/powerpc/kvm/bookehv_interrupts.S
 @@ -120,37 +120,20 @@
 
   .if \flags  NEED_EMU
   /*
 -  * This assumes you have external PID support.
 -  * To support a bookehv CPU without external PID, you'll
 -  * need to look up the TLB entry and create a temporary mapping.
 -  *
 -  * FIXME: we don't currently handle if the lwepx faults.  PR-mode
 -  * booke doesn't handle it either.  Since Linux doesn't use
 -  * broadcast tlbivax anymore, the only way this should happen is
 -  * if the guest maps its memory execute-but-not-read, or if we
 -  * somehow take a TLB miss in the middle of this entry code and
 -  * evict the relevant entry.  On e500mc, all kernel lowmem is
 -  * bolted into TLB1 large page mappings, and we don't use
 -  * broadcast invalidates, so we should not take a TLB miss here.
 -  *
 -  * Later we'll need to deal with faults here.  Disallowing guest
 -  * mappings that are execute-but-not-read could be an option on
 -  * e500mc, but not on chips with an LRAT if it is used.
 +  * We don't use external PID support. lwepx faults would need to be
 +  * handled by KVM and this implies aditional code in DO_KVM (for
 +  * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
 +  * is 

Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-09 Thread Scott Wood

On 07/09/2013 04:44:24 PM, Alexander Graf wrote:


On 09.07.2013, at 20:46, Scott Wood wrote:
 I suspect that tlbsx is faster, or at worst similar.  And unlike  
comparing tlbsx to lwepx (not counting a fix for the threading  
problem), we don't already have code to search the guest TLB, so  
testing would be more work.


We have code to walk the guest TLB for TLB misses. This really is  
just the TLB miss search without host TLB injection.


So let's say we're using the shadow TLB. The guest always has its say  
64 TLB entries that it can count on - we never evict anything by  
accident, because we store all of the 64 entries in our guest TLB  
cache. When the guest faults at an address, the first thing we do is  
we check the cache whether we have that page already mapped.


However, with this method we now have 2 enumeration methods for guest  
TLB searches. We have the tlbsx one which searches the host TLB and  
we have our guest TLB cache. The guest TLB cache might still contain  
an entry for an address that we already invalidated on the host.  
Would that impose a problem?


I guess not because we're swizzling the exit code around to instead  
be an instruction miss which means we restore the TLB entry into our  
host's TLB so that when we resume, we land here and the tlbsx hits.  
But it feels backwards.


Any better way?  Searching the guest TLB won't work for the LRAT case,  
so we'd need to have this logic around anyway.  We shouldn't add a  
second codepath unless it's a clear performance gain -- and again, I  
suspect it would be the opposite, especially if the entry is not in  
TLB0 or in one of the first few entries searched in TLB1.  The tlbsx  
miss case is not what we should optimize for.


At least this code has to become something more generic, such as  
kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu  
implementation, as it's 100% host mmu specific.


I agree that e500_mmu_host.c is a better place for it (with an ifdef  
for BOOKEHV), but supporting anything other than instruction fetches  
could wait until we have a user for it (it means extra code to figure  
out if permissions are correct).


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-09 Thread Scott Wood

On 07/09/2013 04:45:10 PM, Alexander Graf wrote:


On 28.06.2013, at 11:20, Mihai Caraman wrote:

 +  /* Get page size */
 +  if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
 +  psize_shift = PAGE_SHIFT;
 +  else
 +  psize_shift = MAS1_GET_TSIZE(mas1) + 10;
 +
 +  mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
 +  mfspr(SPRN_MAS3);
 +  addr = (mas7_mas3  (~0ULL  psize_shift)) |
 + (geaddr  ((1ULL  psize_shift) - 1ULL));
 +
 +  /* Map a page and get guest's instruction */
 +  page = pfn_to_page(addr  PAGE_SHIFT);

While looking at this I just realized that you're missing a check  
here. What if our IP is in some PCI BAR? Or can't we execute from  
those?


We at least need to check pfn_valid() first.  That'll just keep us from  
accessing a bad pointer in the host kernel, though -- it won't make the  
emulation actually work.  If we need that, we'll probably need to  
create a temporary TLB entry manually.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-08 Thread Alexander Graf

On 28.06.2013, at 11:20, Mihai Caraman wrote:

 lwepx faults needs to be handled by KVM and this implies additional code
 in DO_KVM macro to identify the source of the exception originated from
 host context. This requires to check the Exception Syndrome Register
 (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
 DSI and LRAT exceptions which is too intrusive for the host.
 
 Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
 searching for the physical address and kmap it. This fixes an infinite loop

What's the difference in speed for this?

Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets 
invoked?

 caused by lwepx's data TLB miss handled in the host and the TODO for TLB
 eviction and execute-but-not-read entries.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
 for a while.
 
 arch/powerpc/include/asm/mmu-book3e.h |6 ++-
 arch/powerpc/kvm/booke.c  |6 +++
 arch/powerpc/kvm/booke.h  |2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-
 arch/powerpc/kvm/e500.c   |4 ++
 arch/powerpc/kvm/e500mc.c |   69 +
 6 files changed, 91 insertions(+), 28 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
 b/arch/powerpc/include/asm/mmu-book3e.h
 index 99d43e0..32e470e 100644
 --- a/arch/powerpc/include/asm/mmu-book3e.h
 +++ b/arch/powerpc/include/asm/mmu-book3e.h
 @@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
 -#define MAS0_TLBSEL(x)   (((x)  28)  0x3000)
 +#define MAS0_TLBSEL_MASK 0x3000
 +#define MAS0_TLBSEL_SHIFT28
 +#define MAS0_TLBSEL(x)   (((x)  MAS0_TLBSEL_SHIFT)  
 MAS0_TLBSEL_MASK)
 +#define MAS0_GET_TLBSEL(mas0)(((mas0)  MAS0_TLBSEL_MASK)  
 MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK0x0FFF
 #define MAS0_ESEL_SHIFT   16
 #define MAS0_ESEL(x)  (((x)  MAS0_ESEL_SHIFT)  MAS0_ESEL_MASK)
 @@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK   0x0f80
 #define MAS1_TSIZE_SHIFT  7
 #define MAS1_TSIZE(x) (((x)  MAS1_TSIZE_SHIFT)  MAS1_TSIZE_MASK)
 +#define MAS1_GET_TSIZE(mas1) (((mas1)  MAS1_TSIZE_MASK)  MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN  (~0xFFFUL)
 #define MAS2_X0   0x0040
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 1020119..6764a8e 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   /* update before a new last_exit_type is rewritten */
   kvmppc_update_timing_stats(vcpu);
 
 + /*
 +  * The exception type can change at this point, such as if the TLB entry
 +  * for the emulated instruction has been evicted.
 +  */
 + kvmppc_prepare_for_emulation(vcpu, exit_nr);

Please model this the same way as book3s. Check out kvmppc_get_last_inst() as a 
starting point.

 +
   /* restart interrupts if they were meant for the host */
   kvmppc_restart_interrupt(vcpu, exit_nr);
 
 diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
 index 5fd1ba6..a0d0fea 100644
 --- a/arch/powerpc/kvm/booke.h
 +++ b/arch/powerpc/kvm/booke.h
 @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
 +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int 
 *exit_nr);
 +
 enum int_class {
   INT_CLASS_NONCRIT,
   INT_CLASS_CRIT,
 diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
 b/arch/powerpc/kvm/bookehv_interrupts.S
 index 20c7a54..0538ab9 100644
 --- a/arch/powerpc/kvm/bookehv_interrupts.S
 +++ b/arch/powerpc/kvm/bookehv_interrupts.S
 @@ -120,37 +120,20 @@
 
   .if \flags  NEED_EMU
   /*
 -  * This assumes you have external PID support.
 -  * To support a bookehv CPU without external PID, you'll
 -  * need to look up the TLB entry and create a temporary mapping.
 -  *
 -  * FIXME: we don't currently handle if the lwepx faults.  PR-mode
 -  * booke doesn't handle it either.  Since Linux doesn't use
 -  * broadcast tlbivax anymore, the only way this should happen is
 -  * if the guest maps its memory execute-but-not-read, or if we
 -  * somehow take a TLB miss in the middle of this entry code and
 -  * evict the relevant entry.  On e500mc, all kernel lowmem is
 -  * bolted into TLB1 large page mappings, and we don't use
 -  * broadcast invalidates, so we should not take a TLB miss here.
 -  *
 -  * Later we'll need to deal with faults here.  Disallowing guest
 -  * mappings that are execute-but-not-read could be an option on
 -  * e500mc, but not on chips with an LRAT if it is used.
 

[PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-06-28 Thread Mihai Caraman
lwepx faults needs to be handled by KVM and this implies additional code
in DO_KVM macro to identify the source of the exception originated from
host context. This requires to check the Exception Syndrome Register
(ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
DSI and LRAT exceptions which is too intrusive for the host.

Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
searching for the physical address and kmap it. This fixes an infinite loop
caused by lwepx's data TLB miss handled in the host and the TODO for TLB
eviction and execute-but-not-read entries.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
for a while.

 arch/powerpc/include/asm/mmu-book3e.h |6 ++-
 arch/powerpc/kvm/booke.c  |6 +++
 arch/powerpc/kvm/booke.h  |2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-
 arch/powerpc/kvm/e500.c   |4 ++
 arch/powerpc/kvm/e500mc.c |   69 +
 6 files changed, 91 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index 99d43e0..32e470e 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
-#define MAS0_TLBSEL(x) (((x)  28)  0x3000)
+#define MAS0_TLBSEL_MASK   0x3000
+#define MAS0_TLBSEL_SHIFT  28
+#define MAS0_TLBSEL(x) (((x)  MAS0_TLBSEL_SHIFT)  MAS0_TLBSEL_MASK)
+#define MAS0_GET_TLBSEL(mas0)  (((mas0)  MAS0_TLBSEL_MASK)  
MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK 0x0FFF
 #define MAS0_ESEL_SHIFT16
 #define MAS0_ESEL(x)   (((x)  MAS0_ESEL_SHIFT)  MAS0_ESEL_MASK)
@@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK0x0f80
 #define MAS1_TSIZE_SHIFT   7
 #define MAS1_TSIZE(x)  (((x)  MAS1_TSIZE_SHIFT)  MAS1_TSIZE_MASK)
+#define MAS1_GET_TSIZE(mas1)   (((mas1)  MAS1_TSIZE_MASK)  MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN   (~0xFFFUL)
 #define MAS2_X00x0040
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..6764a8e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);
 
+   /*
+* The exception type can change at this point, such as if the TLB entry
+* for the emulated instruction has been evicted.
+*/
+   kvmppc_prepare_for_emulation(vcpu, exit_nr);
+
/* restart interrupts if they were meant for the host */
kvmppc_restart_interrupt(vcpu, exit_nr);
 
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 5fd1ba6..a0d0fea 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int 
*exit_nr);
+
 enum int_class {
INT_CLASS_NONCRIT,
INT_CLASS_CRIT,
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 20c7a54..0538ab9 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -120,37 +120,20 @@
 
.if \flags  NEED_EMU
/*
-* This assumes you have external PID support.
-* To support a bookehv CPU without external PID, you'll
-* need to look up the TLB entry and create a temporary mapping.
-*
-* FIXME: we don't currently handle if the lwepx faults.  PR-mode
-* booke doesn't handle it either.  Since Linux doesn't use
-* broadcast tlbivax anymore, the only way this should happen is
-* if the guest maps its memory execute-but-not-read, or if we
-* somehow take a TLB miss in the middle of this entry code and
-* evict the relevant entry.  On e500mc, all kernel lowmem is
-* bolted into TLB1 large page mappings, and we don't use
-* broadcast invalidates, so we should not take a TLB miss here.
-*
-* Later we'll need to deal with faults here.  Disallowing guest
-* mappings that are execute-but-not-read could be an option on
-* e500mc, but not on chips with an LRAT if it is used.
+* We don't use external PID support. lwepx faults would need to be
+* handled by KVM and this implies aditional code in DO_KVM (for
+* DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
+* is too intrusive for the host. Get last instuction in
+* 

[PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-06-06 Thread Mihai Caraman
lwepx faults needs to be handled by KVM and this implies additional code
in DO_KVM macro to identify the source of the exception originated in
host context. This requires to check the Exception Syndrome Register
(ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
DSI and LRAT exceptions which is too intrusive for the host.

Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
searching for the physical address and kmap it. This fixes an infinite loop
caused by lwepx's data TLB miss handled in the host and the TODO for TLB
eviction and execute-but-not-read entries.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/include/asm/mmu-book3e.h |6 ++-
 arch/powerpc/kvm/booke.c  |6 +++
 arch/powerpc/kvm/booke.h  |2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-
 arch/powerpc/kvm/e500.c   |4 ++
 arch/powerpc/kvm/e500mc.c |   69 +
 6 files changed, 91 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index 99d43e0..32e470e 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
-#define MAS0_TLBSEL(x) (((x)  28)  0x3000)
+#define MAS0_TLBSEL_MASK   0x3000
+#define MAS0_TLBSEL_SHIFT  28
+#define MAS0_TLBSEL(x) (((x)  MAS0_TLBSEL_SHIFT)  MAS0_TLBSEL_MASK)
+#define MAS0_GET_TLBSEL(mas0)  (((mas0)  MAS0_TLBSEL_MASK)  
MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK 0x0FFF
 #define MAS0_ESEL_SHIFT16
 #define MAS0_ESEL(x)   (((x)  MAS0_ESEL_SHIFT)  MAS0_ESEL_MASK)
@@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK0x0f80
 #define MAS1_TSIZE_SHIFT   7
 #define MAS1_TSIZE(x)  (((x)  MAS1_TSIZE_SHIFT)  MAS1_TSIZE_MASK)
+#define MAS1_GET_TSIZE(mas1)   (((mas1)  MAS1_TSIZE_MASK)  MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN   (~0xFFFUL)
 #define MAS2_X00x0040
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..6764a8e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);
 
+   /*
+* The exception type can change at this point, such as if the TLB entry
+* for the emulated instruction has been evicted.
+*/
+   kvmppc_prepare_for_emulation(vcpu, exit_nr);
+
/* restart interrupts if they were meant for the host */
kvmppc_restart_interrupt(vcpu, exit_nr);
 
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 5fd1ba6..a0d0fea 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int 
*exit_nr);
+
 enum int_class {
INT_CLASS_NONCRIT,
INT_CLASS_CRIT,
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 20c7a54..0538ab9 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -120,37 +120,20 @@
 
.if \flags  NEED_EMU
/*
-* This assumes you have external PID support.
-* To support a bookehv CPU without external PID, you'll
-* need to look up the TLB entry and create a temporary mapping.
-*
-* FIXME: we don't currently handle if the lwepx faults.  PR-mode
-* booke doesn't handle it either.  Since Linux doesn't use
-* broadcast tlbivax anymore, the only way this should happen is
-* if the guest maps its memory execute-but-not-read, or if we
-* somehow take a TLB miss in the middle of this entry code and
-* evict the relevant entry.  On e500mc, all kernel lowmem is
-* bolted into TLB1 large page mappings, and we don't use
-* broadcast invalidates, so we should not take a TLB miss here.
-*
-* Later we'll need to deal with faults here.  Disallowing guest
-* mappings that are execute-but-not-read could be an option on
-* e500mc, but not on chips with an LRAT if it is used.
+* We don't use external PID support. lwepx faults would need to be
+* handled by KVM and this implies aditional code in DO_KVM (for
+* DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
+* is too intrusive for the host. Get last instuction in
+* kvmppc_handle_exit().
 */
-
-   mfspr   r3, SPRN_EPLC   /* will already have correct ELPID