Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 10.07.2013, at 02:06, Scott Wood wrote: On 07/09/2013 04:44:24 PM, Alexander Graf wrote: On 09.07.2013, at 20:46, Scott Wood wrote: I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. We have code to walk the guest TLB for TLB misses. This really is just the TLB miss search without host TLB injection. So let's say we're using the shadow TLB. The guest always has its say 64 TLB entries that it can count on - we never evict anything by accident, because we store all of the 64 entries in our guest TLB cache. When the guest faults at an address, the first thing we do is we check the cache whether we have that page already mapped. However, with this method we now have 2 enumeration methods for guest TLB searches. We have the tlbsx one which searches the host TLB and we have our guest TLB cache. The guest TLB cache might still contain an entry for an address that we already invalidated on the host. Would that impose a problem? I guess not because we're swizzling the exit code around to instead be an instruction miss which means we restore the TLB entry into our host's TLB so that when we resume, we land here and the tlbsx hits. But it feels backwards. Any better way? Searching the guest TLB won't work for the LRAT case, so we'd need to have this logic around anyway. We shouldn't add a second codepath unless it's a clear performance gain -- and again, I suspect it would be the opposite, especially if the entry is not in TLB0 or in one of the first few entries searched in TLB1. The tlbsx miss case is not what we should optimize for. Hrm. So let's redesign this thing theoretically. We would have an exit that requires an instruction fetch. We would override kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can fail because it can't find the TLB entry in the host TLB. When it fails, we have to abort the emulation and resume the guest at the same IP. Now the guest gets the TLB miss, we populate, go back into the guest. The guest hits the emulation failure again. We go back to kvmppc_ld_inst() which succeeds this time and we can emulate the instruction. I think this works. Just make sure that the gateway to the instruction fetch is kvmppc_get_last_inst() and make that failable. Then the difference between looking for the TLB entry in the host's TLB or in the guest's TLB cache is hopefully negligible. At least this code has to become something more generic, such as kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu implementation, as it's 100% host mmu specific. I agree that e500_mmu_host.c is a better place for it (with an ifdef for BOOKEHV), but supporting anything other than instruction fetches could wait until we have a user for it (it means extra code to figure out if permissions are correct). Works for me, as long as it's either documented through BUG_ON/WARN_ON's or an explicit naming convention. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 10.07.2013, at 02:12, Scott Wood wrote: On 07/09/2013 04:45:10 PM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: + /* Get page size */ + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0) + psize_shift = PAGE_SHIFT; + else + psize_shift = MAS1_GET_TSIZE(mas1) + 10; + + mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) 32) | + mfspr(SPRN_MAS3); + addr = (mas7_mas3 (~0ULL psize_shift)) | + (geaddr ((1ULL psize_shift) - 1ULL)); + + /* Map a page and get guest's instruction */ + page = pfn_to_page(addr PAGE_SHIFT); While looking at this I just realized that you're missing a check here. What if our IP is in some PCI BAR? Or can't we execute from those? We at least need to check pfn_valid() first. That'll just keep us from accessing a bad pointer in the host kernel, though -- it won't make the emulation actually work. If we need that, we'll probably need to create a temporary TLB entry manually. ioremap()? However, if we were walking the guest TLB cache instead we would get a guest physical address which we can always resolve to a host virtual address. I'm not sure how important that whole use case is though. Maybe we should just error out to the guest for now. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/10/2013 05:18:10 AM, Alexander Graf wrote: On 10.07.2013, at 02:12, Scott Wood wrote: On 07/09/2013 04:45:10 PM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: + /* Get page size */ + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0) + psize_shift = PAGE_SHIFT; + else + psize_shift = MAS1_GET_TSIZE(mas1) + 10; + + mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) 32) | + mfspr(SPRN_MAS3); + addr = (mas7_mas3 (~0ULL psize_shift)) | +(geaddr ((1ULL psize_shift) - 1ULL)); + + /* Map a page and get guest's instruction */ + page = pfn_to_page(addr PAGE_SHIFT); While looking at this I just realized that you're missing a check here. What if our IP is in some PCI BAR? Or can't we execute from those? We at least need to check pfn_valid() first. That'll just keep us from accessing a bad pointer in the host kernel, though -- it won't make the emulation actually work. If we need that, we'll probably need to create a temporary TLB entry manually. ioremap()? That's a bit heavy... also we'd need to deal with cacheability. This code is already engaged in directly creating TLB entries, so it doesn't seem like much of a stretch to create one for this. It should be faster than ioremap() or kmap_atomic(). The one complication is allocating the virtual address space, but maybe we could just use the page that kmap_atomic would have used? Of course, if we want to handle execution from other than normal kernel memory, we'll need to make sure that the virtual address space is allocated even when highmem is not present (e.g. 64-bit). However, if we were walking the guest TLB cache instead we would get a guest physical address which we can always resolve to a host virtual address. I'm not sure how important that whole use case is though. Maybe we should just error out to the guest for now. It's not that important, now that we are using hugetlb rather than directly mapping a large hunk of reserved memory. It would be nice to handle it though, if we can do without too much hassle. And I think manually creating a TLB entry could be faster than kmap_atomic(), or searching the guest TLB and then doing a reverse memslot lookup. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/10/2013 05:15:09 AM, Alexander Graf wrote: On 10.07.2013, at 02:06, Scott Wood wrote: On 07/09/2013 04:44:24 PM, Alexander Graf wrote: On 09.07.2013, at 20:46, Scott Wood wrote: I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. We have code to walk the guest TLB for TLB misses. This really is just the TLB miss search without host TLB injection. So let's say we're using the shadow TLB. The guest always has its say 64 TLB entries that it can count on - we never evict anything by accident, because we store all of the 64 entries in our guest TLB cache. When the guest faults at an address, the first thing we do is we check the cache whether we have that page already mapped. However, with this method we now have 2 enumeration methods for guest TLB searches. We have the tlbsx one which searches the host TLB and we have our guest TLB cache. The guest TLB cache might still contain an entry for an address that we already invalidated on the host. Would that impose a problem? I guess not because we're swizzling the exit code around to instead be an instruction miss which means we restore the TLB entry into our host's TLB so that when we resume, we land here and the tlbsx hits. But it feels backwards. Any better way? Searching the guest TLB won't work for the LRAT case, so we'd need to have this logic around anyway. We shouldn't add a second codepath unless it's a clear performance gain -- and again, I suspect it would be the opposite, especially if the entry is not in TLB0 or in one of the first few entries searched in TLB1. The tlbsx miss case is not what we should optimize for. Hrm. So let's redesign this thing theoretically. We would have an exit that requires an instruction fetch. We would override kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can fail because it can't find the TLB entry in the host TLB. When it fails, we have to abort the emulation and resume the guest at the same IP. Now the guest gets the TLB miss, we populate, go back into the guest. The guest hits the emulation failure again. We go back to kvmppc_ld_inst() which succeeds this time and we can emulate the instruction. That's pretty much what this patch does, except that it goes immediately to the TLB miss code rather than having the extra round-trip back to the guest. Is there any benefit from adding that extra round-trip? Rewriting the exit type instead doesn't seem that bad... I think this works. Just make sure that the gateway to the instruction fetch is kvmppc_get_last_inst() and make that failable. Then the difference between looking for the TLB entry in the host's TLB or in the guest's TLB cache is hopefully negligible. I don't follow here. What does this have to do with looking in the guest TLB? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 10.07.2013, at 20:37, Scott Wood wrote: On 07/10/2013 05:18:10 AM, Alexander Graf wrote: On 10.07.2013, at 02:12, Scott Wood wrote: On 07/09/2013 04:45:10 PM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: + /* Get page size */ + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0) + psize_shift = PAGE_SHIFT; + else + psize_shift = MAS1_GET_TSIZE(mas1) + 10; + + mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) 32) | + mfspr(SPRN_MAS3); + addr = (mas7_mas3 (~0ULL psize_shift)) | + (geaddr ((1ULL psize_shift) - 1ULL)); + + /* Map a page and get guest's instruction */ + page = pfn_to_page(addr PAGE_SHIFT); While looking at this I just realized that you're missing a check here. What if our IP is in some PCI BAR? Or can't we execute from those? We at least need to check pfn_valid() first. That'll just keep us from accessing a bad pointer in the host kernel, though -- it won't make the emulation actually work. If we need that, we'll probably need to create a temporary TLB entry manually. ioremap()? That's a bit heavy... also we'd need to deal with cacheability. This code is already engaged in directly creating TLB entries, so it doesn't seem like much of a stretch to create one for this. It should be faster than ioremap() or kmap_atomic(). Do we have any guarantees that the TLB entry we create doesn't get evicted by another thread by the time we want to use it? Alex The one complication is allocating the virtual address space, but maybe we could just use the page that kmap_atomic would have used? Of course, if we want to handle execution from other than normal kernel memory, we'll need to make sure that the virtual address space is allocated even when highmem is not present (e.g. 64-bit). However, if we were walking the guest TLB cache instead we would get a guest physical address which we can always resolve to a host virtual address. I'm not sure how important that whole use case is though. Maybe we should just error out to the guest for now. It's not that important, now that we are using hugetlb rather than directly mapping a large hunk of reserved memory. It would be nice to handle it though, if we can do without too much hassle. And I think manually creating a TLB entry could be faster than kmap_atomic(), or searching the guest TLB and then doing a reverse memslot lookup. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 10.07.2013, at 20:42, Scott Wood wrote: On 07/10/2013 05:15:09 AM, Alexander Graf wrote: On 10.07.2013, at 02:06, Scott Wood wrote: On 07/09/2013 04:44:24 PM, Alexander Graf wrote: On 09.07.2013, at 20:46, Scott Wood wrote: I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. We have code to walk the guest TLB for TLB misses. This really is just the TLB miss search without host TLB injection. So let's say we're using the shadow TLB. The guest always has its say 64 TLB entries that it can count on - we never evict anything by accident, because we store all of the 64 entries in our guest TLB cache. When the guest faults at an address, the first thing we do is we check the cache whether we have that page already mapped. However, with this method we now have 2 enumeration methods for guest TLB searches. We have the tlbsx one which searches the host TLB and we have our guest TLB cache. The guest TLB cache might still contain an entry for an address that we already invalidated on the host. Would that impose a problem? I guess not because we're swizzling the exit code around to instead be an instruction miss which means we restore the TLB entry into our host's TLB so that when we resume, we land here and the tlbsx hits. But it feels backwards. Any better way? Searching the guest TLB won't work for the LRAT case, so we'd need to have this logic around anyway. We shouldn't add a second codepath unless it's a clear performance gain -- and again, I suspect it would be the opposite, especially if the entry is not in TLB0 or in one of the first few entries searched in TLB1. The tlbsx miss case is not what we should optimize for. Hrm. So let's redesign this thing theoretically. We would have an exit that requires an instruction fetch. We would override kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can fail because it can't find the TLB entry in the host TLB. When it fails, we have to abort the emulation and resume the guest at the same IP. Now the guest gets the TLB miss, we populate, go back into the guest. The guest hits the emulation failure again. We go back to kvmppc_ld_inst() which succeeds this time and we can emulate the instruction. That's pretty much what this patch does, except that it goes immediately to the TLB miss code rather than having the extra round-trip back to the guest. Is there any benefit from adding that extra round-trip? Rewriting the exit type instead doesn't seem that bad... It's pretty bad. I want to have code that is easy to follow - and I don't care whether the very rare case of a TLB entry getting evicted by a random other thread right when we execute the exit path is slower by a few percent if we get cleaner code for that. I think this works. Just make sure that the gateway to the instruction fetch is kvmppc_get_last_inst() and make that failable. Then the difference between looking for the TLB entry in the host's TLB or in the guest's TLB cache is hopefully negligible. I don't follow here. What does this have to do with looking in the guest TLB? I want to hide the fact that we're cheating as much as possible, that's it. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/10/2013 05:50:01 PM, Alexander Graf wrote: On 10.07.2013, at 20:42, Scott Wood wrote: On 07/10/2013 05:15:09 AM, Alexander Graf wrote: On 10.07.2013, at 02:06, Scott Wood wrote: On 07/09/2013 04:44:24 PM, Alexander Graf wrote: On 09.07.2013, at 20:46, Scott Wood wrote: I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. We have code to walk the guest TLB for TLB misses. This really is just the TLB miss search without host TLB injection. So let's say we're using the shadow TLB. The guest always has its say 64 TLB entries that it can count on - we never evict anything by accident, because we store all of the 64 entries in our guest TLB cache. When the guest faults at an address, the first thing we do is we check the cache whether we have that page already mapped. However, with this method we now have 2 enumeration methods for guest TLB searches. We have the tlbsx one which searches the host TLB and we have our guest TLB cache. The guest TLB cache might still contain an entry for an address that we already invalidated on the host. Would that impose a problem? I guess not because we're swizzling the exit code around to instead be an instruction miss which means we restore the TLB entry into our host's TLB so that when we resume, we land here and the tlbsx hits. But it feels backwards. Any better way? Searching the guest TLB won't work for the LRAT case, so we'd need to have this logic around anyway. We shouldn't add a second codepath unless it's a clear performance gain -- and again, I suspect it would be the opposite, especially if the entry is not in TLB0 or in one of the first few entries searched in TLB1. The tlbsx miss case is not what we should optimize for. Hrm. So let's redesign this thing theoretically. We would have an exit that requires an instruction fetch. We would override kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can fail because it can't find the TLB entry in the host TLB. When it fails, we have to abort the emulation and resume the guest at the same IP. Now the guest gets the TLB miss, we populate, go back into the guest. The guest hits the emulation failure again. We go back to kvmppc_ld_inst() which succeeds this time and we can emulate the instruction. That's pretty much what this patch does, except that it goes immediately to the TLB miss code rather than having the extra round-trip back to the guest. Is there any benefit from adding that extra round-trip? Rewriting the exit type instead doesn't seem that bad... It's pretty bad. I want to have code that is easy to follow - and I don't care whether the very rare case of a TLB entry getting evicted by a random other thread right when we execute the exit path is slower by a few percent if we get cleaner code for that. I guess I just don't see how this is so much harder to follow than returning to guest. I find it harder to follow the flow when there are more round trips to the guest involved. Treat this as an ITLB miss is simpler than, Let this fail, and make sure we retry the trapping instruction on failure. Then, an ITLB miss will happen. Also note that making kvmppc_get_last_inst() able to fail means updating several existing callsites, both for the change in function signature and to actually handle failures. I don't care that deeply either way, it just doesn't seem obviously better. I think this works. Just make sure that the gateway to the instruction fetch is kvmppc_get_last_inst() and make that failable. Then the difference between looking for the TLB entry in the host's TLB or in the guest's TLB cache is hopefully negligible. I don't follow here. What does this have to do with looking in the guest TLB? I want to hide the fact that we're cheating as much as possible, that's it. How are we cheating, and what specifically are you proposing to do to hide that? How is the guest TLB involved at all in the change you're asking for? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 11.07.2013, at 02:15, Scott Wood wrote: On 07/10/2013 05:50:01 PM, Alexander Graf wrote: On 10.07.2013, at 20:42, Scott Wood wrote: On 07/10/2013 05:15:09 AM, Alexander Graf wrote: On 10.07.2013, at 02:06, Scott Wood wrote: On 07/09/2013 04:44:24 PM, Alexander Graf wrote: On 09.07.2013, at 20:46, Scott Wood wrote: I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. We have code to walk the guest TLB for TLB misses. This really is just the TLB miss search without host TLB injection. So let's say we're using the shadow TLB. The guest always has its say 64 TLB entries that it can count on - we never evict anything by accident, because we store all of the 64 entries in our guest TLB cache. When the guest faults at an address, the first thing we do is we check the cache whether we have that page already mapped. However, with this method we now have 2 enumeration methods for guest TLB searches. We have the tlbsx one which searches the host TLB and we have our guest TLB cache. The guest TLB cache might still contain an entry for an address that we already invalidated on the host. Would that impose a problem? I guess not because we're swizzling the exit code around to instead be an instruction miss which means we restore the TLB entry into our host's TLB so that when we resume, we land here and the tlbsx hits. But it feels backwards. Any better way? Searching the guest TLB won't work for the LRAT case, so we'd need to have this logic around anyway. We shouldn't add a second codepath unless it's a clear performance gain -- and again, I suspect it would be the opposite, especially if the entry is not in TLB0 or in one of the first few entries searched in TLB1. The tlbsx miss case is not what we should optimize for. Hrm. So let's redesign this thing theoretically. We would have an exit that requires an instruction fetch. We would override kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can fail because it can't find the TLB entry in the host TLB. When it fails, we have to abort the emulation and resume the guest at the same IP. Now the guest gets the TLB miss, we populate, go back into the guest. The guest hits the emulation failure again. We go back to kvmppc_ld_inst() which succeeds this time and we can emulate the instruction. That's pretty much what this patch does, except that it goes immediately to the TLB miss code rather than having the extra round-trip back to the guest. Is there any benefit from adding that extra round-trip? Rewriting the exit type instead doesn't seem that bad... It's pretty bad. I want to have code that is easy to follow - and I don't care whether the very rare case of a TLB entry getting evicted by a random other thread right when we execute the exit path is slower by a few percent if we get cleaner code for that. I guess I just don't see how this is so much harder to follow than returning to guest. I find it harder to follow the flow when there are more round trips to the guest involved. Treat this as an ITLB miss is simpler than, Let this fail, and make sure we retry the trapping instruction on failure. Then, an ITLB miss will happen. Also note that making kvmppc_get_last_inst() able to fail means updating several existing callsites, both for the change in function signature and to actually handle failures. I don't care that deeply either way, it just doesn't seem obviously better. I think this works. Just make sure that the gateway to the instruction fetch is kvmppc_get_last_inst() and make that failable. Then the difference between looking for the TLB entry in the host's TLB or in the guest's TLB cache is hopefully negligible. I don't follow here. What does this have to do with looking in the guest TLB? I want to hide the fact that we're cheating as much as possible, that's it. How are we cheating, and what specifically are you proposing to do to hide that? How is the guest TLB involved at all in the change you're asking for? It's not involved, but it's basically what we do on book3s pr kvm. There kvmppc_ld reads the guest htab, not the host htab. I think it's fine to expose both cases as the same thing to the rest of kvm. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/08/2013 08:39:05 AM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop What's the difference in speed for this? Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets invoked? Any use of lwepx is problematic unless we want to add overhead to the main Linux TLB miss handler. + return; + } + + mas3 = mfspr(SPRN_MAS3); + pr = vcpu-arch.shared-msr MSR_PR; + if ((pr (!(mas3 MAS3_UX))) || ((!pr) (!(mas3 MAS3_SX { + /* + * Another thread may rewrite the TLB entry in parallel, don't + * execute from the address if the execute permission is not set Isn't this racy? Yes, that's the point. We want to access permissions atomically with the address. If the guest races here, the unpredictable behavior is its own fault, but we don't want to make it worse by assuming that the new TLB entry is executable just because the old TLB entry was. There's still a potential problem if the instruction at the new TLB entry is valid but not something that KVM emulates (because it wouldn't have trapped). Given that the guest is already engaging in unpredictable behavior, though, and that it's no longer a security issue (it'll just cause the guest to exit), I don't think we need to worry too much about it. + */ + vcpu-arch.fault_esr = 0; + *exit_nr = BOOKE_INTERRUPT_INST_STORAGE; + return; + } + + /* Get page size */ + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0) + psize_shift = PAGE_SHIFT; + else + psize_shift = MAS1_GET_TSIZE(mas1) + 10; + + mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) 32) | + mfspr(SPRN_MAS3); You're non-atomically reading MAS3/MAS7 after you've checked for permissions on MAS3. I'm surprised there's no handler that allows MAS3/7 access through the new, combined SPR for 64bit systems. There is, but then we'd need to special-case 64-bit systems. Why does atomicity matter here? The MAS registers were filled in when we did the tlbsx. They are thread-local. They don't magically change just because the other thread rewrites the TLB entry that was used to fill them. + addr = (mas7_mas3 (~0ULL psize_shift)) | + (geaddr ((1ULL psize_shift) - 1ULL)); + + /* Map a page and get guest's instruction */ + page = pfn_to_page(addr PAGE_SHIFT); So it seems to me like you're jumping through a lot of hoops to make sure this works for LRAT and non-LRAT at the same time. Can't we just treat them as the different things they are? What if we have different MMU backends for LRAT and non-LRAT? The non-LRAT case could then try lwepx, if that fails, fall back to read the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall back to this logic. This isn't about LRAT; it's about hardware threads. It also fixes the handling of execute-only pages on current chips. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/09/2013 07:13 PM, Scott Wood wrote: On 07/08/2013 08:39:05 AM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop What's the difference in speed for this? Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets invoked? Any use of lwepx is problematic unless we want to add overhead to the main Linux TLB miss handler. What exactly would be missing? I'd also still like to see some performance benchmarks on this to make sure we're not walking into a bad direction. +return; +} + +mas3 = mfspr(SPRN_MAS3); +pr = vcpu-arch.shared-msr MSR_PR; +if ((pr (!(mas3 MAS3_UX))) || ((!pr) (!(mas3 MAS3_SX { + /* + * Another thread may rewrite the TLB entry in parallel, don't + * execute from the address if the execute permission is not set Isn't this racy? Yes, that's the point. We want to access permissions atomically with the address. If the guest races here, the unpredictable behavior is its own fault, but we don't want to make it worse by assuming that the new TLB entry is executable just because the old TLB entry was. I see. There's still a potential problem if the instruction at the new TLB entry is valid but not something that KVM emulates (because it wouldn't have trapped). Given that the guest is already engaging in unpredictable behavior, though, and that it's no longer a security issue (it'll just cause the guest to exit), I don't think we need to worry too much about it. No, that case is fine. It's the same as book3s pr. + */ +vcpu-arch.fault_esr = 0; +*exit_nr = BOOKE_INTERRUPT_INST_STORAGE; +return; +} + +/* Get page size */ +if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0) +psize_shift = PAGE_SHIFT; +else +psize_shift = MAS1_GET_TSIZE(mas1) + 10; + +mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) 32) | +mfspr(SPRN_MAS3); You're non-atomically reading MAS3/MAS7 after you've checked for permissions on MAS3. I'm surprised there's no handler that allows MAS3/7 access through the new, combined SPR for 64bit systems. There is, but then we'd need to special-case 64-bit systems. Oh, what I was trying to say is that I'm surprised there's nothing in Linux already like static inline u64 get_mas73(void) { #ifdef CONFIG_PPC64 return mfspr(SPRN_MAS73) #else return ((u64)mfspr(SPRN_MAS7) 32) | mfspr(SPRN_MAS3); #endif } Why does atomicity matter here? The MAS registers were filled in when we did the tlbsx. They are thread-local. They don't magically change just because the other thread rewrites the TLB entry that was used to fill them. Yeah, it doesn't matter. +addr = (mas7_mas3 (~0ULL psize_shift)) | + (geaddr ((1ULL psize_shift) - 1ULL)); + +/* Map a page and get guest's instruction */ +page = pfn_to_page(addr PAGE_SHIFT); So it seems to me like you're jumping through a lot of hoops to make sure this works for LRAT and non-LRAT at the same time. Can't we just treat them as the different things they are? What if we have different MMU backends for LRAT and non-LRAT? The non-LRAT case could then try lwepx, if that fails, fall back to read the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall back to this logic. This isn't about LRAT; it's about hardware threads. It also fixes the handling of execute-only pages on current chips. On non-LRAT systems we could always check our shadow copy of the guest's TLB, no? I'd really like to know what the performance difference would be for the 2 approaches. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/09/2013 12:44:32 PM, Alexander Graf wrote: On 07/09/2013 07:13 PM, Scott Wood wrote: On 07/08/2013 08:39:05 AM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop What's the difference in speed for this? Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets invoked? Any use of lwepx is problematic unless we want to add overhead to the main Linux TLB miss handler. What exactly would be missing? If lwepx faults, it goes to the normal host TLB miss handler. Without adding code to it to recognize that it's an external-PID fault, it will try to search the normal Linux page tables and insert a normal host entry. If it thinks it has succeeded, it will retry the instruction rather than search for an exception handler. The instruction will fault again, and you get a hang. I'd also still like to see some performance benchmarks on this to make sure we're not walking into a bad direction. I doubt it'll be significantly different. There's overhead involved in setting up for lwepx as well. It doesn't hurt to test, though this is a functional correctness issue, so I'm not sure what better alternatives we have. I don't want to slow down non-KVM TLB misses for this. +addr = (mas7_mas3 (~0ULL psize_shift)) | + (geaddr ((1ULL psize_shift) - 1ULL)); + +/* Map a page and get guest's instruction */ +page = pfn_to_page(addr PAGE_SHIFT); So it seems to me like you're jumping through a lot of hoops to make sure this works for LRAT and non-LRAT at the same time. Can't we just treat them as the different things they are? What if we have different MMU backends for LRAT and non-LRAT? The non-LRAT case could then try lwepx, if that fails, fall back to read the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall back to this logic. This isn't about LRAT; it's about hardware threads. It also fixes the handling of execute-only pages on current chips. On non-LRAT systems we could always check our shadow copy of the guest's TLB, no? I'd really like to know what the performance difference would be for the 2 approaches. I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 09.07.2013, at 20:46, Scott Wood wrote: On 07/09/2013 12:44:32 PM, Alexander Graf wrote: On 07/09/2013 07:13 PM, Scott Wood wrote: On 07/08/2013 08:39:05 AM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop What's the difference in speed for this? Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets invoked? Any use of lwepx is problematic unless we want to add overhead to the main Linux TLB miss handler. What exactly would be missing? If lwepx faults, it goes to the normal host TLB miss handler. Without adding code to it to recognize that it's an external-PID fault, it will try to search the normal Linux page tables and insert a normal host entry. If it thinks it has succeeded, it will retry the instruction rather than search for an exception handler. The instruction will fault again, and you get a hang. :( So we either have to rewrite IVOR / IVPR or add a branch in the hot TLB miss interrupt handler. Both alternatives suck. I'd also still like to see some performance benchmarks on this to make sure we're not walking into a bad direction. I doubt it'll be significantly different. There's overhead involved in setting up for lwepx as well. It doesn't hurt to test, though this is a functional correctness issue, so I'm not sure what better alternatives we have. I don't want to slow down non-KVM TLB misses for this. Yeah, I concur on that part. It probably won't get better. Sigh. +addr = (mas7_mas3 (~0ULL psize_shift)) | + (geaddr ((1ULL psize_shift) - 1ULL)); + +/* Map a page and get guest's instruction */ +page = pfn_to_page(addr PAGE_SHIFT); So it seems to me like you're jumping through a lot of hoops to make sure this works for LRAT and non-LRAT at the same time. Can't we just treat them as the different things they are? What if we have different MMU backends for LRAT and non-LRAT? The non-LRAT case could then try lwepx, if that fails, fall back to read the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall back to this logic. This isn't about LRAT; it's about hardware threads. It also fixes the handling of execute-only pages on current chips. On non-LRAT systems we could always check our shadow copy of the guest's TLB, no? I'd really like to know what the performance difference would be for the 2 approaches. I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. We have code to walk the guest TLB for TLB misses. This really is just the TLB miss search without host TLB injection. So let's say we're using the shadow TLB. The guest always has its say 64 TLB entries that it can count on - we never evict anything by accident, because we store all of the 64 entries in our guest TLB cache. When the guest faults at an address, the first thing we do is we check the cache whether we have that page already mapped. However, with this method we now have 2 enumeration methods for guest TLB searches. We have the tlbsx one which searches the host TLB and we have our guest TLB cache. The guest TLB cache might still contain an entry for an address that we already invalidated on the host. Would that impose a problem? I guess not because we're swizzling the exit code around to instead be an instruction miss which means we restore the TLB entry into our host's TLB so that when we resume, we land here and the tlbsx hits. But it feels backwards. At least this code has to become something more generic, such as kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu implementation, as it's 100% host mmu specific. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist for a while. arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0)(((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK0x0FFF #define MAS0_ESEL_SHIFT 16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK 0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X0 0x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* + * The exception type can change at this point, such as if the TLB entry + * for the emulated instruction has been evicted. + */ + kvmppc_prepare_for_emulation(vcpu, exit_nr); + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* - * This assumes you have external PID support. - * To support a bookehv CPU without external PID, you'll - * need to look up the TLB entry and create a temporary mapping. - * - * FIXME: we don't currently handle if the lwepx faults. PR-mode - * booke doesn't handle it either. Since Linux doesn't use - * broadcast tlbivax anymore, the only way this should happen is - * if the guest maps its memory execute-but-not-read, or if we - * somehow take a TLB miss in the middle of this entry code and - * evict the relevant entry. On e500mc, all kernel lowmem is - * bolted into TLB1 large page mappings, and we don't use - * broadcast invalidates, so we should not take a TLB miss here. - * - * Later we'll need to deal with faults here. Disallowing guest - * mappings that are execute-but-not-read could be an option on - * e500mc, but not on chips with an LRAT if it is used. + * We don't use external PID support. lwepx faults would need to be + * handled by KVM and this implies aditional code in DO_KVM (for + * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which + * is
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/09/2013 04:44:24 PM, Alexander Graf wrote: On 09.07.2013, at 20:46, Scott Wood wrote: I suspect that tlbsx is faster, or at worst similar. And unlike comparing tlbsx to lwepx (not counting a fix for the threading problem), we don't already have code to search the guest TLB, so testing would be more work. We have code to walk the guest TLB for TLB misses. This really is just the TLB miss search without host TLB injection. So let's say we're using the shadow TLB. The guest always has its say 64 TLB entries that it can count on - we never evict anything by accident, because we store all of the 64 entries in our guest TLB cache. When the guest faults at an address, the first thing we do is we check the cache whether we have that page already mapped. However, with this method we now have 2 enumeration methods for guest TLB searches. We have the tlbsx one which searches the host TLB and we have our guest TLB cache. The guest TLB cache might still contain an entry for an address that we already invalidated on the host. Would that impose a problem? I guess not because we're swizzling the exit code around to instead be an instruction miss which means we restore the TLB entry into our host's TLB so that when we resume, we land here and the tlbsx hits. But it feels backwards. Any better way? Searching the guest TLB won't work for the LRAT case, so we'd need to have this logic around anyway. We shouldn't add a second codepath unless it's a clear performance gain -- and again, I suspect it would be the opposite, especially if the entry is not in TLB0 or in one of the first few entries searched in TLB1. The tlbsx miss case is not what we should optimize for. At least this code has to become something more generic, such as kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu implementation, as it's 100% host mmu specific. I agree that e500_mmu_host.c is a better place for it (with an ifdef for BOOKEHV), but supporting anything other than instruction fetches could wait until we have a user for it (it means extra code to figure out if permissions are correct). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 07/09/2013 04:45:10 PM, Alexander Graf wrote: On 28.06.2013, at 11:20, Mihai Caraman wrote: + /* Get page size */ + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0) + psize_shift = PAGE_SHIFT; + else + psize_shift = MAS1_GET_TSIZE(mas1) + 10; + + mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) 32) | + mfspr(SPRN_MAS3); + addr = (mas7_mas3 (~0ULL psize_shift)) | + (geaddr ((1ULL psize_shift) - 1ULL)); + + /* Map a page and get guest's instruction */ + page = pfn_to_page(addr PAGE_SHIFT); While looking at this I just realized that you're missing a check here. What if our IP is in some PCI BAR? Or can't we execute from those? We at least need to check pfn_valid() first. That'll just keep us from accessing a bad pointer in the host kernel, though -- it won't make the emulation actually work. If we need that, we'll probably need to create a temporary TLB entry manually. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop What's the difference in speed for this? Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets invoked? caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist for a while. arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0)(((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK0x0FFF #define MAS0_ESEL_SHIFT 16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK 0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X0 0x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* + * The exception type can change at this point, such as if the TLB entry + * for the emulated instruction has been evicted. + */ + kvmppc_prepare_for_emulation(vcpu, exit_nr); Please model this the same way as book3s. Check out kvmppc_get_last_inst() as a starting point. + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* - * This assumes you have external PID support. - * To support a bookehv CPU without external PID, you'll - * need to look up the TLB entry and create a temporary mapping. - * - * FIXME: we don't currently handle if the lwepx faults. PR-mode - * booke doesn't handle it either. Since Linux doesn't use - * broadcast tlbivax anymore, the only way this should happen is - * if the guest maps its memory execute-but-not-read, or if we - * somehow take a TLB miss in the middle of this entry code and - * evict the relevant entry. On e500mc, all kernel lowmem is - * bolted into TLB1 large page mappings, and we don't use - * broadcast invalidates, so we should not take a TLB miss here. - * - * Later we'll need to deal with faults here. Disallowing guest - * mappings that are execute-but-not-read could be an option on - * e500mc, but not on chips with an LRAT if it is used.
[PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist for a while. arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT 28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0) (((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK 0x0FFF #define MAS0_ESEL_SHIFT16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X00x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* +* The exception type can change at this point, such as if the TLB entry +* for the emulated instruction has been evicted. +*/ + kvmppc_prepare_for_emulation(vcpu, exit_nr); + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* -* This assumes you have external PID support. -* To support a bookehv CPU without external PID, you'll -* need to look up the TLB entry and create a temporary mapping. -* -* FIXME: we don't currently handle if the lwepx faults. PR-mode -* booke doesn't handle it either. Since Linux doesn't use -* broadcast tlbivax anymore, the only way this should happen is -* if the guest maps its memory execute-but-not-read, or if we -* somehow take a TLB miss in the middle of this entry code and -* evict the relevant entry. On e500mc, all kernel lowmem is -* bolted into TLB1 large page mappings, and we don't use -* broadcast invalidates, so we should not take a TLB miss here. -* -* Later we'll need to deal with faults here. Disallowing guest -* mappings that are execute-but-not-read could be an option on -* e500mc, but not on chips with an LRAT if it is used. +* We don't use external PID support. lwepx faults would need to be +* handled by KVM and this implies aditional code in DO_KVM (for +* DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which +* is too intrusive for the host. Get last instuction in +*
[PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated in host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT 28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0) (((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK 0x0FFF #define MAS0_ESEL_SHIFT16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X00x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* +* The exception type can change at this point, such as if the TLB entry +* for the emulated instruction has been evicted. +*/ + kvmppc_prepare_for_emulation(vcpu, exit_nr); + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* -* This assumes you have external PID support. -* To support a bookehv CPU without external PID, you'll -* need to look up the TLB entry and create a temporary mapping. -* -* FIXME: we don't currently handle if the lwepx faults. PR-mode -* booke doesn't handle it either. Since Linux doesn't use -* broadcast tlbivax anymore, the only way this should happen is -* if the guest maps its memory execute-but-not-read, or if we -* somehow take a TLB miss in the middle of this entry code and -* evict the relevant entry. On e500mc, all kernel lowmem is -* bolted into TLB1 large page mappings, and we don't use -* broadcast invalidates, so we should not take a TLB miss here. -* -* Later we'll need to deal with faults here. Disallowing guest -* mappings that are execute-but-not-read could be an option on -* e500mc, but not on chips with an LRAT if it is used. +* We don't use external PID support. lwepx faults would need to be +* handled by KVM and this implies aditional code in DO_KVM (for +* DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which +* is too intrusive for the host. Get last instuction in +* kvmppc_handle_exit(). */ - - mfspr r3, SPRN_EPLC /* will already have correct ELPID