Re: MMU tricks for NetBSD guests
On Fri, 2009-04-03 at 00:52 +0200, Alexander Graf wrote: That sounds a lot like what I imlemented for real mode on 970. I assume the PID is similar to a full SLB context and AS=1/AS=0 is just another bit that could as well be in the PID? Mostly... however, when an interrupt occurs, AS is set to 0 and PID remains unchanged. Also, AS can have different settings for instruction and data fetches. (I've been abbreviating as MSR[AS], but technically I should be writing MSR[IS] for instructions or MSR[DS] for data). So what we do on 970[1] is we treat real mode as yet another vsid. 970 translates EA - VA - RA. It looks like booke does the same, with the VSID coming from the PID. Exactly -- Book E uses AS | PID to provide the VSID, while Book S uses the SLB. The Book E way is much simpler, and also avoids the effective address collision problem we ran into on 970, because AS/PID don't depend on the EA. -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
MMU tricks for NetBSD guests
On Thu, 2009-04-02 at 11:56 -0700, Rahul Kulkarni wrote: Rahul, one major quirk we exploit is that Linux does not use the MSR[AS] bit at all. One way that bit could be used is to give 32-bit userspace a separate 4GB address space from the kernel. Instead, Linux puts both kernel and userspace into the same 4GB address space (with Linux mappings above 0xc000 and user mappings below). If NetBSD uses MSR[AS]=1 for userspace (which I think is what the hardware architects envisioned), you're going to have a lot of MMU fun. Rahul The NetBSD port for e500/85xx which we have uses the MSR[AS] (IS/DS) for user/kernel address space separation which keep the address spaces split. So that's a major problem to start with. How do we get creative with this to provide guest mappings is something, which has to be explored. Let me know if you have any thoughts.. OK, so this is going to be a fun one if you like this sort of thing. (I like this sort of thing, but unfortunately don't have any time I can commit to it.) I haven't thought through the details all the way, but at a high level here are my thoughts: First, to understand the architecture and the shortcut we're using today, read http://www.linux-kvm.org/page/PowerPC_Book_E_MMU . Now if you don't have the AS shortcut (which you don't), the key observation is that the guest is really is a collection of 4GB address spaces, and those are identified by 9-bit AS|PID. (By the way, does NetBSD use PID1 and PID2? I sure hope not... :) You can treat the 2^9 guest spaces as separate host spaces. When the guest uses a space, reserve a host space for it, and then map guest AS| PID to the host spaces. So for example: * Guest creates a new process and gives it PID 7. * KVM reserves a new host PID. Let's say host PID 23 is available. * Guest creates a mapping (tlbwe) for PID 7, EA 0xc, RA 0x0. * Host intercepts this (it's a privilege violation because guest is running with MSR[PR]=1). * Host already translates real address from guest physical to host physical. Let's say guest physical 0 corresponds to host physical 128M. * Your new code: host *also* translates guest PID (7) to host PID (23). * Resulting shadow mapping: PID 23, EA 0xc000, RA 0x0200. You'll probably want all shadow mappings to have AS=1. In that case, you would treat guest AS=0 PID=7 as a separate host address space from guest AS=1 PID=7. gAS|gPID 0|7 would be hAS|hPID 1|23, and gAS|gPID 1|7 would be hAS|hPID 1|24. In other words, each guest task (PID) will consume two host address spaces (two different host PIDs, one for each guest AS value). Alexander Graf has already done something like this for his 970 work, so he might be able to provide more details or issues to be aware of in a scheme like this. It would be easier to whiteboard, but obviously that's not really an option... -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MMU tricks for NetBSD guests
On 02.04.2009, at 22:08, Hollis Blanchard wrote: On Thu, 2009-04-02 at 11:56 -0700, Rahul Kulkarni wrote: Rahul, one major quirk we exploit is that Linux does not use the MSR[AS] bit at all. One way that bit could be used is to give 32-bit userspace a separate 4GB address space from the kernel. Instead, Linux puts both kernel and userspace into the same 4GB address space (with Linux mappings above 0xc000 and user mappings below). If NetBSD uses MSR[AS]=1 for userspace (which I think is what the hardware architects envisioned), you're going to have a lot of MMU fun. Rahul The NetBSD port for e500/85xx which we have uses the MSR[AS] (IS/DS) for user/kernel address space separation which keep the address spaces split. So that's a major problem to start with. How do we get creative with this to provide guest mappings is something, which has to be explored. Let me know if you have any thoughts.. OK, so this is going to be a fun one if you like this sort of thing. (I like this sort of thing, but unfortunately don't have any time I can commit to it.) I haven't thought through the details all the way, but at a high level here are my thoughts: First, to understand the architecture and the shortcut we're using today, read http://www.linux-kvm.org/page/PowerPC_Book_E_MMU . Now if you don't have the AS shortcut (which you don't), the key observation is that the guest is really is a collection of 4GB address spaces, and those are identified by 9-bit AS|PID. (By the way, does NetBSD use PID1 and PID2? I sure hope not... :) You can treat the 2^9 guest spaces as separate host spaces. When the guest uses a space, reserve a host space for it, and then map guest AS| PID to the host spaces. So for example: * Guest creates a new process and gives it PID 7. * KVM reserves a new host PID. Let's say host PID 23 is available. * Guest creates a mapping (tlbwe) for PID 7, EA 0xc, RA 0x0. * Host intercepts this (it's a privilege violation because guest is running with MSR[PR]=1). * Host already translates real address from guest physical to host physical. Let's say guest physical 0 corresponds to host physical 128M. * Your new code: host *also* translates guest PID (7) to host PID (23). * Resulting shadow mapping: PID 23, EA 0xc000, RA 0x0200. You'll probably want all shadow mappings to have AS=1. In that case, you would treat guest AS=0 PID=7 as a separate host address space from guest AS=1 PID=7. gAS|gPID 0|7 would be hAS|hPID 1|23, and gAS|gPID 1|7 would be hAS|hPID 1|24. In other words, each guest task (PID) will consume two host address spaces (two different host PIDs, one for each guest AS value). Alexander Graf has already done something like this for his 970 work, so he might be able to provide more details or issues to be aware of in a scheme like this. That sounds a lot like what I imlemented for real mode on 970. I assume the PID is similar to a full SLB context and AS=1/AS=0 is just another bit that could as well be in the PID? So what we do on 970[1] is we treat real mode as yet another vsid. 970 translates EA - VA - RA. It looks like booke does the same, with the VSID coming from the PID. This basically means that if we're getting into real mode in the guest, we just switch to guest VSID 0x (which doesn't exist in guests) and map that as one of our host VSIDs available in the pool. You could do the same. Just OR the AS bit into your guest PID you use to map things and allocate whatever PID you need on the host dynamically :-). Alex [1] Sources at http://www.powerkvm.org -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html