Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Mon, Nov 29, 2010 at 03:01:10PM -0500, valdis.kletni...@vt.edu wrote: > On Mon, 29 Nov 2010 19:32:12 +0100, Joerg Roedel said: > > On Mon, Nov 29, 2010 at 12:23:38PM -0500, valdis.kletni...@vt.edu wrote: > > > (Sorry for late reply...) > > > > > > On Thu, 25 Nov 2010 17:23:13 +0100, "Roedel, Joerg" said: > > > > On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: > > > > > On 11/25/2010 03:13 PM, Roedel, Joerg wrote: > > > > > What about things like adding instructions and forgetting to add the > > > > > corresponding svm.c code? > > > > Cannot happen. Every instruction that can be intercepted with SVM is > > > > already handled in this patch-set. > > > > > > Call us back when Intel releases the i9 and i11 with new instructions > > > that need intercept handling. ;) > > > > How does that affect SVM? > > It will quite possibly include instructions that can be intercepted with SVM > that are not in this patch set. At which point Joerg's comment can apply - it > will be possible to add it in one place and forget to add it in the svm.c > code. SVM is AMD-only. So if an instruction does not exist on AMD there will also be no specific intercept for it. For newly added instructions to the AMD ISA which can then be intercepted I have to do bringup work anyway. This will include adding these intercepts to the code in this patch-set. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Mon, 29 Nov 2010 19:32:12 +0100, Joerg Roedel said: > On Mon, Nov 29, 2010 at 12:23:38PM -0500, valdis.kletni...@vt.edu wrote: > > (Sorry for late reply...) > > > > On Thu, 25 Nov 2010 17:23:13 +0100, "Roedel, Joerg" said: > > > On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: > > > > On 11/25/2010 03:13 PM, Roedel, Joerg wrote: > > > > What about things like adding instructions and forgetting to add the > > > > corresponding svm.c code? > > > Cannot happen. Every instruction that can be intercepted with SVM is > > > already handled in this patch-set. > > > > Call us back when Intel releases the i9 and i11 with new instructions > > that need intercept handling. ;) > > How does that affect SVM? It will quite possibly include instructions that can be intercepted with SVM that are not in this patch set. At which point Joerg's comment can apply - it will be possible to add it in one place and forget to add it in the svm.c code. pgpxviFJOpuUT.pgp Description: PGP signature
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Mon, Nov 29, 2010 at 12:23:38PM -0500, valdis.kletni...@vt.edu wrote: > (Sorry for late reply...) > > On Thu, 25 Nov 2010 17:23:13 +0100, "Roedel, Joerg" said: > > On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: > > > On 11/25/2010 03:13 PM, Roedel, Joerg wrote: > > > What about things like adding instructions and forgetting to add the > > > corresponding svm.c code? > > Cannot happen. Every instruction that can be intercepted with SVM is > > already handled in this patch-set. > > Call us back when Intel releases the i9 and i11 with new instructions > that need intercept handling. ;) How does that affect SVM? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
(Sorry for late reply...) On Thu, 25 Nov 2010 17:23:13 +0100, "Roedel, Joerg" said: > On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: > > On 11/25/2010 03:13 PM, Roedel, Joerg wrote: > > What about things like adding instructions and forgetting to add the > > corresponding svm.c code? > Cannot happen. Every instruction that can be intercepted with SVM is > already handled in this patch-set. Call us back when Intel releases the i9 and i11 with new instructions that need intercept handling. ;) pgp2d3vqVooZD.pgp Description: PGP signature
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On 11/25/2010 08:21 PM, Roedel, Joerg wrote: On Thu, Nov 25, 2010 at 10:15:43AM -0500, Avi Kivity wrote: > On 11/25/2010 01:46 PM, Roedel, Joerg wrote: > Eventually the emulator will be used outside kvm. We don't want to tie > the two together. Does any user outside of KVM care about nested virtualization? No. > All that's needed is to read the svm chapter in the AMD manual; you > don't need to understand kvm or out nested svm implementation. On the > other hand, some information needs to be encoded in the emulator (the > order of the intercept check vs exception check) or we need to duplicate > checks. We also do a split decode. Is that person also required to read through the 500 pages of VMX documentation when nested VMX gets merged? Yes. > So they get special treatment. Decode bits are for the general case. > > Let's see: > > CRx/DRx checks - need group mechanism extension, can use decode bits The CRx writes are mostly special because exceptions for validity of the values written take precedence over the intercept. We can have three checks, controlled by the decode bits: // decode instruction if ((c->d & SvmMask) == SvmInterceptBefore) ... do intercept check // do privivilge level checks if ((c->d & SvmMask) == SvmInterceptAfterPriv) ... do intercept check // fetch operands if ((c->d & SvmMask) == SvmInterceptAfterMemory) ... do intercept check Implementing these checks also requires to put the intercept check into the kvm_set_crX functions, which, by themselves, needs to be reworked in an SVM specific way for this. Add a kvm_x86_ops callback for this (vmx as usual is pretty complicated here) > Selective CR0 - special Needs to be handled in the write-cr0 path In the appropriate callback > LIDT/SIDT/LGDT/SGDT/LLDT/SLDT/LTR/STR - decode bits Check for a valid address before the intercept check. Thus special too. See above - we can regularize it by encoding where the check takes place. > RDTSC/RDPMC/CPUID - decode bits RDTSC and RDPMC check all exceptions before the intercept too. > PUSHF/POPF/RSM/IRET/INTn - decode bits, + flag to check before exceptions Should work with decode-bits. > INVD /HLT/INVLPG/INVLPGA - decode bits Exceptions are only caused on cpl> 0 and take precedence over the intercept. Should work with decode bits. > VMRUN/VMLOAD/VMSAVE/VMMCALL/STGI/CLGI/SKINIT - decode bits (VMMCALL > preempts exceptions) VMRUN/VMLOAD/VMSAVE need to check rax for a valid physical address before the intercept is taken. Add an SrcPhys/DstPhys decode, it becomes regular. All SVM instructions are not allowed in real-mode which needs to be checkd too. The realmode-check may be generic but with the address check this is harder. So at least VMRUN/VMLOAD/VMSAVE are special too. Further the SVM instructions are not implemented in the emulator at all (like some other instructions which can be intercepted). Proper emulation of these instructions would require new callbacks. Sure. > RDTSCP/ICEBP/WBINVD/MONITOR/MWAIT - decode bits RDTSCP needs special handling like RDTSC. Why? MONITOR is special too because it checks all exceptions before the intercept. All this can be done, but I doubt the result will look better or is better maintainable than the current the solution in this patch-set. With proper infrastructure I think all the modifications needed will be the three checks above and the decode bits (assuming the current crx/drx/pio callbacks are in the right place). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Thu, Nov 25, 2010 at 10:15:43AM -0500, Avi Kivity wrote: > On 11/25/2010 01:46 PM, Roedel, Joerg wrote: > Eventually the emulator will be used outside kvm. We don't want to tie > the two together. Does any user outside of KVM care about nested virtualization? > All that's needed is to read the svm chapter in the AMD manual; you > don't need to understand kvm or out nested svm implementation. On the > other hand, some information needs to be encoded in the emulator (the > order of the intercept check vs exception check) or we need to duplicate > checks. We also do a split decode. Is that person also required to read through the 500 pages of VMX documentation when nested VMX gets merged? > So they get special treatment. Decode bits are for the general case. > > Let's see: > >CRx/DRx checks - need group mechanism extension, can use decode bits The CRx writes are mostly special because exceptions for validity of the values written take precedence over the intercept. Implementing these checks also requires to put the intercept check into the kvm_set_crX functions, which, by themselves, needs to be reworked in an SVM specific way for this. >Selective CR0 - special Needs to be handled in the write-cr0 path >LIDT/SIDT/LGDT/SGDT/LLDT/SLDT/LTR/STR - decode bits Check for a valid address before the intercept check. Thus special too. >RDTSC/RDPMC/CPUID - decode bits RDTSC and RDPMC check all exceptions before the intercept too. >PUSHF/POPF/RSM/IRET/INTn - decode bits, + flag to check before exceptions Should work with decode-bits. >INVD /HLT/INVLPG/INVLPGA - decode bits Exceptions are only caused on cpl > 0 and take precedence over the intercept. Should work with decode bits. >VMRUN/VMLOAD/VMSAVE/VMMCALL/STGI/CLGI/SKINIT - decode bits (VMMCALL > preempts exceptions) VMRUN/VMLOAD/VMSAVE need to check rax for a valid physical address before the intercept is taken. All SVM instructions are not allowed in real-mode which needs to be checkd too. The realmode-check may be generic but with the address check this is harder. So at least VMRUN/VMLOAD/VMSAVE are special too. Further the SVM instructions are not implemented in the emulator at all (like some other instructions which can be intercepted). Proper emulation of these instructions would require new callbacks. >RDTSCP/ICEBP/WBINVD/MONITOR/MWAIT - decode bits RDTSCP needs special handling like RDTSC. MONITOR is special too because it checks all exceptions before the intercept. All this can be done, but I doubt the result will look better or is better maintainable than the current the solution in this patch-set. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: > On 11/25/2010 03:13 PM, Roedel, Joerg wrote: > > On Thu, Nov 25, 2010 at 12:46:40PM +0100, Roedel, Joerg wrote: > > > We basically have two choices here: > > > > > > a) We expose svm internals into the emulator > > > b) We expose emulator internals into svm > > > > > > Both choices are not really good from a software-design point-of-view. > > > But I think option b) is the better one because it is easier to cope with > > > and thus less likely to break when changing the emulator code. > > > > What we could do probably is to define the interface between the > > emulator and the architecture code in a better way. This would take the > > burden of going into architecture code for emulator changes away. > > What about things like adding instructions and forgetting to add the > corresponding svm.c code? Cannot happen. Every instruction that can be intercepted with SVM is already handled in this patch-set. > Good idea. Needed for the decode bits thing as well. Especially needed to not kill the L1 guest when an L2 instruction emulation fails :) Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On 11/25/2010 03:13 PM, Roedel, Joerg wrote: On Thu, Nov 25, 2010 at 12:46:40PM +0100, Roedel, Joerg wrote: > We basically have two choices here: > >a) We expose svm internals into the emulator >b) We expose emulator internals into svm > > Both choices are not really good from a software-design point-of-view. > But I think option b) is the better one because it is easier to cope with > and thus less likely to break when changing the emulator code. What we could do probably is to define the interface between the emulator and the architecture code in a better way. This would take the burden of going into architecture code for emulator changes away. What about things like adding instructions and forgetting to add the corresponding svm.c code? Of course it can happen with everything in the emulator, but at least there's a chance you will see the decode bits in nearby instructions. The current patch-set only needs a subset of the decode-cache (in the future probably also a subset of the fetch-cache). We could put this information into a seperate struct and give it to the architecture code. I planned to make the guest_mode flag a generic x86 vcpu property anyway, so building this structure could be limited to instructions emulated while the vcpu is in guest mode thus avoiding the overhead for the default case. Good idea. Needed for the decode bits thing as well. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On 11/25/2010 01:46 PM, Roedel, Joerg wrote: On Wed, Nov 24, 2010 at 02:13:32PM -0500, Avi Kivity wrote: > On 11/24/2010 08:18 PM, Joerg Roedel wrote: > > Hi Avi, Hi Marcelo, > > > > here is a patch-set to make the instruction emulator aware of nested > > virtualization. It basically works by introducing a new callback into > > the x86_ops to check if a decoded instruction must be intercepted. If it > > is intercepted the instruction emulator returns straight into the guest. > > > > I am not entirely happy with this solution because it partially > > duplicates the code in the x86_emulate_insn function. > > My big worry is that it makes svm.c aware of internal emulator variable, > so it makes it harder to hack on the emulator. I don't think so, the structure of the code in svm.c follows the same structures (even in a simpler way) as in the x86_emulate_insn() function. Someone who changes the internal data structures of the emulator can easily change svm.c too. This person will even recognize the need for this automatically because svm.c will not compile anymore when the data structure is changed. Eventually the emulator will be used outside kvm. We don't want to tie the two together. On the other side, implementing this in the emulator itself would require a person to learn about very low-level svm internals to get everything right (or the changes easily break the code which is more likely). All that's needed is to read the svm chapter in the AMD manual; you don't need to understand kvm or out nested svm implementation. On the other hand, some information needs to be encoded in the emulator (the order of the intercept check vs exception check) or we need to duplicate checks. We also do a split decode. > So I don't think there's a problem with coding the svm intercepts in > emulate.c. This is no different than emulating any AMD-specific > instruction in the emulator - we're emulating an instruction in exactly > the way it is specified in the manual. That would make sense if the Nested-SVM code is implemented in the generic code so that it is usable from VMX too. But that is not the case and also not really doable. Nested VMX could do the same thing. Sometimes the checks would be shared and sometimes not. > Something you could do is allocate bits for the intercept bit number and > exit code in opcode->flags. This way most unconditional intercepts > happen outside the instruction switch: generic code reads the intercept > bit, the intercept word (via a callback), if the bit is set, returns the > exit code. That should completely kill the diffstat. We only need to > be careful wrt the order of the intercept check and the other permission > checks. We have a lot of intercepts where this does not work. There is no 1-1 mapping between an opcode and an intercept. Some opcodes can result in multiple different intercepts (mov cr, mov dr), We can extend the group mechanism to make these separate opcodes. sometimes multiple intructions result in one intercept (rdmsr/wrmsr, in/out). The later ones even need special handling because the differences between the different instructions are encoded in the exit_info fields. So they get special treatment. Decode bits are for the general case. Let's see: CRx/DRx checks - need group mechanism extension, can use decode bits Selective CR0 - special LIDT/SIDT/LGDT/SGDT/LLDT/SLDT/LTR/STR - decode bits RDTSC/RDPMC/CPUID - decode bits PUSHF/POPF/RSM/IRET/INTn - decode bits, + flag to check before exceptions INVD /HLT/INVLPG/INVLPGA - decode bits PAUSE - special VMRUN/VMLOAD/VMSAVE/VMMCALL/STGI/CLGI/SKINIT - decode bits (VMMCALL preempts exceptions) RDTSCP/ICEBP/WBINVD/MONITOR/MWAIT - decode bits IOIO/MSR - very special Exception intercepts - outside emulator So the majority (by far) can be handled by decode bits. Selective CR0, IOIO, MSR, and PAUSE need special handling, can be done via callbacks into kvm (and into vendor specific code). These will be useful for nested vmx as well. Come to think of it, CR0, IOIO, and MSR already have callbacks into kvm. So all we need to do is add X86EMUL_INTERCEPTED to the callback (provided it's at the right place in terms of intercept/exception priority - haven't checked). All this would expose svm-internals like the vmcb structure into the generic code. I think hacking all this in the emulator itself also makes it more complex than it is today and the changes will likely break at some point when somone hacks on the emulator. And the situation will not get better when Nested-VMX gets merged and needs to do the same. We basically have two choices here: a) We expose svm internals into the emulator b) We expose emulator internals into svm Both choices are not really good from a software-design point-of-view. But I think option b) is the better one because it is easier to cope with and thus less likely to break when cha
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Thu, Nov 25, 2010 at 12:46:40PM +0100, Roedel, Joerg wrote: > We basically have two choices here: > > a) We expose svm internals into the emulator > b) We expose emulator internals into svm > > Both choices are not really good from a software-design point-of-view. > But I think option b) is the better one because it is easier to cope with > and thus less likely to break when changing the emulator code. What we could do probably is to define the interface between the emulator and the architecture code in a better way. This would take the burden of going into architecture code for emulator changes away. The current patch-set only needs a subset of the decode-cache (in the future probably also a subset of the fetch-cache). We could put this information into a seperate struct and give it to the architecture code. I planned to make the guest_mode flag a generic x86 vcpu property anyway, so building this structure could be limited to instructions emulated while the vcpu is in guest mode thus avoiding the overhead for the default case. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Wed, Nov 24, 2010 at 02:13:32PM -0500, Avi Kivity wrote: > On 11/24/2010 08:18 PM, Joerg Roedel wrote: > > Hi Avi, Hi Marcelo, > > > > here is a patch-set to make the instruction emulator aware of nested > > virtualization. It basically works by introducing a new callback into > > the x86_ops to check if a decoded instruction must be intercepted. If it > > is intercepted the instruction emulator returns straight into the guest. > > > > I am not entirely happy with this solution because it partially > > duplicates the code in the x86_emulate_insn function. > > My big worry is that it makes svm.c aware of internal emulator variable, > so it makes it harder to hack on the emulator. I don't think so, the structure of the code in svm.c follows the same structures (even in a simpler way) as in the x86_emulate_insn() function. Someone who changes the internal data structures of the emulator can easily change svm.c too. This person will even recognize the need for this automatically because svm.c will not compile anymore when the data structure is changed. On the other side, implementing this in the emulator itself would require a person to learn about very low-level svm internals to get everything right (or the changes easily break the code which is more likely). > So I don't think there's a problem with coding the svm intercepts in > emulate.c. This is no different than emulating any AMD-specific > instruction in the emulator - we're emulating an instruction in exactly > the way it is specified in the manual. That would make sense if the Nested-SVM code is implemented in the generic code so that it is usable from VMX too. But that is not the case and also not really doable. > Something you could do is allocate bits for the intercept bit number and > exit code in opcode->flags. This way most unconditional intercepts > happen outside the instruction switch: generic code reads the intercept > bit, the intercept word (via a callback), if the bit is set, returns the > exit code. That should completely kill the diffstat. We only need to > be careful wrt the order of the intercept check and the other permission > checks. We have a lot of intercepts where this does not work. There is no 1-1 mapping between an opcode and an intercept. Some opcodes can result in multiple different intercepts (mov cr, mov dr), sometimes multiple intructions result in one intercept (rdmsr/wrmsr, in/out). The later ones even need special handling because the differences between the different instructions are encoded in the exit_info fields. All this would expose svm-internals like the vmcb structure into the generic code. I think hacking all this in the emulator itself also makes it more complex than it is today and the changes will likely break at some point when somone hacks on the emulator. And the situation will not get better when Nested-VMX gets merged and needs to do the same. We basically have two choices here: a) We expose svm internals into the emulator b) We expose emulator internals into svm Both choices are not really good from a software-design point-of-view. But I think option b) is the better one because it is easier to cope with and thus less likely to break when changing the emulator code. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On 11/24/2010 08:18 PM, Joerg Roedel wrote: Hi Avi, Hi Marcelo, here is a patch-set to make the instruction emulator aware of nested virtualization. It basically works by introducing a new callback into the x86_ops to check if a decoded instruction must be intercepted. If it is intercepted the instruction emulator returns straight into the guest. I am not entirely happy with this solution because it partially duplicates the code in the x86_emulate_insn function. My big worry is that it makes svm.c aware of internal emulator variable, so it makes it harder to hack on the emulator. But there are so many SVM specific cases that need to be taken care of that I consider this solution the better one (even when looking at the diff-stat). Keeping this (SVM-specific) complexity in the SVM specific code is better than extending the generic instruction emulator code path. I don't think there's a problem with svm specific code in the emulator for this. My reasoning is that there are two classes of svm code: the common one is using svm to implement kvm, and the other one is emulating the svm instruction set. Most of the current svm code belongs to the first class, even the nested svm code. For example the code that emulates VMRUN is kvm-specific, while the code that decides whether to #GP on VMRUN or not is generic. So I don't think there's a problem with coding the svm intercepts in emulate.c. This is no different than emulating any AMD-specific instruction in the emulator - we're emulating an instruction in exactly the way it is specified in the manual. Something you could do is allocate bits for the intercept bit number and exit code in opcode->flags. This way most unconditional intercepts happen outside the instruction switch: generic code reads the intercept bit, the intercept word (via a callback), if the bit is set, returns the exit code. That should completely kill the diffstat. We only need to be careful wrt the order of the intercept check and the other permission checks. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html