Re: [PATCH 3/4] KVM: Add SMAP support when setting CR4
Il 28/03/2014 06:47, Zhang, Yang Z ha scritto: + smap = smap u !uf + !((kvm_x86_ops-get_cpl(vcpu) 3) + ((kvm_x86_ops-get_rflags(vcpu) + X86_EFLAGS_AC) == 1)); Unfortunately this doesn't work. The reason is that changing X86_EFLAGS_AC doesn't trigger update_permission_bitmask. So the value of CPL 3 AC = 1 must not be checked in update_permission_bitmask; instead, it must be included in the index into the permissions array. You can reuse the PFERR_RSVD_MASK bit, like smapf = pfec PFERR_RSVD_MASK; ... smap = smap smapf u !uf; The VCPU can then be passed to permission_fault in order to get the value of the CPL and the AC bit. Is CPL check needed? Shouldn't it already have been covered by U bit? It is not needed but actually it is covered by !uf, I think. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 71521] Host call trace when create guest.
https://bugzilla.kernel.org/show_bug.cgi?id=71521 Paolo Bonzini bonz...@gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |CODE_FIX -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
27.03.2014 20:14, Alejandro Comisario wrote: Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side (ubuntu 12.04 on host and guest). So, how can i adjust the tinmeout on the guest ? After a bit more talks on IRC yesterday, it turned out that the situation is _much_ more interesting than originally described. The OP claims to have 10500 guests running off an NFS server, and that after NFS server downtime, the backing files were disappeared (whatever it means), so they had to restore those files. More, the OP didn't even bother to look at the guest's dmesg, being busy rebooting all 10500 guests. This solution is the most logical one, but i cannot apply it! thanks for all the responses! I suggested the OP to actually describe the _real_ situation, instead of giving random half-pictures, and actually take a look at the actual problem as reported in various places (most importantly the guest kernel log), and reoirt _those_ hints to the list. I also mentioned that, at least for some NFS servers, if a client has a file open on the server, and this file is deleted, the server will report error to the client when client tries to access that file, and this has nothing at all to do with timeouts of any kind. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/4] KVM: Add SMAP support when setting CR4
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Paolo Bonzini Sent: Friday, March 28, 2014 2:23 PM To: Zhang, Yang Z; Wu, Feng; g...@redhat.com; h...@zytor.com; kvm@vger.kernel.org Subject: Re: [PATCH 3/4] KVM: Add SMAP support when setting CR4 Il 28/03/2014 06:47, Zhang, Yang Z ha scritto: + smap = smap u !uf + !((kvm_x86_ops-get_cpl(vcpu) 3) + ((kvm_x86_ops-get_rflags(vcpu) + X86_EFLAGS_AC) == 1)); Unfortunately this doesn't work. The reason is that changing X86_EFLAGS_AC doesn't trigger update_permission_bitmask. So the value of CPL 3 AC = 1 must not be checked in update_permission_bitmask; instead, it must be included in the index into the permissions array. You can reuse the PFERR_RSVD_MASK bit, like smapf = pfec PFERR_RSVD_MASK; ... smap = smap smapf u !uf; The VCPU can then be passed to permission_fault in order to get the value of the CPL and the AC bit. Is CPL check needed? Shouldn't it already have been covered by U bit? It is not needed but actually it is covered by !uf, I think. In my understanding it is needed, from Intel SDM: Every access to a linear address is either a supervisor-mode access or a user-mode access. All accesses performed while the current privilege level (CPL) is less than 3 are supervisor-mode accesses. If CPL = 3, accesses are generally user-mode accesses. However, some operations implicitly access system data structures, and the resulting accesses to those data structures are supervisor-mode accesses regardless of CPL. Examples of such implicit supervisor accesses include the following: accesses to the global descriptor table (GDT) or local descriptor table (LDT) to load a segment descriptor; accesses to the interrupt descriptor table (IDT) when delivering an interrupt or exception; and accesses to the task-state segment (TSS) as part of a task switch or change of CPL. From the above SDM, we can see supervisor-mode access can also happen when CPL equals 3. If CPL 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, SMAP applies to all supervisor-mode data accesses (these are implicit supervisor accesses) regardless of the value of EFLAGS.AC. So when we check the value of EFLAGS.AC, we also need to check CPL, since AC bit only takes effect when CPL3. U==1 means user-mode access are allowed, while !uf means it is a fault from Supervisor-mode access, I think both *u* and *uf* cannot reflect the value of CPL. Correct me if I am wrong. Thanks a lot! Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks, Feng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
On Fri, Mar 28, 2014 at 11:01:00AM +0400, Michael Tokarev wrote: 27.03.2014 20:14, Alejandro Comisario wrote: Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side (ubuntu 12.04 on host and guest). So, how can i adjust the tinmeout on the guest ? After a bit more talks on IRC yesterday, it turned out that the situation is _much_ more interesting than originally described. The OP claims to have 10500 guests running off an NFS server, and that after NFS server downtime, the backing files were disappeared (whatever it means), so they had to restore those files. More, the OP didn't even bother to look at the guest's dmesg, being busy rebooting all 10500 guests. This solution is the most logical one, but i cannot apply it! thanks for all the responses! I suggested the OP to actually describe the _real_ situation, instead of giving random half-pictures, and actually take a look at the actual problem as reported in various places (most importantly the guest kernel log), and reoirt _those_ hints to the list. I also mentioned that, at least for some NFS servers, if a client has a file open on the server, and this file is deleted, the server will report error to the client when client tries to access that file, and this has nothing at all to do with timeouts of any kind. Thanks for the update and for taking time to help on IRC. I feel you're being harsh on Alejandro though. Improving the quality of bug reports is important but it shouldn't be at the expense of quality of communication. We can't assume that everyone is an expert in troubleshooting KVM or Linux. Therefore we can't blame them, which will only drive people away and detract from the community. TL;DR post logs and error messages +1, berate him -1 Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/4] KVM: Add SMAP support when setting CR4
-Original Message- From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Sent: Thursday, March 27, 2014 7:47 PM To: Wu, Feng; g...@redhat.com; h...@zytor.com; kvm@vger.kernel.org Subject: Re: [PATCH 3/4] KVM: Add SMAP support when setting CR4 Il 27/03/2014 13:25, Feng Wu ha scritto: +void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, bool ept) { unsigned bit, byte, pfec; u8 map; - bool fault, x, w, u, wf, uf, ff, smep; + bool fault, x, w, u, wf, uf, ff, smep, smap; smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP); + smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); for (byte = 0; byte ARRAY_SIZE(mmu-permissions); ++byte) { pfec = byte 1; map = 0; @@ -3617,11 +3618,26 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, w |= !is_write_protection(vcpu) !uf; /* Disallow supervisor fetches of user code if cr4.smep */ x = !(smep u !uf); + + /* +* SMAP:kernel-mode data accesses from user-mode +* mappings should fault. A fault is considered +* as a SMAP violation if all of the following +* conditions are ture: +* - X86_CR4_SMAP is set in CR4 +* - An user page is accessed +* - !(CPL3 X86_EFLAGS_AC is set) +* - Page fault in kernel mode +*/ + smap = smap u !uf + !((kvm_x86_ops-get_cpl(vcpu) 3) + ((kvm_x86_ops-get_rflags(vcpu) + X86_EFLAGS_AC) == 1)); Unfortunately this doesn't work. The reason is that changing X86_EFLAGS_AC doesn't trigger update_permission_bitmask. So the value of CPL 3 AC = 1 must not be checked in update_permission_bitmask; instead, it must be included in the index into the permissions array. You can reuse the PFERR_RSVD_MASK bit, like smapf = pfec PFERR_RSVD_MASK; ... smap = smap smapf u !uf; The VCPU can then be passed to permission_fault in order to get the value of the CPL and the AC bit. Please test nested virtualization too. I think PFERR_RSVD_MASK should be removed in translate_nested_gpa. Thanks very much for your comments, I changed the code according to it and the patch v2 will be sent out for a review. Since setting up the environment for nested virtualization is a little time consuming and it is in progress now, could you please have a look at it to see if it is what you expected first? Any comments are appreciated! Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMAP needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/vmx.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3927528..e58cb5f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3452,13 +3452,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) hw_cr4 = ~X86_CR4_PAE; hw_cr4 |= X86_CR4_PSE; /* -* SMEP is disabled if CPU is in non-paging mode in -* hardware. However KVM always uses paging mode to +* SMEP/SMAP is disabled if CPU is in non-paging mode +* in hardware. However KVM always uses paging mode to * emulate guest non-paging mode with TDP. -* To emulate this behavior, SMEP needs to be manually -* disabled when guest switches to non-paging mode. +* To emulate this behavior, SMEP/SMAP needs to be +* manually disabled when guest switches to non-paging +* mode. */ - hw_cr4 = ~X86_CR4_SMEP; + hw_cr4 = ~(X86_CR4_SMEP | X86_CR4_SMAP); } else if (!(cr4 X86_CR4_PAE)) { hw_cr4 = ~X86_CR4_PAE; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: Remove SMAP bit from CR4_RESERVED_BITS.
This patch removes SMAP bit from CR4_RESERVED_BITS. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fdf83af..4eeb049 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -60,7 +60,7 @@ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ - | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) + | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE | X86_CR4_SMAP)) #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] KVM: enable Intel SMAP for KVM
Supervisor Mode Access Prevention (SMAP) is a new security feature disclosed by Intel, please refer to the following document: http://software.intel.com/sites/default/files/319433-014.pdf Every access to a linear address is either a supervisor-mode access or a user-mode access. All accesses performed while the current privilege level (CPL) is less than 3 are supervisor-mode accesses. If CPL = 3, accesses are generally user-mode accesses. However, some operations implicitly access system data structures, and the resulting accesses to those data structures are supervisor-mode accesses regardless of CPL. Examples of such implicit supervisor accesses include the following: accesses to the global descriptor table (GDT) or local descriptor table (LDT) to load a segment descriptor; accesses to the interrupt descriptor table (IDT) when delivering an interrupt or exception; and accesses to the task-state segment (TSS) as part of a task switch or change of CPL. If CR4.SMAP = 1, supervisor-mode data accesses are not allowed to linear addresses that are accessible in user mode. If CPL 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, SMAP applies to all supervisor-mode data accesses (these are implicit supervisor accesses) regardless of the value of EFLAGS.AC. This patchset pass-through SMAP feature to guests, and let guests benefit from it. Version 1: * Remove SMAP bit from CR4_RESERVED_BITS. * Add SMAP support when setting CR4 * Disable SMAP for guests in EPT realmode and EPT unpaging mode * Expose SMAP feature to guest Version 1: * Change the logic of updatinng mmu permission bitmap for SMAP violation * Expose SMAP feature to guest in the last patch of this series. Feng Wu (4): KVM: Remove SMAP bit from CR4_RESERVED_BITS. KVM: Add SMAP support when setting CR4 KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode KVM: expose SMAP feature to guest arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/cpuid.h| 8 arch/x86/kvm/mmu.c | 24 +--- arch/x86/kvm/mmu.h | 26 +++--- arch/x86/kvm/paging_tmpl.h | 2 +- arch/x86/kvm/vmx.c | 11 ++- arch/x86/kvm/x86.c | 9 - 8 files changed, 69 insertions(+), 15 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c697625..deb5f9b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -303,7 +303,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.ebx */ const u32 kvm_supported_word9_x86_features = F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | - F(BMI2) | F(ERMS) | f_invpcid | F(RTM); + F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | F(SMAP); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/cpuid.h | 8 arch/x86/kvm/mmu.c | 24 +--- arch/x86/kvm/mmu.h | 26 +++--- arch/x86/kvm/paging_tmpl.h | 2 +- arch/x86/kvm/x86.c | 9 - 5 files changed, 61 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index a2a1bb7..eeecbed 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -48,6 +48,14 @@ static inline bool guest_cpuid_has_smep(struct kvm_vcpu *vcpu) return best (best-ebx bit(X86_FEATURE_SMEP)); } +static inline bool guest_cpuid_has_smap(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 7, 0); + return best (best-ebx bit(X86_FEATURE_SMAP)); +} + static inline bool guest_cpuid_has_fsgsbase(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9b53135..83b7f8d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3601,20 +3601,22 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu, } } -static void update_permission_bitmask(struct kvm_vcpu *vcpu, +void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, bool ept) { unsigned bit, byte, pfec; u8 map; - bool fault, x, w, u, wf, uf, ff, smep; + bool fault, x, w, u, wf, uf, ff, smapf, smep, smap; smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP); + smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); for (byte = 0; byte ARRAY_SIZE(mmu-permissions); ++byte) { pfec = byte 1; map = 0; wf = pfec PFERR_WRITE_MASK; uf = pfec PFERR_USER_MASK; ff = pfec PFERR_FETCH_MASK; + smapf = pfec PFERR_RSVD_MASK; for (bit = 0; bit 8; ++bit) { x = bit ACC_EXEC_MASK; w = bit ACC_WRITE_MASK; @@ -3627,11 +3629,27 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, w |= !is_write_protection(vcpu) !uf; /* Disallow supervisor fetches of user code if cr4.smep */ x = !(smep u !uf); + + /* +* SMAP:kernel-mode data accesses from user-mode +* mappings should fault. A fault is considered +* as a SMAP violation if all of the following +* conditions are ture: +* - X86_CR4_SMAP is set in CR4 +* - An user page is accessed +* - Page fault in kernel mode +* - !(CPL3 X86_EFLAGS_AC is set) +* +* Here, we cover the first three conditions, +* we need to check CPL and X86_EFLAGS_AC in +* permission_fault() dynamiccally +*/ + smap = smap smapf u !uf; } else /* Not really needed: no U/S accesses on ept */ u = 1; - fault = (ff !x) || (uf !u) || (wf !w); + fault = (ff !x) || (uf !u) || (wf !w) || smap; map |= fault bit; } mmu-permissions[byte] = map; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 2926152..9d7a0b3 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -73,6 +73,8 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct); void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context); void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context, bool execonly); +void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + bool ept); static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm) { @@ -110,10 +112,28 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu) * Will a fault with a given page-fault error code (pfec) cause a permission * fault with the given access (in ACC_* format)? */ -static inline bool permission_fault(struct kvm_mmu *mmu, unsigned pte_access, - unsigned pfec) +static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + unsigned pte_access, unsigned pfec) { - return (mmu-permissions[pfec 1] pte_access) 1; + bool smapf; + int cpl =
Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make TM avoid program check
On Fri, Mar 28, 2014 at 04:40:36PM +1100, Michael Neuling wrote: Currently using kvmppc_set_one_reg() a transaction could be setup without TEXASR Failure Summary (FS) not set. When this is switched back in by the host, this will result in a TM Bad Thing (ie 0x700 program check) when the trechkpt is run. This avoids this by always setting the TEXASR FS when there is an active transaction being started. This patch is on top of Paulus' recent KVM TM patch set. Thanks, Mikey. Do you mind if I fold these two patches into patch 2/8 of the set I posted? Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Add SMAP support when setting CR4
Il 28/03/2014 18:36, Feng Wu ha scritto: + smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); You are overwriting this variable below, but that is not okay because the value of CR4 must be considered separately in each iteration. This also hides a uninitialized-variable bug for smap correctly in the EPT case. To avoid that, rename this variable to cr4_smap; it's probably better to rename smep to cr4_smep too. for (byte = 0; byte ARRAY_SIZE(mmu-permissions); ++byte) { pfec = byte 1; map = 0; wf = pfec PFERR_WRITE_MASK; uf = pfec PFERR_USER_MASK; ff = pfec PFERR_FETCH_MASK; + smapf = pfec PFERR_RSVD_MASK; The reader will expect PFERR_RSVD_MASK to be zero here. So please add a comment: /* PFERR_RSVD_MASK is set in pfec if ... */. for (bit = 0; bit 8; ++bit) { x = bit ACC_EXEC_MASK; w = bit ACC_WRITE_MASK; @@ -3627,11 +3629,27 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, w |= !is_write_protection(vcpu) !uf; /* Disallow supervisor fetches of user code if cr4.smep */ x = !(smep u !uf); + + /* + * SMAP:kernel-mode data accesses from user-mode + * mappings should fault. A fault is considered + * as a SMAP violation if all of the following + * conditions are ture: + * - X86_CR4_SMAP is set in CR4 + * - An user page is accessed + * - Page fault in kernel mode + * - !(CPL3 X86_EFLAGS_AC is set) + * + * Here, we cover the first three conditions, + * we need to check CPL and X86_EFLAGS_AC in + * permission_fault() dynamiccally dynamically. These three lines however are not entirely correct. We do cover the last condition here, it is in smapf. So perhaps something like * Here, we cover the first three conditions. * The CPL and X86_EFLAGS_AC is in smapf, which * permission_fault() computes dynamically. + */ + smap = smap smapf u !uf; SMAP does not affect instruction fetches. Do you need !ff here? Perhaps it's clearer to add it even if it is not strictly necessary. Please write just smap = cr4_smap u !uf !ff here, and add back smapf below in the assignment to fault. This makes the code more homogeneous. } else /* Not really needed: no U/S accesses on ept */ u = 1; - fault = (ff !x) || (uf !u) || (wf !w); + fault = (ff !x) || (uf !u) || (wf !w) || smap; ... + + /* + * If CPL 3, SMAP protections are disabled if EFLAGS.AC = 1. + * + * If CPL = 3, SMAP applies to all supervisor-mode data accesses + * (these are implicit supervisor accesses) regardless of the value + * of EFLAGS.AC. + * + * So we need to check CPL and EFLAGS.AC to detect whether there is + * a SMAP violation. + */ + + smapf = ((mmu-permissions[(pfec|PFERR_RSVD_MASK) 1] pte_access) + 1) !((cpl 3) ((rflags X86_EFLAGS_AC) == 1)); + + return ((mmu-permissions[pfec 1] pte_access) 1) || smapf; You do not need two separate accesses. Just add PFERR_RSVD_MASK to pfec if the conditions for SMAP are satisfied. There are two possibilities: 1) setting PFERR_RSVD_MASK if SMAP is being enforced, that is if CPL = 3 || AC = 0. This is what you are doing now. 2) setting PFERR_RSVD_MASK if SMAP is being overridden, that is if CPL 3 AC = 1. You then have to invert the bit in update_permission_bitmask. Please consider both choices, and pick the one that gives better code. Also, this must be written in a branchless way. Branchless tricks are common throughout the MMU code because the hit rate of most branches is pretty much 50%-50%. This is also true in this case, at least if SMAP is in use (if it is not in use, we'll have AC=0 most of the time). I don't want to spoil the fun, but I don't want to waste your time either so I rot13'ed my solution and placed it after the signature. ;) As to nested virtualization, I reread the code and it should already work, because it sets PFERR_USER_MASK. This means uf=1, and a SMAP fault will never trigger with uf=1. Thanks for following this! Please include v3 in the patch subject on your next post! Paolo - 8 -- Nqq qrsvavgvbaf sbe CSREE_*_OVG (0 sbe cerfrag, 1
Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make TM avoid program check
Il 28/03/2014 12:08, Paul Mackerras ha scritto: Currently using kvmppc_set_one_reg() a transaction could be setup without TEXASR Failure Summary (FS) not set. When this is switched back in by the host, this will result in a TM Bad Thing (ie 0x700 program check) when the trechkpt is run. This avoids this by always setting the TEXASR FS when there is an active transaction being started. This patch is on top of Paulus' recent KVM TM patch set. Thanks, Mikey. Do you mind if I fold these two patches into patch 2/8 of the set I posted? In either case, am I right that this doesn't include the patches in kvm-ppc-queue? I'm waiting for the pull request. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
Il 27/03/2014 12:30, Paolo Bonzini ha scritto: Despite the provisions to emulate up to 130 consecutive instructions, in practice KVM will emulate just one before exiting handle_invalid_guest_state, because x86_emulate_instructionn always sets KVM_REQ_EVENT. However, we only need to do this if an interrupt could be injected, which happens a) if an interrupt shadow bit (STI or MOV SS) has gone away; b) if the interrupt flag has just been set (because other instructions than STI can set it without enabling an interrupt shadow). This cuts another 250-300 clock cycles from the cost of emulating an instruction (530-870 cycles before the patch on kvm-unit-tests, 290-600 afterwards). Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/x86.c | 28 ++-- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fd31aada351b..ce9523345f2e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE); static void update_cr8_intercept(struct kvm_vcpu *vcpu); static void process_nmi(struct kvm_vcpu *vcpu); +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); struct kvm_x86_ops *kvm_x86_ops; EXPORT_SYMBOL_GPL(kvm_x86_ops); @@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) * means that the last instruction is an sti. We should not * leave the flag on in this case. The same goes for mov ss */ - if (!(int_shadow mask)) + if (unlikely(int_shadow) !(int_shadow mask)) { kvm_x86_ops-set_interrupt_shadow(vcpu, mask); + kvm_make_request(KVM_REQ_EVENT, vcpu); + } Better: * means that the last instruction is an sti. We should not * leave the flag on in this case. The same goes for mov ss */ - if (!(int_shadow mask)) + mask = ~int_shadow; + if (unlikely(mask != int_shadow)) kvm_x86_ops-set_interrupt_shadow(vcpu, mask); + + /* +* The interrupt window might have opened if a bit has been cleared. +*/ + if (unlikely(int_shadow ~mask)) + kvm_make_request(KVM_REQ_EVENT, vcpu); Paolo } static void inject_emulated_exception(struct kvm_vcpu *vcpu) @@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7, return dr6; } -static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r) +static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r) { struct kvm_run *kvm_run = vcpu-run; /* - * Use the raw value to see if TF was passed to the processor. - * Note that the new value of the flags has not been saved yet. + * rflags is the old, raw value of the flags. The new value has + * not been saved yet. * * This is correct even for TF set by the guest, because the * processor will not generate this exception after the instruction * that sets the TF flag. */ - unsigned long rflags = kvm_x86_ops-get_rflags(vcpu); - if (unlikely(rflags X86_EFLAGS_TF)) { if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) { kvm_run-debug.arch.dr6 = DR6_BS | DR6_FIXED_1; @@ -5263,13 +5264,15 @@ restart: r = EMULATE_DONE; if (writeback) { + unsigned long rflags = kvm_x86_ops-get_rflags(vcpu); toggle_interruptibility(vcpu, ctxt-interruptibility); - kvm_make_request(KVM_REQ_EVENT, vcpu); vcpu-arch.emulate_regs_need_sync_to_vcpu = false; kvm_rip_write(vcpu, ctxt-eip); if (r == EMULATE_DONE) - kvm_vcpu_check_singlestep(vcpu, r); - kvm_set_rflags(vcpu, ctxt-eflags); + kvm_vcpu_check_singlestep(vcpu, rflags, r); + __kvm_set_rflags(vcpu, ctxt-eflags); + if (unlikely((ctxt-eflags ~rflags) X86_EFLAGS_IF)) + kvm_make_request(KVM_REQ_EVENT, vcpu); } else vcpu-arch.emulate_regs_need_sync_to_vcpu = true; @@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_get_rflags); -void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) { if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP kvm_is_linear_rip(vcpu, vcpu-arch.singlestep_rip)) rflags |= X86_EFLAGS_TF; kvm_x86_ops-set_rflags(vcpu, rflags); +} + +void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + __kvm_set_rflags(vcpu, rflags); kvm_make_request(KVM_REQ_EVENT, vcpu); } EXPORT_SYMBOL_GPL(kvm_set_rflags); -- To
[PATCH] arm: KVM: fix possible misalignment of PGDs and bounce page
The kvm/mmu code shared by arm and arm64 uses kalloc() to allocate a bounce page (if hypervisor init code crosses page boundary) and hypervisor PGDs. The problem is that kalloc() does not guarantee the proper alignment. In the case of the bounce page, the page sized buffer allocated may also cross a page boundary negating the purpose and leading to a hang during kvm initialization. Likewise the PGDs allocated may not meet the minimum alignment requirements of the underlying MMU. This patch uses __get_free_page() to guarantee the worst case alignment needs of the bounce page and PGDs on both arm and arm64. Signed-off-by: Mark Salter msal...@redhat.com --- arch/arm/kvm/mmu.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 7789857..575d790 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -42,6 +42,8 @@ static unsigned long hyp_idmap_start; static unsigned long hyp_idmap_end; static phys_addr_t hyp_idmap_vector; +#define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) + #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x)) static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) @@ -199,14 +201,14 @@ void free_boot_hyp_pgd(void) if (boot_hyp_pgd) { unmap_range(NULL, boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE); unmap_range(NULL, boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE); - kfree(boot_hyp_pgd); + free_pages((unsigned long)boot_hyp_pgd, pgd_order); boot_hyp_pgd = NULL; } if (hyp_pgd) unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE); - kfree(init_bounce_page); + free_page((unsigned long)init_bounce_page); init_bounce_page = NULL; mutex_unlock(kvm_hyp_pgd_mutex); @@ -236,7 +238,7 @@ void free_hyp_pgds(void) for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE) unmap_range(NULL, hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE); - kfree(hyp_pgd); + free_pages((unsigned long)hyp_pgd, pgd_order); hyp_pgd = NULL; } @@ -930,7 +932,7 @@ int kvm_mmu_init(void) size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start; phys_addr_t phys_base; - init_bounce_page = kmalloc(PAGE_SIZE, GFP_KERNEL); + init_bounce_page = (void *)__get_free_page(GFP_KERNEL); if (!init_bounce_page) { kvm_err(Couldn't allocate HYP init bounce page\n); err = -ENOMEM; @@ -956,8 +958,9 @@ int kvm_mmu_init(void) (unsigned long)phys_base); } - hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL); - boot_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL); + hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, pgd_order); + boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, pgd_order); + if (!hyp_pgd || !boot_hyp_pgd) { kvm_err(Hyp mode PGD not allocated\n); err = -ENOMEM; -- 1.8.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
huge 2nd stage pages and live migration
Hello I've been working on live migration for ARM-KVM, and noticed problem completing migration with huge 2nd stage tables. Aafter write protecting the VM, for write fault 512 page bits are set in dirty_bitmap[] to take into account future writes to huge page.The pmd is write protected again when QEMU reads the dirty log, and the cycle repeats. With this not even a idle 32MB VM completes live migration. If QEMU uses THPs, and 2nd stage tables use pte's, then there is no problem, live migration is quick. I'm assumung QEMU and Guest huge pages with 2nd stage page table pte's should work fine too. I'm wondering how this has been solved (for any architecture)? - Mario -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: Add SMAP support when setting CR4
Il 28/03/2014 08:33, Wu, Feng ha scritto: In my understanding it is needed, from Intel SDM: Every access to a linear address is either a supervisor-mode access or a user-mode access. All accesses performed while the current privilege level (CPL) is less than 3 are supervisor-mode accesses. If CPL = 3, accesses are generally user-mode accesses. However, some operations implicitly access system data structures, and the resulting accesses to those data structures are supervisor-mode accesses regardless of CPL. Examples of such implicit supervisor accesses include the following: accesses to the global descriptor table (GDT) or local descriptor table (LDT) to load a segment descriptor; accesses to the interrupt descriptor table (IDT) when delivering an interrupt or exception; and accesses to the task-state segment (TSS) as part of a task switch or change of CPL. From the above SDM, we can see supervisor-mode access can also happen when CPL equals 3. If CPL 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, SMAP applies to all supervisor-mode data accesses (these are implicit supervisor accesses) regardless of the value of EFLAGS.AC. So when we check the value of EFLAGS.AC, we also need to check CPL, since AC bit only takes effect when CPL3. U==1 means user-mode access are allowed, while !uf means it is a fault from Supervisor-mode access, I think both *u* and *uf* cannot reflect the value of CPL. Correct me if I am wrong. Thanks a lot! You're right! Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/4] KVM: Add SMAP support when setting CR4
-Original Message- From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Sent: Friday, March 28, 2014 8:03 PM To: Wu, Feng; g...@redhat.com; h...@zytor.com; kvm@vger.kernel.org Subject: Re: [PATCH 2/4] KVM: Add SMAP support when setting CR4 Il 28/03/2014 18:36, Feng Wu ha scritto: + smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); You are overwriting this variable below, but that is not okay because the value of CR4 must be considered separately in each iteration. This also hides a uninitialized-variable bug for smap correctly in the EPT case. To avoid that, rename this variable to cr4_smap; it's probably better to rename smep to cr4_smep too. for (byte = 0; byte ARRAY_SIZE(mmu-permissions); ++byte) { pfec = byte 1; map = 0; wf = pfec PFERR_WRITE_MASK; uf = pfec PFERR_USER_MASK; ff = pfec PFERR_FETCH_MASK; + smapf = pfec PFERR_RSVD_MASK; The reader will expect PFERR_RSVD_MASK to be zero here. So please add a comment: /* PFERR_RSVD_MASK is set in pfec if ... */. for (bit = 0; bit 8; ++bit) { x = bit ACC_EXEC_MASK; w = bit ACC_WRITE_MASK; @@ -3627,11 +3629,27 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, w |= !is_write_protection(vcpu) !uf; /* Disallow supervisor fetches of user code if cr4.smep */ x = !(smep u !uf); + + /* +* SMAP:kernel-mode data accesses from user-mode +* mappings should fault. A fault is considered +* as a SMAP violation if all of the following +* conditions are ture: +* - X86_CR4_SMAP is set in CR4 +* - An user page is accessed +* - Page fault in kernel mode +* - !(CPL3 X86_EFLAGS_AC is set) +* +* Here, we cover the first three conditions, +* we need to check CPL and X86_EFLAGS_AC in +* permission_fault() dynamiccally dynamically. These three lines however are not entirely correct. We do cover the last condition here, it is in smapf. So perhaps something like * Here, we cover the first three conditions. * The CPL and X86_EFLAGS_AC is in smapf, which * permission_fault() computes dynamically. +*/ + smap = smap smapf u !uf; SMAP does not affect instruction fetches. Do you need !ff here? Perhaps it's clearer to add it even if it is not strictly necessary. Please write just smap = cr4_smap u !uf !ff here, and add back smapf below in the assignment to fault. This makes the code more homogeneous. } else /* Not really needed: no U/S accesses on ept */ u = 1; - fault = (ff !x) || (uf !u) || (wf !w); + fault = (ff !x) || (uf !u) || (wf !w) || smap; ... + + /* +* If CPL 3, SMAP protections are disabled if EFLAGS.AC = 1. +* +* If CPL = 3, SMAP applies to all supervisor-mode data accesses +* (these are implicit supervisor accesses) regardless of the value +* of EFLAGS.AC. +* +* So we need to check CPL and EFLAGS.AC to detect whether there is +* a SMAP violation. +*/ + + smapf = ((mmu-permissions[(pfec|PFERR_RSVD_MASK) 1] pte_access) +1) !((cpl 3) ((rflags X86_EFLAGS_AC) == 1)); + + return ((mmu-permissions[pfec 1] pte_access) 1) || smapf; You do not need two separate accesses. Just add PFERR_RSVD_MASK to pfec if the conditions for SMAP are satisfied. There are two possibilities: 1) setting PFERR_RSVD_MASK if SMAP is being enforced, that is if CPL = 3 || AC = 0. This is what you are doing now. 2) setting PFERR_RSVD_MASK if SMAP is being overridden, that is if CPL 3 AC = 1. You then have to invert the bit in update_permission_bitmask. Please consider both choices, and pick the one that gives better code. Also, this must be written in a branchless way. Branchless tricks are common throughout the MMU code because the hit rate of most branches is pretty much 50%-50%. This is also true in this case, at least if SMAP is in use (if it is not in use, we'll have AC=0 most of the time). I don't want to spoil the fun, but I don't want to waste your time either so I rot13'ed my solution and placed it after the signature. ;) As to nested virtualization, I reread the code and it should already work, because it sets
[RFC v2] ARM VM System Specification
ARM VM System Specification === Goal The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen. Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic. Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, this spec also covers the aarch32 execution mode, not covered in the SBSA. Image format The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations. The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6]. The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6]. The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state as specified in Section 3.3 (3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) in [6]. This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image. A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux. Virtual Firmware The VM system must be UEFI compliant in order to be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later. This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades. This document strongly recommends that VM implementations implement persistent variable storage for their UEFI implementation. Persistent variable storage shall be a property of a VM instance, but shall not be stored as part of a portable disk image. Portable disk images shall conform to the UEFI removable disk requirements from the UEFI spec and cannot rely on on a pre-configured UEFI environment. The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation. Hardware Description The VM system must be UEFI compliant and therefore the UEFI system table will provide a means to access hardware description data. The VM implementation must provide through its UEFI implementation: a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree. For more information about UEFI booting, see [4] and [5]. VM Platform --- The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT) or a bus-specific mechanism such as PCI. If aarch64 physical CPUs implement support for the aarch32 execution state in EL1 and EL0 execution, it is recommended that the VM implementation supports booting the VM at EL1 in both aarch32 and aarch64 execution states. The virtual hardware platform must provide a number of mandatory peripherals: Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console. An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation. The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1]. It is strongly recommended that the VM implementation provides a hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus. For the VM image to be compliant with this spec, the following applies for the guest OS in the VM image: The guest OS must include support for pl011 UART, virtio-console, and the Xen PV console. The guest OS must include support for GICv2 and any available newer version of the GIC architecture
Re: [RFC PATCH 1/4] ARM: KVM: on unhandled IO mem abort, route the call to the KVM MMIO bus
On Thu, Mar 13, 2014 at 04:57:26PM +0100, Antonios Motakis wrote: On an unhandled IO memory abort, use the kvm_io_bus_* API in order to handle the MMIO access through any registered read/write callbacks. This is a dependency for eventfd support (ioeventfd and irqfd). However, accesses to the VGIC are still left implemented independently, since the kvm_io_bus_* API doesn't pass the VCPU pointer doing the access. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/mmio.c | 32 virt/kvm/arm/vgic.c | 5 - 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 4cb5a93..1d17831 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -162,6 +162,35 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return 0; } +/** + * kvm_handle_mmio - handle an in-kernel MMIO access + * @vcpu:pointer to the vcpu performing the access + * @run: pointer to the kvm_run structure + * @mmio:pointer to the data describing the access + * + * returns true if the MMIO access has been performed in kernel space, + * and false if it needs to be emulated in user space. + */ +static bool handle_kernel_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_exit_mmio *mmio) +{ + int ret; + if (mmio-is_write) { + ret = kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, mmio-phys_addr, + mmio-len, mmio-data); + + } else { + ret = kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, mmio-phys_addr, + mmio-len, mmio-data); + } + if (!ret) { + kvm_prepare_mmio(run, mmio); + kvm_handle_mmio_return(vcpu, run); + } + + return !ret; +} + int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, phys_addr_t fault_ipa) { @@ -200,6 +229,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (vgic_handle_mmio(vcpu, run, mmio)) return 1; + if (handle_kernel_mmio(vcpu, run, mmio)) + return 1; + this special-casing of the vgic is now really terrible. Is there anything holding you back from doing the necessary restructure of the kvm_bus_io_*() API instead? That would allow us to get rid of the ugly Fix it! in the vgic driver as well. -Christoffer kvm_prepare_mmio(run, mmio); return 0; } diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 8ca405c..afdecc3 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -849,13 +849,16 @@ struct mmio_range *find_matching_range(const struct mmio_range *ranges, } /** - * vgic_handle_mmio - handle an in-kernel MMIO access + * vgic_handle_mmio - handle an in-kernel vgic MMIO access * @vcpu:pointer to the vcpu performing the access * @run: pointer to the kvm_run structure * @mmio:pointer to the data describing the access * * returns true if the MMIO access has been performed in kernel space, * and false if it needs to be emulated in user space. + * + * This is handled outside of kvm_handle_mmio because the kvm_io_bus only + * passes the VM pointer, while we need the VCPU performing the access. */ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: huge 2nd stage pages and live migration
On Fri, Mar 28, 2014 at 10:39:25AM -0700, Mario Smarduch wrote: Hello I've been working on live migration for ARM-KVM, and noticed problem completing migration with huge 2nd stage tables. Aafter write protecting the VM, for write fault 512 page bits are set in dirty_bitmap[] to take into account future writes to huge page.The pmd is write protected again when QEMU reads the dirty log, and the cycle repeats. With this not even a idle 32MB VM completes live migration. If QEMU uses THPs, and 2nd stage tables use pte's, then there is no problem, live migration is quick. I'm assumung QEMU and Guest huge pages with 2nd stage page table pte's should work fine too. I'm wondering how this has been solved (for any architecture)? I don't know if there's a generic solution (have you looked at x86/PPC what they do?), but I don't see any conceptual problem with putting the VM into use-pte-for-2nd-stage-mappings-I'm-migrating-mode. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm: KVM: fix possible misalignment of PGDs and bounce page
On Fri, Mar 28, 2014 at 10:25:19AM -0400, Mark Salter wrote: The kvm/mmu code shared by arm and arm64 uses kalloc() to allocate a bounce page (if hypervisor init code crosses page boundary) and hypervisor PGDs. The problem is that kalloc() does not guarantee the proper alignment. In the case of the bounce page, the page sized buffer allocated may also cross a page boundary negating the purpose and leading to a hang during kvm initialization. Likewise the PGDs allocated may not meet the minimum alignment requirements of the underlying MMU. This patch uses __get_free_page() to guarantee the worst case alignment needs of the bounce page and PGDs on both arm and arm64. Signed-off-by: Mark Salter msal...@redhat.com --- arch/arm/kvm/mmu.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 7789857..575d790 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -42,6 +42,8 @@ static unsigned long hyp_idmap_start; static unsigned long hyp_idmap_end; static phys_addr_t hyp_idmap_vector; +#define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) + #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x)) static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) @@ -199,14 +201,14 @@ void free_boot_hyp_pgd(void) if (boot_hyp_pgd) { unmap_range(NULL, boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE); unmap_range(NULL, boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE); - kfree(boot_hyp_pgd); + free_pages((unsigned long)boot_hyp_pgd, pgd_order); boot_hyp_pgd = NULL; } if (hyp_pgd) unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE); - kfree(init_bounce_page); + free_page((unsigned long)init_bounce_page); init_bounce_page = NULL; mutex_unlock(kvm_hyp_pgd_mutex); @@ -236,7 +238,7 @@ void free_hyp_pgds(void) for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE) unmap_range(NULL, hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE); - kfree(hyp_pgd); + free_pages((unsigned long)hyp_pgd, pgd_order); hyp_pgd = NULL; } @@ -930,7 +932,7 @@ int kvm_mmu_init(void) size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start; phys_addr_t phys_base; - init_bounce_page = kmalloc(PAGE_SIZE, GFP_KERNEL); + init_bounce_page = (void *)__get_free_page(GFP_KERNEL); if (!init_bounce_page) { kvm_err(Couldn't allocate HYP init bounce page\n); err = -ENOMEM; @@ -956,8 +958,9 @@ int kvm_mmu_init(void) (unsigned long)phys_base); } - hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL); - boot_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL); + hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, pgd_order); + boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, pgd_order); + if (!hyp_pgd || !boot_hyp_pgd) { kvm_err(Hyp mode PGD not allocated\n); err = -ENOMEM; -- 1.8.5.3 This looks right to me. Funnily enough I seem to remember a discussion from when we originally merged this code where someone (maybe me) argued that kmalloc() would align to the size of the allocation, but I don't see anything backing this up at this point. So: Acked-by: Christoffer Dall christoffer.d...@linaro.org If Marc agrees I can queue this for -rc1. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net] vhost: fix total length when packets are too short
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 27 Mar 2014 12:00:26 +0200 When mergeable buffers are disabled, and the incoming packet is too large for the rx buffer, get_rx_bufs returns success. This was intentional in order for make recvmsg truncate the packet and then handle_rx would detect err != sock_len and drop it. Unfortunately we pass the original sock_len to recvmsg - which means we use parts of iov not fully validated. Fix this up by detecting this overrun and doing packet drop immediately. CVE-2014-0077 Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Changes from v1: Fix CVE# in the commit log. Patch is unchanged. Note: this is needed for -stable. Applied and queued up for -stable. I wonder if this can still make the release. I will try but no promises. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] vhost: validate vhost_get_vq_desc return value
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 27 Mar 2014 12:53:37 +0200 vhost fails to validate negative error code from vhost_get_vq_desc causing a crash: we are using -EFAULT which is 0xfff2 as vector size, which exceeds the allocated size. The code in question was introduced in commit 8dd014adfea6f173c1ef6378f7e5e7924866c923 vhost-net: mergeable buffers support CVE-2014-0055 Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This is needed in -stable. Applied and queued up for -stable, thanks Michael. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Qemu v1.7.1 CentOS 6.4
Hi Everyone, I am running into performance issues with Windows guest VM's in conjunction with the rather old version of Qemu-KVM that is currently being shipped with rhel 6.4 and as such I am looking to upgrade to the latest stable release of qemu (v1.7.1 if I am not mistaken). As it stands now I have been unsuccessful in locating a good guide/tutorial on how to correctly update (or remove install) v1.7.1 - Does anyone have a good tutorial on how to properly get it installed under CentOS 6 (or rhel 6 in general)? The reason for the upgrade is due high context switch rate when the Windows virtual VM's are idle causing the hypervisor machine to utilize a lot more CPU than it should be. Based on my Google skills I have located several threads noting upgrade to at least 0.12.4 should help resolve the issue, however as it stands right now I really do not have any confirmation of this. As to CentOS, unfortunately the control panel system we are using currently limits us to rhel 6 based distros and as such swapping to a debian based distro or otherwise is not currently an option. Any advice feedback would be very much appreciated. Thanks! -Lane -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu v1.7.1 CentOS 6.4
On 03/28/2014 03:55 PM, Lane Eckley wrote: Hi Everyone, I am running into performance issues with Windows guest VM's in conjunction with the rather old version of Qemu-KVM that is currently being shipped with rhel 6.4 and as such I am looking to upgrade to the latest stable release of qemu (v1.7.1 if I am not mistaken). As it stands now I have been unsuccessful in locating a good guide/tutorial on how to correctly update (or remove install) v1.7.1 - Does anyone have a good tutorial on how to properly get it installed under CentOS 6 (or rhel 6 in general)? The reason for the upgrade is due high context switch rate when the Windows virtual VM's are idle causing the hypervisor machine to utilize a lot more CPU than it should be. Based on my Google skills I have located several threads noting upgrade to at least 0.12.4 should help resolve the issue, however as it stands right now I really do not have any confirmation of this. RedHat very likely has backported any fixes and probably most performance improvements from 0.12-1.7. I'd suggest upgrading to the latest RHEL release and try to enable the hv-* options (hv-spinlocks, hv-relaxed, hv-vapic, hv-time or whatever is supported by RHEL's kvm). IIRC, there is also a document/page in the RHEL docs that talks specifically about Windows guest performance. As to CentOS, unfortunately the control panel system we are using currently limits us to rhel 6 based distros and as such swapping to a debian based distro or otherwise is not currently an option. Any advice feedback would be very much appreciated. Thanks! -Lane -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make TM avoid program check
On Fri, Mar 28, 2014 at 04:40:36PM +1100, Michael Neuling wrote: Currently using kvmppc_set_one_reg() a transaction could be setup without TEXASR Failure Summary (FS) not set. When this is switched back in by the host, this will result in a TM Bad Thing (ie 0x700 program check) when the trechkpt is run. This avoids this by always setting the TEXASR FS when there is an active transaction being started. This patch is on top of Paulus' recent KVM TM patch set. Thanks, Mikey. Do you mind if I fold these two patches into patch 2/8 of the set I posted? Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make TM avoid program check
Il 28/03/2014 12:08, Paul Mackerras ha scritto: Currently using kvmppc_set_one_reg() a transaction could be setup without TEXASR Failure Summary (FS) not set. When this is switched back in by the host, this will result in a TM Bad Thing (ie 0x700 program check) when the trechkpt is run. This avoids this by always setting the TEXASR FS when there is an active transaction being started. This patch is on top of Paulus' recent KVM TM patch set. Thanks, Mikey. Do you mind if I fold these two patches into patch 2/8 of the set I posted? In either case, am I right that this doesn't include the patches in kvm-ppc-queue? I'm waiting for the pull request. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html