[no subject]
subscribe kvm -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
my name is Mrs. Alice Walton,i have a charity proposal for you -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Proposal, Respond to my personal email; mrs.zhangxiao1962@outlook. com Yours Sincerely. Mrs. Zhang Xiao (Accounts book Keeper) Angang Steel Company Limited 396 Nan Zhong Hua Lu, Tie Dong District Anshan, Liaoning 114021, China. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Proposal, Respond to my personal email; mrs.zhangxiao1962@outlook. com Yours Sincerely. Mrs. Zhang Xiao (Accounts book Keeper) Angang Steel Company Limited 396 Nan Zhong Hua Lu, Tie Dong District Anshan, Liaoning 114021, China. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
i need your assistance in transferring some funds -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm-commits -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe kvm udesh...@binghamton.edu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Your email was listed for AIG Master-Card funds for Compensation,contact(morganad...@att.net) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hello, I had some basic questions regarding KVM, and would appreciate any help:) I have been reading about the KVM architecture, and as I understand it, the guest shows up as a regular process in the host itself.. I had some questions around that.. 1. Are the guest processes implemented as a control group within the overall VM process itself? Is the VM a kernel process or a user process? 2. Is there a way for me to force some specific CPU/s to a guest, and those CPUs to be not used for any work on the host itself? Pinning is just making sure the vCPU runs on the same physical CPU always, I am looking for something more than that.. 3. If the host is compiled as a non pre-emptible kernel, kernel process run to completion until they give up the CPU themselves. In the context of a guest, I am trying to understand what that would mean in the context of KVM and guest VMs. If the VM is a user process, it means nothing, I wasnt sure as per (1). Cheers! M -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
I have a proposal for you. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hello, The following two patches address an integration issue between KVM and KGDB. The issue described in the patches can be triggered with vanilla kernels that enable KGDB and KVM together on x86 (more specifically, we bump into this with Fedora's 3.11 kernel from FC19). On a kernel enabled with KGDB, running with kvm-unit-tests should reproduce the issue. On VM hosts servers where an admin accidently left an active KGDB, and unprivilged guest might be able to bring the host down. Patches apply to linux-next and earlier kernels. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] *** SUBJECT HERE ***
From: Bharat Bhushan bharat.bhus...@freescale.com v1-v2 - Removed _PAGE_BUSY loop as suggested by PaulS. - Added check for PAGE_SPLITTING kvm: powerpc: use cache attributes from linux pte - 1st Patch fixes a bug in booke (detail in patch) - 2nd patch is renaming the linux_pte_lookup_function() just for clarity. There is not functional change. - 3nd Patch adds a Linux pte lookup function. - 4th Patch uses the above defined function and setup TLB.wimg accordingly Bharat Bhushan (4): kvm: booke: clear host tlb reference flag on guest tlb invalidation kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update() kvm: powerpc: define a linux pte lookup function kvm: powerpc: use caching attributes as per linux pte arch/powerpc/include/asm/kvm_host.h |2 +- arch/powerpc/include/asm/pgtable.h | 27 + arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++-- arch/powerpc/kvm/booke.c|1 + arch/powerpc/kvm/e500.h |8 +++-- arch/powerpc/kvm/e500_mmu_host.c| 55 +++--- 6 files changed, 70 insertions(+), 31 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] *** SUBJECT HERE ***
From: Bharat Bhushan bharat.bhus...@freescale.com v1-v2 - Removed _PAGE_BUSY loop as suggested by PaulS. - Added check for PAGE_SPLITTING kvm: powerpc: use cache attributes from linux pte - 1st Patch fixes a bug in booke (detail in patch) - 2nd patch is renaming the linux_pte_lookup_function() just for clarity. There is not functional change. - 3nd Patch adds a Linux pte lookup function. - 4th Patch uses the above defined function and setup TLB.wimg accordingly Bharat Bhushan (4): kvm: booke: clear host tlb reference flag on guest tlb invalidation kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update() kvm: powerpc: define a linux pte lookup function kvm: powerpc: use caching attributes as per linux pte arch/powerpc/include/asm/kvm_host.h |2 +- arch/powerpc/include/asm/pgtable.h | 27 + arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++-- arch/powerpc/kvm/booke.c|1 + arch/powerpc/kvm/e500.h |8 +++-- arch/powerpc/kvm/e500_mmu_host.c| 55 +++--- 6 files changed, 70 insertions(+), 31 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Can anyone please let me know links providing \ info about ongoing/future kvm feature \ development?is the todo list in the main page upto date? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Dear Sir/Madam, This is my fifth times of written you this email since last year till date but no response from you.Hope you get this one, as this is a personal email directed to you. My wife and I won a Jackpot Lottery of $11.3 million in July and have voluntarily decided to donate the sum of $500,000.00 USD to you as part of our own charity project to improve the lot of 10 lucky individuals all over the world. If you have received this email then you are one of the lucky recipients and all you have to do is get back with us so that we can send your details to the payout bank.Please note that you have to contact my private email for more informations(allenvioletlarge...@yahoo.co.uk ) You can verify this by visiting the web pages below. http://www.dailymail.co.uk/news/article-1326473/Canadian-couple-Allen-Violet- Large-away-entire-11-2m-lottery-win.html Good-luck, Allen and Violet Large Email:allenvioletlarge...@yahoo.co.uk -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
From 1273f8b2e5464ec987facf9942fd3ccc0b69087e Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Mon, 19 Aug 2013 09:33:30 +0800 Subject: [PATCH] qemu-kvm bugfix for IA32_FEATURE_CONTROL This patch is to fix the bug https://bugs.launchpad.net/qemu-kvm/+bug/1207623 IA32_FEATURE_CONTROL is pointless if not expose VMX or SMX bits to cpuid.1.ecx of vcpu. Current qemu-kvm will error return when kvm_put_msrs or kvm_get_msrs. Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/kvm.c | 16 ++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 84ac00a..7facbfe 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -65,6 +65,7 @@ static bool has_msr_star; static bool has_msr_hsave_pa; static bool has_msr_tsc_adjust; static bool has_msr_tsc_deadline; +static bool has_msr_feature_control; static bool has_msr_async_pf_en; static bool has_msr_pv_eoi_en; static bool has_msr_misc_enable; @@ -644,6 +645,11 @@ int kvm_arch_init_vcpu(CPUState *cs) qemu_add_vm_change_state_handler(cpu_update_state, env); +c = cpuid_find_entry(cpuid_data.cpuid, 1, 0); +if (c) +has_msr_feature_control = !!(c-ecx CPUID_EXT_VMX) | + !!(c-ecx CPUID_EXT_SMX); + cpuid_data.cpuid.padding = 0; r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data); if (r) { @@ -1121,7 +1127,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level) if (hyperv_vapic_recommended()) { kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0); } -kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, env-msr_ia32_feature_control); +if (has_msr_feature_control) { +kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, + env-msr_ia32_feature_control); +} } if (env-mcg_cap) { int i; @@ -1346,7 +1355,9 @@ static int kvm_get_msrs(X86CPU *cpu) if (has_msr_misc_enable) { msrs[n++].index = MSR_IA32_MISC_ENABLE; } -msrs[n++].index = MSR_IA32_FEATURE_CONTROL; +if (has_msr_feature_control) { +msrs[n++].index = MSR_IA32_FEATURE_CONTROL; +} if (!env-tsc_valid) { msrs[n++].index = MSR_IA32_TSC; @@ -1447,6 +1458,7 @@ static int kvm_get_msrs(X86CPU *cpu) break; case MSR_IA32_FEATURE_CONTROL: env-msr_ia32_feature_control = msrs[i].data; +break; default: if (msrs[i].index = MSR_MC0_CTL msrs[i].index MSR_MC0_CTL + (env-mcg_cap 0xff) * 4) { -- 1.7.1 0001-qemu-kvm-bugfix-for-IA32_FEATURE_CONTROL.patch Description: 0001-qemu-kvm-bugfix-for-IA32_FEATURE_CONTROL.patch
[no subject]
Loan Syndicacion Am AFG Guaranty Trust Bank, zu strukturieren wir Kreditlinien treffen Sie unsere Kunden spezifischen geschäftlichen Anforderungen und einen deutlichen Mehrwert für unsere Kunden Unternehmen. eine Division der AFG Finance und Private Bank plc. Wenn Sie erwägen, eine große Akquisition oder ein Großprojekt sind, können Sie brauchen eine erhebliche Menge an Kredit. AFG Guaranty Trust Bank setzen können zusammen das Syndikat, das die gesamte Kredit schnürt für Sie. Als Bank mit internationaler Reichweite, sind wir gekommen, um Darlehen zu identifizieren Syndizierungen als Teil unseres Kerngeschäfts und durch spitzte diese Zeile aggressiv sind wir an einem Punkt, wo wir kommen, um als erkannt haben Hauptakteur in diesem Bereich. öffnen Sie ein Girokonto heute mit einem Minimum Bankguthaben von 500 £ und Getup zu £ 10.000 als Darlehen und auch den Hauch einer Chance und gewann die Sterne Preis von £ 500.000 in die sparen und gewinnen promo in may.aply jetzt. mit dem Folowing Informationen über Rechtsanwalt steven lee das Konto Offizier. FULL NAME; Wohnadresse; E-MAIL-ADRESSE; Telefonnummer; Nächsten KINS; MUTTER MAIDEN NAME; Familienstand; BÜROADRESSE; ALTERNATIVE Telefonnummer; TO @ yahoo.com bar.stevenlee NOTE; ALLE Darlehen sind für 10JAHRE RATE VALID ANGEBOT ENDET BALD SO JETZT HURRY -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Loan Syndicacion Am AFG Guaranty Trust Bank, zu strukturieren wir Kreditlinien treffen Sie unsere Kunden spezifischen geschäftlichen Anforderungen und einen deutlichen Mehrwert für unsere Kunden Unternehmen. eine Division der AFG Finance und Private Bank plc. Wenn Sie erwägen, eine große Akquisition oder ein Großprojekt sind, können Sie brauchen eine erhebliche Menge an Kredit. AFG Guaranty Trust Bank setzen können zusammen das Syndikat, das die gesamte Kredit schnürt für Sie. Als Bank mit internationaler Reichweite, sind wir gekommen, um Darlehen zu identifizieren Syndizierungen als Teil unseres Kerngeschäfts und durch spitzte diese Zeile aggressiv sind wir an einem Punkt, wo wir kommen, um als erkannt haben Hauptakteur in diesem Bereich. öffnen Sie ein Girokonto heute mit einem Minimum Bankguthaben von 500 £ und Getup zu £ 10.000 als Darlehen und auch den Hauch einer Chance und gewann die Sterne Preis von £ 500.000 in die sparen und gewinnen promo in may.aply jetzt. mit dem Folowing Informationen über Rechtsanwalt steven lee das Konto Offizier. FULL NAME; Wohnadresse; E-MAIL-ADRESSE; Telefonnummer; Nächsten KINS; MUTTER MAIDEN NAME; Familienstand; BÜROADRESSE; ALTERNATIVE Telefonnummer; TO @ yahoo.com bar.stevenlee NOTE; ALLE Darlehen sind für 10JAHRE RATE VALID ANGEBOT ENDET BALD SO JETZT HURRY -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] Subject: [PATCH 01/10] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/vmx.c --- arch/x86/kvm/vmx.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6667042..9e0ec9d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2057,6 +2057,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) #else nested_vmx_exit_ctls_high = 0; #endif + nested_vmx_exit_ctls_high |= VM_EXIT_LOAD_IA32_EFER; /* entry controls */ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2064,6 +2065,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) nested_vmx_entry_ctls_low = 0; nested_vmx_entry_ctls_high = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE; + nested_vmx_entry_ctls_high |= VM_ENTRY_LOAD_IA32_EFER; /* cpu-based controls */ rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, @@ -7050,10 +7052,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vcpu-arch.cr0_guest_owned_bits = ~vmcs12-cr0_guest_host_mask; vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu-arch.cr0_guest_owned_bits); - /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer below */ - vmcs_write32(VM_EXIT_CONTROLS, - vmcs12-vm_exit_controls | vmcs_config.vmexit_ctrl); - vmcs_write32(VM_ENTRY_CONTROLS, vmcs12-vm_entry_controls | + /* L2-L1 exit controls are emulated - the hardware exit is to L0 so + * we should use its exit controls. Note that IA32_MODE, LOAD_IA32_EFER + * bits are further modified by vmx_set_efer() below. + */ + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + + /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are + * emulated by vmx_set_efer(), below. + */ + vmcs_write32(VM_ENTRY_CONTROLS, + (vmcs12-vm_entry_controls ~VM_ENTRY_LOAD_IA32_EFER + ~VM_ENTRY_IA32E_MODE) | (vmcs_config.vmentry_ctrl ~VM_ENTRY_IA32E_MODE)); if (vmcs12-vm_entry_controls VM_ENTRY_LOAD_IA32_PAT) -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/12] Subject: [PATCH 02/10] nEPT: Add EPT tables support to paging_tmpl.h
This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with EPT) which correctly read and write EPT tables. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/mmu.c modified: arch/x86/kvm/paging_tmpl.h --- arch/x86/kvm/mmu.c | 5 ++ arch/x86/kvm/paging_tmpl.h | 135 ++--- 2 files changed, 131 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 956ca35..91cac19 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3418,6 +3418,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp return mmu-last_pte_bitmap (1 index); } +#define PTTYPE_EPT 18 /* arbitrary */ +#define PTTYPE PTTYPE_EPT +#include paging_tmpl.h +#undef PTTYPE + #define PTTYPE 64 #include paging_tmpl.h #undef PTTYPE diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 105dd5b..6226b51 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -50,6 +50,22 @@ #define PT_LEVEL_BITS PT32_LEVEL_BITS #define PT_MAX_FULL_LEVELS 2 #define CMPXCHG cmpxchg +#elif PTTYPE == PTTYPE_EPT + #define pt_element_t u64 + #define guest_walker guest_walkerEPT + #define FNAME(name) EPT_##name + #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK + #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl) + #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) + #define PT_INDEX(addr, level) PT64_INDEX(addr, level) + #define PT_LEVEL_BITS PT64_LEVEL_BITS + #ifdef CONFIG_X86_64 + #define PT_MAX_FULL_LEVELS 4 + #define CMPXCHG cmpxchg + #else + #define CMPXCHG cmpxchg64 + #define PT_MAX_FULL_LEVELS 2 + #endif #else #error Invalid PTTYPE value #endif @@ -80,6 +96,7 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl) return (gpte PT_LVL_ADDR_MASK(lvl)) PAGE_SHIFT; } +#if PTTYPE != PTTYPE_EPT static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, pt_element_t __user *ptep_user, unsigned index, pt_element_t orig_pte, pt_element_t new_pte) @@ -102,7 +119,52 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return (ret != orig_pte); } +#endif + +static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte) +{ + unsigned access; + +#if PTTYPE == PTTYPE_EPT + /* We rely here that ACC_WRITE_MASK==VMX_EPT_WRITABLE_MASK */ + access = (gpte VMX_EPT_WRITABLE_MASK) | ACC_USER_MASK | + ((gpte VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0); +#else + access = (gpte (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; + access = ~(gpte PT64_NX_SHIFT); +#endif + + return access; +} + +static inline int FNAME(is_present_gpte)(unsigned long pte) +{ +#if PTTYPE == PTTYPE_EPT + return pte (VMX_EPT_READABLE_MASK | VMX_EPT_WRITABLE_MASK | + VMX_EPT_EXECUTABLE_MASK); +#else + return is_present_gpte(pte); +#endif +} + +static inline int FNAME(check_write_user_access)(struct kvm_vcpu *vcpu, + bool write_fault, bool user_fault, + unsigned long pte) +{ +#if PTTYPE == PTTYPE_EPT + if (unlikely(write_fault !(pte VMX_EPT_WRITABLE_MASK) + (user_fault || is_write_protection(vcpu + return false; + return true; +#else + u32 access = ((kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0) +| (write_fault ? PFERR_WRITE_MASK : 0); + + return !permission_fault(vcpu-arch.walk_mmu, vcpu-arch.access, access); +#endif +} +#if PTTYPE != PTTYPE_EPT static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, struct guest_walker *walker, @@ -139,6 +201,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, } return 0; } +#endif /* * Fetch a guest pte for a guest virtual address @@ -147,7 +210,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
[PATCH 03/12] Subject: [PATCH 03/10] nEPT: MMU context for nested EPT
KVM's existing shadow MMU code already supports nested TDP. To use it, we need to set up a new MMU context for nested EPT, and create a few callbacks for it (nested_ept_*()). This context should also use the EPT versions of the page table access functions (defined in the previous patch). Then, we need to switch back and forth between this nested context and the regular MMU context when switching between L1 and L2 (when L1 runs this L2 with EPT). Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/mmu.c modified: arch/x86/kvm/mmu.h modified: arch/x86/kvm/vmx.c --- arch/x86/kvm/mmu.c | 38 arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/vmx.c | 56 +++--- 3 files changed, 92 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 91cac19..34e406e2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3674,6 +3674,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context) } EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu); +int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context) +{ + ASSERT(vcpu); + ASSERT(!VALID_PAGE(vcpu-arch.mmu.root_hpa)); + + context-shadow_root_level = kvm_x86_ops-get_tdp_level(); + + context-nx = is_nx(vcpu); /* TODO: ? */ + context-new_cr3 = paging_new_cr3; + context-page_fault = EPT_page_fault; + context-gva_to_gpa = EPT_gva_to_gpa; + context-sync_page = EPT_sync_page; + context-invlpg = EPT_invlpg; + context-update_pte = EPT_update_pte; + context-free = paging_free; + context-root_level = context-shadow_root_level; + context-root_hpa = INVALID_PAGE; + context-direct_map = false; + + /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need + something different. + */ + reset_rsvds_bits_mask(vcpu, context); + + + /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why + they are done, or why they write to vcpu-arch.mmu and not context + */ + vcpu-arch.mmu.base_role.cr4_pae = !!is_pae(vcpu); + vcpu-arch.mmu.base_role.cr0_wp = is_write_protection(vcpu); + vcpu-arch.mmu.base_role.smep_andnot_wp = + kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) + !is_write_protection(vcpu); + + return 0; +} +EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu); + static int init_kvm_softmmu(struct kvm_vcpu *vcpu) { int r = kvm_init_shadow_mmu(vcpu, vcpu-arch.walk_mmu); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 6987108..19dd5ab 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]); void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask); int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct); int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context); +int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context); static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm) { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9e0ec9d..f2fd79d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -912,12 +912,16 @@ static inline bool nested_cpu_has2(struct vmcs12 *vmcs12, u32 bit) (vmcs12-secondary_vm_exec_control bit); } -static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12, - struct kvm_vcpu *vcpu) +static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12) { return vmcs12-pin_based_vm_exec_control PIN_BASED_VIRTUAL_NMIS; } +static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12) +{ + return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT); +} + static inline bool is_exception(u32 intr_info) { return (intr_info (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK)) @@ -6873,6 +6877,46 @@ static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) entry-ecx |= bit(X86_FEATURE_VMX); } +/* Callbacks for nested_ept_init_mmu_context: */ + +static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu) +{ + /* return the page table to be shadowed - in our case, EPT12 */ + return get_vmcs12(vcpu)-ept_pointer; +} + +static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + struct vmcs12 *vmcs12; + nested_vmx_vmexit(vcpu); + vmcs12 = get_vmcs12(vcpu); + /* + * Note no need to set vmcs12-vm_exit_reason as it is already copied + * from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION. + */ + vmcs12-exit_qualification = fault-error_code; + vmcs12-guest_physical_address = fault-address; +} + +static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu) +{ + int r = kvm_init_shadow_EPT_mmu(vcpu, vcpu-arch.mmu); + + vcpu-arch.mmu.set_cr3 = vmx_set_cr3; + vcpu-arch.mmu.get_cr3 = nested_ept_get_cr3; + vcpu-arch.mmu.inject_page_fault = nested_ept_inject_page_fault; + + vcpu-arch.walk_mmu = vcpu-arch.nested_mmu; + + return r; +} + +static void
[PATCH 04/12] Subject: [PATCH 04/10] nEPT: Fix cr3 handling in nested exit and entry
The existing code for handling cr3 and related VMCS fields during nested exit and entry wasn't correct in all cases: If L2 is allowed to control cr3 (and this is indeed the case in nested EPT), during nested exit we must copy the modified cr3 from vmcs02 to vmcs12, and we forgot to do so. This patch adds this copy. If L0 isn't controlling cr3 when running L2 (i.e., L0 is using EPT), and whoever does control cr3 (L1 or L2) is using PAE, the processor might have saved PDPTEs and we should also save them in vmcs12 (and restore later). Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/vmx.c --- arch/x86/kvm/vmx.c | 37 - 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f2fd79d..d4bfd32 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7162,10 +7162,26 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmx_set_cr4(vcpu, vmcs12-guest_cr4); vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12)); - /* shadow page tables on either EPT or shadow page tables */ + /* + * Note that kvm_set_cr3() and kvm_mmu_reset_context() will do the + * right thing, and set GUEST_CR3 and/or EPT_POINTER in all supported + * settings: 1. shadow page tables on shadow page tables, 2. shadow + * page tables on EPT, 3. EPT on EPT. + */ kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + /* + * Additionally, except when L0 is using shadow page tables, L1 or + * L2 control guest_cr3 for L2, so they may also have saved PDPTEs + */ + if (enable_ept) { + vmcs_write64(GUEST_PDPTR0, vmcs12-guest_pdptr0); + vmcs_write64(GUEST_PDPTR1, vmcs12-guest_pdptr1); + vmcs_write64(GUEST_PDPTR2, vmcs12-guest_pdptr2); + vmcs_write64(GUEST_PDPTR3, vmcs12-guest_pdptr3); + } + kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12-guest_rsp); kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12-guest_rip); } @@ -7397,6 +7413,25 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12-guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); + /* + * In some cases (usually, nested EPT), L2 is allowed to change its + * own CR3 without exiting. If it has changed it, we must keep it. + * Of course, if L0 is using shadow page tables, GUEST_CR3 was defined + * by L0, not L1 or L2, so we mustn't unconditionally copy it to vmcs12. + */ + if (enable_ept) + vmcs12-guest_cr3 = vmcs_read64(GUEST_CR3); + /* + * Additionally, except when L0 is using shadow page tables, L1 or + * L2 control guest_cr3 for L2, so save their PDPTEs + */ + if (enable_ept) { + vmcs12-guest_pdptr0 = vmcs_read64(GUEST_PDPTR0); + vmcs12-guest_pdptr1 = vmcs_read64(GUEST_PDPTR1); + vmcs12-guest_pdptr2 = vmcs_read64(GUEST_PDPTR2); + vmcs12-guest_pdptr3 = vmcs_read64(GUEST_PDPTR3); + } + /* TODO: These cannot have changed unless we have MSR bitmaps and * the relevant bit asks not to trap the change */ vmcs12-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/12] Subject: [PATCH 05/10] nEPT: Fix wrong test in kvm_set_cr3
kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical address. The problem is that with nested EPT, cr3 is an *L2* physical address, not an L1 physical address as this test expects. As the comment above this test explains, it isn't necessary, and doesn't correspond to anything a real processor would do. So this patch removes it. Note that this wrong test could have also theoretically caused problems in nested NPT, not just in nested EPT. However, in practice, the problem was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus circumventing the problem. Additional potential calls to the buggy function are avoided in that we don't trap cr3 modifications when nested NPT is enabled. However, because in nested VMX we did want to use kvm_set_cr3() (as requested in Avi Kivity's review of the original nested VMX patches), we can't avoid this problem and need to fix it. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/x86.c --- arch/x86/kvm/x86.c | 11 --- 1 file changed, 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e172132..c34590d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -659,17 +659,6 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) */ } - /* - * Does the new cr3 value map to physical memory? (Note, we - * catch an invalid cr3 even in real-mode, because it would - * cause trouble later on when we turn on paging anyway.) - * - * A real CPU would silently accept an invalid cr3 and would - * attempt to use it - with largely undefined (and often hard - * to debug) behavior on the guest side. - */ - if (unlikely(!gfn_to_memslot(vcpu-kvm, cr3 PAGE_SHIFT))) - return 1; vcpu-arch.cr3 = cr3; __set_bit(VCPU_EXREG_CR3, (ulong *)vcpu-arch.regs_avail); vcpu-arch.mmu.new_cr3(vcpu); -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] Subject: [PATCH 06/10] nEPT: Some additional comments
Some additional comments to preexisting code: Explain who (L0 or L1) handles EPT violation and misconfiguration exits. Don't mention shadow on either EPT or shadow as the only two options. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/vmx.c --- arch/x86/kvm/vmx.c | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d4bfd32..0e99b15 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6126,7 +6126,20 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES); case EXIT_REASON_EPT_VIOLATION: + /* + * L0 always deals with the EPT violation. If nested EPT is + * used, and the nested mmu code discovers that the address is + * missing in the guest EPT table (EPT12), the EPT violation + * will be injected with nested_ept_inject_page_fault() + */ + return 0; case EXIT_REASON_EPT_MISCONFIG: + /* + * L2 never uses directly L1's EPT, but rather L0's own EPT + * table (shadow on EPT) or a merged EPT table that L0 built + * (EPT on EPT). So any problems with the structure of the + * table is L0's fault. + */ return 0; case EXIT_REASON_WBINVD: return nested_cpu_has2(vmcs12, SECONDARY_EXEC_WBINVD_EXITING); -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/12] Subject: [PATCH 07/10] nEPT: Advertise EPT to L1
Advertise the support of EPT to the L1 guest, through the appropriate MSR. This is the last patch of the basic Nested EPT feature, so as to allow bisection through this patch series: The guest will not see EPT support until this last patch, and will not attempt to use the half-applied feature. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/vmx.c --- arch/x86/kvm/vmx.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0e99b15..a5e14d1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2026,6 +2026,7 @@ static u32 nested_vmx_secondary_ctls_low, nested_vmx_secondary_ctls_high; static u32 nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high; static u32 nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high; static u32 nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high; +static u32 nested_vmx_ept_caps; static __init void nested_vmx_setup_ctls_msrs(void) { /* @@ -2101,6 +2102,18 @@ static __init void nested_vmx_setup_ctls_msrs(void) nested_vmx_secondary_ctls_low = 0; nested_vmx_secondary_ctls_high = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; + if (enable_ept) { + /* nested EPT: emulate EPT also to L1 */ + nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT; + nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT; + nested_vmx_ept_caps |= + VMX_EPT_INVEPT_BIT | VMX_EPT_EXTENT_GLOBAL_BIT | + VMX_EPT_EXTENT_CONTEXT_BIT | + VMX_EPT_EXTENT_INDIVIDUAL_BIT; + nested_vmx_ept_caps = vmx_capability.ept; + } else + nested_vmx_ept_caps = 0; + } static inline bool vmx_control_verify(u32 control, u32 low, u32 high) @@ -2200,8 +2213,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) nested_vmx_secondary_ctls_high); break; case MSR_IA32_VMX_EPT_VPID_CAP: - /* Currently, no nested ept or nested vpid */ - *pdata = 0; + /* Currently, no nested vpid support */ + *pdata = nested_vmx_ept_caps; break; default: return 0; -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/12] Subject: [PATCH 08/10] nEPT: Nested INVEPT
If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. Therefore, when L1 calls INVEPT, we don't really need to do anything. In particular we *don't* need to call the real INVEPT again. All we do in our INVEPT is verify the validity of the call, and its parameters, and then do nothing. In KVM Forum 2010, Dong et al. presented Nested Virtualization Friendly KVM and classified our current nested EPT implementation as shadow-like virtual EPT. He recommended instead a different approach, which he called VTLB-like virtual EPT. If we had taken that alternative approach, INVEPT would have had a bigger role: L0 would only rebuild the shadow EPT table when L1 calls INVEPT. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/include/asm/vmx.h modified: arch/x86/kvm/vmx.c --- arch/x86/include/asm/vmx.h | 4 ++- arch/x86/kvm/vmx.c | 83 ++ 2 files changed, 86 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index b6fbf86..0ce54f3 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -376,7 +376,9 @@ enum vmcs_field { #define VMX_EPTP_WB_BIT (1ull 14) #define VMX_EPT_2MB_PAGE_BIT (1ull 16) #define VMX_EPT_1GB_PAGE_BIT (1ull 17) -#define VMX_EPT_AD_BIT(1ull 21) +#define VMX_EPT_INVEPT_BIT (1ull 20) +#define VMX_EPT_AD_BIT (1ull 21) +#define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull 24) #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull 25) #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull 26) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a5e14d1..10f2a69 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5878,6 +5878,87 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu) return 1; } +/* Emulate the INVEPT instruction */ +static int handle_invept(struct kvm_vcpu *vcpu) +{ + u32 vmx_instruction_info; + unsigned long type; + gva_t gva; + struct x86_exception e; + struct { + u64 eptp, gpa; + } operand; + + if (!(nested_vmx_secondary_ctls_high SECONDARY_EXEC_ENABLE_EPT) || +!(nested_vmx_ept_caps VMX_EPT_INVEPT_BIT)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + if (!nested_vmx_check_permission(vcpu)) + return 1; + + if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + /* According to the Intel VMX instruction reference, the memory + * operand is read even if it isn't needed (e.g., for type==global) + */ + vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); + if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), + vmx_instruction_info, gva)) + return 1; + if (kvm_read_guest_virt(vcpu-arch.emulate_ctxt, gva, operand, + sizeof(operand), e)) { + kvm_inject_page_fault(vcpu, e); + return 1; + } + + type = kvm_register_read(vcpu, (vmx_instruction_info 28) 0xf); + + switch (type) { + case VMX_EPT_EXTENT_GLOBAL: + if (!(nested_vmx_ept_caps VMX_EPT_EXTENT_GLOBAL_BIT)) + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + else { + /* + * Do nothing: when L1 changes EPT12, we already + * update EPT02 (the shadow EPT table) and call INVEPT. + * So when L1 calls INVEPT, there's nothing left to do. + */ + nested_vmx_succeed(vcpu); + } + break; + case VMX_EPT_EXTENT_CONTEXT: + if (!(nested_vmx_ept_caps VMX_EPT_EXTENT_CONTEXT_BIT)) + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + else { + /* Do nothing */ + nested_vmx_succeed(vcpu); + } + break; + case VMX_EPT_EXTENT_INDIVIDUAL_ADDR: + if (!(nested_vmx_ept_caps VMX_EPT_EXTENT_INDIVIDUAL_BIT)) + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + else { + /* Do nothing */ + nested_vmx_succeed(vcpu); + } + break; + default: + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + } + + skip_emulated_instruction(vcpu); + return 1; +} + /* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs @@ -5922,6 +6003,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause, [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op, [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op, + [EXIT_REASON_INVEPT] = handle_invept, }; static const int kvm_vmx_max_exit_handlers = @@ -6106,6 +6188,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD: case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE: case EXIT_REASON_VMOFF: case EXIT_REASON_VMON: + case EXIT_REASON_INVEPT: /* * VMX instructions trap unconditionally. This allows
[PATCH 09/12] Subject: [PATCH 09/10] nEPT: Documentation
Update the documentation to no longer say that nested EPT is not supported. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: Documentation/virtual/kvm/nested-vmx.txt --- Documentation/virtual/kvm/nested-vmx.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/nested-vmx.txt b/Documentation/virtual/kvm/nested-vmx.txt index 8ed937d..cdf7839 100644 --- a/Documentation/virtual/kvm/nested-vmx.txt +++ b/Documentation/virtual/kvm/nested-vmx.txt @@ -38,8 +38,8 @@ The current code supports running Linux guests under KVM guests. Only 64-bit guest hypervisors are supported. Additional patches for running Windows under guest KVM, and Linux under -guest VMware server, and support for nested EPT, are currently running in -the lab, and will be sent as follow-on patchsets. +guest VMware server, are currently running in the lab, and will be sent as +follow-on patchsets. Running nested VMX -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/12] Subject: [PATCH 10/10] nEPT: Miscelleneous cleanups
Some trivial code cleanups not really related to nested EPT. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/include/asm/vmx.h modified: arch/x86/kvm/vmx.c --- arch/x86/include/asm/vmx.h | 44 arch/x86/kvm/vmx.c | 3 +-- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0ce54f3..5838be1 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -254,6 +254,50 @@ enum vmcs_field { HOST_RIP= 0x6c16, }; +#define VMX_EXIT_REASONS_FAILED_VMENTRY 0x8000 + +#define EXIT_REASON_EXCEPTION_NMI 0 +#define EXIT_REASON_EXTERNAL_INTERRUPT 1 +#define EXIT_REASON_TRIPLE_FAULT2 + +#define EXIT_REASON_PENDING_INTERRUPT 7 +#define EXIT_REASON_NMI_WINDOW 8 +#define EXIT_REASON_TASK_SWITCH 9 +#define EXIT_REASON_CPUID 10 +#define EXIT_REASON_HLT 12 +#define EXIT_REASON_INVD13 +#define EXIT_REASON_INVLPG 14 +#define EXIT_REASON_RDPMC 15 +#define EXIT_REASON_RDTSC 16 +#define EXIT_REASON_VMCALL 18 +#define EXIT_REASON_VMCLEAR 19 +#define EXIT_REASON_VMLAUNCH20 +#define EXIT_REASON_VMPTRLD 21 +#define EXIT_REASON_VMPTRST 22 +#define EXIT_REASON_VMREAD 23 +#define EXIT_REASON_VMRESUME24 +#define EXIT_REASON_VMWRITE 25 +#define EXIT_REASON_VMOFF 26 +#define EXIT_REASON_VMON27 +#define EXIT_REASON_CR_ACCESS 28 +#define EXIT_REASON_DR_ACCESS 29 +#define EXIT_REASON_IO_INSTRUCTION 30 +#define EXIT_REASON_MSR_READ31 +#define EXIT_REASON_MSR_WRITE 32 +#define EXIT_REASON_INVALID_STATE 33 +#define EXIT_REASON_MWAIT_INSTRUCTION 36 +#define EXIT_REASON_MONITOR_INSTRUCTION 39 +#define EXIT_REASON_PAUSE_INSTRUCTION 40 +#define EXIT_REASON_MCE_DURING_VMENTRY 41 +#define EXIT_REASON_TPR_BELOW_THRESHOLD 43 +#define EXIT_REASON_APIC_ACCESS 44 +#define EXIT_REASON_EPT_VIOLATION 48 +#define EXIT_REASON_EPT_MISCONFIG 49 +#define EXIT_REASON_INVEPT 50 +#define EXIT_REASON_WBINVD 54 +#define EXIT_REASON_XSETBV 55 +#define EXIT_REASON_INVPCID 58 + /* * Interruption-information format */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 10f2a69..95304cc 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -616,7 +616,6 @@ static void nested_release_page_clean(struct page *page) static u64 construct_eptp(unsigned long root_hpa); static void kvm_cpu_vmxon(u64 addr); static void kvm_cpu_vmxoff(void); -static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3); static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr); static void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); @@ -6320,7 +6319,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) if (unlikely(!cpu_has_virtual_nmis() vmx-soft_vnmi_blocked !(is_guest_mode(vcpu) nested_cpu_has_virtual_nmis( -get_vmcs12(vcpu), vcpu { + get_vmcs12(vcpu) { if (vmx_interrupt_allowed(vcpu)) { vmx-soft_vnmi_blocked = 0; } else if (vmx-vnmi_blocked_time 10LL -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] Subject: [PATCH 01/10] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
All the patches are mangled by your email client. Please use git send-email --thread to send them. On Thu, Apr 25, 2013 at 12:50:19AM -0700, Nakajima, Jun wrote: Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com modified: arch/x86/kvm/vmx.c --- arch/x86/kvm/vmx.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6667042..9e0ec9d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2057,6 +2057,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) #else nested_vmx_exit_ctls_high = 0; #endif + nested_vmx_exit_ctls_high |= VM_EXIT_LOAD_IA32_EFER; /* entry controls */ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2064,6 +2065,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) nested_vmx_entry_ctls_low = 0; nested_vmx_entry_ctls_high = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE; + nested_vmx_entry_ctls_high |= VM_ENTRY_LOAD_IA32_EFER; /* cpu-based controls */ rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, @@ -7050,10 +7052,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vcpu-arch.cr0_guest_owned_bits = ~vmcs12-cr0_guest_host_mask; vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu-arch.cr0_guest_owned_bits); - /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer below */ - vmcs_write32(VM_EXIT_CONTROLS, - vmcs12-vm_exit_controls | vmcs_config.vmexit_ctrl); - vmcs_write32(VM_ENTRY_CONTROLS, vmcs12-vm_entry_controls | + /* L2-L1 exit controls are emulated - the hardware exit is to L0 so + * we should use its exit controls. Note that IA32_MODE, LOAD_IA32_EFER + * bits are further modified by vmx_set_efer() below. + */ + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + + /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are + * emulated by vmx_set_efer(), below. + */ + vmcs_write32(VM_ENTRY_CONTROLS, + (vmcs12-vm_entry_controls ~VM_ENTRY_LOAD_IA32_EFER + ~VM_ENTRY_IA32E_MODE) | (vmcs_config.vmentry_ctrl ~VM_ENTRY_IA32E_MODE)); if (vmcs12-vm_entry_controls VM_ENTRY_LOAD_IA32_PAT) -- 1.8.2.1.610.g562af5b -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
I want to know how to use qemu-system-x86_64 command to create a kvm virtial machine with scsi disk as its first boot disk ! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
On Sun, Jan 06, 2013 at 02:36:13PM +0800, Asias He wrote: This drops the cmd completion list spin lock and makes the cmd completion queue lock-less. Signed-off-by: Asias He as...@redhat.com Nicholas, any feedback? --- drivers/vhost/tcm_vhost.c | 46 +- drivers/vhost/tcm_vhost.h | 2 +- 2 files changed, 14 insertions(+), 34 deletions(-) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index b20df5c..3720604 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -47,6 +47,7 @@ #include linux/vhost.h #include linux/virtio_net.h /* TODO vhost.h currently depends on this */ #include linux/virtio_scsi.h +#include linux/llist.h #include vhost.c #include vhost.h @@ -64,8 +65,7 @@ struct vhost_scsi { struct vhost_virtqueue vqs[3]; struct vhost_work vs_completion_work; /* cmd completion work item */ - struct list_head vs_completion_list; /* cmd completion queue */ - spinlock_t vs_completion_lock;/* protects s_completion_list */ + struct llist_head vs_completion_list; /* cmd completion queue */ }; /* Local pointer to allocated TCM configfs fabric module */ @@ -301,9 +301,7 @@ static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *tv_cmd) { struct vhost_scsi *vs = tv_cmd-tvc_vhost; - spin_lock_bh(vs-vs_completion_lock); - list_add_tail(tv_cmd-tvc_completion_list, vs-vs_completion_list); - spin_unlock_bh(vs-vs_completion_lock); + llist_add(tv_cmd-tvc_completion_list, vs-vs_completion_list); vhost_work_queue(vs-dev, vs-vs_completion_work); } @@ -347,27 +345,6 @@ static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd) kfree(tv_cmd); } -/* Dequeue a command from the completion list */ -static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion( - struct vhost_scsi *vs) -{ - struct tcm_vhost_cmd *tv_cmd = NULL; - - spin_lock_bh(vs-vs_completion_lock); - if (list_empty(vs-vs_completion_list)) { - spin_unlock_bh(vs-vs_completion_lock); - return NULL; - } - - list_for_each_entry(tv_cmd, vs-vs_completion_list, - tvc_completion_list) { - list_del(tv_cmd-tvc_completion_list); - break; - } - spin_unlock_bh(vs-vs_completion_lock); - return tv_cmd; -} - /* Fill in status and signal that we are done processing this command * * This is scheduled in the vhost work queue so we are called with the owner @@ -377,12 +354,18 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) { struct vhost_scsi *vs = container_of(work, struct vhost_scsi, vs_completion_work); + struct virtio_scsi_cmd_resp v_rsp; struct tcm_vhost_cmd *tv_cmd; + struct llist_node *llnode; + struct se_cmd *se_cmd; + int ret; - while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs))) { - struct virtio_scsi_cmd_resp v_rsp; - struct se_cmd *se_cmd = tv_cmd-tvc_se_cmd; - int ret; + llnode = llist_del_all(vs-vs_completion_list); + while (llnode) { + tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd, + tvc_completion_list); + llnode = llist_next(llnode); + se_cmd = tv_cmd-tvc_se_cmd; pr_debug(%s tv_cmd %p resid %u status %#02x\n, __func__, tv_cmd, se_cmd-residual_count, se_cmd-scsi_status); @@ -426,7 +409,6 @@ static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd( pr_err(Unable to allocate struct tcm_vhost_cmd\n); return ERR_PTR(-ENOMEM); } - INIT_LIST_HEAD(tv_cmd-tvc_completion_list); tv_cmd-tvc_tag = v_req-tag; tv_cmd-tvc_task_attr = v_req-task_attr; tv_cmd-tvc_exp_data_len = exp_data_len; @@ -859,8 +841,6 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) return -ENOMEM; vhost_work_init(s-vs_completion_work, vhost_scsi_complete_cmd_work); - INIT_LIST_HEAD(s-vs_completion_list); - spin_lock_init(s-vs_completion_lock); s-vqs[VHOST_SCSI_VQ_CTL].handle_kick = vhost_scsi_ctl_handle_kick; s-vqs[VHOST_SCSI_VQ_EVT].handle_kick = vhost_scsi_evt_handle_kick; diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h index 7e87c63..47ee80b 100644 --- a/drivers/vhost/tcm_vhost.h +++ b/drivers/vhost/tcm_vhost.h @@ -34,7 +34,7 @@ struct tcm_vhost_cmd { /* Sense buffer that will be mapped into outgoing status */ unsigned char tvc_sense_buf[TRANSPORT_SENSE_BUFFER]; /* Completed commands list, serviced from vhost worker thread */ - struct list_head tvc_completion_list; + struct llist_node tvc_completion_list; }; struct tcm_vhost_nexus { -- 1.7.11.7 -- To unsubscribe from this
[PATCH 1/1] Subject: [PATCH] s390/kvm: Fix address space mixup
I was chasing down a bug of random validity intercepts on s390. (guest prefix page not mapped in the host virtual aspace). Turns out that the problem was a wrong address space control element. The cause was quite complex: During paging activity a DAT protection during SIE caused a program interrupt. Normally, the sie retry loop tries to catch all interrupts during and shortly before sie to rerun the setup. The problem is now that protection causes a suppressing program interrupt, causing the PSW to point to the instruction AFTER SIE in case of DAT protection. This confused the logic of the retry loop to not trigger, instead we jumped directly back to SIE after return from the program interrupt. (the protection fault handler itself did a rewind of the psw). This usually works quite well, but: If now the protection fault handler has to wait, another program might be scheduled in. Later on the sie process will be schedules in again. In that case the content of CR1 (primary address space) will be wrong because switch_to will put the user space ASCE into CR1 and not the guest ASCE. In addition the program parameter is also wrong for every protection fault of a guest, since we dont issue the SPP instruction. So lets also check for PSW == instruction after SIE in the program check handler. Instead of expensively checking all program interruption codes that might be suppressing we assume that a program interrupt pointing after SIE was always a program interrupt in SIE. (Otherwise we have a kernel bug anyway). We also have to compensate the rewinding, since the C-level handlers will do that. Therefore we need to add a nop with the same length as SIE before the sie_loop. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com CC: sta...@vger.kernel.org CC: Martin Schwidefsky schwidef...@de.ibm.com CC: Heiko Carstens heiko.carst...@de.ibm.com --- arch/s390/kernel/entry64.S | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S index 07d8de3..19b6080 100644 --- a/arch/s390/kernel/entry64.S +++ b/arch/s390/kernel/entry64.S @@ -80,14 +80,21 @@ _TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING) #endif .endm - .macro HANDLE_SIE_INTERCEPT scratch + .macro HANDLE_SIE_INTERCEPT scratch,pgmcheck #if defined(CONFIG_KVM) || defined(CONFIG_KVM_MODULE) tmhh%r8,0x0001 # interrupting from user ? jnz .+42 lgr \scratch,%r9 slg \scratch,BASED(.Lsie_loop) clg \scratch,BASED(.Lsie_length) + .if \pgmcheck + # Some program interrupts are suppressing (e.g. protection). + # We must also check the instruction after SIE in that case. + # do_protection_exception will rewind to rewind_pad + jh .+22 + .else jhe .+22 + .endif lg %r9,BASED(.Lsie_loop) SPP BASED(.Lhost_id)# set host id #endif @@ -391,7 +398,7 @@ ENTRY(pgm_check_handler) lg %r12,__LC_THREAD_INFO larl%r13,system_call lmg %r8,%r9,__LC_PGM_OLD_PSW - HANDLE_SIE_INTERCEPT %r14 + HANDLE_SIE_INTERCEPT %r14,1 tmhh%r8,0x0001 # test problem state bit jnz 1f # - fault in user space tmhh%r8,0x4000 # PER bit set in old PSW ? @@ -467,7 +474,7 @@ ENTRY(io_int_handler) lg %r12,__LC_THREAD_INFO larl%r13,system_call lmg %r8,%r9,__LC_IO_OLD_PSW - HANDLE_SIE_INTERCEPT %r14 + HANDLE_SIE_INTERCEPT %r14,0 SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_STACK,STACK_SHIFT tmhh%r8,0x0001 # interrupting from user? jz io_skip @@ -613,7 +620,7 @@ ENTRY(ext_int_handler) lg %r12,__LC_THREAD_INFO larl%r13,system_call lmg %r8,%r9,__LC_EXT_OLD_PSW - HANDLE_SIE_INTERCEPT %r14 + HANDLE_SIE_INTERCEPT %r14,0 SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_STACK,STACK_SHIFT tmhh%r8,0x0001 # interrupting from user ? jz ext_skip @@ -661,7 +668,7 @@ ENTRY(mcck_int_handler) lg %r12,__LC_THREAD_INFO larl%r13,system_call lmg %r8,%r9,__LC_MCK_OLD_PSW - HANDLE_SIE_INTERCEPT %r14 + HANDLE_SIE_INTERCEPT %r14,0 tm __LC_MCCK_CODE,0x80 # system damage? jo mcck_panic # yes - rest of mcck code invalid lghi%r14,__LC_CPU_TIMER_SAVE_AREA @@ -960,6 +967,13 @@ ENTRY(sie64a) stg %r3,__SF_EMPTY+8(%r15) # save guest register save area xc __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # host id == 0 lmg %r0,%r13,0(%r3) # load guest gprs 0-13 +# some program checks are suppressing. C code (e.g. do_protection_exception) +# will rewind the PSW by the ILC,
[PATCH v3 00/17] *** SUBJECT HERE ***
*** BLURB HERE *** Don Slutz (17): target-i386: Allow tsc-frequency to be larger then 2.147G target-i386: Add missing kvm bits. target-i386: Add Hypervisor level. target-i386: Add cpu object access routines for Hypervisor level. target-i386: Add x86_set_hyperv. target-i386: Use Hypervisor level in -machine pc,accel=kvm. target-i386: Use Hypervisor level in -machine pc,accel=tcg. target-i386: Add Hypervisor vendor. target-i386: Add cpu object access routines for Hypervisor vendor. target-i386: Use Hypervisor vendor in -machine pc,accel=kvm. target-i386: Use Hypervisor vendor in -machine pc,accel=tcg. target-i386: Add some known names to Hypervisor vendor. target-i386: Add optional Hypervisor leaf extra. target-i386: Add cpu object access routines for Hypervisor leaf extra. target-i386: Add setting of Hypervisor leaf extra for known vmare4. target-i386: Use Hypervisor leaf extra in -machine pc,accel=kvm. target-i386: Use Hypervisor leaf extra in -machine pc,accel=tcg. target-i386/cpu.c | 261 - target-i386/cpu.h | 21 + target-i386/kvm.c | 33 ++-- 3 files changed, 304 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/17] *** SUBJECT HERE ***
forgot to delete the backup versions. :( -Don On 09/17/12 09:39, Don Slutz wrote: *** BLURB HERE *** Don Slutz (17): target-i386: Allow tsc-frequency to be larger then 2.147G target-i386: Add missing kvm bits. target-i386: Add Hypervisor level. target-i386: Add cpu object access routines for Hypervisor level. target-i386: Add x86_set_hyperv. target-i386: Use Hypervisor level in -machine pc,accel=kvm. target-i386: Use Hypervisor level in -machine pc,accel=tcg. target-i386: Add Hypervisor vendor. target-i386: Add cpu object access routines for Hypervisor vendor. target-i386: Use Hypervisor vendor in -machine pc,accel=kvm. target-i386: Use Hypervisor vendor in -machine pc,accel=tcg. target-i386: Add some known names to Hypervisor vendor. target-i386: Add optional Hypervisor leaf extra. target-i386: Add cpu object access routines for Hypervisor leaf extra. target-i386: Add setting of Hypervisor leaf extra for known vmare4. target-i386: Use Hypervisor leaf extra in -machine pc,accel=kvm. target-i386: Use Hypervisor leaf extra in -machine pc,accel=tcg. target-i386/cpu.c | 261 - target-i386/cpu.h | 21 + target-i386/kvm.c | 33 ++-- 3 files changed, 304 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/17] *** SUBJECT HERE ***
On 09/17/12 09:49, Don Slutz wrote: forgot to delete the backup versions. :( -Don On 09/17/12 09:39, Don Slutz wrote: Here is the planned cover letter: From 7c0a80d8e870da981786b7235d3a968024c89abb Mon Sep 17 00:00:00 2001 In-Reply-To: 1346354435-21685-1-git-send-email-...@cloudswitch.com References: 1346354435-21685-1-git-send-email-...@cloudswitch.com From: Don Slutz d...@cloudswitch.com Date: Mon, 17 Sep 2012 09:23:29 -0400 Subject: [PATCH v3 00/17] Allow changing of Hypervisor CPUIDs. Also known as Paravirtualization CPUIDs. This is primarily done so that the guest will think it is running under vmware when hypervisor-vendor=vmware is specified as a property of a cpu. This depends on: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01400.html As far as I know it is #4. It depends on (1) and (2) and (3). This change is based on: Microsoft Hypervisor CPUID Leaves: http://msdn.microsoft.com/en-us/library/windows/hardware/ff542428%28v=vs.85%29.aspx Linux kernel change starts with: http://fixunix.com/kernel/538707-use-cpuid-communicate-hypervisor.html Also: http://lkml.indiana.edu/hypermail/linux/kernel/1205.0/00100.html VMware documention on CPUIDs (Mechanisms to determine if software is running in a VMware virtual machine): http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1009458 Changes from v1 to v2: 1) Added 1/4 from http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg05153.html Because Fred is changing jobs and so will not be pushing to get this in. It needed to be rebased, And I needed it to complete the testing of this change. 2) Added 2/4 because of the re-work I needed a way to clear all KVM bits, 3) The rework of v1. Make it fit into the object model re-work of cpu.c for x86. 4) Added 3/4 -- The split out of the code that is not needed for accel=kvm. Changes from v2 to v3: Marcelo Tosatti: Its one big patch, better split in logically correlated patches (with better changelog). This would help reviewers. So split 3 and 4 into 3 to 17. More info in change log. No code change. Don Slutz (17): target-i386: Allow tsc-frequency to be larger then 2.147G target-i386: Add missing kvm bits. target-i386: Add Hypervisor level. target-i386: Add cpu object access routines for Hypervisor level. target-i386: Add x86_set_hyperv. target-i386: Use Hypervisor level in -machine pc,accel=kvm. target-i386: Use Hypervisor level in -machine pc,accel=tcg. target-i386: Add Hypervisor vendor. target-i386: Add cpu object access routines for Hypervisor vendor. target-i386: Use Hypervisor vendor in -machine pc,accel=kvm. target-i386: Use Hypervisor vendor in -machine pc,accel=tcg. target-i386: Add some known names to Hypervisor vendor. target-i386: Add optional Hypervisor leaf extra. target-i386: Add cpu object access routines for Hypervisor leaf extra. target-i386: Add setting of Hypervisor leaf extra for known vmare4. target-i386: Use Hypervisor leaf extra in -machine pc,accel=kvm. target-i386: Use Hypervisor leaf extra in -machine pc,accel=tcg. target-i386/cpu.c | 261 - target-i386/cpu.h | 21 + target-i386/kvm.c | 33 ++-- 3 files changed, 304 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Subject: Re: Nested kvm_intel broken on pre 3.3 hosts No, you're backporting the entire feature. All we need is to expose RDPMC intercept to the guest. Oh well, I thought that was the thing you asked for... It should be sufficient to backport the bits in nested_vmx_setup_ctls_msrs() and nested_vmx_exit_handled(). Ok, how about that? It is probably wrong again, but at least it allows to load the kvm-intel module from within a nested guest and not having the feature pretend to fail seems the closest thing to do... --- From 0aeb99348363b7aeb2b0bd92428cb212159fa468 Mon Sep 17 00:00:00 2001 From: Stefan Bader stefan.ba...@canonical.com Date: Thu, 10 Nov 2011 14:57:25 +0200 Subject: [PATCH] KVM: VMX: Fake intercept RDPMC Based on commit fee84b079d5ddee2247b5c1f53162c330c622902 upstream. Intercept RDPMC and forward it to the PMU emulation code. But drop the requirement for the feature being present and instead of forwarding, cause a GP as if the call had failed. BugLink: http://bugs.launchpad.net/bugs/1031090 Signed-off-by: Stefan Bader stefan.ba...@canonical.com --- arch/x86/kvm/vmx.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7315488..fc937f2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1956,6 +1956,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) #endif CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING | + CPU_BASED_RDPMC_EXITING | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; /* * We can allow some features even when not supported by the @@ -4613,6 +4614,14 @@ static int handle_invlpg(struct kvm_vcpu *vcpu) return 1; } +static int handle_rdpmc(struct kvm_vcpu *vcpu) +{ + /* Instead of implementing the feature, cause a GP */ + kvm_complete_insn_gp(vcpu, 1); + + return 1; +} + static int handle_wbinvd(struct kvm_vcpu *vcpu) { skip_emulated_instruction(vcpu); @@ -5563,6 +5572,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_HLT] = handle_halt, [EXIT_REASON_INVD]= handle_invd, [EXIT_REASON_INVLPG] = handle_invlpg, + [EXIT_REASON_RDPMC] = handle_rdpmc, [EXIT_REASON_VMCALL] = handle_vmcall, [EXIT_REASON_VMCLEAR] = handle_vmclear, [EXIT_REASON_VMLAUNCH]= handle_vmlaunch, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
-- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hi, I think I've tracked down the bug that causes KVM_GET_SUPPORTED_CPUID failed: Argument list too long errors when using the kvm tool. Basically, this (possibly squished) code seems to be to blame: case 0xd: { int i; entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent maxnent i 64; ++i) { if (entry[i].eax == 0) continue; do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent; } break; } You can see there's a check whether entry[i].eax is 0, but it isn't until the next line that entry[i] is actually filled in. That means that whether or not an entry is filled in for the 0xd function is essentially random, and that can lead to the loss of valid entries. It also means that nent may be incremented too often, and since all 64 entries are iterated over, that can fill up the available storage and cause that error. I tested my theory by commenting out the if (100% failure rate) and moving it after do_cpuid_1_ent (100% success rate). Since this is a non-deterministic failure that isn't really conclusive, but I'm fairly confident my fix is correct. I don't know exactly what your procedure is for submitting patches, but one is attached. Gabe diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 77c9d86..35d7ae0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2414,9 +2414,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent maxnent i 64; ++i) { + do_cpuid_1_ent(entry[i], function, i); if (entry[i].eax == 0) continue; - do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent;
Re: [libvirt] (no subject)
On Fri, 2011-12-09 at 20:30 +0800, Osier Yang wrote: By the way, nobody is interested in kvmtool privodes a way to for external apps to get the capabilities? Do we still want to suffer from parsing the capabilities ourselves in future just like what we do for qemu? :-) I think the feature makes sense especially if it simplifies libvirt. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] (no subject)
On 2011年12月06日 22:38, Daniel P. Berrange wrote: On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: Hi, all This is a basic implementation of libvirt Native Linux KVM Tool driver. Note that this is just made with my own interest and spare time, it's not an endorsement/effort by Red Hat, and it isn't supported by Red Hat officially. Basically, the driver is designed as *stateful*, as KVM tool doesn't maintain any info about the guest except a socket which for its own IPC. And it's implemented by using KVM tool binary, which is name kvm currently, along with cgroup controllers cpuacct, and memory support. And as one of KVM tool's pricinple is to allow both the non-root and root user to play with. The driver is designed to support root and non-root too, just like QEMU does. Example of the connection URI: virsh -c kvmtool:///system virsh -c kvmtool:///session virsh -c kvmtool+unix:///system virsh -c kvmtool+unix:///session The implementation can support more or less than 15 virsh commands currently, including basic domain cycle operations (define/undefine, start/destroy, suspend/resume, console, setmem, schedinfo, dumpxml, ,autostart, dominfo, etc.) About the domain configuration: * kernel: must be specified as KVM tool only support boots from the kernel currently (no integration with BIOS app yet). * disk: only virtio bus is supported, and device type must be 'disk'. * serial/console: only one console is supported, of type serial or virtio (can extend to support multiple console as long as kvm tool supports, libvirt already supported mutiple console, see upstream commit 0873b688c). * p9fs: only support specifying the source dir, and mount tag, only type of 'mount' is supported. * memballoon: only virtio is supported, and there is no way to config the addr. * Multiple disk and p9fs is supported. * Graphics and network are not supported, will explain below. Please see [PATCH 7/8] for an example of the domain config. (which contains all the XMLs supported by current implementation). The problems of Native Linux KVM Tool from libvirt p.o.v: * Some destros package qemu-kvm as kvm, also kvm is a long established name for KVM itself, so naming the project as kvm might be not a good idea. I assume it will be named as kvmtool in this implementation, never mind this if you don't like that, it can be updated easily. :-) Yeah, naming the binary 'kvm' is just madness. I'd strongly recommend using 'kvmtool' as the binary name to avoid confusion with existing 'kvm' binaries based on QEMU. * It still doesn't have an official package yet, even no make install. means we have no way to check the dependancy and do the checking when 'configure'. I assume it will be installed as /usr/bin/kvmtool in this implementation. This is the main reason which can prevents upstream libvirt accepting the patches I guess. Ok, not really a problem - we do similar for the regular QEMU driver. * Lacks of options for user's configuration, such as -vnc, there is no option for user to configure the properties for the vnc, such as the port. It hides things, doesn't provide ways to query the properties too, this causes problems for libvirt to add the vnc support, as vnc clients such as virt-manager, virt-viewer, have no way to connect the guest. Even vncviewer can't. Being able to specify a VNC port of libvirt's choosing is pretty much mandatory to be able to support that.In addition being able to specify the bind address is important to be able to control security. eg to only bind to 127.0.0.1, or only to certain NICs in a multi-NIC host. * KVM tool manages the network completely itself (with DHCP support?), no way to configure, except specify the modes (user|tap|none). I have not test it yet, but it should need explicit script to setup the network rules(e.g. NAT) for the guest access outside world. Anyway, there is no way for libvirt to control the guest network. If KVM tool support TAP devices, can't be do whatever we like with that just by passing in a configured TAP device from libvir ? * There is a gap about the domain status between KVM tool and libvirt, it's caused by KVM tool unlink()s the guest socket when user exits from console (both text and graphic), but libvirt still think the guest is running. Being able to reliably detect shutdown/exit of the KVM too is a very important tasks, and we can't rely on waitpid/SIG_CHLD because we want to daemonize all instances wrt libvirtd. In the QEMU driver we keep open a socket to the monitor, and when we see an I/O error / POLLHUP on the socket we know that QEMU has quit. What is this guest socket used for ? Could libvirt keep open a connection to it ? One other option would be to use inotify to watch for deletion of the guest socket in the filesystem. This is sortof
Re: [libvirt] (no subject)
On 2011年12月07日 14:21, Sasha Levin wrote: On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote: On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: * Lacks of options for user's configuration, such as -vnc, there is no option for user to configure the properties for the vnc, such as the port. It hides things, doesn't provide ways to query the properties too, this causes problems for libvirt to add the vnc support, as vnc clients such as virt-manager, virt-viewer, have no way to connect the guest. Even vncviewer can't. Being able to specify a VNC port of libvirt's choosing is pretty much mandatory to be able to support that.In addition being able to specify the bind address is important to be able to control security. eg to only bind to 127.0.0.1, or only to certain NICs in a multi-NIC host. I'll add that feature in the next couple of days. * KVM tool manages the network completely itself (with DHCP support?), no way to configure, except specify the modes (user|tap|none). I have not test it yet, but it should need explicit script to setup the network rules(e.g. NAT) for the guest access outside world. Anyway, there is no way for libvirt to control the guest network. If KVM tool support TAP devices, can't be do whatever we like with that just by passing in a configured TAP device from libvir ? KVM tool currently creates and configures the TAP devices it uses, it shouldn't be an issue to have it use a TAP fd passed to it either. How does libvirt do it? Create a /dev/tapX on it's own and pass the fd to the hypervisor? * There is a gap about the domain status between KVM tool and libvirt, it's caused by KVM tool unlink()s the guest socket when user exits from console (both text and graphic), but libvirt still think the guest is running. Being able to reliably detect shutdown/exit of the KVM too is a very important tasks, and we can't rely on waitpid/SIG_CHLD because we want to daemonize all instances wrt libvirtd. In the QEMU driver we keep open a socket to the monitor, and when we see an I/O error / POLLHUP on the socket we know that QEMU has quit. What is this guest socket used for ? Could libvirt keep open a connection to it ? It's being used for communication with the IPC sub-commands (like 'kvm list', 'kvm debug', etc). It's basically the server in a server-client setup used to signal the hypervisor to do things. Theres also no problem with keeping an open connection to it. I'll update the codes to use it. One other option would be to use inotify to watch for deletion of the guest socket in the filesystem. This is sortof what we do with the UML driver. * KVM tool uses $HOME/.kvm_tool as the state dir, and no way to configure, I made a small patch to allow KVM tool accept a ENV variable, which is KVM_STATE_DIR, it's used across the driver. I made a simple patch against kvm tool to let the whole patches work. See [PATCH] kvm tools.. As generally we want the state dir of a driver can be /var/run/libvirt/kvmtool/... for root user or $HOME/.libvirt/kvmtool/run for non-root user. What does it do with the state dir ? Is that just for storing the guest socket ? afaik that patch should be already in as well. It does two things in the state dir: - Store sockets. - KVM tools has a feature which lets a user boot a guest based on virtio-9p which lets him see a system which is an exact copy of the host. This makes testing of programs and sandboxing very easy. The state files required for that are stored in that dir as well. With QEMU we chose $HOME/.libvirt/qemu or /var/run/libvirt because there was no policy set by QEMU itself. If KVM tool has a policy for where it stores its state, we should just use that, and not try to force it into a libvirt specific location. In a privileged libvirtd instace, we should aim to still have kvmtool itself run as an unprivilegd user / group , eg 'kvmtool:kvmtool' And we could set the home dir of that user to /var/lib/kvmtool * kvmtoolGetVersion is just broken now, as what ./kvm version returns is something like 3.0.rc5.873.gb73216, however, libvirt wants things like 2.6.40.6-0. This might be not a problem as long as KVM tool has a official package. The version numbers libvirt reports for hypervisors are pretty meaningless really. In that example you give I'd just report '3.0' as the version from libvirt. Anything that relies on these versions from libvirt is doomed to be broken anyway. The version is just a 'git describe' of the kernel tree in which it was built, so if you build KVM tools from an official kernel tree* you'll also get pretty versions :) * After KVM tools is merged... * console connection is implemented by setup ptys in libvirt, stdout/stderr of kvm tool process is redirected to the master pty, and libvirt connects to the slave pty. This works fine now, but
Re: [libvirt] (no subject)
On 2011年12月07日 17:16, Daniel P. Berrange wrote: On Wed, Dec 07, 2011 at 08:21:06AM +0200, Sasha Levin wrote: On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote: On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: * KVM tool manages the network completely itself (with DHCP support?), no way to configure, except specify the modes (user|tap|none). I have not test it yet, but it should need explicit script to setup the network rules(e.g. NAT) for the guest access outside world. Anyway, there is no way for libvirt to control the guest network. If KVM tool support TAP devices, can't be do whatever we like with that just by passing in a configured TAP device from libvir ? KVM tool currently creates and configures the TAP devices it uses, it shouldn't be an issue to have it use a TAP fd passed to it either. How does libvirt do it? Create a /dev/tapX on it's own and pass the fd to the hypervisor? Yes, libvirt opens a /dev/tap device (or a macvtap device for VEPA mode), adds it to the neccessary bridge, and/or configures VEPA, etc and then passes the FD to the hypervisor, with a ARGV parameter to tell the HV which FD is being passed. * console connection is implemented by setup ptys in libvirt, stdout/stderr of kvm tool process is redirected to the master pty, and libvirt connects to the slave pty. This works fine now, but it might be better if kvm tool could provide more advanced console mechanisms. Just like QEMU does? This sounds good enough for now. KVM tools does a redirection to a PTY, which at that point could be redirected to anywhere the user wants. What features might be interesting to do on top of that? I presume that Osier is just comparing with the features QEMU has available for chardevs config, which include - PTYs - UNIX sockets - TCP sockets - UDP sockets - FIFO pipe - Plain file (output only obviously, but useful for logging) Yes, that's what I meant. :-) libvirt doesn't specifically need any of them, but it can support those options if they exist. Yes, these won't prevent us, I just meant it will be great if they are supported. * Not much ways existed yet for external apps or user to query the guest informations. But this might be changed soon per KVM tool grows up quickly. What sort of guest info are you thinking about ? The most immediate pieces of info I can imagine we need are - Mapping between PIDs and vCPU threads - Current balloon driver value Those are pretty easily added using the IPC interface I've mentioned above. For example, 'kvm balloon' and 'kvm stat' will return a lot of info out of the balloon driver (including the memory stats VQ - which afaik we're probably the only ones who actually do that (but I might be wrong) :) Ok, that sounds sufficient for the balloon info. Regards, Daniel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] (no subject)
On Wed, Dec 07, 2011 at 08:21:06AM +0200, Sasha Levin wrote: On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote: On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: * KVM tool manages the network completely itself (with DHCP support?), no way to configure, except specify the modes (user|tap|none). I have not test it yet, but it should need explicit script to setup the network rules(e.g. NAT) for the guest access outside world. Anyway, there is no way for libvirt to control the guest network. If KVM tool support TAP devices, can't be do whatever we like with that just by passing in a configured TAP device from libvir ? KVM tool currently creates and configures the TAP devices it uses, it shouldn't be an issue to have it use a TAP fd passed to it either. How does libvirt do it? Create a /dev/tapX on it's own and pass the fd to the hypervisor? Yes, libvirt opens a /dev/tap device (or a macvtap device for VEPA mode), adds it to the neccessary bridge, and/or configures VEPA, etc and then passes the FD to the hypervisor, with a ARGV parameter to tell the HV which FD is being passed. * console connection is implemented by setup ptys in libvirt, stdout/stderr of kvm tool process is redirected to the master pty, and libvirt connects to the slave pty. This works fine now, but it might be better if kvm tool could provide more advanced console mechanisms. Just like QEMU does? This sounds good enough for now. KVM tools does a redirection to a PTY, which at that point could be redirected to anywhere the user wants. What features might be interesting to do on top of that? I presume that Osier is just comparing with the features QEMU has available for chardevs config, which include - PTYs - UNIX sockets - TCP sockets - UDP sockets - FIFO pipe - Plain file (output only obviously, but useful for logging) libvirt doesn't specifically need any of them, but it can support those options if they exist. * Not much ways existed yet for external apps or user to query the guest informations. But this might be changed soon per KVM tool grows up quickly. What sort of guest info are you thinking about ? The most immediate pieces of info I can imagine we need are - Mapping between PIDs and vCPU threads - Current balloon driver value Those are pretty easily added using the IPC interface I've mentioned above. For example, 'kvm balloon' and 'kvm stat' will return a lot of info out of the balloon driver (including the memory stats VQ - which afaik we're probably the only ones who actually do that (but I might be wrong) :) Ok, that sounds sufficient for the balloon info. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] (no subject)
On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: Hi, all This is a basic implementation of libvirt Native Linux KVM Tool driver. Note that this is just made with my own interest and spare time, it's not an endorsement/effort by Red Hat, and it isn't supported by Red Hat officially. Basically, the driver is designed as *stateful*, as KVM tool doesn't maintain any info about the guest except a socket which for its own IPC. And it's implemented by using KVM tool binary, which is name kvm currently, along with cgroup controllers cpuacct, and memory support. And as one of KVM tool's pricinple is to allow both the non-root and root user to play with. The driver is designed to support root and non-root too, just like QEMU does. Example of the connection URI: virsh -c kvmtool:///system virsh -c kvmtool:///session virsh -c kvmtool+unix:///system virsh -c kvmtool+unix:///session The implementation can support more or less than 15 virsh commands currently, including basic domain cycle operations (define/undefine, start/destroy, suspend/resume, console, setmem, schedinfo, dumpxml, ,autostart, dominfo, etc.) About the domain configuration: * kernel: must be specified as KVM tool only support boots from the kernel currently (no integration with BIOS app yet). * disk: only virtio bus is supported, and device type must be 'disk'. * serial/console: only one console is supported, of type serial or virtio (can extend to support multiple console as long as kvm tool supports, libvirt already supported mutiple console, see upstream commit 0873b688c). * p9fs: only support specifying the source dir, and mount tag, only type of 'mount' is supported. * memballoon: only virtio is supported, and there is no way to config the addr. * Multiple disk and p9fs is supported. * Graphics and network are not supported, will explain below. Please see [PATCH 7/8] for an example of the domain config. (which contains all the XMLs supported by current implementation). The problems of Native Linux KVM Tool from libvirt p.o.v: * Some destros package qemu-kvm as kvm, also kvm is a long established name for KVM itself, so naming the project as kvm might be not a good idea. I assume it will be named as kvmtool in this implementation, never mind this if you don't like that, it can be updated easily. :-) Yeah, naming the binary 'kvm' is just madness. I'd strongly recommend using 'kvmtool' as the binary name to avoid confusion with existing 'kvm' binaries based on QEMU. * It still doesn't have an official package yet, even no make install. means we have no way to check the dependancy and do the checking when 'configure'. I assume it will be installed as /usr/bin/kvmtool in this implementation. This is the main reason which can prevents upstream libvirt accepting the patches I guess. Ok, not really a problem - we do similar for the regular QEMU driver. * Lacks of options for user's configuration, such as -vnc, there is no option for user to configure the properties for the vnc, such as the port. It hides things, doesn't provide ways to query the properties too, this causes problems for libvirt to add the vnc support, as vnc clients such as virt-manager, virt-viewer, have no way to connect the guest. Even vncviewer can't. Being able to specify a VNC port of libvirt's choosing is pretty much mandatory to be able to support that.In addition being able to specify the bind address is important to be able to control security. eg to only bind to 127.0.0.1, or only to certain NICs in a multi-NIC host. * KVM tool manages the network completely itself (with DHCP support?), no way to configure, except specify the modes (user|tap|none). I have not test it yet, but it should need explicit script to setup the network rules(e.g. NAT) for the guest access outside world. Anyway, there is no way for libvirt to control the guest network. If KVM tool support TAP devices, can't be do whatever we like with that just by passing in a configured TAP device from libvir ? * There is a gap about the domain status between KVM tool and libvirt, it's caused by KVM tool unlink()s the guest socket when user exits from console (both text and graphic), but libvirt still think the guest is running. Being able to reliably detect shutdown/exit of the KVM too is a very important tasks, and we can't rely on waitpid/SIG_CHLD because we want to daemonize all instances wrt libvirtd. In the QEMU driver we keep open a socket to the monitor, and when we see an I/O error / POLLHUP on the socket we know that QEMU has quit. What is this guest socket used for ? Could libvirt keep open a connection to it ? One other option would be to use inotify to watch for deletion of the guest socket in the filesystem. This is sortof what we do with the UML
Re: [libvirt] (no subject)
On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote: On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: * Lacks of options for user's configuration, such as -vnc, there is no option for user to configure the properties for the vnc, such as the port. It hides things, doesn't provide ways to query the properties too, this causes problems for libvirt to add the vnc support, as vnc clients such as virt-manager, virt-viewer, have no way to connect the guest. Even vncviewer can't. Being able to specify a VNC port of libvirt's choosing is pretty much mandatory to be able to support that.In addition being able to specify the bind address is important to be able to control security. eg to only bind to 127.0.0.1, or only to certain NICs in a multi-NIC host. I'll add that feature in the next couple of days. * KVM tool manages the network completely itself (with DHCP support?), no way to configure, except specify the modes (user|tap|none). I have not test it yet, but it should need explicit script to setup the network rules(e.g. NAT) for the guest access outside world. Anyway, there is no way for libvirt to control the guest network. If KVM tool support TAP devices, can't be do whatever we like with that just by passing in a configured TAP device from libvir ? KVM tool currently creates and configures the TAP devices it uses, it shouldn't be an issue to have it use a TAP fd passed to it either. How does libvirt do it? Create a /dev/tapX on it's own and pass the fd to the hypervisor? * There is a gap about the domain status between KVM tool and libvirt, it's caused by KVM tool unlink()s the guest socket when user exits from console (both text and graphic), but libvirt still think the guest is running. Being able to reliably detect shutdown/exit of the KVM too is a very important tasks, and we can't rely on waitpid/SIG_CHLD because we want to daemonize all instances wrt libvirtd. In the QEMU driver we keep open a socket to the monitor, and when we see an I/O error / POLLHUP on the socket we know that QEMU has quit. What is this guest socket used for ? Could libvirt keep open a connection to it ? It's being used for communication with the IPC sub-commands (like 'kvm list', 'kvm debug', etc). It's basically the server in a server-client setup used to signal the hypervisor to do things. Theres also no problem with keeping an open connection to it. One other option would be to use inotify to watch for deletion of the guest socket in the filesystem. This is sortof what we do with the UML driver. * KVM tool uses $HOME/.kvm_tool as the state dir, and no way to configure, I made a small patch to allow KVM tool accept a ENV variable, which is KVM_STATE_DIR, it's used across the driver. I made a simple patch against kvm tool to let the whole patches work. See [PATCH] kvm tools.. As generally we want the state dir of a driver can be /var/run/libvirt/kvmtool/... for root user or $HOME/.libvirt/kvmtool/run for non-root user. What does it do with the state dir ? Is that just for storing the guest socket ? afaik that patch should be already in as well. It does two things in the state dir: - Store sockets. - KVM tools has a feature which lets a user boot a guest based on virtio-9p which lets him see a system which is an exact copy of the host. This makes testing of programs and sandboxing very easy. The state files required for that are stored in that dir as well. With QEMU we chose $HOME/.libvirt/qemu or /var/run/libvirt because there was no policy set by QEMU itself. If KVM tool has a policy for where it stores its state, we should just use that, and not try to force it into a libvirt specific location. In a privileged libvirtd instace, we should aim to still have kvmtool itself run as an unprivilegd user / group , eg 'kvmtool:kvmtool' And we could set the home dir of that user to /var/lib/kvmtool * kvmtoolGetVersion is just broken now, as what ./kvm version returns is something like 3.0.rc5.873.gb73216, however, libvirt wants things like 2.6.40.6-0. This might be not a problem as long as KVM tool has a official package. The version numbers libvirt reports for hypervisors are pretty meaningless really. In that example you give I'd just report '3.0' as the version from libvirt. Anything that relies on these versions from libvirt is doomed to be broken anyway. The version is just a 'git describe' of the kernel tree in which it was built, so if you build KVM tools from an official kernel tree* you'll also get pretty versions :) * After KVM tools is merged... * console connection is implemented by setup ptys in libvirt, stdout/stderr of kvm tool process is redirected to the master pty, and libvirt connects to the slave pty. This works fine now, but it might be better if kvm
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
My working partner in relationship with HSBC London has concluded that our working partner has helped us to send you first payment of US$5,000 to you as instructed by Malaysia government and will keep sending you $5000 twice a week until the payment of (US$820,000 ) is completed within six months and here is the information MONEY TRANSFER REFERENCE:2116-3297 SENDER'S NAME: Mike Marx AMOUNT: US$5000 To track your funds forward money gram Transfer agent Mr Allan Davis Your Name.__ Phone .__ Contact Allan Davis for the funds clearance certificate neccessary for the realise of your funds E-mail:mrallan_dav...@yahoo.co.jp D/L: Tel:+601-635-44376 Please direct all enquiring to: money gram Alex Rogers: Please direct all enquiring to: dmr.al...@yahoo.com.hk Best Regards, Mr Allan Davis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
No subject
Hi all I'm a computer science student from germany interested in virtualization technologies and given I have some free time I'd like to dig into kvm modding, maybe even actual development ;-) I've been trying to understand the kvm mmu, with a special regard to paging structures. I'm having a hard time finding the code that is responsible for creating the tdp (epml4, epdpt, epd and ept on intel i.e.) structures, any hint(s)? Using some debug output I'm getting tdp_page_faul(s) where the paging level is 1, how can this be? does this mean all pages are 2MB pages and there are no page tables with 4 kbyte pages at all? thanks a lot in advance and kind regards, David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC -v3 PATCH 3/3] Subject: kvm: use yield_to instead of sleep in kvm_vcpu_on_spin
Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic slowdowns of certain workloads, we instead use yield_to to hand the rest of our timeslice to another vcpu in the same KVM guest. Signed-off-by: Rik van Riel r...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c011ba3..ad3cb4a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -185,6 +185,7 @@ struct kvm { #endif struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; atomic_t online_vcpus; + int last_boosted_vcpu; struct list_head vm_list; struct mutex lock; struct kvm_io_bus *buses[KVM_NR_BUSES]; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 21f816c..5822246 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1878,18 +1878,44 @@ void kvm_resched(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_resched); -void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu) +void kvm_vcpu_on_spin(struct kvm_vcpu *me) { - ktime_t expires; - DEFINE_WAIT(wait); - - prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE); - - /* Sleep for 100 us, and hope lock-holder got scheduled */ - expires = ktime_add_ns(ktime_get(), 10UL); - schedule_hrtimeout(expires, HRTIMER_MODE_ABS); + struct kvm *kvm = me-kvm; + struct kvm_vcpu *vcpu; + int last_boosted_vcpu = me-kvm-last_boosted_vcpu; + int yielded = 0; + int pass; + int i; - finish_wait(vcpu-wq, wait); + /* +* We boost the priority of a VCPU that is runnable but not +* currently running, because it got preempted by something +* else and called schedule in __vcpu_run. Hopefully that +* VCPU is holding the lock that we need and will release it. +* We approximate round-robin by starting at the last boosted VCPU. +*/ + for (pass = 0; pass 2 !yielded; pass++) { + kvm_for_each_vcpu(i, vcpu, kvm) { + struct task_struct *task = vcpu-task; + if (!pass i last_boosted_vcpu) { + i = last_boosted_vcpu; + continue; + } else if (pass i last_boosted_vcpu) + break; + if (vcpu == me) + continue; + if (!task) + continue; + if (waitqueue_active(vcpu-wq)) + continue; + if (task-flags PF_VCPU) + continue; + kvm-last_boosted_vcpu = i; + yielded = 1; + yield_to(task, 1); + break; + } + } } EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://www.escuelahispanomexicana.edu.mx/stores.php -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://www.cir-rosario.com.ar/peper.php -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://www.streetperformanceteam.ch/important.php -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://maralemprendimientos.com/important.php -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://www.decaza.com/TER healthtworx.ru cid=extasy.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://kortina94.com/mydocs.php -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://satimis.blog-discount.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://www.satimis.multi-drugs.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Your ID won £1,000,000.00, in the BT Promo. Send Names.Tel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] *** SUBJECT HERE ***
On 08/31/10 18:13, Anthony Liguori wrote: Just as an aside, does anyone have a good way to maintain the 00 patches in series with repeated submissions? /me uses cut+paste from email folder or list archive. I suspect you are looking for something better though ... cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] *** SUBJECT HERE ***
*** BLURB HERE *** Avi Kivity (6): kvm_stat: refactor to separate stats provider from difference engine kvm_stat: implement tracepoint stats provider kvm_stat: provide detailed kvm_exit:exit_reason display kvm_stat: sort tui output according to highest occurence kvm_stat: increase label width kvm_stat: be slower kvm/kvm_stat | 297 +++--- 1 files changed, 285 insertions(+), 12 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] *** SUBJECT HERE ***
On 08/31/2010 04:25 PM, Avi Kivity wrote: *** BLURB HERE *** That was supposed to be: [PATCH 0/6] kvm_stat tracepoint support This patchset allows kvm_stat to display the information exposed by kvm tracepoints. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] *** SUBJECT HERE ***
On 08/31/10 15:30, Avi Kivity wrote: On 08/31/2010 04:25 PM, Avi Kivity wrote: *** BLURB HERE *** That was supposed to be: [PATCH 0/6] kvm_stat tracepoint support This patchset allows kvm_stat to display the information exposed by kvm tracepoints. That was there too. You better should pass '00*.patch' instead of '00*' to git send-email so it doesn't mail out the editor backup file ;) cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] *** SUBJECT HERE ***
On 08/31/2010 05:43 PM, Gerd Hoffmann wrote: You better should pass '00*.patch' instead of '00*' to git send-email so it doesn't mail out the editor backup file ;) That's what I usually do - guess I slipped this time. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] *** SUBJECT HERE ***
On 08/31/2010 11:10 AM, Avi Kivity wrote: On 08/31/2010 05:43 PM, Gerd Hoffmann wrote: You better should pass '00*.patch' instead of '00*' to git send-email so it doesn't mail out the editor backup file ;) That's what I usually do - guess I slipped this time. Just as an aside, does anyone have a good way to maintain the 00 patches in series with repeated submissions? I tried to store it in git as an empty commit but most of the git tooling doesn't work well with that. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] *** SUBJECT HERE ***
On 08/31/2010 07:13 PM, Anthony Liguori wrote: Just as an aside, does anyone have a good way to maintain the 00 patches in series with repeated submissions? I tried to store it in git as an empty commit but most of the git tooling doesn't work well with that. I keep each posting in a -v1/ etc directory and copy-paste from that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] *** SUBJECT HERE ***
From: Nicholas Bellinger n...@linux-iscsi.org Greetings Gerd, Hannes and co, This series adds initial support for a hw/scsi-bsg.c backstore for scsi-bus compatible HBA emulation in QEMU-KVM on Linux hosts supporting the BSG driver. This code is available from the scsi-bsg branch in the megasas/scsi friendly QEMU-KVM tree at: http://git.kernel.org/?p=virt/kvm/nab/qemu-kvm.git;a=shortlog;h=refs/heads/scsi-bsg Note that this initial code is being posted for review and to see how useful a BSG backstore would be for QEMU-KVM and Linux hosts. Note that in order for BSG I/O to function using vectored AIO a kernel patch to linux/block/bsg.c:bsg_map_hdr() is currently required running on a bit paired user/kernel enviroment. The kernel patch in question is here: http://marc.info/?l=linux-scsim=127649585524598w=2 The first three patches involve updating block code to support the BSG backstore for scsi-bsg. The forth patch adds the minor changes to hw/scsi-bus.c and hw/scsi-disk.c in order to function with scsi-bsg. And the fifth patch adds the main hw/scsi-bsg.c logic necessary to run the new struct SCSIDeviceInfo and for BSG AIO using struct iovec and paio_submit_len() to function. So far this patch series has been tested with a Linux based x86_64 KVM host and guest using the hw/megasas.c 8708EM2 HBA Emulation with TCM_Loop virtual SAS Port LUNs. Comments are welcome, Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org Nicholas Bellinger (5): [block]: Add top level BSG support [block]: Add BSG qemu_open() in block/raw.c:raw_open() [block]: Add paio_submit_len() non sector sized AIO [scsi]: Add BSG support for scsi-bus and scsi-disk [scsi-bsg]: Add initial support for BSG based SCSIDeviceInfo Makefile.objs |2 +- block.c | 23 ++- block.h |1 + block/raw-posix-aio.h |3 + block/raw-posix.c | 17 ++- block/raw.c | 20 ++ block_int.h |5 + hw/scsi-bsg.c | 588 + hw/scsi-bus.c |3 +- hw/scsi-disk.c|4 + posix-aio-compat.c| 28 +++ 11 files changed, 687 insertions(+), 7 deletions(-) create mode 100644 hw/scsi-bsg.c -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe -- Wolfgang Lendl IT Systems Communications Medizinische Universität Wien Spitalgasse 23 / BT 88 /Ebene 00 A-1090 Wien Tel: +43 1 40160-21231 Fax: +43 1 40160-921200 http://www.meduniwien.ac.at/itsc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] Subject: virtio: Add unused buffers detach from vring
I submit one split patch for review to make sure that's the right format. I copied Rusty's comment for the commit message, and change destroy to detach since we destroy the buffers in caller. This patch is built against Dave's net-next tree. There's currently no way for a virtio driver to ask for unused buffers, so it has to keep a list itself to reclaim them at shutdown. This is redundant, since virtio_ring stores that information. So add a new hook to do this: virtio_net will be the first user. Signed-off-by: Shirley Ma x...@us.ibm.com --- drivers/virtio/virtio_ring.c | 24 include/linux/virtio.h |1 + 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index fbd2ecd..f847bc3 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -334,6 +334,29 @@ static bool vring_enable_cb(struct virtqueue *_vq) return true; } +/* This function is used to return vring unused buffers to caller for free */ +static void *vring_detach_bufs(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + unsigned int i; + + START_USE(vq); + + for (i = 0; i vq-vring.num; ++i) { + if (vq-data[i]) { + /* detach_buf clears data, so grab it now. */ + detach_buf(vq, i); + END_USE(vq); + return vq-data[i]; + } + } + /* That should have freed everything. */ + BUG_ON(vq-num_free != vq-vring.num); + + END_USE(vq); + return NULL; +} + irqreturn_t vring_interrupt(int irq, void *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); @@ -360,6 +383,7 @@ static struct virtqueue_ops vring_vq_ops = { .kick = vring_kick, .disable_cb = vring_disable_cb, .enable_cb = vring_enable_cb, + .detach_bufs = vring_detach_bufs, }; struct virtqueue *vring_new_virtqueue(unsigned int num, diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 057a2e0..d7da456 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -71,6 +71,7 @@ struct virtqueue_ops { void (*disable_cb)(struct virtqueue *vq); bool (*enable_cb)(struct virtqueue *vq); + void *(*detach_bufs)(struct virtqueue *vq); }; /** Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Subject: virtio: Add unused buffers detach from vring
On Tue, Dec 15, 2009 at 10:42:53AM -0800, Shirley Ma wrote: I submit one split patch for review to make sure that's the right format. I copied Rusty's comment for the commit message, and change destroy to detach since we destroy the buffers in caller. This patch is built against Dave's net-next tree. Almost :) text not intended for git commit logs like the above should go after ---, this way git am knows to skip it. There's currently no way for a virtio driver to ask for unused buffers, so it has to keep a list itself to reclaim them at shutdown. This is redundant, since virtio_ring stores that information. So add a new hook to do this: virtio_net will be the first user. Signed-off-by: Shirley Ma x...@us.ibm.com --- drivers/virtio/virtio_ring.c | 24 include/linux/virtio.h |1 + 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index fbd2ecd..f847bc3 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -334,6 +334,29 @@ static bool vring_enable_cb(struct virtqueue *_vq) return true; } +/* This function is used to return vring unused buffers to caller for free */ +static void *vring_detach_bufs(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + unsigned int i; + + START_USE(vq); + + for (i = 0; i vq-vring.num; ++i) { This is a single statement loop, so you do not need {} around it. Or even better: for (i = 0; i vq-vring.num; ++i) { if (!vq-data[i]) continue; ... } which has less nesting. + if (vq-data[i]) { + /* detach_buf clears data, so grab it now. */ You wrote that comment, but did you read it :) + detach_buf(vq, i); + END_USE(vq); + return vq-data[i]; In fact, this will return NULL always, won't it? + } + } + /* That should have freed everything. */ + BUG_ON(vq-num_free != vq-vring.num); + + END_USE(vq); + return NULL; +} + irqreturn_t vring_interrupt(int irq, void *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); @@ -360,6 +383,7 @@ static struct virtqueue_ops vring_vq_ops = { .kick = vring_kick, .disable_cb = vring_disable_cb, .enable_cb = vring_enable_cb, + .detach_bufs = vring_detach_bufs, }; struct virtqueue *vring_new_virtqueue(unsigned int num, diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 057a2e0..d7da456 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -71,6 +71,7 @@ struct virtqueue_ops { void (*disable_cb)(struct virtqueue *vq); bool (*enable_cb)(struct virtqueue *vq); + void *(*detach_bufs)(struct virtqueue *vq); }; Please add documentation in virtio.h Thanks! /** Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Subject: virtio: Add unused buffers detach from vring
Thanks Michael, will fix them all. Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Subject: virtio: Add unused buffers detach from vring
Hello Michael, On Tue, 2009-12-15 at 20:47 +0200, Michael S. Tsirkin wrote: + detach_buf(vq, i); + END_USE(vq); + return vq-data[i]; In fact, this will return NULL always, won't it? Nope, I changed the destroy to detach and return the buffers without destroying them within the call. I thought it might be useful in some other case. Maybe I should put destroy call back? Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Subject: virtio: Add unused buffers detach from vring
On Tue, Dec 15, 2009 at 11:14:07AM -0800, Shirley Ma wrote: Hello Michael, On Tue, 2009-12-15 at 20:47 +0200, Michael S. Tsirkin wrote: + detach_buf(vq, i); + END_USE(vq); + return vq-data[i]; In fact, this will return NULL always, won't it? Nope, I changed the destroy to detach and return the buffers without destroying them within the call. I thought it might be useful in some other case. Maybe I should put destroy call back? Thanks Shirley No I think it's good as is, we do not need a callback. I was simply saying that detach_buf sets data to NULL, so return vq-data[i] after detach does not make sense. You need to save data as comment below says.c -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(no subject)
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm Veja quais são os assuntos do momento no Yahoo! +Buscados http://br.maisbuscados.yahoo.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
HI All, I have taken kvm-22 with linux-2.6.24 kernel but when ever i install guest through qemu bins, system hangs. In dmesg it prints as Unable to handle NULL derefrencing pointer. Please suggest me why it is behaving like this __ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hi I find that the HOW-TO for VT-d pass through on linux-kvm.org asks for Interrupt Remapping to be enabled in the kernel. However Interrupt Remapping is dependent on x86_64 (DMAR was also dependent but I guess the 2.6.30 kernel allows it for 32 bit systems) Does that mean I will not be able to try device pass through on 32 bit systems? Thanks in advance Subash -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
VTd pass through and 32 bit (sorry previous mail had no subject)
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Subash Kalbarga Sent: Thursday, July 09, 2009 12:51 PM To: kvm@vger.kernel.org Subject: Hi I find that the HOW-TO for VT-d pass through on linux-kvm.org asks for Interrupt Remapping to be enabled in the kernel. However Interrupt Remapping is dependent on x86_64 (DMAR was also dependent but I guess the 2.6.30 kernel allows it for 32 bit systems) Does that mean I will not be able to try device pass through on 32 bit systems? Thanks in advance Subash -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: VTd pass through and 32 bit (sorry previous mail had no subject)
Subash Kalbarga wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Subash Kalbarga Sent: Thursday, July 09, 2009 12:51 PM To: kvm@vger.kernel.org Subject: Hi I find that the HOW-TO for VT-d pass through on linux-kvm.org asks for Interrupt Remapping to be enabled in the kernel. However Interrupt Remapping is dependent on x86_64 (DMAR was also dependent but I guess the 2.6.30 kernel allows it for 32 bit systems) Device pass through doesn't depend on interrupt remapping. Interrupt remapping is optional. I will change the VT-d HOW-TO to clarify it. Regards, Weidong Does that mean I will not be able to try device pass through on 32 bit systems? Thanks in advance Subash -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] *** SUBJECT HERE ***
RFC: move the dirty page tracking to use dirty bit Well, i was bored this morning and had this idea for a while, didnt test it to much..., first i want to hear what ppl think? Thanks. Izik Eidus (2): kvm: fix dirty bit tracking for slots with large pages kvm: change the dirty page tracking to work with dirty bit instead of page fault arch/ia64/kvm/kvm-ia64.c|4 arch/powerpc/kvm/powerpc.c |4 arch/s390/kvm/kvm-s390.c|4 arch/x86/include/asm/kvm_host.h |3 +++ arch/x86/kvm/mmu.c | 32 +--- arch/x86/kvm/svm.c |7 +++ arch/x86/kvm/vmx.c |7 +++ arch/x86/kvm/x86.c | 21 ++--- include/linux/kvm_host.h|1 + virt/kvm/kvm_main.c | 17 - 10 files changed, 89 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Subject:[PATCH 1/2] Clean up MADT Table Creation
This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon e...@us.ibm.com diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 7f62e4f..ac8f9c5 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1873,16 +1873,6 @@ void acpi_bios_init(void) HPET, sizeof(*hpet), 1); #endif -acpi_additional_tables(); /* resets cfg to required entry */ -for(i = 0; i external_tables; i++) { -uint16_t len; -if(acpi_load_table(i, addr, len) 0) -BX_PANIC(Failed to load ACPI table from QEMU\n); -rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); -addr += len; -if(addr = ram_size) -BX_PANIC(ACPI table overflow\n); -} #endif /* RSDT */ @@ -1895,6 +1885,19 @@ void acpi_bios_init(void) // rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes 0) rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); +acpi_additional_tables(); /* resets cfg to required entry */ +/* external_tables load must occur last to + * properly check for MAX_RSDT_ENTRIES overflow. + */ +for(i = 0; i external_tables; i++) { +uint16_t len; +if(acpi_load_table(i, addr, len) 0) +BX_PANIC(Failed to load ACPI table from QEMU\n); +rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); +addr += len; +if((addr = ram_size) || (nb_rsdt_entries MAX_RSDT_ENTRIES)) +BX_PANIC(ACPI table overflow\n); +} #endif rsdt_size -= MAX_RSDT_ENTRIES * 4; rsdt_size += nb_rsdt_entries * 4; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Subject:[PATCH 1/2] Clean up MADT Table Creation
Beth Kon wrote: This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon e...@us.ibm.com This should have been patch 2/2. I think git-send-email didn't like that I didn't have a space after Subject: . Let me try to resend with the space added. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html