[COMMIT master] KVM: SVM: Do not report xsave in supported cpuid
From: Joerg Roedel To support xsave properly for the guest the SVM module need software support for it. As long as this is not present do not report the xsave as supported feature in cpuid. As a side-effect this patch moves the bit() helper function into the x86.h file so that it can be used in svm.c too. KVM-Stable-Tag. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 740884b..9b3d166 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3622,6 +3622,10 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu) static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) { switch (func) { + case 0x0001: + /* Mask out xsave bit as long as it is not supported by SVM */ + entry->ecx &= ~(bit(X86_FEATURE_XSAVE)); + break; case 0x8001: if (nested) entry->ecx |= (1 << 2); /* Set SVM bit */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5c62ef2..c195260 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4268,11 +4268,6 @@ static int vmx_get_lpage_level(void) return PT_PDPE_LEVEL; } -static inline u32 bit(int bitno) -{ - return 1 << (bitno & 31); -} - static void vmx_cpuid_update(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bb04957..8d76150 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -163,11 +163,6 @@ static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu) vcpu->arch.apf.gfns[i] = ~0; } -static inline u32 bit(int bitno) -{ - return 1 << (bitno & 31); -} - static void kvm_on_user_return(struct user_return_notifier *urn) { unsigned slot; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 2cea414..c600da8 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -70,6 +70,11 @@ static inline int is_paging(struct kvm_vcpu *vcpu) return kvm_read_cr0_bits(vcpu, X86_CR0_PG); } +static inline u32 bit(int bitno) +{ + return 1 << (bitno & 31); +} + void kvm_before_handle_nmi(struct kvm_vcpu *vcpu); void kvm_after_handle_nmi(struct kvm_vcpu *vcpu); int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq); -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Fix incorrect direct page write protection due to ro host page
From: Avi Kivity If KVM sees a read-only host page, it will map it as read-only to prevent breaking a COW. However, if the page was part of a large guest page, KVM incorrectly extends the write protection to the entire large page frame instead of limiting it to the normal host page. This results in the instantiation of a new shadow page with read-only access. If this happens for a MOVS instruction that moves memory between two normal pages, within a single large page frame, and mapped within the guest as a large page, and if, in addition, the source operand is not writeable in the host (perhaps due to KSM), then KVM will instantiate a read-only direct shadow page, instantiate an spte for the source operand, then instantiate a new read/write direct shadow page and instantiate an spte for the destination operand. Since these two sptes are in different shadow pages, MOVS will never see them at the same time and the guest will not make progress. Fix by mapping the direct shadow page read/write, and only marking the host page read-only. Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 146b681..5ca9426 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -511,6 +511,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, link_shadow_page(it.sptep, sp); } + if (!map_writable) + access &= ~ACC_WRITE_MASK; + mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access, user_fault, write_fault, dirty, ptwrite, it.level, gw->gfn, pfn, prefault, map_writable); @@ -593,9 +596,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, if (is_error_pfn(pfn)) return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn); - if (!map_writable) - walker.pte_access &= ~ACC_WRITE_MASK; - spin_lock(&vcpu->kvm->mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Fix build error on s390 due to missing tlbs_dirty
From: Avi Kivity Make it available for all archs. Signed-off-by: Avi Kivity diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bd0da8f..b5021db 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -256,8 +256,8 @@ struct kvm { struct mmu_notifier mmu_notifier; unsigned long mmu_notifier_seq; long mmu_notifier_count; - long tlbs_dirty; #endif + long tlbs_dirty; }; /* The guest did something we don't support. */ -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: enlarge number of possible CPUID leaves
From: Andre Przywara Currently the number of CPUID leaves KVM handles is limited to 40. My desktop machine (AthlonII) already has 35 and future CPUs will expand this well beyond the limit. Extend the limit to 80 to make room for future processors. KVM-Stable-Tag. Signed-off-by: Andre Przywara Signed-off-by: Avi Kivity diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b55d789..4461429 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -79,7 +79,7 @@ #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT) #define KVM_MIN_FREE_MMU_PAGES 5 #define KVM_REFILL_PAGES 25 -#define KVM_MAX_CPUID_ENTRIES 40 +#define KVM_MAX_CPUID_ENTRIES 80 #define KVM_NR_FIXED_MTRR_REGION 88 #define KVM_NR_VAR_MTRR 8 -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next
From: Avi Kivity * 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (94 commits) drm/i915: i915 cannot provide switcher services. Input: wacom - add new Bamboo PT (0xdb) ARM: S3C24XX: Fix mess with gpio {set,get}_pull callbacks drm/radeon/kms: fix vram base calculation on rs780/rs880 drm/radeon/kms: fix formatting of vram and gtt info drm/radeon/kms: forbid big bo allocation (fdo 31708) v3 drm: Don't try and disable an encoder that was never enabled drm: Add missing drm_vblank_put() along queue vblank error path drm/i915/dp: Only apply the workaround if the select is still active MN10300: Fix interrupt mask alteration function call name in gdbstub autofs4 - remove ioctl mutex (bz23142) drm/i915: Emit a request to clear a flushed and idle ring for unbusy bo Linux 2.6.37-rc5 ARM: tegra: fix regression from addruart rewrite Input: add input driver for polled GPIO buttons PM / Hibernate: Fix memory corruption related to swap PM / Hibernate: Use async I/O when reading compressed hibernation image wmi: use memcmp instead of strncmp to compare GUIDs perf record: Fix eternal wait for stillborn child ARM: 6524/1: GIC irq desciptor bug fix ... Signed-off-by: Avi Kivity -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add xsetbv intercept
From: Joerg Roedel This patch implements the xsetbv intercept to the AMD part of KVM. This makes AVX usable in a save way for the guest on AVX capable AMD hardware. The patch is tested by using AVX in the guest and host in parallel and checking for data corruption. I also used the KVM xsave unit-tests and they all pass. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 82ecaa3..f7087bf 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -47,6 +47,7 @@ enum { INTERCEPT_MONITOR, INTERCEPT_MWAIT, INTERCEPT_MWAIT_COND, + INTERCEPT_XSETBV, }; @@ -329,6 +330,7 @@ struct __attribute__ ((__packed__)) vmcb { #define SVM_EXIT_MONITOR 0x08a #define SVM_EXIT_MWAIT 0x08b #define SVM_EXIT_MWAIT_COND0x08c +#define SVM_EXIT_XSETBV0x08d #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_ERR -1 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9b3d166..24b4373 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -935,6 +935,7 @@ static void init_vmcb(struct vcpu_svm *svm) set_intercept(svm, INTERCEPT_WBINVD); set_intercept(svm, INTERCEPT_MONITOR); set_intercept(svm, INTERCEPT_MWAIT); + set_intercept(svm, INTERCEPT_XSETBV); control->iopm_base_pa = iopm_base; control->msrpm_base_pa = __pa(svm->msrpm); @@ -2546,6 +2547,19 @@ static int skinit_interception(struct vcpu_svm *svm) return 1; } +static int xsetbv_interception(struct vcpu_svm *svm) +{ + u64 new_bv = kvm_read_edx_eax(&svm->vcpu); + u32 index = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX); + + if (kvm_set_xcr(&svm->vcpu, index, new_bv) == 0) { + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + } + + return 1; +} + static int invalid_op_interception(struct vcpu_svm *svm) { kvm_queue_exception(&svm->vcpu, UD_VECTOR); @@ -2971,6 +2985,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_WBINVD] = emulate_on_interception, [SVM_EXIT_MONITOR] = invalid_op_interception, [SVM_EXIT_MWAIT]= invalid_op_interception, + [SVM_EXIT_XSETBV] = xsetbv_interception, [SVM_EXIT_NPF] = pf_interception, }; @@ -3622,10 +3637,6 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu) static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) { switch (func) { - case 0x0001: - /* Mask out xsave bit as long as it is not supported by SVM */ - entry->ecx &= ~(bit(X86_FEATURE_XSAVE)); - break; case 0x8001: if (nested) entry->ecx |= (1 << 2); /* Set SVM bit */ @@ -3699,6 +3710,7 @@ static const struct trace_print_flags svm_exit_reasons_str[] = { { SVM_EXIT_WBINVD, "wbinvd" }, { SVM_EXIT_MONITOR, "monitor" }, { SVM_EXIT_MWAIT, "mwait" }, + { SVM_EXIT_XSETBV, "xsetbv" }, { SVM_EXIT_NPF, "npf" }, { -1, NULL } }; -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Make the way of accessing lpage_info more generic
From: Takuya Yoshikawa Large page information has two elements but one of them, write_count, alone is accessed by a helper function. This patch replaces this helper function with more generic one which returns newly named kvm_lpage_info structure and use it to access the other element rmap_pde. Signed-off-by: Takuya Yoshikawa Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 1a953ac..5bc820c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -477,46 +477,46 @@ static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) } /* - * Return the pointer to the largepage write count for a given - * gfn, handling slots that are not large page aligned. + * Return the pointer to the large page information for a given gfn, + * handling slots that are not large page aligned. */ -static int *slot_largepage_idx(gfn_t gfn, - struct kvm_memory_slot *slot, - int level) +static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn, + struct kvm_memory_slot *slot, + int level) { unsigned long idx; idx = (gfn >> KVM_HPAGE_GFN_SHIFT(level)) - (slot->base_gfn >> KVM_HPAGE_GFN_SHIFT(level)); - return &slot->lpage_info[level - 2][idx].write_count; + return &slot->lpage_info[level - 2][idx]; } static void account_shadowed(struct kvm *kvm, gfn_t gfn) { struct kvm_memory_slot *slot; - int *write_count; + struct kvm_lpage_info *linfo; int i; slot = gfn_to_memslot(kvm, gfn); for (i = PT_DIRECTORY_LEVEL; i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { - write_count = slot_largepage_idx(gfn, slot, i); - *write_count += 1; + linfo = lpage_info_slot(gfn, slot, i); + linfo->write_count += 1; } } static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn) { struct kvm_memory_slot *slot; - int *write_count; + struct kvm_lpage_info *linfo; int i; slot = gfn_to_memslot(kvm, gfn); for (i = PT_DIRECTORY_LEVEL; i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { - write_count = slot_largepage_idx(gfn, slot, i); - *write_count -= 1; - WARN_ON(*write_count < 0); + linfo = lpage_info_slot(gfn, slot, i); + linfo->write_count -= 1; + WARN_ON(linfo->write_count < 0); } } @@ -525,12 +525,12 @@ static int has_wrprotected_page(struct kvm *kvm, int level) { struct kvm_memory_slot *slot; - int *largepage_idx; + struct kvm_lpage_info *linfo; slot = gfn_to_memslot(kvm, gfn); if (slot) { - largepage_idx = slot_largepage_idx(gfn, slot, level); - return *largepage_idx; + linfo = lpage_info_slot(gfn, slot, level); + return linfo->write_count; } return 1; @@ -585,16 +585,15 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn) static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level) { struct kvm_memory_slot *slot; - unsigned long idx; + struct kvm_lpage_info *linfo; slot = gfn_to_memslot(kvm, gfn); if (likely(level == PT_PAGE_TABLE_LEVEL)) return &slot->rmap[gfn - slot->base_gfn]; - idx = (gfn >> KVM_HPAGE_GFN_SHIFT(level)) - - (slot->base_gfn >> KVM_HPAGE_GFN_SHIFT(level)); + linfo = lpage_info_slot(gfn, slot, level); - return &slot->lpage_info[level - 2][idx].rmap_pde; + return &linfo->rmap_pde; } /* @@ -882,19 +881,16 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, end = start + (memslot->npages << PAGE_SHIFT); if (hva >= start && hva < end) { gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; + gfn_t gfn = memslot->base_gfn + gfn_offset; ret = handler(kvm, &memslot->rmap[gfn_offset], data); for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) { - unsigned long idx; - int sh; - - sh = KVM_HPAGE_GFN_SHIFT(PT_DIRECTORY_LEVEL+j); - idx = ((memslot->base_gfn+gfn_offset) >> sh) - - (memslot->base_gfn >> sh); - ret |= handler(kvm, - &memslot->lpage_info[j][idx].rmap_pde, - data); + struct kvm_lpage_info *linfo; + + linfo = lpage_info_slot(gfn, memslot, +
[COMMIT master] KVM: VMX: add module parameter to avoid trapping HLT instructions (v5)
From: Anthony Liguori In certain use-cases, we want to allocate guests fixed time slices where idle guest cycles leave the machine idling. There are many approaches to achieve this but the most direct is to simply avoid trapping the HLT instruction which lets the guest directly execute the instruction putting the processor to sleep. Introduce this as a module-level option for kvm-vmx.ko since if you do this for one guest, you probably want to do it for all. Signed-off-by: Anthony Liguori Signed-off-by: Avi Kivity diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 42d9590..9642c22 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -297,6 +297,12 @@ enum vmcs_field { #define GUEST_INTR_STATE_SMI 0x0004 #define GUEST_INTR_STATE_NMI 0x0008 +/* GUEST_ACTIVITY_STATE flags */ +#define GUEST_ACTIVITY_ACTIVE 0 +#define GUEST_ACTIVITY_HLT 1 +#define GUEST_ACTIVITY_SHUTDOWN2 +#define GUEST_ACTIVITY_WAIT_SIPI 3 + /* * Exit Qualifications for MOV for Control Register Access */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 72cfdb7..5c62ef2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -69,6 +69,9 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO); static int __read_mostly vmm_exclusive = 1; module_param(vmm_exclusive, bool, S_IRUGO); +static int __read_mostly yield_on_hlt = 1; +module_param(yield_on_hlt, bool, S_IRUGO); + #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \ (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD) #define KVM_GUEST_CR0_MASK \ @@ -1009,6 +1012,17 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu) vmx_set_interrupt_shadow(vcpu, 0); } +static void vmx_clear_hlt(struct kvm_vcpu *vcpu) +{ + /* Ensure that we clear the HLT state in the VMCS. We don't need to +* explicitly skip the instruction because if the HLT state is set, then +* the instruction is already executing and RIP has already been +* advanced. */ + if (!yield_on_hlt && + vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT) + vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); +} + static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool has_error_code, u32 error_code, bool reinject) @@ -1035,6 +1049,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, intr_info |= INTR_TYPE_HARD_EXCEPTION; vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info); + vmx_clear_hlt(vcpu); } static bool vmx_rdtscp_supported(void) @@ -1419,7 +1434,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) &_pin_based_exec_control) < 0) return -EIO; - min = CPU_BASED_HLT_EXITING | + min = #ifdef CONFIG_X86_64 CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | @@ -1432,6 +1447,10 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) CPU_BASED_MWAIT_EXITING | CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING; + + if (yield_on_hlt) + min |= CPU_BASED_HLT_EXITING; + opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; @@ -2728,7 +2747,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmcs_writel(GUEST_IDTR_BASE, 0); vmcs_write32(GUEST_IDTR_LIMIT, 0x); - vmcs_write32(GUEST_ACTIVITY_STATE, 0); + vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0); @@ -2821,6 +2840,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu) } else intr |= INTR_TYPE_EXT_INTR; vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr); + vmx_clear_hlt(vcpu); } static void vmx_inject_nmi(struct kvm_vcpu *vcpu) @@ -2848,6 +2868,7 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu) } vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR); + vmx_clear_hlt(vcpu); } static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Fix OSXSAVE after migration
From: Sheng Yang CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID after migration. KVM-Stable-Tag. Signed-off-by: Sheng Yang Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 018bb70..bb04957 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5585,6 +5585,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4; kvm_x86_ops->set_cr4(vcpu, sregs->cr4); + if (sregs->cr4 & X86_CR4_OSXSAVE) + update_cpuid(vcpu); if (!is_long_mode(vcpu) && is_pae(vcpu)) { load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3); mmu_reset_needed = 1; -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: retry #PF for softmmu
From: Xiao Guangrong Retry #PF for softmmu only when the current vcpu has the same cr3 as the time when #PF occurs Signed-off-by: Xiao Guangrong Signed-off-by: Avi Kivity diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f7e5066..b55d789 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -593,6 +593,7 @@ struct kvm_x86_ops { struct kvm_arch_async_pf { u32 token; gfn_t gfn; + unsigned long cr3; bool direct_map; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 04f9033..1a953ac 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2607,9 +2607,11 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn) { struct kvm_arch_async_pf arch; + arch.token = (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id; arch.gfn = gfn; arch.direct_map = vcpu->arch.mmu.direct_map; + arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu); return kvm_setup_async_pf(vcpu, gva, gfn, &arch); } diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 52b3e91..146b681 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -438,7 +438,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct guest_walker *gw, int user_fault, int write_fault, int hlevel, -int *ptwrite, pfn_t pfn, bool map_writable) +int *ptwrite, pfn_t pfn, bool map_writable, +bool prefault) { unsigned access = gw->pt_access; struct kvm_mmu_page *sp = NULL; @@ -512,7 +513,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access, user_fault, write_fault, dirty, ptwrite, it.level, -gw->gfn, pfn, false, map_writable); +gw->gfn, pfn, prefault, map_writable); FNAME(pte_prefetch)(vcpu, gw, it.sptep); return it.sptep; @@ -568,8 +569,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, */ if (!r) { pgprintk("%s: guest page fault\n", __func__); - inject_page_fault(vcpu, &walker.fault); - vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ + if (!prefault) { + inject_page_fault(vcpu, &walker.fault); + /* reset fork detector */ + vcpu->arch.last_pt_write_count = 0; + } return 0; } @@ -599,7 +603,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); kvm_mmu_free_some_pages(vcpu); sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, -level, &write_pt, pfn, map_writable); +level, &write_pt, pfn, map_writable, prefault); (void)sptep; pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__, sptep, *sptep, write_pt); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ed373ba..018bb70 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6183,7 +6183,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { int r; - if (!vcpu->arch.mmu.direct_map || !work->arch.direct_map || + if ((vcpu->arch.mmu.direct_map != work->arch.direct_map) || is_error_page(work->page)) return; @@ -6191,6 +6191,10 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) if (unlikely(r)) return; + if (!vcpu->arch.mmu.direct_map && + work->arch.cr3 != vcpu->arch.mmu.get_cr3(vcpu)) + return; + vcpu->arch.mmu.page_fault(vcpu, work->gva, 0, true); } -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Use svm_flush_tlb instead of force_new_asid
From: Joerg Roedel This patch replaces all calls to force_new_asid which are intended to flush the guest-tlb by the more appropriate function svm_flush_tlb. As a side-effect the force_new_asid function is removed. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 16334bb..b4aad21 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -421,11 +421,6 @@ static inline void invlpga(unsigned long addr, u32 asid) asm volatile (__ex(SVM_INVLPGA) : : "a"(addr), "c"(asid)); } -static inline void force_new_asid(struct kvm_vcpu *vcpu) -{ - to_svm(vcpu)->asid_generation--; -} - static int get_npt_level(void) { #ifdef CONFIG_X86_64 @@ -999,7 +994,7 @@ static void init_vmcb(struct vcpu_svm *svm) save->cr3 = 0; save->cr4 = 0; } - force_new_asid(&svm->vcpu); + svm->asid_generation = 0; svm->nested.vmcb = 0; svm->vcpu.arch.hflags = 0; @@ -1419,7 +1414,7 @@ static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) unsigned long old_cr4 = to_svm(vcpu)->vmcb->save.cr4; if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE)) - force_new_asid(vcpu); + svm_flush_tlb(vcpu); vcpu->arch.cr4 = cr4; if (!npt_enabled) @@ -1762,7 +1757,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu, svm->vmcb->control.nested_cr3 = root; mark_dirty(svm->vmcb, VMCB_NPT); - force_new_asid(vcpu); + svm_flush_tlb(vcpu); } static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu, @@ -2366,7 +2361,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) svm->nested.intercept_exceptions = nested_vmcb->control.intercept_exceptions; svm->nested.intercept= nested_vmcb->control.intercept; - force_new_asid(&svm->vcpu); + svm_flush_tlb(&svm->vcpu); svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK; if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK) svm->vcpu.arch.hflags |= HF_VINTR_MASK; @@ -3308,7 +3303,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr) static void svm_flush_tlb(struct kvm_vcpu *vcpu) { - force_new_asid(vcpu); + to_svm(vcpu)->asid_generation--; } static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu) @@ -3560,7 +3555,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root) svm->vmcb->save.cr3 = root; mark_dirty(svm->vmcb, VMCB_CR); - force_new_asid(vcpu); + svm_flush_tlb(vcpu); } static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root) @@ -3574,7 +3569,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root) svm->vmcb->save.cr3 = vcpu->arch.cr3; mark_dirty(svm->vmcb, VMCB_CR); - force_new_asid(vcpu); + svm_flush_tlb(vcpu); } static int is_disabled(void) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Implement Flush-By-Asid feature
From: Joerg Roedel This patch adds the new flush-by-asid of upcoming AMD processors to the KVM-AMD module. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 235dd73..82ecaa3 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -88,6 +88,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { #define TLB_CONTROL_DO_NOTHING 0 #define TLB_CONTROL_FLUSH_ALL_ASID 1 +#define TLB_CONTROL_FLUSH_ASID 3 +#define TLB_CONTROL_FLUSH_ASID_LOCAL 7 #define V_TPR_MASK 0x0f diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index b4aad21..740884b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3158,7 +3158,6 @@ static void pre_svm_run(struct vcpu_svm *svm) struct svm_cpu_data *sd = per_cpu(svm_data, cpu); - svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING; /* FIXME: handle wraparound of asid_generation */ if (svm->asid_generation != sd->asid_generation) new_asid(svm, sd); @@ -3303,7 +3302,12 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr) static void svm_flush_tlb(struct kvm_vcpu *vcpu) { - to_svm(vcpu)->asid_generation--; + struct vcpu_svm *svm = to_svm(vcpu); + + if (static_cpu_has(X86_FEATURE_FLUSHBYASID)) + svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID; + else + svm->asid_generation--; } static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu) @@ -3527,6 +3531,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) svm->next_rip = 0; + svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING; + /* if exit due to PF check for async PF */ if (svm->vmcb->control.exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) svm->apf_reason = kvm_read_and_reset_pf_reason(); -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for GDT and IDT
From: Joerg Roedel This patch implements the clean-bit for the base and limit of the gdt and idt in the vmcb. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 89be0d6..bb640ae 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -194,6 +194,7 @@ enum { VMCB_NPT,/* npt_en, nCR3, gPAT */ VMCB_CR, /* CR0, CR3, CR4, EFER */ VMCB_DR, /* DR6, DR7 */ + VMCB_DT, /* GDT, IDT */ VMCB_DIRTY_MAX, }; @@ -1304,6 +1305,7 @@ static void svm_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) svm->vmcb->save.idtr.limit = dt->size; svm->vmcb->save.idtr.base = dt->address ; + mark_dirty(svm->vmcb, VMCB_DT); } static void svm_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) @@ -1320,6 +1322,7 @@ static void svm_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) svm->vmcb->save.gdtr.limit = dt->size; svm->vmcb->save.gdtr.base = dt->address ; + mark_dirty(svm->vmcb, VMCB_DT); } static void svm_decache_cr0_guest_bits(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Remove flush_guest_tlb function
From: Joerg Roedel This function is unused and there is svm_flush_tlb which does the same. So this function can be removed. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 05ae90a..16334bb 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -426,11 +426,6 @@ static inline void force_new_asid(struct kvm_vcpu *vcpu) to_svm(vcpu)->asid_generation--; } -static inline void flush_guest_tlb(struct kvm_vcpu *vcpu) -{ - force_new_asid(vcpu); -} - static int get_npt_level(void) { #ifdef CONFIG_X86_64 -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: fix accessed bit set on prefault path
From: Xiao Guangrong Retry #PF is the speculative path, so don't set the accessed bit Signed-off-by: Xiao Guangrong Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4954de9..04f9033 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2214,7 +2214,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) } static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int map_writable, int level, gfn_t gfn, pfn_t pfn) + int map_writable, int level, gfn_t gfn, pfn_t pfn, + bool prefault) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -2229,7 +2230,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, pte_access &= ~ACC_WRITE_MASK; mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access, 0, write, 1, &pt_write, -level, gfn, pfn, false, map_writable); +level, gfn, pfn, prefault, map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; @@ -2321,7 +2322,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn, if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; kvm_mmu_free_some_pages(vcpu); - r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn); + r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn, +prefault); spin_unlock(&vcpu->kvm->mmu_lock); @@ -2684,7 +2686,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, goto out_unlock; kvm_mmu_free_some_pages(vcpu); r = __direct_map(vcpu, gpa, write, map_writable, -level, gfn, pfn); +level, gfn, pfn, prefault); spin_unlock(&vcpu->kvm->mmu_lock); return r; -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: rename 'no_apf' to 'prefault'
From: Xiao Guangrong It's the speculative path if 'no_apf = 1' and we will specially handle this speculative path in the later patch, so 'prefault' is better to fit the sense. Signed-off-by: Xiao Guangrong Signed-off-by: Avi Kivity diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cfbcbfa..f7e5066 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -241,7 +241,8 @@ struct kvm_mmu { void (*new_cr3)(struct kvm_vcpu *vcpu); void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root); unsigned long (*get_cr3)(struct kvm_vcpu *vcpu); - int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool no_apf); + int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, + bool prefault); void (*inject_page_fault)(struct kvm_vcpu *vcpu, struct x86_exception *fault); void (*free)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d75ba1e..4954de9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2284,11 +2284,11 @@ static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn) return 1; } -static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn, +static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gva_t gva, pfn_t *pfn, bool write, bool *writable); static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn, -bool no_apf) +bool prefault) { int r; int level; @@ -2310,7 +2310,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn, mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, no_apf, gfn, v, &pfn, write, &map_writable)) + if (try_async_pf(vcpu, prefault, gfn, v, &pfn, write, &map_writable)) return 0; /* mmio */ @@ -2583,7 +2583,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr, } static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, - u32 error_code, bool no_apf) + u32 error_code, bool prefault) { gfn_t gfn; int r; @@ -2599,7 +2599,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn = gva >> PAGE_SHIFT; return nonpaging_map(vcpu, gva & PAGE_MASK, -error_code & PFERR_WRITE_MASK, gfn, no_apf); +error_code & PFERR_WRITE_MASK, gfn, prefault); } static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn) @@ -2621,7 +2621,7 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu) return kvm_x86_ops->interrupt_allowed(vcpu); } -static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn, +static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gva_t gva, pfn_t *pfn, bool write, bool *writable) { bool async; @@ -2633,7 +2633,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn, put_page(pfn_to_page(*pfn)); - if (!no_apf && can_do_async_pf(vcpu)) { + if (!prefault && can_do_async_pf(vcpu)) { trace_kvm_try_async_get_page(gva, gfn); if (kvm_find_async_pf_gfn(vcpu, gfn)) { trace_kvm_async_pf_doublefault(gva, gfn); @@ -2649,7 +2649,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn, } static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, - bool no_apf) + bool prefault) { pfn_t pfn; int r; @@ -2673,7 +2673,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn, write, &map_writable)) + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) return 0; /* mmio */ diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index d5a0a11..52b3e91 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -539,7 +539,7 @@ out_gpte_changed: * a negative value on error. */ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, -bool no_apf) +bool prefault) { int write_fault = error_code & PFERR_WRITE_MASK; int user_fault = error_code & PFERR_USER_MASK; @@ -581,7 +581,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pf
[COMMIT master] KVM: SVM: Add clean-bit for LBR state
From: Joerg Roedel This patch implements the clean-bit for all LBR related state. This includes the debugctl, br_from, br_to, last_excp_from, and last_excp_to msrs. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index e5db339..05ae90a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -197,6 +197,7 @@ enum { VMCB_DT, /* GDT, IDT */ VMCB_SEG,/* CS, DS, SS, ES, CPL */ VMCB_CR2,/* CR2 only */ + VMCB_LBR,/* DBGCTL, BR_FROM, BR_TO, LAST_EX_FROM, LAST_EX_TO */ VMCB_DIRTY_MAX, }; @@ -2847,6 +2848,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) return 1; svm->vmcb->save.dbgctl = data; + mark_dirty(svm->vmcb, VMCB_LBR); if (data & (1ULL<<0)) svm_enable_lbrv(svm); else -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for CR2 register
From: Joerg Roedel This patch implements the clean-bit for the cr2 register in the vmcb. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 85d3350..e5db339 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -196,11 +196,12 @@ enum { VMCB_DR, /* DR6, DR7 */ VMCB_DT, /* GDT, IDT */ VMCB_SEG,/* CS, DS, SS, ES, CPL */ + VMCB_CR2,/* CR2 only */ VMCB_DIRTY_MAX, }; -/* TPR is always written before VMRUN */ -#define VMCB_ALWAYS_DIRTY_MASK (1U << VMCB_INTR) +/* TPR and CR2 are always written before VMRUN */ +#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2)) static inline void mark_all_dirty(struct vmcb *vmcb) { -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for DR6 and DR7
From: Joerg Roedel This patch implements the clean-bit for the dr6 and dr7 debug registers in the vmcb. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 135727c..89be0d6 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -193,6 +193,7 @@ enum { VMCB_INTR, /* int_ctl, int_vector */ VMCB_NPT,/* npt_en, nCR3, gPAT */ VMCB_CR, /* CR0, CR3, CR4, EFER */ + VMCB_DR, /* DR6, DR7 */ VMCB_DIRTY_MAX, }; @@ -1484,6 +1485,8 @@ static void svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) else svm->vmcb->save.dr7 = vcpu->arch.dr7; + mark_dirty(svm->vmcb, VMCB_DR); + update_db_intercept(vcpu); } @@ -1506,6 +1509,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value) struct vcpu_svm *svm = to_svm(vcpu); svm->vmcb->save.dr7 = value; + mark_dirty(svm->vmcb, VMCB_DR); } static int pf_interception(struct vcpu_svm *svm) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for Segements and CPL
From: Joerg Roedel This patch implements the clean-bit defined for the cs, ds, ss, an es segemnts and the current cpl saved in the vmcb. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index bb640ae..85d3350 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -195,6 +195,7 @@ enum { VMCB_CR, /* CR0, CR3, CR4, EFER */ VMCB_DR, /* DR6, DR7 */ VMCB_DT, /* GDT, IDT */ + VMCB_SEG,/* CS, DS, SS, ES, CPL */ VMCB_DIRTY_MAX, }; @@ -1457,6 +1458,7 @@ static void svm_set_segment(struct kvm_vcpu *vcpu, = (svm->vmcb->save.cs.attrib >> SVM_SELECTOR_DPL_SHIFT) & 3; + mark_dirty(svm->vmcb, VMCB_SEG); } static void update_db_intercept(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for control registers
From: Joerg Roedel This patch implements the CRx clean-bit for the vmcb. This bit covers cr0, cr3, cr4, and efer. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2a63dfa..135727c 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -192,6 +192,7 @@ enum { VMCB_ASID, /* ASID */ VMCB_INTR, /* int_ctl, int_vector */ VMCB_NPT,/* npt_en, nCR3, gPAT */ + VMCB_CR, /* CR0, CR3, CR4, EFER */ VMCB_DIRTY_MAX, }; @@ -441,6 +442,7 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer) efer &= ~EFER_LME; to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME; + mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR); } static int is_external_interrupt(u32 info) @@ -1338,6 +1340,7 @@ static void update_cr0_intercept(struct vcpu_svm *svm) *hcr0 = (*hcr0 & ~SVM_CR0_SELECTIVE_MASK) | (gcr0 & SVM_CR0_SELECTIVE_MASK); + mark_dirty(svm->vmcb, VMCB_CR); if (gcr0 == *hcr0 && svm->vcpu.fpu_active) { clr_cr_intercept(svm, INTERCEPT_CR0_READ); @@ -1404,6 +1407,7 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) */ cr0 &= ~(X86_CR0_CD | X86_CR0_NW); svm->vmcb->save.cr0 = cr0; + mark_dirty(svm->vmcb, VMCB_CR); update_cr0_intercept(svm); } @@ -1420,6 +1424,7 @@ static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) cr4 |= X86_CR4_PAE; cr4 |= host_cr4_mce; to_svm(vcpu)->vmcb->save.cr4 = cr4; + mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR); } static void svm_set_segment(struct kvm_vcpu *vcpu, @@ -3547,6 +3552,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root) struct vcpu_svm *svm = to_svm(vcpu); svm->vmcb->save.cr3 = root; + mark_dirty(svm->vmcb, VMCB_CR); force_new_asid(vcpu); } @@ -3559,6 +3565,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root) /* Also sync guest cr3 here in case we live migrate */ svm->vmcb->save.cr3 = vcpu->arch.cr3; + mark_dirty(svm->vmcb, VMCB_CR); force_new_asid(vcpu); } -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for NPT state
From: Joerg Roedel This patch implements the clean-bit for all nested paging related state in the vmcb. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index b98092d..2a63dfa 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -191,6 +191,7 @@ enum { VMCB_PERM_MAP, /* IOPM Base and MSRPM Base */ VMCB_ASID, /* ASID */ VMCB_INTR, /* int_ctl, int_vector */ + VMCB_NPT,/* npt_en, nCR3, gPAT */ VMCB_DIRTY_MAX, }; @@ -1749,6 +1750,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu, struct vcpu_svm *svm = to_svm(vcpu); svm->vmcb->control.nested_cr3 = root; + mark_dirty(svm->vmcb, VMCB_NPT); force_new_asid(vcpu); } @@ -3553,6 +3555,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root) struct vcpu_svm *svm = to_svm(vcpu); svm->vmcb->control.nested_cr3 = root; + mark_dirty(svm->vmcb, VMCB_NPT); /* Also sync guest cr3 here in case we live migrate */ svm->vmcb->save.cr3 = vcpu->arch.cr3; -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for the ASID
From: Joerg Roedel This patch implements the clean-bit for the asid in the vmcb. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1802f7c..a3fd9ba 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -189,6 +189,7 @@ enum { VMCB_INTERCEPTS, /* Intercept vectors, TSC offset, pause filter count */ VMCB_PERM_MAP, /* IOPM Base and MSRPM Base */ + VMCB_ASID, /* ASID */ VMCB_DIRTY_MAX, }; @@ -1488,6 +1489,8 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd) svm->asid_generation = sd->asid_generation; svm->vmcb->control.asid = sd->next_asid++; + + mark_dirty(svm->vmcb, VMCB_ASID); } static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for interrupt state
From: Joerg Roedel This patch implements the clean-bit for all interrupt related state in the vmcb. This corresponds to vmcb offset 0x60-0x67. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index a3fd9ba..b98092d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -190,10 +190,12 @@ enum { pause filter count */ VMCB_PERM_MAP, /* IOPM Base and MSRPM Base */ VMCB_ASID, /* ASID */ + VMCB_INTR, /* int_ctl, int_vector */ VMCB_DIRTY_MAX, }; -#define VMCB_ALWAYS_DIRTY_MASK 0U +/* TPR is always written before VMRUN */ +#define VMCB_ALWAYS_DIRTY_MASK (1U << VMCB_INTR) static inline void mark_all_dirty(struct vmcb *vmcb) { @@ -2508,6 +2510,8 @@ static int clgi_interception(struct vcpu_svm *svm) svm_clear_vintr(svm); svm->vmcb->control.int_ctl &= ~V_IRQ_MASK; + mark_dirty(svm->vmcb, VMCB_INTR); + return 1; } @@ -2878,6 +2882,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm) kvm_make_request(KVM_REQ_EVENT, &svm->vcpu); svm_clear_vintr(svm); svm->vmcb->control.int_ctl &= ~V_IRQ_MASK; + mark_dirty(svm->vmcb, VMCB_INTR); /* * If the user space waits to inject interrupts, exit as soon as * possible @@ -3169,6 +3174,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) control->int_ctl &= ~V_INTR_PRIO_MASK; control->int_ctl |= V_IRQ_MASK | ((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT); + mark_dirty(svm->vmcb, VMCB_INTR); } static void svm_set_irq(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for intercetps, tsc-offset and pause filter count
From: Joerg Roedel This patch adds the clean-bit for intercepts-vectors, the TSC offset and the pause-filter count to the appropriate places. The IO and MSR permission bitmaps are not subject to this bit. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 0904c11..609f661 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -186,6 +186,8 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, bool has_error_code, u32 error_code); enum { + VMCB_INTERCEPTS, /* Intercept vectors, TSC offset, + pause filter count */ VMCB_DIRTY_MAX, }; @@ -217,6 +219,8 @@ static void recalc_intercepts(struct vcpu_svm *svm) struct vmcb_control_area *c, *h; struct nested_state *g; + mark_dirty(svm->vmcb, VMCB_INTERCEPTS); + if (!is_guest_mode(&svm->vcpu)) return; @@ -854,6 +858,8 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) } svm->vmcb->control.tsc_offset = offset + g_tsc_offset; + + mark_dirty(svm->vmcb, VMCB_INTERCEPTS); } static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment) @@ -863,6 +869,7 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment) svm->vmcb->control.tsc_offset += adjustment; if (is_guest_mode(vcpu)) svm->nested.hsave->control.tsc_offset += adjustment; + mark_dirty(svm->vmcb, VMCB_INTERCEPTS); } static void init_vmcb(struct vcpu_svm *svm) -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bit for IOPM_BASE and MSRPM_BASE
From: Joerg Roedel This patch adds the clean bit for the physical addresses of the MSRPM and the IOPM. It does not need to be set in the code because the only place where these values are changed is the nested-svm vmrun and vmexit path. These functions already mark the complete VMCB as dirty. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 609f661..1802f7c 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -188,6 +188,7 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, enum { VMCB_INTERCEPTS, /* Intercept vectors, TSC offset, pause filter count */ + VMCB_PERM_MAP, /* IOPM Base and MSRPM Base */ VMCB_DIRTY_MAX, }; -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add clean-bits infrastructure code
From: Roedel, Joerg This patch adds the infrastructure for the implementation of the individual clean-bits. Signed-off-by: Joerg Roedel Signed-off-by: Avi Kivity diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 11dbca7..235dd73 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -79,7 +79,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { u32 event_inj_err; u64 nested_cr3; u64 lbr_ctl; - u64 reserved_5; + u32 clean; + u32 reserved_5; u64 next_rip; u8 reserved_6[816]; }; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ae943bb..0904c11 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -185,6 +185,28 @@ static int nested_svm_vmexit(struct vcpu_svm *svm); static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, bool has_error_code, u32 error_code); +enum { + VMCB_DIRTY_MAX, +}; + +#define VMCB_ALWAYS_DIRTY_MASK 0U + +static inline void mark_all_dirty(struct vmcb *vmcb) +{ + vmcb->control.clean = 0; +} + +static inline void mark_all_clean(struct vmcb *vmcb) +{ + vmcb->control.clean = ((1 << VMCB_DIRTY_MAX) - 1) + & ~VMCB_ALWAYS_DIRTY_MASK; +} + +static inline void mark_dirty(struct vmcb *vmcb, int bit) +{ + vmcb->control.clean &= ~(1 << bit); +} + static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) { return container_of(vcpu, struct vcpu_svm, vcpu); @@ -973,6 +995,8 @@ static void init_vmcb(struct vcpu_svm *svm) set_intercept(svm, INTERCEPT_PAUSE); } + mark_all_dirty(svm->vmcb); + enable_gif(svm); } @@ -1089,6 +1113,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (unlikely(cpu != vcpu->cpu)) { svm->asid_generation = 0; + mark_all_dirty(svm->vmcb); } #ifdef CONFIG_X86_64 @@ -2140,6 +2165,8 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) svm->vmcb->save.cpl = 0; svm->vmcb->control.exit_int_info = 0; + mark_all_dirty(svm->vmcb); + nested_svm_unmap(page); nested_svm_uninit_mmu_context(&svm->vcpu); @@ -2351,6 +2378,8 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) enable_gif(svm); + mark_all_dirty(svm->vmcb); + return true; } @@ -3488,6 +3517,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_EXCP_BASE + MC_VECTOR)) svm_handle_mce(svm); + + mark_all_clean(svm->vmcb); } #undef R -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] pci: Error on PCI capability collisions
From: Alex Williamson Nothing good can happen when we overlap capabilities Signed-off-by: Alex Williamson Signed-off-by: Avi Kivity diff --git a/hw/pci.c b/hw/pci.c index b08113d..288d6fd 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1845,6 +1845,20 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, if (!offset) { return -ENOSPC; } +} else { +int i; + +for (i = offset; i < offset + size; i++) { +if (pdev->config_map[i]) { +fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " +"Attempt to add PCI capability %x at offset " +"%x overlaps existing capability %x at offset %x\n", +pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), +PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), +cap_id, offset, pdev->config_map[i], i); +return -EFAULT; +} +} } config = pdev->config + offset; -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] device-assignment: Error checking when adding capabilities
From: Alex Williamson Signed-off-by: Alex Williamson Signed-off-by: Avi Kivity diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1a90a89..0ae04de 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1288,7 +1288,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) { AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev); PCIRegion *pci_region = dev->real_device.regions; -int pos; +int ret, pos; /* Clear initial capabilities pointer and status copied from hw */ pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0); @@ -1303,7 +1303,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) { dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI; /* Only 32-bit/no-mask currently supported */ -pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10); +if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10)) < 0) { +return ret; +} pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS, pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) & @@ -1325,7 +1327,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) uint32_t msix_table_entry; dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX; -pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12); +if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12)) < 0) { +return ret; +} pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) & -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] device-assignment: pass through and stub more PCI caps
From: Alex Williamson Some drivers depend on finding capabilities like power management, PCI express/X, vital product data, or vendor specific fields. Now that we have better capability support, we can pass more of these tables through to the guest. Note that VPD and VNDR are direct pass through capabilies, the rest are mostly empty shells with a few writable bits where necessary. It may be possible to consolidate dummy capabilities into common files for other drivers to use, but I prefer to leave them here for now as we figure out what bits to handle directly with hardware and what bits are purely emulated. Signed-off-by: Alex Williamson Signed-off-by: Avi Kivity diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 0ae04de..50c6408 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -67,6 +67,9 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len); +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev, +uint32_t address, int len); + static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region, uint32_t addr, int len, uint32_t *val) { @@ -370,11 +373,32 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos) return (uint8_t)assigned_dev_pci_read(d, pos, 1); } -static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap) +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len) +{ +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); +ssize_t ret; +int fd = pci_dev->real_device.config_fd; + +again: +ret = pwrite(fd, &val, len, pos); +if (ret != len) { + if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) + goto again; + + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n", + __func__, ret, errno); + + exit(1); +} + +return; +} + +static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start) { int id; int max_cap = 48; -int pos = PCI_CAPABILITY_LIST; +int pos = start ? start : PCI_CAPABILITY_LIST; int status; status = assigned_dev_pci_read_byte(d, PCI_STATUS); @@ -453,10 +477,16 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, ssize_t ret; AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); +if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) { +val = assigned_device_pci_cap_read_config(d, address, len); +DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", + (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); +return val; +} + if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) || (address >= 0x10 && address <= 0x24) || address == 0x30 || -address == 0x34 || address == 0x3c || address == 0x3d || -(address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address])) { +address == 0x34 || address == 0x3c || address == 0x3d) { val = pci_default_read_config(d, address, len); DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); @@ -1251,7 +1281,70 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos) #endif #endif -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address, +/* There can be multiple VNDR capabilities per device, we need to find the + * one that starts closet to the given address without going over. */ +static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address) +{ +uint8_t cap, pos; + +for (cap = pos = 0; + (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos)); + pos += PCI_CAP_LIST_NEXT) { +if (pos <= address) { +cap = MAX(pos, cap); +} +} +return cap; +} + +/* Merge the bits set in mask from mval into val. Both val and mval are + * at the same addr offset, pos is the starting offset of the mask. */ +static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr, + int len, uint8_t pos, uint32_t mask) +{ +if (!ranges_overlap(addr, len, pos, 4)) { +return val; +} + +if (addr >= pos) { +mask >>= (addr - pos) * 8; +} else { +mask <<= (pos - addr) * 8; +} +mask &= 0xU >> (4 - len) * 8; + +val &= ~mask; +val |= (mval & mask); + +return val; +} + +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev, +uint32_t address, int len) +{ +uint8_t cap, cap_id = pci_dev->config_map[address]; +uint32_t val; + +switch (cap_id) { + +case PCI_CA
[COMMIT master] pci: Remove PCI_CAPABILITY_CONFIG_*
From: Alex Williamson Half of these aren't used anywhere, the other half are wrong. Now that device assignment is trying to match physical hardware offsets for PCI capabilities, we can't round up the MSI and MSI-X length. MSI-X is always 12 bytes. MSI is variable length depending on features, but for the current device assignment implementation, it's always the minimum length of 10 bytes. Signed-off-by: Alex Williamson Signed-off-by: Avi Kivity diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 6d6e657..1a90a89 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1302,10 +1302,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) * MSI capability is the 1st capability in capability config */ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) { dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI; -pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, - PCI_CAPABILITY_CONFIG_MSI_LENGTH); - /* Only 32-bit/no-mask currently supported */ +pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10); + pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS, pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) & PCI_MSI_FLAGS_QMASK); @@ -1326,8 +1325,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) uint32_t msix_table_entry; dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX; -pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, - PCI_CAPABILITY_CONFIG_MSIX_LENGTH); +pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12); pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) & diff --git a/hw/pci.h b/hw/pci.h index 34955d8..d579738 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -122,11 +122,6 @@ enum { QEMU_PCI_CAP_MULTIFUNCTION = (1 << QEMU_PCI_CAP_MULTIFUNCTION_BITNR), }; -#define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60 -#define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40 -#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10 -#define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x10 - typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector, int masked); -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] device-assignment: Fix off-by-one in header check
From: Alex Williamson Include the first byte at 40h or else access might go to the hardware instead of the emulated config space, resulting in capability loops, since the ordering is different. Signed-off-by: Alex Williamson Signed-off-by: Avi Kivity diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 832c236..6d6e657 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -410,7 +410,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7), (uint16_t) address, val, len); -if (address > PCI_CONFIG_HEADER_SIZE && d->config_map[address]) { +if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) { return assigned_device_pci_cap_write_config(d, address, val, len); } @@ -456,7 +456,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) || (address >= 0x10 && address <= 0x24) || address == 0x30 || address == 0x34 || address == 0x3c || address == 0x3d || -(address > PCI_CONFIG_HEADER_SIZE && d->config_map[address])) { +(address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address])) { val = pci_default_read_config(d, address, len); DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); -- To unsubscribe from this list: send the line "unsubscribe kvm-commits" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html