Re: [PULL 0/5] ppc patch queue 2013-04-17
On Wed, Apr 17, 2013 at 09:04:04PM +0200, Alexander Graf wrote: Hi Marcelo / Gleb, This is my current patch queue for ppc. Please pull for 3.10. Pulled. Thanks. Changes include: - KVM: PPC: Fix in-kernel MMIO loads - KVM: PPC: BookE: Fix 64-bit guest kernels with SMP Alex The following changes since commit 79558f112fc0352e057f7b5e158e3d88b8b62c60: Alexander Graf (1): KVM: ARM: Fix kvm_vm_ioctl_irq_line are available in the git repository at: git://github.com/agraf/linux-2.6.git kvm-ppc-next Alexander Graf (2): Merge commit 'origin/next' into kvm-ppc-next Merge commit 'origin/next' into kvm-ppc-next Bharat Bhushan (1): Added ONE_REG interface for debug instruction Scott Wood (1): kvm/ppc: don't call complete_mmio_load when it's a store Stuart Yoder (1): KVM: PPC: emulate dcbst arch/powerpc/include/asm/kvm_book3s.h |2 ++ arch/powerpc/include/asm/kvm_booke.h |2 ++ arch/powerpc/include/uapi/asm/kvm.h |4 arch/powerpc/kvm/book3s.c |6 ++ arch/powerpc/kvm/booke.c |6 ++ arch/powerpc/kvm/emulate.c|2 ++ arch/powerpc/kvm/powerpc.c|1 - 7 files changed, 22 insertions(+), 1 deletions(-) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5
On Thu, Apr 18, 2013 at 02:34:25PM +0300, Abel Gordon wrote: This series of patches implements shadow-vmcs capability for nested VMX. Applied, thanks. Shadow-vmcs - background and overview: In Intel VMX, vmread and vmwrite privileged instructions are used by the hypervisor to read and modify the guest and host specifications (VMCS). In a nested virtualization environment, L1 executes multiple vmread and vmwrite instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1 traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction behaviour and resumes L1 execution. Removing the need to trap and emulate these special instructions reduces the number of exits and improves nested virtualization performance. As it was first evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization overhead up-to 40%. Intel introduced a new feature to their processors called shadow-vmcs. Using shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode access VMCS12 fields using vmread and vmwrite instructions but without causing an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled by L0. Shadow-vmcs - design considerations: A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using vmread and vmwrite instructions. With nested virtualization we aim to abstract the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we prefered to keep the software defined VMCS12 format as part of L1 address space and hold the processor-specific shadow-vmcs format only in L0 address space. In other words, the shadow-vmcs is used by L0 as an accelerator but the format and content is never exposed to L1 directly. L0 syncs the content of the processor-specific shadow vmcs with the content of the software-controlled VMCS12 format. We could have been kept the processor-specific shadow-vmcs format in L1 address space to avoid using the software defined VMCS12 format, however, this type of design/implementation would have been created hardware dependencies and would complicate other capabilities (e.g. Live Migration of L1). Changes since v1: 1) Added sync_shadow_vmcs flag used to indicate when the content of VMCS12 must be copied to the shadow vmcs. The flag value is checked during vmx_vcpu_run. 2) Code quality improvements Changes since v2: 1) Allocate shadow vmcs only once per VCPU on handle_vmxon and re-use the same instance for multiple VMCS12s 2) More code quality improvements Changes since v3: 1) Fixed VMXON emulation (new patch). Previous nVMX code didn't verify if L1 is already in root mode (VMXON was previously called). Now we call nested_vmx_failValid if VMX is already ON. This is requird to avoid host leaks (due to shadow vmcs allocation) if L1 repetedly executes VMXON. 2) Improved comment: clarified we do not shadow fields that are modified when L1 executes vmx instructions like the VM_INSTRUCTION_ERROR field. Changes since v4: 1) Fixed free_nested: we now free the shadow vmcs also when there is no current vmcs. Acknowledgments: Many thanks to Natapov, Gleb g...@redhat.com Xu, Dongxiao dongxiao...@intel.com Nakajima, Jun jun.nakaj...@intel.com Har'El, Nadav na...@harel.org.il for the insightful discussions, comments and reviews. These patches were easily created and maintained using Patchouli -- patch creator http://patchouli.sourceforge.net/ [1] The Turtles Project: Design and Implementation of Nested Virtualization, http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: fix error return code in kvm_arch_vcpu_init()
On Thu, Apr 18, 2013 at 07:41:00AM +0800, Wei Yongjun wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn Applied, thanks. --- arch/x86/kvm/x86.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e172132..c45d656 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6706,8 +6706,10 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) } vcpu-arch.mcg_cap = KVM_MAX_MCE_BANKS; - if (!zalloc_cpumask_var(vcpu-arch.wbinvd_dirty_mask, GFP_KERNEL)) + if (!zalloc_cpumask_var(vcpu-arch.wbinvd_dirty_mask, GFP_KERNEL)) { + r = -ENOMEM; goto fail_free_mce_banks; + } r = fx_init(vcpu); if (r) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3] KVM: x86: Fix memory leak in vmx.c
On Thu, Apr 18, 2013 at 09:38:14AM -0700, Andrew Honig wrote: If userspace creates and destroys multiple VMs within the same process we leak 20k of memory in the userspace process context per VM. This patch frees the memory in kvm_arch_destroy_vm. If the process exits without closing the VM file descriptor or the file descriptor has been shared with another process then we don't free the memory. It's still possible for a user space process to leak memory if the last process to close the fd for the VM is not the process that created it. However, this is an unexpected case that's only caused by a user space process that's misbehaving. Signed-off-by: Andrew Honig aho...@google.com Applied, thanks. --- arch/x86/kvm/x86.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8ffac42..3b389bf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6856,6 +6856,23 @@ void kvm_arch_sync_events(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { + if (current-mm == kvm-mm) { + /* + * Free memory regions allocated on behalf of userspace, + * unless the the memory map has changed due to process exit + * or fd copying. + */ + struct kvm_userspace_memory_region mem; + memset(mem, 0, sizeof(mem)); + mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT; + kvm_set_memory_region(kvm, mem); + + mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT; + kvm_set_memory_region(kvm, mem); + + mem.slot = TSS_PRIVATE_MEMSLOT; + kvm_set_memory_region(kvm, mem); + } kvm_iommu_unmap_guest(kvm); kfree(kvm-arch.vpic); kfree(kvm-arch.vioapic); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 5/6] KVM: nVMX: Fix conditions for NMI injection
On Sun, Apr 14, 2013 at 09:04:26PM +0200, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com The logic for checking if interrupts can be injected has to be applied also on NMIs. The difference is that if NMI interception is on these events are consumed and blocked by the VM exit. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Applied, thanks. --- arch/x86/kvm/vmx.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 74dea94..9ad30d8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4190,6 +4190,12 @@ static bool nested_exit_on_intr(struct kvm_vcpu *vcpu) PIN_BASED_EXT_INTR_MASK; } +static bool nested_exit_on_nmi(struct kvm_vcpu *vcpu) +{ + return get_vmcs12(vcpu)-pin_based_vm_exec_control + PIN_BASED_NMI_EXITING; +} + static void enable_irq_window(struct kvm_vcpu *vcpu) { u32 cpu_based_vm_exec_control; @@ -4315,6 +4321,26 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) { + if (is_guest_mode(vcpu)) { + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + + if (to_vmx(vcpu)-nested.nested_run_pending) + return 0; + if (nested_exit_on_nmi(vcpu)) { + nested_vmx_vmexit(vcpu); + vmcs12-vm_exit_reason = EXIT_REASON_EXCEPTION_NMI; + vmcs12-vm_exit_intr_info = NMI_VECTOR | + INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK; + /* + * The NMI-triggered VM exit counts as injection: + * clear this one and block further NMIs. + */ + vcpu-arch.nmi_pending = 0; + vmx_set_nmi_mask(vcpu, true); + return 0; + } + } + if (!cpu_has_virtual_nmis() to_vmx(vcpu)-soft_vnmi_blocked) return 0; -- 1.7.3.4 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm/svm: emulate SVM_KEY MSR
On Mon, Apr 22, 2013 at 09:24:12AM +0530, Prasad Joshi wrote: On Sun, Apr 21, 2013 at 2:49 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Apr 21, 2013 at 02:12:21PM +0530, prasadjoshi.li...@gmail.com wrote: From: Prasad Joshi prasadjoshi.li...@gmail.com Write only SVM_KEY can be used to create a password protected mechanism to clear VM_CR.LOCK. This key can only be set when VM_CR.LOCK is unset. Once this key is set, a write to it compares the written value with already set key. The VM_CR.LOCK is unset only if the key matches. All reads from this MSR must return 0. Signed-off-by: Prasad Joshi prasadjoshi.li...@gmail.com --- arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/svm.c| 31 +++ 2 files changed, 32 insertions(+) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index 892ce40..eda8346 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -170,6 +170,7 @@ #define MSR_AMD64_PATCH_LEVEL0x008b #define MSR_AMD64_TSC_RATIO 0xc104 +#define MSR_AMD64_SVM_KEY0xc0010118 #define MSR_AMD64_NB_CFG 0xc001001f #define MSR_AMD64_PATCH_LOADER 0xc0010020 #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index fcdfdea..73b2dc2 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -150,6 +150,8 @@ struct vcpu_svm { struct nested_state nested; + u64 svm_key; /* Write only SVM Key */ + bool nmi_singlestep; unsigned int3_injected; @@ -3079,6 +3081,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) case MSR_VM_CR: *data = svm-nested.vm_cr_msr; break; + case MSR_AMD64_SVM_KEY: + /* + * SVM Key is password protected mechanism to manage + * VM_CR.LOCK and therefore reading it must always return 0. + */ + *data = 0; + break; case MSR_IA32_UCODE_REV: *data = 0x0165; break; @@ -3132,6 +3141,26 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data) return 0; } +static int svm_set_svm_key(struct kvm_vcpu *vcpu, u64 data) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (!boot_cpu_has(X86_FEATURE_SVML)) + return 1; + Since the feature is emulated why do you care if host cpu supports it? Thanks Gleb. I changed this to checking SVML feature on the guest CPU. Something like this should work if (!(cpuid_edx(SVM_CPUID_FUNC) SVM_FEATURE_SVML)) return 1; This still checks host cpu. You need to use kvm_find_cpuid_entry(). + if (!(svm-nested.vm_cr_msr SVM_VM_CR_SVM_LOCK_MASK) data) { + /* vm_cr.lock not set: set the password key */ + svm-svm_key = data; + } else { + /* vm_cr.lock is set */ + if (data svm-svm_key == data) { + /* password key matched: reset the lock */ + svm-nested.vm_cr_msr = ~SVM_VM_CR_SVM_LOCK_MASK; + } + } That's not enough. First of all KVM does not currently checks SVMDIS before allowing setting of EFER_SVME. Second svm_set_vm_cr() erroneously ignores writes into VM_CR SVMDIS/LOCK if SVMDIS is set, while it should check LOCK. I was aware of these changes (BUGs), however thought they must go in another patch. It should go to another patch, but send them together with the series that implement LOCK feature. Without the fixes LOCK feature does not work. Any ways, I will include them in the next patch series. Thanks. Third when write is done by a userspace during reset the key check should be ignored. I missed this part. I think, CPU reset must set svm_key to 0 and reset the VM_CR.LOCK. Both of these can be part of init_vmcb() which is called during CPU reset. It should ensure that the no key is registered during CPU reset. init_vmcb() is called as part of kvm_vcpu_reset(). kvm_vcpu_reset() is called during vcpu creation and when vcpu receives INIT signal (if irqchip is in kernel). INIT signal does not cause LOCK to be unset, otherwise it would have been pointless. Power-up reset is emulated by userspace configuring vcpu with reset state, for that userspase should be able to set any value there. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] KVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFER
Il 20/04/2013 10:52, Jan Kiszka ha scritto: As we may emulate the loading of EFER on VM-entry and VM-exit, implement the checks that VMX performs on the guest and host values on vmlaunch/ vmresume. Factor out kvm_valid_efer for this purpose which checks for set reserved bits. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Changes in v2: - refactored if clauses as requested by Paolo - fixed typo in comment found my Marcelo arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/vmx.c | 40 +++ arch/x86/kvm/x86.c | 29 ++- 3 files changed, 60 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 599f98b..18635ae 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -809,6 +809,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu, } void kvm_enable_efer_bits(u64); +bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 19aebc7..e3b951f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7327,6 +7327,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) struct vcpu_vmx *vmx = to_vmx(vcpu); int cpu; struct loaded_vmcs *vmcs02; + bool ia32e; if (!nested_vmx_check_permission(vcpu) || !nested_vmx_check_vmcs12(vcpu)) @@ -7415,6 +7416,45 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If the “load IA32_EFER” VM-entry control is 1, the following checks + * are performed on the field for the IA32_EFER MSR: + * - Bits reserved in the IA32_EFER MSR must be 0. + * - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of + * the IA-32e mode guest VM-exit control. It must also be identical + * to bit 8 (LME) if bit 31 in the CR0 field (corresponding to + * CR0.PG) is 1. + */ + if (vmcs12-vm_entry_controls VM_ENTRY_LOAD_IA32_EFER) { + ia32e = (vmcs12-vm_entry_controls VM_ENTRY_IA32E_MODE) != 0; + if (!kvm_valid_efer(vcpu, vmcs12-guest_ia32_efer) || + ia32e != !!(vmcs12-guest_ia32_efer EFER_LMA) || + ((vmcs12-guest_cr0 X86_CR0_PG) + ia32e != !!(vmcs12-guest_ia32_efer EFER_LME))) { + nested_vmx_entry_failure(vcpu, vmcs12, + EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT); + return 1; + } + } + + /* + * If the load IA32_EFER VM-exit control is 1, bits reserved in the + * IA32_EFER MSR must be 0 in the field for that register. In addition, + * the values of the LMA and LME bits in the field must each be that of + * the host address-space size VM-exit control. + */ + if (vmcs12-vm_exit_controls VM_EXIT_LOAD_IA32_EFER) { + ia32e = (vmcs12-vm_exit_controls + VM_EXIT_HOST_ADDR_SPACE_SIZE) != 0; + if (!kvm_valid_efer(vcpu, vmcs12-host_ia32_efer) || + ia32e != !!(vmcs12-host_ia32_efer EFER_LMA) || + ia32e != !!(vmcs12-host_ia32_efer EFER_LME)) { + nested_vmx_entry_failure(vcpu, vmcs12, + EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT); + return 1; + } + } Looks good, difficult to do better with C's operator precedence rules. Paolo + /* * We're finally done with prerequisite checking, and can start with * the nested entry. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 50e2e10..482784d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -845,23 +845,17 @@ static const u32 emulated_msrs[] = { MSR_IA32_MCG_CTL, }; -static int set_efer(struct kvm_vcpu *vcpu, u64 efer) +bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) { - u64 old_efer = vcpu-arch.efer; - if (efer efer_reserved_bits) - return 1; - - if (is_paging(vcpu) - (vcpu-arch.efer EFER_LME) != (efer EFER_LME)) - return 1; + return false; if (efer EFER_FFXSR) { struct kvm_cpuid_entry2 *feat; feat = kvm_find_cpuid_entry(vcpu, 0x8001, 0); if (!feat || !(feat-edx bit(X86_FEATURE_FXSR_OPT))) - return 1; + return false; } if (efer EFER_SVME) { @@ -869,9 +863,24 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer) feat = kvm_find_cpuid_entry(vcpu, 0x8001, 0); if (!feat || !(feat-ecx
Re: [PATCH v7 2/3] tcm_vhost: Add helper to check if endpoint is setup
On Thu, Apr 18, 2013 at 10:38:23AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 04:32:30PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:09:53AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 09:05:53AM +0800, Asias He wrote: Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/tcm_vhost.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 88ebb79..8f05528 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -111,6 +111,24 @@ static bool tcm_vhost_check_feature(struct vhost_scsi *vs, int feature) return ret; } +static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq) +{ + bool ret = false; + + /* + * We can handle the vq only after the endpoint is setup by calling the + * VHOST_SCSI_SET_ENDPOINT ioctl. + * + * TODO: Check that we are running from vhost_worker which acts + * as read-side critical section for vhost kind of RCU. + * See the comments in struct vhost_virtqueue in drivers/vhost/vhost.h + */ + if (rcu_dereference_check(vq-private_data, 1)) + ret = true; + + return ret; +} + You can't use RCU in this way. You need to actually derefence. Instead, just move this within vq mutex and do rcu_dereference_protected. document that this function requires vq mutex. Any difference with: vs_tpg = rcu_dereference_check(vq-private_data, 1); Yes, it's not the same. rcu_dereference_protected says this is generally RCU but here I use it under lock. rcu_dereference_check can only be used if you really dereference later, and you don't. So you want the helper to be this? static bool tcm_vhost_check_endpoint() { struct tcm_vhost_tpg **vs_tpg; vs_tpg = rcu_dereference_check(vq-private_data, 1); if (vs_tpg) return true; else return false } And what difference code the compiler will generate with this: static bool tcm_vhost_check_endpoint() { if (rcu_dereference_check(vq-private_data, 1)) return true else return false } if (vs_tpg) return ture else return false; Note, tcm_vhost_check_endpoint() is called in multiple places. Having a helper is more convenient. I'm fine with the helper, but fix the implementation and fix all callers to have vq mutex when calling. All the caller are in the vhost thread. Why need vq mutex to be taken? static int tcm_vhost_check_true(struct se_portal_group *se_tpg) { return 1; -- 1.8.1.4 -- Asias -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v2] kvm: Emulate MOVBE
Il 21/04/2013 14:23, Borislav Petkov ha scritto: On Sun, Apr 21, 2013 at 01:46:50PM +0200, Borislav Petkov wrote: We probably need something with copying values to a temp variable or so. Basically something like that: case 2: /* * From MOVBE definition: ...When the operand size is 16 bits, * the upper word of the destination register remains unchanged * ... * * Both casting -valptr and -val to u16 breaks strict aliasing * rules so we have to do the operation almost per hand. */ tmp = (u16)ctxt-src.val; ctxt-dst.val = ~0xUL; ctxt-dst.val |= (unsigned long)swab16(tmp); break; This passes all gcc checks, even the stricter ones when building with W=3. I thought the valptr one was ok. I find this one more readable, too. How does the generated code look like? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] tcm_vhost: Add hotplug/hotunplug support
On Sat, Apr 20, 2013 at 10:01:31PM +0300, Michael S. Tsirkin wrote: On Fri, Apr 19, 2013 at 10:34:10AM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 11:21:54AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 04:59:08PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:34:29AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 09:05:54AM +0800, Asias He wrote: In commit 365a7150094 ([SCSI] virtio-scsi: hotplug support for virtio-scsi), hotplug support is added to virtio-scsi. This patch adds hotplug and hotunplug support to tcm_vhost. You can create or delete a LUN in targetcli to hotplug or hotunplug a LUN in guest. Changes in v7: - Add vhost_work_flush for vs-vs_event_work to this series Changes in v6: - Pass tcm_vhost_evt to tcm_vhost_do_evt_work Changes in v5: - Switch to int from u64 to vs_events_nr - Set s-vs_events_dropped flag in tcm_vhost_allocate_evt - Do not nest dev mutex within vq mutex - Use vs_events_lock to protect vs_events_dropped and vs_events_nr - Rebase to target/master Changes in v4: - Drop tcm_vhost_check_endpoint in tcm_vhost_send_evt - Add tcm_vhost_check_endpoint in vhost_scsi_evt_handle_kick Changes in v3: - Separate the bug fix to another thread Changes in v2: - Remove code duplication in tcm_vhost_{hotplug,hotunplug} - Fix racing of vs_events_nr - Add flush fix patch to this series Signed-off-by: Asias He as...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com --- drivers/vhost/tcm_vhost.c | 208 +- drivers/vhost/tcm_vhost.h | 10 +++ 2 files changed, 214 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 8f05528..da2021b 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -66,11 +66,13 @@ enum { * TODO: debug and remove the workaround. */ enum { - VHOST_SCSI_FEATURES = VHOST_FEATURES (~VIRTIO_RING_F_EVENT_IDX) + VHOST_SCSI_FEATURES = (VHOST_FEATURES (~VIRTIO_RING_F_EVENT_IDX)) | + (1ULL VIRTIO_SCSI_F_HOTPLUG) }; #define VHOST_SCSI_MAX_TARGET 256 #define VHOST_SCSI_MAX_VQ 128 +#define VHOST_SCSI_MAX_EVENT 128 struct vhost_scsi { /* Protected by vhost_scsi-dev.mutex */ @@ -82,6 +84,13 @@ struct vhost_scsi { struct vhost_work vs_completion_work; /* cmd completion work item */ struct llist_head vs_completion_list; /* cmd completion queue */ + + struct vhost_work vs_event_work; /* evt injection work item */ + struct llist_head vs_event_list; /* evt injection queue */ + + struct mutex vs_events_lock; /* protect vs_events_dropped,events_nr */ Looking at this code, there are just so many locks now. This does not make me feel safe :) At least, please document lock nesting. Do you see a real problem? Complexity is a real problem. My head already spins. No I don't see a bug, but we need to simplify locking. And I think I see a nice way to do this: 1. move away from a global work to per-event work - so no list 2. remove dynamic allocation of events - so no events_nr 3. set/test overrun flag under the appropriate vq mutex I think that's ideal. But we can move there in small steps. As a first step - why can't we always take the vq mutex lock and drop vs_events_lock? There are really different ways to solve the same problem. You are welcome to implement you ideas. I have a problem with this patch. It just seems too tricky, while it is really trying to do a very simple thing. How tricky is it? There's new lock nesting. I used dev mutex in the past, but it does not work because it introduces a deadlock in vhost_scsi_set_endpoint. vs_events is a per device stat, so I think it deserves a per device lock instead of a per queue lock. There are new places which do lock, read data, unlock, use data. With vs_events_lock? There's new RCU. It is not a *new* RCU. It uses the existing one vq-private_data. All this feels fragile, and will be hard to maintain even though I don't see bugs. So I would prefer it if you tried to simplify this code making it easier to maintain. I sent some ideas on how to do this, but if you like, do it some other way. I think it is pretty simple already. I also made some comments about the coding style but these are less important (except goto again I guess) if these will be te only issues, I'd say we can apply. + bool vs_events_dropped; /* any missed events */ +
Re: [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages
On Sun, Apr 21, 2013 at 10:09:29PM +0800, Xiao Guangrong wrote: On 04/21/2013 09:03 PM, Gleb Natapov wrote: On Tue, Apr 16, 2013 at 02:32:38PM +0800, Xiao Guangrong wrote: This patchset is based on my previous two patchset: [PATCH 0/2] KVM: x86: avoid potential soft lockup and unneeded mmu reload (https://lkml.org/lkml/2013/4/1/2) [PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes (https://lkml.org/lkml/2013/4/1/134) Changlog: V3: completely redesign the algorithm, please see below. This looks pretty complicated. Is it still needed in order to avoid soft lockups after avoid potential soft lockup and unneeded mmu reload patch? Yes. I discussed this point with Marcelo: == BTW, to my honest, i do not think spin_needbreak is a good way - it does not fix the hot-lock contention and it just occupies more cpu time to avoid possible soft lock-ups. Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus create page tables again. zap-all-shadow-page need long time to be finished, the worst case is, it can not completed forever on intensive vcpu and memory usage. So what about mixed approach: use generation numbers and reload roots to quickly invalidate all shadow pages and then do kvm_mmu_zap_all_invalid(). kvm_mmu_zap_all_invalid() is a new function that invalidates only shadow pages with stale generation number (and uses lock break technique). It may traverse active_mmu_pages from tail to head since new shadow pages will be added to the head of the list or it may use invalid slot rmap to find exactly what should be invalidated. I still think the right way to fix this kind of thing is optimization for mmu-lock. == Which parts scare you? Let's find a way to optimize for it. ;). For example, if you do not like unmap_memslot_rmap_nolock(), we can simplify it - We can use walk_shadow_page_lockless_begin() and walk_shadow_page_lockless_end() to protect spte instead of kvm-being_unmaped_rmap. kvm-being_unmaped_rmap is particularly tricky, although looks correct. Additional indirection with rmap ops also does not help following the code. I'd rather have if(slot is invalid) in a couple of places where things should be done differently. In most places it will be WARN_ON(slot is invalid). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v2] kvm: Emulate MOVBE
On Mon, Apr 22, 2013 at 10:53:42AM +0200, Paolo Bonzini wrote: Il 21/04/2013 14:23, Borislav Petkov ha scritto: On Sun, Apr 21, 2013 at 01:46:50PM +0200, Borislav Petkov wrote: We probably need something with copying values to a temp variable or so. Basically something like that: case 2: /* * From MOVBE definition: ...When the operand size is 16 bits, * the upper word of the destination register remains unchanged * ... * * Both casting -valptr and -val to u16 breaks strict aliasing * rules so we have to do the operation almost per hand. */ tmp = (u16)ctxt-src.val; ctxt-dst.val = ~0xUL; ctxt-dst.val |= (unsigned long)swab16(tmp); break; This passes all gcc checks, even the stricter ones when building with W=3. I thought the valptr one was ok. Yep, it looked like that too. And, it could actually really be ok and the gcc's warning here is bogus. I'll try to talk to gcc people about it. I find this one more readable, too. How does the generated code look like? Well, so so: movzwl 112(%rdi), %eax # ctxt_5(D)-src.D.27823.val, tmp87 movq240(%rdi), %rdx # ctxt_5(D)-dst.D.27823.val, tmp89 xorw%dx, %dx# tmp89 rolw$8, %ax #, tmp87 movzwl %ax, %eax # tmp87, tmp91 I have hard time understanding why it is adding this insn here - it can simply drop it and continue with the 64-bit OR. It's not like it changes anything... orq %rdx, %rax # tmp89, tmp91 movq%rax, 240(%rdi) # tmp91, ctxt_5(D)-dst.D.27823.val Btw, I wanted to ask: when kvm commits the results, does it look at ctxt-op_bytes to know exactly how many bytes to write to the guest? Because if it does, we can save ourselves the trouble here. Or does it simply write both the full sizeof(unsigned long) bytes of -src.val and -dst.val to the guest? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v2] kvm: Emulate MOVBE
On Mon, Apr 22, 2013 at 11:38:10AM +0200, Borislav Petkov wrote: On Mon, Apr 22, 2013 at 10:53:42AM +0200, Paolo Bonzini wrote: Il 21/04/2013 14:23, Borislav Petkov ha scritto: On Sun, Apr 21, 2013 at 01:46:50PM +0200, Borislav Petkov wrote: We probably need something with copying values to a temp variable or so. Basically something like that: case 2: /* * From MOVBE definition: ...When the operand size is 16 bits, * the upper word of the destination register remains unchanged * ... * * Both casting -valptr and -val to u16 breaks strict aliasing * rules so we have to do the operation almost per hand. */ tmp = (u16)ctxt-src.val; ctxt-dst.val = ~0xUL; ctxt-dst.val |= (unsigned long)swab16(tmp); break; This passes all gcc checks, even the stricter ones when building with W=3. I thought the valptr one was ok. Yep, it looked like that too. And, it could actually really be ok and the gcc's warning here is bogus. I'll try to talk to gcc people about it. I find this one more readable, too. How does the generated code look like? Well, so so: movzwl 112(%rdi), %eax # ctxt_5(D)-src.D.27823.val, tmp87 movq240(%rdi), %rdx # ctxt_5(D)-dst.D.27823.val, tmp89 xorw%dx, %dx# tmp89 rolw$8, %ax #, tmp87 movzwl %ax, %eax # tmp87, tmp91 I have hard time understanding why it is adding this insn here - it can simply drop it and continue with the 64-bit OR. It's not like it changes anything... orq %rdx, %rax # tmp89, tmp91 movq%rax, 240(%rdi) # tmp91, ctxt_5(D)-dst.D.27823.val Btw, I wanted to ask: when kvm commits the results, does it look at ctxt-op_bytes to know exactly how many bytes to write to the guest? Because if it does, we can save ourselves the trouble here. Or does it simply write both the full sizeof(unsigned long) bytes of -src.val and -dst.val to the guest? No, it does this in case of register operand: static void write_register_operand(struct operand *op) { /* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */ switch (op-bytes) { case 1: *(u8 *)op-addr.reg = (u8)op-val; break; case 2: *(u16 *)op-addr.reg = (u16)op-val; break; case 4: *op-addr.reg = (u32)op-val; break; /* 64b: zero-extend */ case 8: *op-addr.reg = op-val; break; } } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v2] kvm: Emulate MOVBE
On Mon, Apr 22, 2013 at 12:42:46PM +0300, Gleb Natapov wrote: Btw, I wanted to ask: when kvm commits the results, does it look at ctxt-op_bytes to know exactly how many bytes to write to the guest? Because if it does, we can save ourselves the trouble here. Or does it simply write both the full sizeof(unsigned long) bytes of -src.val and -dst.val to the guest? No, it does this in case of register operand: static void write_register_operand(struct operand *op) { /* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */ switch (op-bytes) { case 1: *(u8 *)op-addr.reg = (u8)op-val; break; case 2: *(u16 *)op-addr.reg = (u16)op-val; break; case 4: *op-addr.reg = (u32)op-val; break; /* 64b: zero-extend */ case 8: *op-addr.reg = op-val; break; } } Ok, and for OP_MEM it does look at ctxt-dst.bytes in writeback(), AFAICT. And I see other emulated instructions like POPF, for example, do this: ctxt-dst.bytes = ctxt-op_bytes; Which means, we can drop all the bullshit in em_movbe and even destroy some of the bytes in dst.val but only write out the correct ones. Which means, a simpler code and a lot less jumping through hoops. Would that be the more accepted practice? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFER
On Sun, Apr 14, 2013 at 12:44:09PM +0200, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com As we may emulate the loading of EFER on VM-entry and VM-exit, implement the checks that VMX performs on the guest and host values on vmlaunch/ vmresume. Factor out kvm_valid_efer for this purpose which checks for set reserved bits. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Applied v2 of 1/2 and 2/2. --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/vmx.c | 38 ++ arch/x86/kvm/x86.c | 29 +++-- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b2c7263..28a458f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -805,6 +805,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu, } void kvm_enable_efer_bits(u64); +bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cc2ba55..0d13b29 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7257,6 +7257,44 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If the “load IA32_EFER” VM-entry control is 1, the following checks + * are performed on the field for the IA32_EFER MSR: + * - Bits reserved in the IA32_EFER MSR must be 0. + * - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of + * the IA-32e mode guest VM-exit control. It must also be identical + * to bit 8 (LME) if bit 31 in the CR0 field (corresponding to + * CR0.PG) is 1.1 + */ + if (vmcs12-vm_entry_controls VM_ENTRY_LOAD_IA32_EFER + (!kvm_valid_efer(vcpu, vmcs12-guest_ia32_efer) || + !!(vmcs12-vm_entry_controls VM_ENTRY_IA32E_MODE) != + !!(vmcs12-guest_ia32_efer EFER_LMA) || + (vmcs12-guest_cr0 X86_CR0_PG + !!(vmcs12-vm_entry_controls VM_ENTRY_IA32E_MODE) != + !!(vmcs12-guest_ia32_efer EFER_LME { + nested_vmx_entry_failure(vcpu, vmcs12, + EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT); + return 1; + } + + /* + * If the load IA32_EFER VM-exit control is 1, bits reserved in the + * IA32_EFER MSR must be 0 in the field for that register. In addition, + * the values of the LMA and LME bits in the field must each be that of + * the host address-space size VM-exit control. + */ + if (vmcs12-vm_exit_controls VM_EXIT_LOAD_IA32_EFER + (!kvm_valid_efer(vcpu, vmcs12-host_ia32_efer) || + !!(vmcs12-vm_exit_controls VM_EXIT_HOST_ADDR_SPACE_SIZE) != + !!(vmcs12-host_ia32_efer EFER_LMA) || + !!(vmcs12-vm_exit_controls VM_EXIT_HOST_ADDR_SPACE_SIZE) != + !!(vmcs12-host_ia32_efer EFER_LME))) { + nested_vmx_entry_failure(vcpu, vmcs12, + EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT); + return 1; + } + + /* * We're finally done with prerequisite checking, and can start with * the nested entry. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index eb9927e..f248a3a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -845,23 +845,17 @@ static const u32 emulated_msrs[] = { MSR_IA32_MCG_CTL, }; -static int set_efer(struct kvm_vcpu *vcpu, u64 efer) +bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) { - u64 old_efer = vcpu-arch.efer; - if (efer efer_reserved_bits) - return 1; - - if (is_paging(vcpu) - (vcpu-arch.efer EFER_LME) != (efer EFER_LME)) - return 1; + return false; if (efer EFER_FFXSR) { struct kvm_cpuid_entry2 *feat; feat = kvm_find_cpuid_entry(vcpu, 0x8001, 0); if (!feat || !(feat-edx bit(X86_FEATURE_FXSR_OPT))) - return 1; + return false; } if (efer EFER_SVME) { @@ -869,9 +863,24 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer) feat = kvm_find_cpuid_entry(vcpu, 0x8001, 0); if (!feat || !(feat-ecx bit(X86_FEATURE_SVM))) - return 1; + return false; } + return true; +} +EXPORT_SYMBOL_GPL(kvm_valid_efer); + +static int set_efer(struct kvm_vcpu *vcpu, u64 efer) +{ + u64 old_efer = vcpu-arch.efer; + + if (!kvm_valid_efer(vcpu, efer)) + return 1; + + if (is_paging(vcpu) + (vcpu-arch.efer EFER_LME) != (efer EFER_LME)) + return 1; + efer = ~EFER_LMA;
Re: [PATCH -v2] kvm: Emulate MOVBE
On Mon, Apr 22, 2013 at 11:52:03AM +0200, Borislav Petkov wrote: On Mon, Apr 22, 2013 at 12:42:46PM +0300, Gleb Natapov wrote: Btw, I wanted to ask: when kvm commits the results, does it look at ctxt-op_bytes to know exactly how many bytes to write to the guest? Because if it does, we can save ourselves the trouble here. Or does it simply write both the full sizeof(unsigned long) bytes of -src.val and -dst.val to the guest? No, it does this in case of register operand: static void write_register_operand(struct operand *op) { /* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */ switch (op-bytes) { case 1: *(u8 *)op-addr.reg = (u8)op-val; break; case 2: *(u16 *)op-addr.reg = (u16)op-val; break; case 4: *op-addr.reg = (u32)op-val; break; /* 64b: zero-extend */ case 8: *op-addr.reg = op-val; break; } } Ok, and for OP_MEM it does look at ctxt-dst.bytes in writeback(), AFAICT. And I see other emulated instructions like POPF, for example, do this: ctxt-dst.bytes = ctxt-op_bytes; Which means, we can drop all the bullshit in em_movbe and even destroy some of the bytes in dst.val but only write out the correct ones. Which means, a simpler code and a lot less jumping through hoops. Would that be the more accepted practice? For most instructions the decoder already sets op-bytes to correct value, given that all flags a correctly specified in opcode table. Explicit op-bytes setting should be done only if it cannot be expressed by opcode flags. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: kvm
Hi I was wondering if anyone could help me with an issue with KVM and ISCSI. If we restart a controller on our EqualLogic SAN or there are any network interruptions on the storage network, KVM guests throw a wobbler and their files systems go into read only(centos 5.9 guest with virtio driver). I have read a few forums that indicate you can set disk timeout values on the guests themselves but this is not possible using the virtio driver, which is what we are currently using. Is there any way we can instruct KVM to pause the vm's if there is a storage failure and resume them when the storage comes back online ? We are currently running Centos 6.4. There seems to be a werror='stop' and rerror='stop' options to achieve this but if I try to put these in options in the libvirt xml file for a vm, libvirt appears to be removing them. Thanks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote: If we always incremented the ticket number by 2 (instead of 1), then we could use the lower bit of the ticket number as the spinlock. ISTR that paravirt ticket locks already do that and use the lsb to indicate the unlock needs to perform wakeups. Also, since all of this is virt nonsense, shouldn't it live in the paravirt ticket lock code and leave the native code as is? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages
On Sun, Apr 21, 2013 at 12:35:08PM -0300, Marcelo Tosatti wrote: On Sun, Apr 21, 2013 at 12:27:51PM -0300, Marcelo Tosatti wrote: On Sun, Apr 21, 2013 at 04:03:46PM +0300, Gleb Natapov wrote: On Tue, Apr 16, 2013 at 02:32:38PM +0800, Xiao Guangrong wrote: This patchset is based on my previous two patchset: [PATCH 0/2] KVM: x86: avoid potential soft lockup and unneeded mmu reload (https://lkml.org/lkml/2013/4/1/2) [PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes (https://lkml.org/lkml/2013/4/1/134) Changlog: V3: completely redesign the algorithm, please see below. This looks pretty complicated. Is it still needed in order to avoid soft lockups after avoid potential soft lockup and unneeded mmu reload patch? Do not want kvm_set_memory (cases: DELETE/MOVE/CREATES) to be suspectible to: vcpu 1 | kvm_set_memory create shadow page nuke shadow page create shadow page nuke shadow page Which is guest triggerable behavior with spinlock preemption algorithm. Not only guest triggerable as in the sense of a malicious guest, but condition above can be induced by host workload with non-malicious guest system. Is the problem that newly created shadow pages are immediately zapped? Shouldn't generation number/kvm_mmu_zap_all_invalid() idea described here https://lkml.org/lkml/2013/4/22/111 solve this? Also kvm_set_memory being relatively fast with huge memory guests is nice (which is what Xiaos idea allows). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 07:51 AM, Peter Zijlstra wrote: On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote: If we always incremented the ticket number by 2 (instead of 1), then we could use the lower bit of the ticket number as the spinlock. ISTR that paravirt ticket locks already do that and use the lsb to indicate the unlock needs to perform wakeups. Also, since all of this is virt nonsense, shouldn't it live in the paravirt ticket lock code and leave the native code as is? Sure, but that is still no reason not to have the virt implementation be as fast as possible, and share the same data type as the non-virt implementation. Also, is it guaranteed that the native spin_lock code has not been called yet before we switch over to the paravirt functions? If the native spin_lock code has been called already at that time, the native code would still need to be modified to increment the ticket number by 2, so we end up with a compatible value in each spin lock's .tickets field, and prevent a deadlock after we switch over to the paravirt variant. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] tcm_vhost: Add hotplug/hotunplug support
On Mon, Apr 22, 2013 at 05:20:24PM +0800, Asias He wrote: On Sat, Apr 20, 2013 at 10:01:31PM +0300, Michael S. Tsirkin wrote: On Fri, Apr 19, 2013 at 10:34:10AM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 11:21:54AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 04:59:08PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:34:29AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 09:05:54AM +0800, Asias He wrote: In commit 365a7150094 ([SCSI] virtio-scsi: hotplug support for virtio-scsi), hotplug support is added to virtio-scsi. This patch adds hotplug and hotunplug support to tcm_vhost. You can create or delete a LUN in targetcli to hotplug or hotunplug a LUN in guest. Changes in v7: - Add vhost_work_flush for vs-vs_event_work to this series Changes in v6: - Pass tcm_vhost_evt to tcm_vhost_do_evt_work Changes in v5: - Switch to int from u64 to vs_events_nr - Set s-vs_events_dropped flag in tcm_vhost_allocate_evt - Do not nest dev mutex within vq mutex - Use vs_events_lock to protect vs_events_dropped and vs_events_nr - Rebase to target/master Changes in v4: - Drop tcm_vhost_check_endpoint in tcm_vhost_send_evt - Add tcm_vhost_check_endpoint in vhost_scsi_evt_handle_kick Changes in v3: - Separate the bug fix to another thread Changes in v2: - Remove code duplication in tcm_vhost_{hotplug,hotunplug} - Fix racing of vs_events_nr - Add flush fix patch to this series Signed-off-by: Asias He as...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com --- drivers/vhost/tcm_vhost.c | 208 +- drivers/vhost/tcm_vhost.h | 10 +++ 2 files changed, 214 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 8f05528..da2021b 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -66,11 +66,13 @@ enum { * TODO: debug and remove the workaround. */ enum { - VHOST_SCSI_FEATURES = VHOST_FEATURES (~VIRTIO_RING_F_EVENT_IDX) + VHOST_SCSI_FEATURES = (VHOST_FEATURES (~VIRTIO_RING_F_EVENT_IDX)) | + (1ULL VIRTIO_SCSI_F_HOTPLUG) }; #define VHOST_SCSI_MAX_TARGET256 #define VHOST_SCSI_MAX_VQ128 +#define VHOST_SCSI_MAX_EVENT 128 struct vhost_scsi { /* Protected by vhost_scsi-dev.mutex */ @@ -82,6 +84,13 @@ struct vhost_scsi { struct vhost_work vs_completion_work; /* cmd completion work item */ struct llist_head vs_completion_list; /* cmd completion queue */ + + struct vhost_work vs_event_work; /* evt injection work item */ + struct llist_head vs_event_list; /* evt injection queue */ + + struct mutex vs_events_lock; /* protect vs_events_dropped,events_nr */ Looking at this code, there are just so many locks now. This does not make me feel safe :) At least, please document lock nesting. Do you see a real problem? Complexity is a real problem. My head already spins. No I don't see a bug, but we need to simplify locking. And I think I see a nice way to do this: 1. move away from a global work to per-event work - so no list 2. remove dynamic allocation of events - so no events_nr 3. set/test overrun flag under the appropriate vq mutex I think that's ideal. But we can move there in small steps. As a first step - why can't we always take the vq mutex lock and drop vs_events_lock? There are really different ways to solve the same problem. You are welcome to implement you ideas. I have a problem with this patch. It just seems too tricky, while it is really trying to do a very simple thing. How tricky is it? It has its own locking scheme which seems unwarranted. Find a way to use the existing locking please. There's new lock nesting. I used dev mutex in the past, but it does not work because it introduces a deadlock in vhost_scsi_set_endpoint. vs_events is a per device stat, so I think it deserves a per device lock instead of a per queue lock. We don't add a lock per field, it just does not make sense. There are new places which do lock, read data, unlock, use data. With vs_events_lock? With this patch. There's new RCU. It is not a *new* RCU. It uses the existing one vq-private_data. Incorrectly. It should be used on the workqueue where it's dereferenced. All this feels fragile, and will be hard to maintain even though I don't see bugs. So I would
Re: [PATCH v7 2/3] tcm_vhost: Add helper to check if endpoint is setup
On Mon, Apr 22, 2013 at 04:53:27PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:38:23AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 04:32:30PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:09:53AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 09:05:53AM +0800, Asias He wrote: Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/tcm_vhost.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 88ebb79..8f05528 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -111,6 +111,24 @@ static bool tcm_vhost_check_feature(struct vhost_scsi *vs, int feature) return ret; } +static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq) +{ + bool ret = false; + + /* + * We can handle the vq only after the endpoint is setup by calling the + * VHOST_SCSI_SET_ENDPOINT ioctl. + * + * TODO: Check that we are running from vhost_worker which acts + * as read-side critical section for vhost kind of RCU. + * See the comments in struct vhost_virtqueue in drivers/vhost/vhost.h + */ + if (rcu_dereference_check(vq-private_data, 1)) + ret = true; + + return ret; +} + You can't use RCU in this way. You need to actually derefence. Instead, just move this within vq mutex and do rcu_dereference_protected. document that this function requires vq mutex. Any difference with: vs_tpg = rcu_dereference_check(vq-private_data, 1); Yes, it's not the same. rcu_dereference_protected says this is generally RCU but here I use it under lock. rcu_dereference_check can only be used if you really dereference later, and you don't. So you want the helper to be this? static bool tcm_vhost_check_endpoint() { struct tcm_vhost_tpg **vs_tpg; vs_tpg = rcu_dereference_check(vq-private_data, 1); if (vs_tpg) return true; else return false } And what difference code the compiler will generate with this: static bool tcm_vhost_check_endpoint() { if (rcu_dereference_check(vq-private_data, 1)) return true else return false } No, what I want is that private data is either dereferenced (not tested against NULL) or used under vq mutex. In this case the second option is the easiest and the cleanest. So move it under mutex and then test it safely without rcu. if (vs_tpg) return ture else return false; Note, tcm_vhost_check_endpoint() is called in multiple places. Having a helper is more convenient. I'm fine with the helper, but fix the implementation and fix all callers to have vq mutex when calling. All the caller are in the vhost thread. Why need vq mutex to be taken? vhost thread is an implementation detail, people are experimenting with various threading strategies. In particular, code must keep working when we move it to use a workqueue. static int tcm_vhost_check_true(struct se_portal_group *se_tpg) { return 1; -- 1.8.1.4 -- Asias -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages
On Mon, 22 Apr 2013 15:39:38 +0300 Gleb Natapov g...@redhat.com wrote: Do not want kvm_set_memory (cases: DELETE/MOVE/CREATES) to be suspectible to: vcpu 1| kvm_set_memory create shadow page nuke shadow page create shadow page nuke shadow page Which is guest triggerable behavior with spinlock preemption algorithm. Not only guest triggerable as in the sense of a malicious guest, but condition above can be induced by host workload with non-malicious guest system. Is the problem that newly created shadow pages are immediately zapped? Shouldn't generation number/kvm_mmu_zap_all_invalid() idea described here https://lkml.org/lkml/2013/4/22/111 solve this? I guess so. That's what Avi described when he tried to achieve lockless TLB flushes. Mixing that idea with Xiao's approach will achieve reasonably nice performance, I think. Various improvements should be added later on top of that if needed. Also kvm_set_memory being relatively fast with huge memory guests is nice (which is what Xiaos idea allows). I agree with this point. But if so, it should be actually measured on such guests, even if the algorithm looks promising. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v2] kvm: Emulate MOVBE
On Mon, Apr 22, 2013 at 12:58:12PM +0300, Gleb Natapov wrote: For most instructions the decoder already sets op-bytes to correct value, given that all flags a correctly specified in opcode table. Explicit op-bytes setting should be done only if it cannot be expressed by opcode flags. MOVBE encodes operands in ModRM and operand size is determined by the effective operand size. By looking at that switch(mode) thing near the beginning of x86_decode_insn, we make sure ctxt-op_bytes is set accordingly. Then, we have the following definitions for MOVBE: + [0xf0] = I(DstReg | SrcMem | ModRM | Mov | ThreeByte | EmulateOnUD, em_movbe), + [0xf1] = I(DstMem | SrcReg | ModRM | Mov | ThreeByte | EmulateOnUD, em_movbe), and from looking at decode_operand(), it makes sure that op-bytes gets the correct value since we have the proper {Src,Dst}{Reg,Mem} flags in the insn definition. So everything is fine, I'll make sure it works that way too, though, when testing. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 55201] host panic when creating guest, doing scp and killing QEMU process continuously
https://bugzilla.kernel.org/show_bug.cgi?id=55201 Jay Ren yongjie@intel.com changed: What|Removed |Added Status|VERIFIED|CLOSED -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 55421] igb VF can't work in KVM guest
https://bugzilla.kernel.org/show_bug.cgi?id=55421 Jay Ren yongjie@intel.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID --- Comment #5 from Jay Ren yongjie@intel.com 2013-04-22 14:14:26 --- As this is the latest igb drive change, I'll close this bug. There are two methods to work around or fix this issue. 1. update igbvf driver or kernel of the guest 2. use ip command to set VF's MAC before assigning to a guest (e.g. ip link set eth0 vf 0 mac 00:1E:67:65:93:01 ) -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 55421] igb VF can't work in KVM guest
https://bugzilla.kernel.org/show_bug.cgi?id=55421 Jay Ren yongjie@intel.com changed: What|Removed |Added Status|RESOLVED|CLOSED -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 56971] New: [nested virt] L1 CPU Stuck when booting a L2 guest
https://bugzilla.kernel.org/show_bug.cgi?id=56971 Summary: [nested virt] L1 CPU Stuck when booting a L2 guest Product: Virtualization Version: unspecified Kernel Version: 3.9.0-rc3 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: yongjie@intel.com Regression: No Created an attachment (id=99651) -- (https://bugzilla.kernel.org/attachment.cgi?id=99651) call trace in L1 guest Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Linux kvm.git next Commit:c0d1c770c05ac7051df86914f9627b68f29c1d67 qemu-kvm.git Commit:8912bdea01e8671e59fe0287314379be9c1f40ec Host Kernel Version:3.9.0-rc3 Hardware: Sandy Bridge - EP Bug detailed description: -- create L1 guest with -cpu host, then create a L2 guest. We found CPU Stuck in L1 guest when booting L2 guest. This should be a regression. kvm(next)+ qemu-kvm = result 188424ba + 8912bdea = good c0d1c770 + 8912bdea = bad Reproduce steps: 1. create L1 guests: qemu-system-x86_64 --enable-kvm -m 10240 -smp 8 -net nic,macaddr=00:12:45:67:2B:1C -net tap,script=/etc/kvm/qemu-ifup nested-kvm-rhel6u4.qcow -cpu host 2. create L2 guest qemu-system-x86_64 --enable-kvm -m 1024 -smp 2 -net none rhel6u4.img Current result: L1 call trace (CPU Stuck) Expected result: L1 and L2 work fine Basic root-causing log: -- [ 312.158002] BUG: soft lockup - CPU#7 stuck for 22s! [qemu-system-x86:2353] [ 312.158002] Modules linked in: bridge nfsv3 nfs_acl nfsv4 auth_rpcgss nfs fuse fscache dns_resolver lockd sunrpc 8021q garp stp llc binfmt_misc uinput ppdev parport_pc parport kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr e1000 cirrus ttm drm_kms_helper drm i2c_piix4 i2c_core floppy(F) [ 312.158002] CPU 7 [ 312.158002] Pid: 2353, comm: qemu-system-x86 Tainted: GF3.8.5 #1 Bochs Bochs [ 312.158002] RIP: 0010:[810c0982] [810c0982] smp_call_function_many+0x202/0x270 [ 312.158002] RSP: 0018:88027a21dce8 EFLAGS: 0202 [ 312.158002] RAX: 0008 RBX: 00088ede8700 RCX: 0004 [ 312.158002] RDX: 0004 RSI: 0080 RDI: 0292 [ 312.158002] RBP: 88027a21dd38 R08: 88029fdd4890 R09: 0080 [ 312.158002] R10: 0002 R11: R12: 0292 [ 312.158002] R13: 0001 R14: ea0009e34640 R15: 00480008 [ 312.158002] FS: 7ff26ef41700() GS:88029fdc() knlGS: [ 312.158002] CS: 0010 DS: ES: CR0: 80050033 [ 312.158002] CR2: 7ff271248000 CR3: 0002902a7000 CR4: 000427e0 [ 312.158002] DR0: DR1: DR2: [ 312.158002] DR3: DR6: 0ff0 DR7: 0400 [ 312.158002] Process qemu-system-x86 (pid: 2353, threadinfo 88027a21c000, task 88027d368000) [ 312.158002] Stack: [ 312.158002] 88027d368000 01ff88027d368000 88027a21dd48 8104f110 [ 312.158002] 7ff23fff 88028ede89c0 88028ede8700 7ff267dff000 [ 312.158002] 88027a21de18 7ff26800 88027a21dd68 8104ed5e [ 312.158002] Call Trace: [ 312.158002] [8104f110] ? flush_tlb_mm_range+0x250/0x250 [ 312.158002] [8104ed5e] native_flush_tlb_others+0x2e/0x30 [ 312.158002] [8104ef30] flush_tlb_mm_range+0x70/0x250 [ 312.158002] [8115d822] tlb_flush_mmu+0xa2/0xb0 [ 312.158002] [8115e15c] tlb_finish_mmu+0x1c/0x50 [ 312.158002] [8116552a] unmap_region+0xea/0x110 [ 312.158002] [811672c2] ? __split_vma+0x1e2/0x230 [ 312.158002] [81167c14] do_munmap+0x274/0x3a0 [ 312.158002] [81167d91] vm_munmap+0x51/0x80 [ 312.158002] [81167dec] sys_munmap+0x2c/0x40 [ 312.158002] [81653999] system_call_fastpath+0x16/0x1b [ 312.158002] Code: a6 58 00 0f ae f0 4c 89 e7 ff 15 e2 08 b6 00 80 7d bf 00 0f 84 89 fe ff ff f6 43 20 01 0f 84 7f fe ff ff 66 0f 1f 44 00 00 f3 90 f6 43 20 01 75 f8 e9 6c fe ff ff 0f 1f 00 4c 89 ea 4c 89 f6 44 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 56981] New: [SR-IOV] Intel I350 NIC VF cannot work in a Windows 2008 guest.
https://bugzilla.kernel.org/show_bug.cgi?id=56981 Summary: [SR-IOV] Intel I350 NIC VF cannot work in a Windows 2008 guest. Product: Virtualization Version: unspecified Kernel Version: 3.9.0-rc3 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: yongjie@intel.com Regression: No Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Windows kvm.git Commit:79558f112fc0352e057f7b5e158e3d88b8b62c60 qemu-kvm Commit:8912bdea01e8671e59fe0287314379be9c1f40ec Host Kernel Version:3.9.0-rc3 Hardware: Sandy Bridge - EP Bug detailed description: -- The assigned Intel I350 NIC VF (a igbvf) cannot work in a Windows 2008 guest. 1. If the guest is Windows 7/8/2012 or RHEL6.x , the Intel I350 NIC VF can work fine in the guest. 2. Intel 82576 NIC VF (also igbvf) can work in a Windows 2008 guest. Reproduce steps: 1. ./pcistub.sh -h 0b:10.0 ( to hide a device) 2. qemu-system-x86_64 -enable-kvm -m 4096 -smp 4 -device pci-assign,host=0b:10.0 -net none -hda win2k8.img Current result: Intel I350 NIC VF cannot work in win2k8 guest Expected result: Intel I350 NIC VF can work in win2k8 guest -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 56971] [nested virt] L1 CPU Stuck when booting a L2 guest
https://bugzilla.kernel.org/show_bug.cgi?id=56971 Jay Ren yongjie@intel.com changed: What|Removed |Added Regression|No |Yes -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 56981] [SR-IOV] Intel I350 NIC VF cannot work in a Windows 2008 guest.
https://bugzilla.kernel.org/show_bug.cgi?id=56981 --- Comment #1 from Jay Ren yongjie@intel.com 2013-04-22 14:43:00 --- Created an attachment (id=99661) -- (https://bugzilla.kernel.org/attachment.cgi?id=99661) lspci info of the igbvf in kvm host -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 56981] [SR-IOV] Intel I350 NIC VF cannot work in a Windows 2008 guest.
https://bugzilla.kernel.org/show_bug.cgi?id=56981 --- Comment #2 from Jay Ren yongjie@intel.com 2013-04-22 14:43:25 --- Created an attachment (id=99671) -- (https://bugzilla.kernel.org/attachment.cgi?id=99671) host dmesg -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/3] tcm_vhost: Add helper to check if endpoint is setup
On Mon, Apr 22, 2013 at 04:28:07PM +0300, Michael S. Tsirkin wrote: On Mon, Apr 22, 2013 at 04:53:27PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:38:23AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 04:32:30PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:09:53AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 09:05:53AM +0800, Asias He wrote: Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/tcm_vhost.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 88ebb79..8f05528 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -111,6 +111,24 @@ static bool tcm_vhost_check_feature(struct vhost_scsi *vs, int feature) return ret; } +static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq) +{ + bool ret = false; + + /* + * We can handle the vq only after the endpoint is setup by calling the + * VHOST_SCSI_SET_ENDPOINT ioctl. + * + * TODO: Check that we are running from vhost_worker which acts + * as read-side critical section for vhost kind of RCU. + * See the comments in struct vhost_virtqueue in drivers/vhost/vhost.h + */ + if (rcu_dereference_check(vq-private_data, 1)) + ret = true; + + return ret; +} + You can't use RCU in this way. You need to actually derefence. Instead, just move this within vq mutex and do rcu_dereference_protected. document that this function requires vq mutex. Any difference with: vs_tpg = rcu_dereference_check(vq-private_data, 1); Yes, it's not the same. rcu_dereference_protected says this is generally RCU but here I use it under lock. rcu_dereference_check can only be used if you really dereference later, and you don't. So you want the helper to be this? static bool tcm_vhost_check_endpoint() { struct tcm_vhost_tpg **vs_tpg; vs_tpg = rcu_dereference_check(vq-private_data, 1); if (vs_tpg) return true; else return false } And what difference code the compiler will generate with this: static bool tcm_vhost_check_endpoint() { if (rcu_dereference_check(vq-private_data, 1)) return true else return false } No, what I want is that private data is either dereferenced (not tested against NULL) or used under vq mutex. In this case the second option is the easiest and the cleanest. So move it under mutex and then test it safely without rcu. Why you use it in -net? The test against NULL is under in the rcu read critical section. What is wrong with this really? if (vs_tpg) return ture else return false; Note, tcm_vhost_check_endpoint() is called in multiple places. Having a helper is more convenient. I'm fine with the helper, but fix the implementation and fix all callers to have vq mutex when calling. All the caller are in the vhost thread. Why need vq mutex to be taken? vhost thread is an implementation detail, people are experimenting with various threading strategies. In particular, code must keep working when we move it to use a workqueue. Well, you should remove your vhost RCU thing first and use real RCU explicitly. Does -net keep working after vhost changes to workqueue? Since when this becomes a requirement? static int tcm_vhost_check_true(struct se_portal_group *se_tpg) { return 1; -- 1.8.1.4 -- Asias -- Asias -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] tcm_vhost: Add hotplug/hotunplug support
On Mon, Apr 22, 2013 at 04:17:04PM +0300, Michael S. Tsirkin wrote: On Mon, Apr 22, 2013 at 05:20:24PM +0800, Asias He wrote: On Sat, Apr 20, 2013 at 10:01:31PM +0300, Michael S. Tsirkin wrote: On Fri, Apr 19, 2013 at 10:34:10AM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 11:21:54AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 04:59:08PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:34:29AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 09:05:54AM +0800, Asias He wrote: In commit 365a7150094 ([SCSI] virtio-scsi: hotplug support for virtio-scsi), hotplug support is added to virtio-scsi. This patch adds hotplug and hotunplug support to tcm_vhost. You can create or delete a LUN in targetcli to hotplug or hotunplug a LUN in guest. Changes in v7: - Add vhost_work_flush for vs-vs_event_work to this series Changes in v6: - Pass tcm_vhost_evt to tcm_vhost_do_evt_work Changes in v5: - Switch to int from u64 to vs_events_nr - Set s-vs_events_dropped flag in tcm_vhost_allocate_evt - Do not nest dev mutex within vq mutex - Use vs_events_lock to protect vs_events_dropped and vs_events_nr - Rebase to target/master Changes in v4: - Drop tcm_vhost_check_endpoint in tcm_vhost_send_evt - Add tcm_vhost_check_endpoint in vhost_scsi_evt_handle_kick Changes in v3: - Separate the bug fix to another thread Changes in v2: - Remove code duplication in tcm_vhost_{hotplug,hotunplug} - Fix racing of vs_events_nr - Add flush fix patch to this series Signed-off-by: Asias He as...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com --- drivers/vhost/tcm_vhost.c | 208 +- drivers/vhost/tcm_vhost.h | 10 +++ 2 files changed, 214 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 8f05528..da2021b 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -66,11 +66,13 @@ enum { * TODO: debug and remove the workaround. */ enum { - VHOST_SCSI_FEATURES = VHOST_FEATURES (~VIRTIO_RING_F_EVENT_IDX) + VHOST_SCSI_FEATURES = (VHOST_FEATURES (~VIRTIO_RING_F_EVENT_IDX)) | + (1ULL VIRTIO_SCSI_F_HOTPLUG) }; #define VHOST_SCSI_MAX_TARGET 256 #define VHOST_SCSI_MAX_VQ 128 +#define VHOST_SCSI_MAX_EVENT 128 struct vhost_scsi { /* Protected by vhost_scsi-dev.mutex */ @@ -82,6 +84,13 @@ struct vhost_scsi { struct vhost_work vs_completion_work; /* cmd completion work item */ struct llist_head vs_completion_list; /* cmd completion queue */ + + struct vhost_work vs_event_work; /* evt injection work item */ + struct llist_head vs_event_list; /* evt injection queue */ + + struct mutex vs_events_lock; /* protect vs_events_dropped,events_nr */ Looking at this code, there are just so many locks now. This does not make me feel safe :) At least, please document lock nesting. Do you see a real problem? Complexity is a real problem. My head already spins. No I don't see a bug, but we need to simplify locking. And I think I see a nice way to do this: 1. move away from a global work to per-event work - so no list 2. remove dynamic allocation of events - so no events_nr 3. set/test overrun flag under the appropriate vq mutex I think that's ideal. But we can move there in small steps. As a first step - why can't we always take the vq mutex lock and drop vs_events_lock? There are really different ways to solve the same problem. You are welcome to implement you ideas. I have a problem with this patch. It just seems too tricky, while it is really trying to do a very simple thing. How tricky is it? It has its own locking scheme which seems unwarranted. Find a way to use the existing locking please. Why is this tricky? There's new lock nesting. I used dev mutex in the past, but it does not work because it introduces a deadlock in vhost_scsi_set_endpoint. vs_events is a per device stat, so I think it deserves a per device lock instead of a per queue lock. We don't add a lock per field, it just does not make sense. There are new places which do lock, read data, unlock, use data.
KVM call agenda for 2013-04-23
Hi Please send in any agenda topics you are interested in. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-04-23
Il 22/04/2013 18:11, Juan Quintela ha scritto: Hi Please send in any agenda topics you are interested in. * 1.5 pending patches Paolo Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/3] tcm_vhost: Add helper to check if endpoint is setup
On Mon, Apr 22, 2013 at 11:00:19PM +0800, Asias He wrote: On Mon, Apr 22, 2013 at 04:28:07PM +0300, Michael S. Tsirkin wrote: On Mon, Apr 22, 2013 at 04:53:27PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:38:23AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 04:32:30PM +0800, Asias He wrote: On Thu, Apr 18, 2013 at 10:09:53AM +0300, Michael S. Tsirkin wrote: On Thu, Apr 18, 2013 at 09:05:53AM +0800, Asias He wrote: Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/tcm_vhost.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 88ebb79..8f05528 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -111,6 +111,24 @@ static bool tcm_vhost_check_feature(struct vhost_scsi *vs, int feature) return ret; } +static bool tcm_vhost_check_endpoint(struct vhost_virtqueue *vq) +{ + bool ret = false; + + /* + * We can handle the vq only after the endpoint is setup by calling the + * VHOST_SCSI_SET_ENDPOINT ioctl. + * + * TODO: Check that we are running from vhost_worker which acts + * as read-side critical section for vhost kind of RCU. + * See the comments in struct vhost_virtqueue in drivers/vhost/vhost.h + */ + if (rcu_dereference_check(vq-private_data, 1)) + ret = true; + + return ret; +} + You can't use RCU in this way. You need to actually derefence. Instead, just move this within vq mutex and do rcu_dereference_protected. document that this function requires vq mutex. Any difference with: vs_tpg = rcu_dereference_check(vq-private_data, 1); Yes, it's not the same. rcu_dereference_protected says this is generally RCU but here I use it under lock. rcu_dereference_check can only be used if you really dereference later, and you don't. So you want the helper to be this? static bool tcm_vhost_check_endpoint() { struct tcm_vhost_tpg **vs_tpg; vs_tpg = rcu_dereference_check(vq-private_data, 1); if (vs_tpg) return true; else return false } And what difference code the compiler will generate with this: static bool tcm_vhost_check_endpoint() { if (rcu_dereference_check(vq-private_data, 1)) return true else return false } No, what I want is that private data is either dereferenced (not tested against NULL) or used under vq mutex. In this case the second option is the easiest and the cleanest. So move it under mutex and then test it safely without rcu. Why you use it in -net? The test against NULL is under in the rcu read critical section. What is wrong with this really? Look at the code and see that barriers in rcu_dereference assume that pointer really is dereferenced. You don't dereference it, so rcu_dereference_XXX is wrong. if (vs_tpg) return ture else return false; Note, tcm_vhost_check_endpoint() is called in multiple places. Having a helper is more convenient. I'm fine with the helper, but fix the implementation and fix all callers to have vq mutex when calling. All the caller are in the vhost thread. Why need vq mutex to be taken? vhost thread is an implementation detail, people are experimenting with various threading strategies. In particular, code must keep working when we move it to use a workqueue. Well, you should remove your vhost RCU thing first and use real RCU explicitly. I don't think it's useful or possible. Does -net keep working after vhost changes to workqueue? I think so, yes. Since when this becomes a requirement? It's the requirement to use the abstractions that are in place. Devices should use the interfaces and not assume vhost implementation, it happens to be the case but it's not guaranteed. static int tcm_vhost_check_true(struct se_portal_group *se_tpg) { return 1; -- 1.8.1.4 -- Asias -- Asias -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2013-04-23
On 04/22/2013 10:11 AM, Juan Quintela wrote: Hi Please send in any agenda topics you are interested in. * What can libvirt expect in 1.5 for introspection of command-line support? * What are the rules for adding optional parameters to existing QMP commands? Would it help if we had introspection of QMP commands? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: Preemptable Ticket Spinlock
On Mon, Apr 22, 2013 at 1:58 AM, Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote: The intuition is to downgrade a fair lock to an unfair lock automatically upon preemption, and preserve the fairness otherwise. I also hope being little unfair, does not affect the original intention of introducing ticket spinlocks too. Some discussions were here long back in this thead, https://lkml.org/lkml/2010/6/3/331 Good point. I also have the question that why not use unfair lock under virtual environment, and is fairness really a big issue. However, given that current kernel is using ticket lock, I assume fairness is a necessary spinlock feature. Regard the fairness of preemptable-lock, I did a user space experiment using 8 pCPU to compete on one spinlock, and count the lock acquisition times. Results show that lock acquisition counts are *almost* evenly distributed between threads in preemptable-lock. AFAIU, the experiments are on non PLE machines and it would be worth experimenting on PLE machines too. and also bigger machines. (we may get some surprises there otherwise). 'll wait for your next iteration of the patches with using lower bit changes. Yes, they are on PLE machines. Current implementation and evaluation is still at the stage of concept proving. More experiments (with PLE, bigger machiens, etc) are needed to better understand the lock behavior. +#define TIMEOUT_UNIT (114) This value seem to be at the higher end. But I hope you have experimented enough to come up with this. Better again to test all these tunables?? on PLE machines too. I actually didn't tune this parameter at all... But yes, find a better value is necessary if it is in industry code. static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { register struct __raw_tickets inc = { .tail = 1 }; + unsigned int timeout = 0; + __ticket_t current_head; inc = xadd(lock-tickets, inc); - + if (likely(inc.head == inc.tail)) + goto spin; + + timeout = TIMEOUT_UNIT * (inc.tail - inc.head); + do { + current_head = ACCESS_ONCE(lock-tickets.head); + if (inc.tail = current_head) { + goto spin; + } else if (inc.head != current_head) { + inc.head = current_head; + timeout = TIMEOUT_UNIT * (inc.tail - inc.head); Good idea indeed to base the loop on head and tail difference.. But for virtualization I believe this directly proportional notion is little tricky too. Could you explain your concern a little bit more? Thanks --Jiannan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
suggesting wording fixes for virtio-spec 0.9.5
Hi, (I'm not subscribed to either list,) using the word descriptor is misleading in the following sections: 2.4.1.2 Updating The Available Ring [...] However, in general we can add many descriptors before we update the idx field (at which point they become visible to the device), so we keep a counter of how many we've added: [...] and 2.4.1.3 Updating The Index Field Once the idx field of the virtqueue is updated, the device will be able to access the descriptor entries we've created and the memory they refer to. [...] (The word descriptor in the above language is the reason I mis-implemented the virtio-blk guest driver in OVMF.) In fact the available ring tracks *head* descriptors only. I suggest s/many descriptors/many separate descriptor chains/ s/descriptor entries/separate descriptor chains/ for the above. Similarly, 2.3.4 Available Ring should start with something like The available ring describes what descriptor chains we are offering the device: each entry of the available ring refers to the head descriptor of a separate descriptor chain. Thanks Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-net mq vq initialization
On 04/15/2013 01:58 AM, Jason Wang wrote: Initializing them only when they're actually needed will do the trick here. I don't see so much memory allocation with qemu, the main problem I guess here is kvmtool does not support mergeable rx buffers ( does it? ). So guest must allocate 64K per packet which would be even worse in multiqueue case. If mergeable rx buffer was used, the receive buffer will only occupy about 256 * 4K (1M) which seems pretty acceptable here. Yup, kvmtool doesn't do mergable RX at the moment. I'll add support for that in, thanks for pointing it out! Thanks, Sasha -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote: On 04/22/2013 07:51 AM, Peter Zijlstra wrote: On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote: If we always incremented the ticket number by 2 (instead of 1), then we could use the lower bit of the ticket number as the spinlock. ISTR that paravirt ticket locks already do that and use the lsb to indicate the unlock needs to perform wakeups. Also, since all of this is virt nonsense, shouldn't it live in the paravirt ticket lock code and leave the native code as is? Sure, but that is still no reason not to have the virt implementation be as fast as possible, and share the same data type as the non-virt implementation. It has to share the same data-type.. Also, is it guaranteed that the native spin_lock code has not been called yet before we switch over to the paravirt functions? If the native spin_lock code has been called already at that time, the native code would still need to be modified to increment the ticket number by 2, so we end up with a compatible value in each spin lock's .tickets field, and prevent a deadlock after we switch over to the paravirt variant. I thought the stuff already made it upstream, but apparently not; the lastest posting I'm aware of is here: https://lkml.org/lkml/2012/5/2/105 That stuff changes the normal ticket increment as well.. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 03:49 PM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote: If the native spin_lock code has been called already at that time, the native code would still need to be modified to increment the ticket number by 2, so we end up with a compatible value in each spin lock's .tickets field, and prevent a deadlock after we switch over to the paravirt variant. I thought the stuff already made it upstream, but apparently not; the lastest posting I'm aware of is here: https://lkml.org/lkml/2012/5/2/105 That stuff changes the normal ticket increment as well.. Jiannan, It looks like the patch above could make a good patch 1 (or 2) in your patch series :) -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, Apr 22, 2013 at 3:56 PM, Rik van Riel r...@redhat.com wrote: On 04/22/2013 03:49 PM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote: If the native spin_lock code has been called already at that time, the native code would still need to be modified to increment the ticket number by 2, so we end up with a compatible value in each spin lock's .tickets field, and prevent a deadlock after we switch over to the paravirt variant. I thought the stuff already made it upstream, but apparently not; the lastest posting I'm aware of is here: https://lkml.org/lkml/2012/5/2/105 That stuff changes the normal ticket increment as well.. Jiannan, It looks like the patch above could make a good patch 1 (or 2) in your patch series :) -- All rights reversed Yes. I'm going to move my code, updated with Rik's suggestions, to paravirt ops based on Jeremy's patch. I'll post a new patch series soon. Thanks to everyone for the great feedback! --Jiannan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, 2013-04-22 at 15:56 -0400, Rik van Riel wrote: On 04/22/2013 03:49 PM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote: If the native spin_lock code has been called already at that time, the native code would still need to be modified to increment the ticket number by 2, so we end up with a compatible value in each spin lock's .tickets field, and prevent a deadlock after we switch over to the paravirt variant. I thought the stuff already made it upstream, but apparently not; the lastest posting I'm aware of is here: https://lkml.org/lkml/2012/5/2/105 That stuff changes the normal ticket increment as well.. Jiannan, It looks like the patch above could make a good patch 1 (or 2) in your patch series :) I much prefer the entire series from Jeremy since it maintains the ticket semantics and doesn't degrade the lock to unfair under contention. Now I suppose there's a reason its not been merged yet and I suspect its !paravirt hotpath impact which wasn't rightly justified or somesuch so maybe someone can work on that or so.. dunno. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 04:08 PM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 15:56 -0400, Rik van Riel wrote: On 04/22/2013 03:49 PM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote: If the native spin_lock code has been called already at that time, the native code would still need to be modified to increment the ticket number by 2, so we end up with a compatible value in each spin lock's .tickets field, and prevent a deadlock after we switch over to the paravirt variant. I thought the stuff already made it upstream, but apparently not; the lastest posting I'm aware of is here: https://lkml.org/lkml/2012/5/2/105 That stuff changes the normal ticket increment as well.. Jiannan, It looks like the patch above could make a good patch 1 (or 2) in your patch series :) I much prefer the entire series from Jeremy since it maintains the ticket semantics and doesn't degrade the lock to unfair under contention. Now I suppose there's a reason its not been merged yet and I suspect its !paravirt hotpath impact which wasn't rightly justified or somesuch so maybe someone can work on that or so.. dunno. IIRC one of the reasons was that the performance improvement wasn't as obvious. Rescheduling VCPUs takes a fair amount of time, quite probably more than the typical hold time of a spinlock. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote: IIRC one of the reasons was that the performance improvement wasn't as obvious. Rescheduling VCPUs takes a fair amount of time, quite probably more than the typical hold time of a spinlock. IIRC it would spin for a while before blocking.. /me goes re-read some of that thread... Ah, its because PLE is curing most of it.. !PLE it had huge gains but apparently nobody cares about !PLE hardware anymore :-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, Apr 22, 2013 at 4:08 PM, Peter Zijlstra pet...@infradead.org wrote: I much prefer the entire series from Jeremy since it maintains the ticket semantics and doesn't degrade the lock to unfair under contention. Now I suppose there's a reason its not been merged yet and I suspect its !paravirt hotpath impact which wasn't rightly justified or somesuch so maybe someone can work on that or so.. dunno. In my paper, I comparable preemptable-lock and pv_lock on KVM from Raghu and Jeremy. Results show that: - preemptable-lock improves performance significantly without paravirt support - preemptable-lock can also be paravirtualized, which outperforms pv_lock, especially when overcommited by 3 or more - pv-preemptable-lock has much less performance variance compare to pv_lock, because it adapts to preemption within VM, other than using rescheduling that increase VM interference It would still be very interesting to conduct more experiments to compare these two, to see if the fairness enforced by pv_lock is mandatory, and if preemptable-lock outperforms pv_lock in most cases, and how do they work with PLE. --Jiannan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, 2013-04-22 at 22:44 +0200, Peter Zijlstra wrote: On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote: IIRC one of the reasons was that the performance improvement wasn't as obvious. Rescheduling VCPUs takes a fair amount of time, quite probably more than the typical hold time of a spinlock. IIRC it would spin for a while before blocking.. /me goes re-read some of that thread... Ah, its because PLE is curing most of it.. !PLE it had huge gains but apparently nobody cares about !PLE hardware anymore :-) Hmm.. it looked like under light overcommit the paravirt ticket lock still had some gain (~10%) and of course it brings the fairness thing which is always good. I can only imagine the mess unfair + vcpu preemption can bring to guest tasks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 04:46 PM, Jiannan Ouyang wrote: It would still be very interesting to conduct more experiments to compare these two, to see if the fairness enforced by pv_lock is mandatory, and if preemptable-lock outperforms pv_lock in most cases, and how do they work with PLE. Given the fairly high cost of rescheduling a VCPU (which is likely to include an IPI), versus the short hold time of most spinlocks, I have the strong suspicion that your approach would win. The fairness is only compromised in a limited way and in certain circumstances, so I am not too worried about that. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, Apr 22, 2013 at 4:44 PM, Peter Zijlstra pet...@infradead.org wrote: On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote: IIRC one of the reasons was that the performance improvement wasn't as obvious. Rescheduling VCPUs takes a fair amount of time, quite probably more than the typical hold time of a spinlock. IIRC it would spin for a while before blocking.. /me goes re-read some of that thread... Ah, its because PLE is curing most of it.. !PLE it had huge gains but apparently nobody cares about !PLE hardware anymore :-) For now, I don't know how good it can work with PLE. But I think it should save the time of VMEXIT on PLE machine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 04:48 PM, Peter Zijlstra wrote: Hmm.. it looked like under light overcommit the paravirt ticket lock still had some gain (~10%) and of course it brings the fairness thing which is always good. I can only imagine the mess unfair + vcpu preemption can bring to guest tasks. If you think unfairness + vcpu preemption is bad, you haven't tried full fairness + vcpu preemption :) -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 4/22/2013 1:50 PM, Jiannan Ouyang wrote: On Mon, Apr 22, 2013 at 4:44 PM, Peter Zijlstra pet...@infradead.org wrote: On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote: IIRC one of the reasons was that the performance improvement wasn't as obvious. Rescheduling VCPUs takes a fair amount of time, quite probably more than the typical hold time of a spinlock. IIRC it would spin for a while before blocking.. /me goes re-read some of that thread... Ah, its because PLE is curing most of it.. !PLE it had huge gains but apparently nobody cares about !PLE hardware anymore :-) For now, I don't know how good it can work with PLE. But I think it should save the time of VMEXIT on PLE machine. . Thanks for sharing your patch. 'am waiting for your v2 patch(es) and then let you any review feedback. Hoping to verify your changes on a large box (PLE enabled) and get back to you with some data... Thanks Vinod -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, 2013-04-22 at 16:46 -0400, Jiannan Ouyang wrote: On Mon, Apr 22, 2013 at 4:08 PM, Peter Zijlstra pet...@infradead.org wrote: I much prefer the entire series from Jeremy since it maintains the ticket semantics and doesn't degrade the lock to unfair under contention. Now I suppose there's a reason its not been merged yet and I suspect its !paravirt hotpath impact which wasn't rightly justified or somesuch so maybe someone can work on that or so.. dunno. In my paper, I comparable preemptable-lock and pv_lock on KVM from Raghu and Jeremy. Which pv_lock? The current pv spinlock mess is basically the old unfair thing. The later patch series I referred to earlier implemented a paravirt ticket lock, that should perform much better under overcommit. Results show that: - preemptable-lock improves performance significantly without paravirt support But completely wrecks our native spinlock implementation so that's not going to happen of course ;-) - preemptable-lock can also be paravirtualized, which outperforms pv_lock, especially when overcommited by 3 or more See above.. - pv-preemptable-lock has much less performance variance compare to pv_lock, because it adapts to preemption within VM, other than using rescheduling that increase VM interference I would say it has a _much_ worse worst case (and thus worse variance) than the paravirt ticket implementation from Jeremy. While full paravirt ticket lock results in vcpu scheduling it does maintain fairness. If you drop strict fairness you can end up in unbounded starvation cases and those are very ugly indeed. It would still be very interesting to conduct more experiments to compare these two, to see if the fairness enforced by pv_lock is mandatory, and if preemptable-lock outperforms pv_lock in most cases, and how do they work with PLE. Be more specific, pv_lock as currently upstream is a trainwreck mostly done because pure ticket spinners and vcpu-preemption are even worse. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, 2013-04-22 at 16:49 -0400, Rik van Riel wrote: Given the fairly high cost of rescheduling a VCPU (which is likely to include an IPI), versus the short hold time of most spinlocks, I have the strong suspicion that your approach would win. https://lkml.org/lkml/2012/5/2/101 If you schedule too often your SPIN_THRESHOLD is far too low. Anyway.. performance can't be that bad, otherwise Jeremey would have spend as much time on it as he did. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, Apr 22, 2013 at 4:55 PM, Peter Zijlstra pet...@infradead.org wrote: Which pv_lock? The current pv spinlock mess is basically the old unfair thing. The later patch series I referred to earlier implemented a paravirt ticket lock, that should perform much better under overcommit. Yes, it is a paravirt *ticket* spinck. I got the patch from Raghavendra K T through email http://lwn.net/Articles/495597/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
Rik van Riel r...@redhat.com writes: If we always incremented the ticket number by 2 (instead of 1), then we could use the lower bit of the ticket number as the spinlock. Spinning on a single bit is very inefficient, as you need to do try lock in a loop which is very unfriendly to the MESI state protocol. It's much better to have at least three states and allow spinning-while-reading-only. This is typically very visible on systems with 2S. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 04:55 PM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 16:46 -0400, Jiannan Ouyang wrote: - pv-preemptable-lock has much less performance variance compare to pv_lock, because it adapts to preemption within VM, other than using rescheduling that increase VM interference I would say it has a _much_ worse worst case (and thus worse variance) than the paravirt ticket implementation from Jeremy. While full paravirt ticket lock results in vcpu scheduling it does maintain fairness. If you drop strict fairness you can end up in unbounded starvation cases and those are very ugly indeed. If needed, Jiannan's scheme could easily be bounded to prevent infinite starvation. For example, we could allow only the first 8 CPUs in line to jump the queue. However, given the way that virtual CPUs get scheduled in and out all the time, I suspect starvation is not a worry, and we will not need the additional complexity to deal with it. You may want to play around with virtualization a bit, to get a feel for how things work in virt land. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 05:56 PM, Andi Kleen wrote: Rik van Riel r...@redhat.com writes: If we always incremented the ticket number by 2 (instead of 1), then we could use the lower bit of the ticket number as the spinlock. Spinning on a single bit is very inefficient, as you need to do try lock in a loop which is very unfriendly to the MESI state protocol. It's much better to have at least three states and allow spinning-while-reading-only. This is typically very visible on systems with 2S. Absolutely, the spinning should be read-only, until the CPU sees that the desired bit is clear. MESI-friendly spinning is essential. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/17] KVM: PPC: Support irq routing and irqfd for in-kernel MPIC
On 04/18/2013 07:15:46 PM, Alexander Graf wrote: On 18.04.2013, at 23:39, Scott Wood wrote: Do we really want any default routes? There's no platform notion of GSI here, so how is userspace to know how the kernel set it up (or what GSIs are free to be used for new routes) -- other than the read the code answer I got when I asked about x86? :-P The default routes really are just expose all pins 1:1 as GSI. I think it makes sense to have a simple default for user space that doesn't want to mess with irq routing. What GSIs are free for new routes doesn't matter. Routes are always completely rewritten as a while from user space. So when user space goes in and wants to change only a single line it has to lay out the full map itself anyway. It looks like you already write the routes in your QEMU patches, so I'd like to avoid adding MPIC default routes in KVM to keep things simple. It's legacy baggage from day one. With default routes, what happens if we later support instantiating multiple interrupt controllers? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages
On 04/22/2013 05:21 PM, Gleb Natapov wrote: On Sun, Apr 21, 2013 at 10:09:29PM +0800, Xiao Guangrong wrote: On 04/21/2013 09:03 PM, Gleb Natapov wrote: On Tue, Apr 16, 2013 at 02:32:38PM +0800, Xiao Guangrong wrote: This patchset is based on my previous two patchset: [PATCH 0/2] KVM: x86: avoid potential soft lockup and unneeded mmu reload (https://lkml.org/lkml/2013/4/1/2) [PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes (https://lkml.org/lkml/2013/4/1/134) Changlog: V3: completely redesign the algorithm, please see below. This looks pretty complicated. Is it still needed in order to avoid soft lockups after avoid potential soft lockup and unneeded mmu reload patch? Yes. I discussed this point with Marcelo: == BTW, to my honest, i do not think spin_needbreak is a good way - it does not fix the hot-lock contention and it just occupies more cpu time to avoid possible soft lock-ups. Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus create page tables again. zap-all-shadow-page need long time to be finished, the worst case is, it can not completed forever on intensive vcpu and memory usage. So what about mixed approach: use generation numbers and reload roots to quickly invalidate all shadow pages and then do kvm_mmu_zap_all_invalid(). kvm_mmu_zap_all_invalid() is a new function that invalidates only shadow pages with stale generation number (and uses lock break technique). It may traverse active_mmu_pages from tail to head since new shadow pages will be added to the head of the list or it may use invalid slot rmap to find exactly what should be invalidated. I prefer to unmapping the invalid rmap instead of zapping stale shadow pages in kvm_mmu_zap_all_invalid(), the former is faster. This way may help but not good, after reload mmu with the new generation number, all of the vcpu will fault in a long time, try to hold mmu-lock is not good even if use lock break technique. I think we can do this step first, then unmapping invalid rmap out of mmu-lock later. I still think the right way to fix this kind of thing is optimization for mmu-lock. == Which parts scare you? Let's find a way to optimize for it. ;). For example, if you do not like unmap_memslot_rmap_nolock(), we can simplify it - We can use walk_shadow_page_lockless_begin() and walk_shadow_page_lockless_end() to protect spte instead of kvm-being_unmaped_rmap. kvm-being_unmaped_rmap is particularly tricky, although looks Okay. Will use walk_shadow_page_lockless_begin() and walk_shadow_page_lockless_end instead. correct. Additional indirection with rmap ops also does not help following the code. I'd rather have if(slot is invalid) in a couple of places where things should be done differently. In most places it will be WARN_ON(slot is invalid). Less change, good to me, will do. ;) Thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: virtio-net mergable rx buffers
Support mergable rx buffers for virtio-net. This helps reduce the amount of memory the guest kernel has to allocate per rx vq. Signed-off-by: Sasha Levin sasha.le...@oracle.com --- tools/kvm/include/kvm/uip.h | 4 ++-- tools/kvm/include/kvm/util.h | 3 +++ tools/kvm/net/uip/core.c | 54 +--- tools/kvm/net/uip/tcp.c | 2 +- tools/kvm/net/uip/udp.c | 2 +- tools/kvm/util/util.c| 15 tools/kvm/virtio/net.c | 37 ++ 7 files changed, 55 insertions(+), 62 deletions(-) diff --git a/tools/kvm/include/kvm/uip.h b/tools/kvm/include/kvm/uip.h index ac248d2..fa82f10 100644 --- a/tools/kvm/include/kvm/uip.h +++ b/tools/kvm/include/kvm/uip.h @@ -252,7 +252,7 @@ struct uip_tcp_socket { }; struct uip_tx_arg { - struct virtio_net_hdr *vnet; + struct virtio_net_hdr_mrg_rxbuf *vnet; struct uip_info *info; struct uip_eth *eth; int vnet_len; @@ -332,7 +332,7 @@ static inline u16 uip_eth_hdrlen(struct uip_eth *eth) } int uip_tx(struct iovec *iov, u16 out, struct uip_info *info); -int uip_rx(struct iovec *iov, u16 in, struct uip_info *info); +int uip_rx(unsigned char *buffer, u32 length, struct uip_info *info); int uip_init(struct uip_info *info); int uip_tx_do_ipv4_udp_dhcp(struct uip_tx_arg *arg); diff --git a/tools/kvm/include/kvm/util.h b/tools/kvm/include/kvm/util.h index 0df9f0d..6f8ac83 100644 --- a/tools/kvm/include/kvm/util.h +++ b/tools/kvm/include/kvm/util.h @@ -22,6 +22,7 @@ #include sys/param.h #include sys/types.h #include linux/types.h +#include sys/uio.h #ifdef __GNUC__ #define NORETURN __attribute__((__noreturn__)) @@ -94,4 +95,6 @@ struct kvm; void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size); void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size); +int memcpy_toiovecend(const struct iovec *iov, int iovlen, unsigned char *kdata, size_t len); + #endif /* KVM__UTIL_H */ diff --git a/tools/kvm/net/uip/core.c b/tools/kvm/net/uip/core.c index 4e5bb82..d9e9993 100644 --- a/tools/kvm/net/uip/core.c +++ b/tools/kvm/net/uip/core.c @@ -7,7 +7,7 @@ int uip_tx(struct iovec *iov, u16 out, struct uip_info *info) { - struct virtio_net_hdr *vnet; + struct virtio_net_hdr_mrg_rxbuf *vnet; struct uip_tx_arg arg; int eth_len, vnet_len; struct uip_eth *eth; @@ -74,63 +74,21 @@ int uip_tx(struct iovec *iov, u16 out, struct uip_info *info) return vnet_len + eth_len; } -int uip_rx(struct iovec *iov, u16 in, struct uip_info *info) +int uip_rx(unsigned char *buffer, u32 length, struct uip_info *info) { - struct virtio_net_hdr *vnet; - struct uip_eth *eth; struct uip_buf *buf; - int vnet_len; - int eth_len; - char *p; int len; - int cnt; - int i; /* * Sleep until there is a buffer for guest */ buf = uip_buf_get_used(info); - /* -* Fill device to guest buffer, vnet hdr fisrt -*/ - vnet_len = iov[0].iov_len; - vnet = iov[0].iov_base; - if (buf-vnet_len vnet_len) { - len = -1; - goto out; - } - memcpy(vnet, buf-vnet, buf-vnet_len); - - /* -* Then, the real eth data -* Note: Be sure buf-eth_len is not bigger than the buffer len that guest provides -*/ - cnt = buf-eth_len; - p = buf-eth; - for (i = 1; i in; i++) { - eth_len = iov[i].iov_len; - eth = iov[i].iov_base; - if (cnt eth_len) { - memcpy(eth, p, eth_len); - cnt -= eth_len; - p += eth_len; - } else { - memcpy(eth, p, cnt); - cnt -= cnt; - break; - } - } - - if (cnt) { - pr_warning(uip_rx error); - len = -1; - goto out; - } + memcpy(buffer, buf-vnet, buf-vnet_len); + memcpy(buffer + buf-vnet_len, buf-eth, buf-eth_len); len = buf-vnet_len + buf-eth_len; -out: uip_buf_set_free(info, buf); return len; } @@ -172,8 +130,8 @@ int uip_init(struct uip_info *info) } list_for_each_entry(buf, buf_head, list) { - buf-vnet = malloc(sizeof(struct virtio_net_hdr)); - buf-vnet_len = sizeof(struct virtio_net_hdr); + buf-vnet = malloc(sizeof(struct virtio_net_hdr_mrg_rxbuf)); + buf-vnet_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); buf-eth= malloc(1024*64 + sizeof(struct uip_pseudo_hdr)); buf-eth_len= 1024*64 + sizeof(struct uip_pseudo_hdr); diff --git a/tools/kvm/net/uip/tcp.c b/tools/kvm/net/uip/tcp.c index 9044f40..a99b95e 100644 --- a/tools/kvm/net/uip/tcp.c +++
[PATCH] virtio-net: fill only rx queues which are being used
Due to MQ support we may allocate a whole bunch of rx queues but never use them. With this patch we'll safe the space used by the receive buffers until they are actually in use: sh-4.2# free -h total used free sharedbuffers cached Mem: 490M35M 455M 0B 0B 4.1M -/+ buffers/cache:31M 459M Swap: 0B 0B 0B sh-4.2# ethtool -L eth0 combined 8 sh-4.2# free -h total used free sharedbuffers cached Mem: 490M 162M 327M 0B 0B 4.1M -/+ buffers/cache: 158M 331M Swap: 0B 0B 0B Signed-off-by: Sasha Levin sasha.le...@oracle.com --- drivers/net/virtio_net.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6bfc511..4d82d17 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -581,7 +581,7 @@ static void refill_work(struct work_struct *work) bool still_empty; int i; - for (i = 0; i vi-max_queue_pairs; i++) { + for (i = 0; i vi-curr_queue_pairs; i++) { struct receive_queue *rq = vi-rq[i]; napi_disable(rq-napi); @@ -636,7 +636,7 @@ static int virtnet_open(struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int i; - for (i = 0; i vi-max_queue_pairs; i++) { + for (i = 0; i vi-curr_queue_pairs; i++) { /* Make sure we have some buffers: if oom use wq. */ if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); @@ -900,6 +900,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) struct scatterlist sg; struct virtio_net_ctrl_mq s; struct net_device *dev = vi-dev; + int i; if (!vi-has_cvq || !virtio_has_feature(vi-vdev, VIRTIO_NET_F_MQ)) return 0; @@ -912,8 +913,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) dev_warn(dev-dev, Fail to set num of queue pairs to %d\n, queue_pairs); return -EINVAL; - } else + } else { + if (queue_pairs vi-curr_queue_pairs) + for (i = 0; i queue_pairs; i++) + if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) + schedule_delayed_work(vi-refill, 0); vi-curr_queue_pairs = queue_pairs; + } return 0; } @@ -1568,7 +1574,7 @@ static int virtnet_probe(struct virtio_device *vdev) } /* Last of all, set up some receive buffers. */ - for (i = 0; i vi-max_queue_pairs; i++) { + for (i = 0; i vi-curr_queue_pairs; i++) { try_fill_recv(vi-rq[i], GFP_KERNEL); /* If we didn't even get one input buffer, we're useless. */ @@ -1692,7 +1698,7 @@ static int virtnet_restore(struct virtio_device *vdev) netif_device_attach(vi-dev); - for (i = 0; i vi-max_queue_pairs; i++) + for (i = 0; i vi-curr_queue_pairs; i++) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Bug 56981] New: [SR-IOV] Intel I350 NIC VF cannot work in a Windows 2008 guest.
Added e1000-devel list to see whether this's a known issue. Best Regards, Yongjie (Jay) -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of bugzilla-dae...@bugzilla.kernel.org Sent: Monday, April 22, 2013 10:41 PM To: kvm@vger.kernel.org Subject: [Bug 56981] New: [SR-IOV] Intel I350 NIC VF cannot work in a Windows 2008 guest. https://bugzilla.kernel.org/show_bug.cgi?id=56981 Summary: [SR-IOV] Intel I350 NIC VF cannot work in a Windows 2008 guest. Product: Virtualization Version: unspecified Kernel Version: 3.9.0-rc3 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: yongjie@intel.com Regression: No Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Windows kvm.git Commit:79558f112fc0352e057f7b5e158e3d88b8b62c60 qemu-kvm Commit:8912bdea01e8671e59fe0287314379be9c1f40ec Host Kernel Version:3.9.0-rc3 Hardware: Sandy Bridge - EP Bug detailed description: -- The assigned Intel I350 NIC VF (a igbvf) cannot work in a Windows 2008 guest. 1. If the guest is Windows 7/8/2012 or RHEL6.x , the Intel I350 NIC VF can work fine in the guest. 2. Intel 82576 NIC VF (also igbvf) can work in a Windows 2008 guest. Reproduce steps: 1. ./pcistub.sh -h 0b:10.0 ( to hide a device) 2. qemu-system-x86_64 -enable-kvm -m 4096 -smp 4 -device pci-assign,host=0b:10.0 -net none -hda win2k8.img Current result: Intel I350 NIC VF cannot work in win2k8 guest Expected result: Intel I350 NIC VF can work in win2k8 guest -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/23/2013 01:19 AM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote: On 04/22/2013 07:51 AM, Peter Zijlstra wrote: On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote: If we always incremented the ticket number by 2 (instead of 1), then we could use the lower bit of the ticket number as the spinlock. ISTR that paravirt ticket locks already do that and use the lsb to indicate the unlock needs to perform wakeups. Also, since all of this is virt nonsense, shouldn't it live in the paravirt ticket lock code and leave the native code as is? Sure, but that is still no reason not to have the virt implementation be as fast as possible, and share the same data type as the non-virt implementation. It has to share the same data-type.. Also, is it guaranteed that the native spin_lock code has not been called yet before we switch over to the paravirt functions? If the native spin_lock code has been called already at that time, the native code would still need to be modified to increment the ticket number by 2, so we end up with a compatible value in each spin lock's .tickets field, and prevent a deadlock after we switch over to the paravirt variant. I thought the stuff already made it upstream, but apparently not; the lastest posting I'm aware of is here: https://lkml.org/lkml/2012/5/2/105 That stuff changes the normal ticket increment as well.. pv-ticket spinlock went on hold state, after Avi acked because of: though on non-PLE, we get a huge advantage, on PLE machine the benefit was not as impressive (~10% as you stated in email chain) compared to the complexity of the patches. So Avi suggested to try PLE improvements first, so they are going upstream. https://lkml.org/lkml/2012/7/18/247 https://lkml.org/lkml/2013/1/22/104 https://lkml.org/lkml/2013/2/6/345 (on the way in kvm tree) Current status of PV spinlock: I have the rebased patches of pv spinlocks and experimenting with latest kernel.I have Gleb's irq delivery incorporated into the patch series. But I am thinknig whether I can improve some guest side logic in unlock. I will probably setup a githup and post the link soon. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/22/2013 10:12 PM, Jiannan Ouyang wrote: On Mon, Apr 22, 2013 at 1:58 AM, Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote: [...] static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock) { register struct __raw_tickets inc = { .tail = 1 }; + unsigned int timeout = 0; + __ticket_t current_head; inc = xadd(lock-tickets, inc); - + if (likely(inc.head == inc.tail)) + goto spin; + + timeout = TIMEOUT_UNIT * (inc.tail - inc.head); Forgot to mention about this, for immediate wait case, you can busyloop instead of timeout (I mean timeout = TIMEOUT_UNIT * (inc.tail - inc.head -1); This ideas was used by Rik in his spinlock backoff patches. + do { + current_head = ACCESS_ONCE(lock-tickets.head); + if (inc.tail = current_head) { + goto spin; + } else if (inc.head != current_head) { + inc.head = current_head; + timeout = TIMEOUT_UNIT * (inc.tail - inc.head); Good idea indeed to base the loop on head and tail difference.. But for virtualization I believe this directly proportional notion is little tricky too. Could you explain your concern a little bit more? Consider a big machine with 2 VMs running. If nth vcpu of say VM1 waiting in the queue, the question is, Do we have to have all the n VCPU doing busyloop and thus burning sigma (n*(n+1) * TIMEOUT_UNIT)) ? OR Is it that, far off vcpu in the queue worth giving his time back so that probably some other vcpu of VM1 doing good work OR vcpu of VM2 can benefit from this. I mean far the vcpu in the queue, let him yield voluntarily. (inversely proportional notion just because it is vcpu). and of course for some n THRESHOLD we can still have directly proportional wait idea. Does this idea sound good ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages
On Mon, Apr 22, 2013 at 10:45:53PM +0900, Takuya Yoshikawa wrote: On Mon, 22 Apr 2013 15:39:38 +0300 Gleb Natapov g...@redhat.com wrote: Do not want kvm_set_memory (cases: DELETE/MOVE/CREATES) to be suspectible to: vcpu 1 | kvm_set_memory create shadow page nuke shadow page create shadow page nuke shadow page Which is guest triggerable behavior with spinlock preemption algorithm. Not only guest triggerable as in the sense of a malicious guest, but condition above can be induced by host workload with non-malicious guest system. Is the problem that newly created shadow pages are immediately zapped? Shouldn't generation number/kvm_mmu_zap_all_invalid() idea described here https://lkml.org/lkml/2013/4/22/111 solve this? I guess so. That's what Avi described when he tried to achieve lockless TLB flushes. Mixing that idea with Xiao's approach will achieve reasonably nice performance, I think. Yes. Various improvements should be added later on top of that if needed. Also kvm_set_memory being relatively fast with huge memory guests is nice (which is what Xiaos idea allows). I agree with this point. But if so, it should be actually measured on such guests, even if the algorithm looks promising. Works for me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: ARM: Fix API documentation for ONE_REG encoding
Unless I'm mistaken, the size field was encoded 4 bits off and a wrong value was used for 64-bit FP registers. Signed-off-by: Christoffer Dall cd...@cs.columbia.edu --- Documentation/virtual/kvm/api.txt | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 976eb65..7145ee9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1797,22 +1797,22 @@ ARM registers are mapped using the lower 32 bits. The upper 16 of that is the register group type, or coprocessor number: ARM core registers have the following id bit patterns: - 0x4002 0010 index into the kvm_regs struct:16 + 0x4020 0010 index into the kvm_regs struct:16 ARM 32-bit CP15 registers have the following id bit patterns: - 0x4002 000F zero:1 crn:4 crm:4 opc1:4 opc2:3 + 0x4020 000F zero:1 crn:4 crm:4 opc1:4 opc2:3 ARM 64-bit CP15 registers have the following id bit patterns: - 0x4003 000F zero:1 zero:4 crm:4 opc1:4 zero:3 + 0x4030 000F zero:1 zero:4 crm:4 opc1:4 zero:3 ARM CCSIDR registers are demultiplexed by CSSELR value: - 0x4002 0011 00 csselr:8 + 0x4020 0011 00 csselr:8 ARM 32-bit VFP control registers have the following id bit patterns: - 0x4002 0012 1 regno:12 + 0x4020 0012 1 regno:12 ARM 64-bit FP registers have the following id bit patterns: - 0x4002 0012 0 regno:12 + 0x4030 0012 0 regno:12 4.69 KVM_GET_ONE_REG -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM: kvm_set_slave_cpu: Invalid argument when trying direct interrupt delivery
HI Tomoki, Thanks for you config file, but it is for linux-3.5-rc4, but the patches you posted to the community was based on linux-3.6 as described in the following link. http://thread.gmane.org/gmane.linux.kernel/1353803 I also tested the config file on linux-3.6, still can not work. Could you please provide me the detail about your setup, for example: - kernel version and config - the version and url of your patch - qemu command line - other info so that I can try and reproduce your benchmark :-). I think that your patch would be a good workaround on realtime and performance improvement with hardware which do not support apicv. BTW, I notice your benchmark result in your slides, which shows that the throughput of a 10Gbps NIC exceeds 10Gbps? How did that happen? My benchmark also shows a interesting data: the throughput of a pass- through device is higher than bare-mental sometime.(I use NetPIPE) Thanks a lot. Yang Minqiang -Original Message- From: Tomoki Sekiyama [mailto:tomoki.sekiy...@hds.com] Sent: Thursday, April 11, 2013 4:37 AM To: Yangminqiang Cc: kvm@vger.kernel.org; Haofeng; Luohao (brian) Subject: Re: KVM: kvm_set_slave_cpu: Invalid argument when trying direct interrupt delivery Hi, On 4/7/13 6:09 , Yangminqiang yangminqi...@huawei.com wrote: Hi Tomoki, I offline the cpu2 and cpu3 on my machine and continue to try your patch. I run the vm without pass-through device for I only want to know the interrupt latency improvement.(Am I right?) This should be OK. my qemu parameter: ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 1024 -cpu qemu64,+x2apic -no-kvm-pit -serial pty -nographic -drive file=/mnt/sdb/vms/testfc/testfc.qcow2,if=virtio,index=0,format=qcow2 -spice port=12000,addr=186.100.8.171,disable-ticketing,plaintext-channel=main, pla intext-channel=playback,plaintext-channel=record,image-compression=aut o _gl z -no-kvm-pit Using video (-spice) could cause a large latency, because non-passthrough devices cause larger latency with this patch. Please use serial console (but should avoid serial I/O while benchmark, because serial I/O is also non-passthrough device). You may need to turn off CPU low power features, or running some infinite loop (e.g. yes/dev/null) to avoid cpu core from going into power saving mode. cyclictest: cyclictest -m -p 99 -n -l 10 -h 3000 -q but I got very bad result: avg lantency: 2+ us max lantency: 5+ us and got Message from syslogd@kvmsteven at Apr 7 05:43:30 ... kernel:[ 2201.151817] BUG: soft lockup - CPU#18 stuck for 22s! [qemu-system-x86:2365] This patch is not yet tested in various kernel CONFIG_* settings, and some config may cause issue like this... I was using Fedora-based config (attached). Could you try with this? my setup: host kernel: 3.6.0-rc4+ and your patches guest kernel: 3.6.11.1-rt32 qemu: qemu-kvm-1.0 with your patch BTW, I am sure that my rt-kernel works well, which got 12us max latency as a host OS. Could you please provoide me more detail about your benchmark so I could reproduce your benchmark result? Thanks, Steven
Re: [PATCH] virtio-net: fill only rx queues which are being used
Sasha Levin sasha.le...@oracle.com writes: Due to MQ support we may allocate a whole bunch of rx queues but never use them. With this patch we'll safe the space used by the receive buffers until they are actually in use: Idea is good, implementation needs a tiny tweak: @@ -912,8 +913,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) dev_warn(dev-dev, Fail to set num of queue pairs to %d\n, queue_pairs); return -EINVAL; - } else + } else { + if (queue_pairs vi-curr_queue_pairs) + for (i = 0; i queue_pairs; i++) + if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) + schedule_delayed_work(vi-refill, 0); vi-curr_queue_pairs = queue_pairs; + } return 0; } You don't want to refill existing queues, so you don't need the if. for (i = vi-curr_queue_pairs; i queue_pairs; i++) { if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); We don't free up buffers when we're reducing queues, but I consider that a corner case. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] tcm_vhost: Add hotplug/hotunplug support
Asias He as...@redhat.com writes: On Mon, Apr 22, 2013 at 04:17:04PM +0300, Michael S. Tsirkin wrote: + evt = kzalloc(sizeof(*evt), GFP_KERNEL); I think kzalloc not needed here, you init all fields. Not really! evt-event.lun[4-7] is not initialized. It needs to be 0. So that is 4 bytes just init them when you set rest of lun. It is not in the fast path. You can do it this way but not a must. I think that's a bit cleaner than relying on kzalloc to zero-initialize. I think it is a bit shorter, 4 lines saved. OTOH you don't have to think what about the rest of the bytes? then hunt through the code and go ah, it's kzalloc. Looks it's a style issue but I prefer explicit initialization. It is just make none sense to argue about this. If you really want you can make patches to remove all the lines where use kzalloc because it is cleaner. I prefer explicit initialization too, FWIW. +static void tcm_vhost_do_evt_work(struct vhost_scsi *vs, + struct tcm_vhost_evt *evt) +{ + struct vhost_virtqueue *vq = vs-vqs[VHOST_SCSI_VQ_EVT]; + struct virtio_scsi_event *event = evt-event; + struct virtio_scsi_event __user *eventp; + unsigned out, in; + int head, ret; + + if (!tcm_vhost_check_endpoint(vq)) + return; + + mutex_lock(vs-vs_events_lock); + mutex_lock(vq-mutex); +again: + vhost_disable_notify(vs-dev, vq); + head = vhost_get_vq_desc(vs-dev, vq, vq-iov, + ARRAY_SIZE(vq-iov), out, in, + NULL, NULL); + if (head 0) { + vs-vs_events_dropped = true; + goto out; + } + if (head == vq-num) { + if (vhost_enable_notify(vs-dev, vq)) + goto again; Please write loops with while() or for(). Not with goto. goto is for error handling. This makes extra indention which is more ugly. I don't care. No loops with goto and that's a hard rule. It is not a loop. If same code repeats, it's a loop. We really need here is to call vhost_get_vq_desc once. It's not a loop like other places to use while() or for() with vhost_get_vq_desc. Exactly. Like a mutex for example? We only need to lock it once. See __mutex_lock_common - uses a for loop, doesn't it? So what? So fix this please, use while or for or even until. I do not think it is necessary. I tend to use again: in such corner cases, where I don't expect it to be invoked very often. A do loop looks too normal. I do exactly the same thing in virtio-net.c, for the same case. Would you like me to review the entire thing? Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: suggesting wording fixes for virtio-spec 0.9.5
Laszlo Ersek ler...@redhat.com writes: Hi, (I'm not subscribed to either list,) using the word descriptor is misleading in the following sections: Yes, I like the use of 'descriptor chains'. This is a definite improvement. Here's the diff I ended up with (massaged to minimize it). Thanks! Rusty. --- virtio-spec.txt-old 2013-04-23 13:22:21.339158214 +0930 +++ virtio-spec.txt 2013-04-23 13:34:14.055176464 +0930 @@ -482,10 +482,10 @@ 2.3.4 Available Ring -The available ring refers to what descriptors we are offering the -device: it refers to the head of a descriptor chain. The “flags” +The available ring refers to what descriptor chains we are offering the +device: each entry refers to the head of a descriptor chain. The “flags” field is currently 0 or 1: 1 indicating that we do not need an -interrupt when the device consumes a descriptor from the +interrupt when the device consumes a descriptor chain from the available ring. Alternatively, the guest can ask the device to delay interrupts until an entry with an index specified by the “ used_event” field is written in the used ring (equivalently, @@ -671,16 +671,16 @@ avail-ring[avail-idx % qsz] = head; -However, in general we can add many descriptors before we update -the “idx” field (at which point they become visible to the -device), so we keep a counter of how many we've added: +However, in general we can add many separate descriptor chains before we update +the “idx” field (at which point they become visible to the device), +so we keep a counter of how many we've added: avail-ring[(avail-idx + added++) % qsz] = head; 2.4.1.3 Updating The Index Field Once the idx field of the virtqueue is updated, the device will -be able to access the descriptor entries we've created and the +be able to access the descriptor chains we've created and the memory they refer to. This is why a memory barrier is generally used before the idx update, to ensure it sees the most up-to-date copy. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] tcm_vhost: Add hotplug/hotunplug support
On Tue, Apr 23, 2013 at 01:48:51PM +0930, Rusty Russell wrote: Asias He as...@redhat.com writes: On Mon, Apr 22, 2013 at 04:17:04PM +0300, Michael S. Tsirkin wrote: +evt = kzalloc(sizeof(*evt), GFP_KERNEL); I think kzalloc not needed here, you init all fields. Not really! evt-event.lun[4-7] is not initialized. It needs to be 0. So that is 4 bytes just init them when you set rest of lun. It is not in the fast path. You can do it this way but not a must. I think that's a bit cleaner than relying on kzalloc to zero-initialize. I think it is a bit shorter, 4 lines saved. OTOH you don't have to think what about the rest of the bytes? then hunt through the code and go ah, it's kzalloc. Looks it's a style issue but I prefer explicit initialization. It is just make none sense to argue about this. If you really want you can make patches to remove all the lines where use kzalloc because it is cleaner. I prefer explicit initialization too, FWIW. I do not really care. I am OK with either way. +static void tcm_vhost_do_evt_work(struct vhost_scsi *vs, +struct tcm_vhost_evt *evt) +{ +struct vhost_virtqueue *vq = vs-vqs[VHOST_SCSI_VQ_EVT]; +struct virtio_scsi_event *event = evt-event; +struct virtio_scsi_event __user *eventp; +unsigned out, in; +int head, ret; + +if (!tcm_vhost_check_endpoint(vq)) +return; + +mutex_lock(vs-vs_events_lock); +mutex_lock(vq-mutex); +again: +vhost_disable_notify(vs-dev, vq); +head = vhost_get_vq_desc(vs-dev, vq, vq-iov, +ARRAY_SIZE(vq-iov), out, in, +NULL, NULL); +if (head 0) { +vs-vs_events_dropped = true; +goto out; +} +if (head == vq-num) { +if (vhost_enable_notify(vs-dev, vq)) +goto again; Please write loops with while() or for(). Not with goto. goto is for error handling. This makes extra indention which is more ugly. I don't care. No loops with goto and that's a hard rule. It is not a loop. If same code repeats, it's a loop. We really need here is to call vhost_get_vq_desc once. It's not a loop like other places to use while() or for() with vhost_get_vq_desc. Exactly. Like a mutex for example? We only need to lock it once. See __mutex_lock_common - uses a for loop, doesn't it? So what? So fix this please, use while or for or even until. I do not think it is necessary. I tend to use again: in such corner cases, where I don't expect it to be invoked very often. A do loop looks too normal. I do exactly the same thing in virtio-net.c, for the same case. Would you like me to review the entire thing? Sure. Thanks Rusty! Cheers, Rusty. -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-net: fill only rx queues which are being used
On 04/23/2013 12:13 AM, Rusty Russell wrote: Sasha Levin sasha.le...@oracle.com writes: Due to MQ support we may allocate a whole bunch of rx queues but never use them. With this patch we'll safe the space used by the receive buffers until they are actually in use: Idea is good, implementation needs a tiny tweak: @@ -912,8 +913,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) dev_warn(dev-dev, Fail to set num of queue pairs to %d\n, queue_pairs); return -EINVAL; -} else +} else { +if (queue_pairs vi-curr_queue_pairs) +for (i = 0; i queue_pairs; i++) +if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) +schedule_delayed_work(vi-refill, 0); vi-curr_queue_pairs = queue_pairs; +} return 0; } You don't want to refill existing queues, so you don't need the if. for (i = vi-curr_queue_pairs; i queue_pairs; i++) { if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); That makes more sense, I'll resend. We don't free up buffers when we're reducing queues, but I consider that a corner case. It didn't bother anyone up until now, and the spec doesn't state anything about it - so I preferred to just leave that alone. Unless the ARM folks would find it useful? Thanks, Sasha -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On 04/23/2013 02:31 AM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 16:49 -0400, Rik van Riel wrote: Given the fairly high cost of rescheduling a VCPU (which is likely to include an IPI), versus the short hold time of most spinlocks, I have the strong suspicion that your approach would win. https://lkml.org/lkml/2012/5/2/101 If you schedule too often your SPIN_THRESHOLD is far too low. Anyway.. performance can't be that bad, otherwise Jeremey would have spend as much time on it as he did. When I experimented last time ideal SPIN_THRESHOLD for PLE machine, was around 4k, 8k. Jeremy's experiment was on a non-PLE machine AFAIK, which complemented PLE feature in a nice way with 2k threshold. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Para-Virtualized Clock Usage
On Mon, Apr 22, 2013 at 04:58:01PM +, Joji Mekkattuparamban (joji) wrote: Greetings, I have a SMP guest application, running on the 2.6.27 Linux kernel. The application, originally written for bare metal, makes extensive use of the TSC, by directly invoking rdtsc from the user space for timestamp purposes. While running on KVM (RHEL version 6.3), we are running into TSC issues on some hardware. As a solution, I am considering migrating to the pvclock. I am wondering if there is an example for migrating from TSC to the pvclock. Any pointers? Wrong list, you should ask KVM (copied). Recent kernels have pvclock vdso support which means that gettimeofday() uses it without entering the kernel. Marcelo? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preemptable Ticket Spinlock
On Mon, Apr 22, 2013 at 07:08:06PM -0400, Rik van Riel wrote: On 04/22/2013 04:55 PM, Peter Zijlstra wrote: On Mon, 2013-04-22 at 16:46 -0400, Jiannan Ouyang wrote: - pv-preemptable-lock has much less performance variance compare to pv_lock, because it adapts to preemption within VM, other than using rescheduling that increase VM interference I would say it has a _much_ worse worst case (and thus worse variance) than the paravirt ticket implementation from Jeremy. While full paravirt ticket lock results in vcpu scheduling it does maintain fairness. If you drop strict fairness you can end up in unbounded starvation cases and those are very ugly indeed. If needed, Jiannan's scheme could easily be bounded to prevent infinite starvation. For example, we could allow only the first 8 CPUs in line to jump the queue. However, given the way that virtual CPUs get scheduled in and out all the time, I suspect starvation is not a worry, and we will not need the additional complexity to deal with it. FWIW RHEL6 uses unfair spinlock when it runs as a guest. We never got reports about problems due to this on any scale. You may want to play around with virtualization a bit, to get a feel for how things work in virt land. -- All rights reversed -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL 0/5] ppc patch queue 2013-04-17
On Wed, Apr 17, 2013 at 09:04:04PM +0200, Alexander Graf wrote: Hi Marcelo / Gleb, This is my current patch queue for ppc. Please pull for 3.10. Pulled. Thanks. Changes include: - KVM: PPC: Fix in-kernel MMIO loads - KVM: PPC: BookE: Fix 64-bit guest kernels with SMP Alex The following changes since commit 79558f112fc0352e057f7b5e158e3d88b8b62c60: Alexander Graf (1): KVM: ARM: Fix kvm_vm_ioctl_irq_line are available in the git repository at: git://github.com/agraf/linux-2.6.git kvm-ppc-next Alexander Graf (2): Merge commit 'origin/next' into kvm-ppc-next Merge commit 'origin/next' into kvm-ppc-next Bharat Bhushan (1): Added ONE_REG interface for debug instruction Scott Wood (1): kvm/ppc: don't call complete_mmio_load when it's a store Stuart Yoder (1): KVM: PPC: emulate dcbst arch/powerpc/include/asm/kvm_book3s.h |2 ++ arch/powerpc/include/asm/kvm_booke.h |2 ++ arch/powerpc/include/uapi/asm/kvm.h |4 arch/powerpc/kvm/book3s.c |6 ++ arch/powerpc/kvm/booke.c |6 ++ arch/powerpc/kvm/emulate.c|2 ++ arch/powerpc/kvm/powerpc.c|1 - 7 files changed, 22 insertions(+), 1 deletions(-) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/17] KVM: PPC: Support irq routing and irqfd for in-kernel MPIC
On 04/18/2013 07:15:46 PM, Alexander Graf wrote: On 18.04.2013, at 23:39, Scott Wood wrote: Do we really want any default routes? There's no platform notion of GSI here, so how is userspace to know how the kernel set it up (or what GSIs are free to be used for new routes) -- other than the read the code answer I got when I asked about x86? :-P The default routes really are just expose all pins 1:1 as GSI. I think it makes sense to have a simple default for user space that doesn't want to mess with irq routing. What GSIs are free for new routes doesn't matter. Routes are always completely rewritten as a while from user space. So when user space goes in and wants to change only a single line it has to lay out the full map itself anyway. It looks like you already write the routes in your QEMU patches, so I'd like to avoid adding MPIC default routes in KVM to keep things simple. It's legacy baggage from day one. With default routes, what happens if we later support instantiating multiple interrupt controllers? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html