[COMMIT master] KVM: x86: ignore access permissions for hypercall patching
From: Marcelo Tosatti mtosa...@redhat.com Ignore access permissions while patching hypercall instructions. Otherwise KVM injects a page fault when trying to patch vmcall on read-only text regions: Freeing initrd memory: 8843k freed Freeing unused kernel memory: 660k freed Write protecting the kernel text: 4780k Write protecting the kernel read-only data: 1912k BUG: unable to handle kernel paging request at c01292e3 IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70 *pde = 00910067 *pte = 00129161 Oops: 0003 [#1] SMP CC: sta...@kernel.org Reported-and-Tested-by: Stefan Bader stefan.ba...@canonical.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bcf52d1..9d02cc7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3226,12 +3226,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, static int emulator_write_emulated_onepage(unsigned long addr, const void *val, unsigned int bytes, - struct kvm_vcpu *vcpu) + struct kvm_vcpu *vcpu, + bool guest_initiated) { gpa_t gpa; u32 error_code; - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); + + if (guest_initiated) + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); + else + gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code); if (gpa == UNMAPPED_GVA) { kvm_inject_page_fault(vcpu, addr, error_code); @@ -3262,24 +3267,35 @@ mmio: return X86EMUL_CONTINUE; } -int emulator_write_emulated(unsigned long addr, +int __emulator_write_emulated(unsigned long addr, const void *val, unsigned int bytes, - struct kvm_vcpu *vcpu) + struct kvm_vcpu *vcpu, + bool guest_initiated) { /* Crossing a page boundary? */ if (((addr + bytes - 1) ^ addr) PAGE_MASK) { int rc, now; now = -addr ~PAGE_MASK; - rc = emulator_write_emulated_onepage(addr, val, now, vcpu); + rc = emulator_write_emulated_onepage(addr, val, now, vcpu, +guest_initiated); if (rc != X86EMUL_CONTINUE) return rc; addr += now; val += now; bytes -= now; } - return emulator_write_emulated_onepage(addr, val, bytes, vcpu); + return emulator_write_emulated_onepage(addr, val, bytes, vcpu, + guest_initiated); +} + +int emulator_write_emulated(unsigned long addr, + const void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu) +{ + return __emulator_write_emulated(addr, val, bytes, vcpu, true); } EXPORT_SYMBOL_GPL(emulator_write_emulated); @@ -3970,7 +3986,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu) kvm_x86_ops-patch_hypercall(vcpu, instruction); - return emulator_write_emulated(rip, instruction, 3, vcpu); + return __emulator_write_emulated(rip, instruction, 3, vcpu, false); } static u64 mk_cr_64(u64 curr_cr, u32 new_val) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: ia64: fix the error code of ioctl KVM_IA64_VCPU_GET_STACK failure
From: Wei Yongjun yj...@cn.fujitsu.com The ioctl KVM_IA64_VCPU_GET_STACK does not set the error code if copy_to_user() fail, and 0 will be return, we should use -EFAULT instead of 0 in this case, so this patch fixed it. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 26e0e08..bc07c81 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1535,8 +1535,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, goto out; if (copy_to_user(user_stack, stack, -sizeof(struct kvm_ia64_vcpu_stack))) +sizeof(struct kvm_ia64_vcpu_stack))) { + r = -EFAULT; goto out; + } break; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: Use native_store_idt() instead of kvm_get_idt()
From: Wei Yongjun yj...@cn.fujitsu.com This patch use generic linux function native_store_idt() instead of kvm_get_idt(), and also removed the useless function kvm_get_idt(). Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec891a2..ea1b6c6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -716,11 +716,6 @@ static inline void kvm_load_ldt(u16 sel) asm(lldt %0 : : rm(sel)); } -static inline void kvm_get_idt(struct desc_ptr *table) -{ - asm(sidt %0 : =m(*table)); -} - #ifdef CONFIG_X86_64 static inline unsigned long read_msr(unsigned long msr) { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 06108f3..df70244 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2445,7 +2445,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8); /* 22.2.4 */ - kvm_get_idt(dt); + native_store_idt(dt); vmcs_writel(HOST_IDTR_BASE, dt.address); /* 22.2.4 */ asm(mov $.Lkvm_vmx_return, %0 : =r(kvm_vmx_return)); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: ia64: fix the error of ioctl KVM_IRQ_LINE if no irq chip
From: Wei Yongjun yj...@cn.fujitsu.com If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT. But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is return. So this patch used -ENXIO instead of -EFAULT. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index bc07c81..b0ed80c 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -979,11 +979,13 @@ long kvm_arch_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(irq_event, argp, sizeof irq_event)) goto out; + r = -ENXIO; if (irqchip_in_kernel(kvm)) { __s32 status; status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irq_event.irq, irq_event.level); if (ioctl == KVM_IRQ_LINE_STATUS) { + r = -EFAULT; irq_event.status = status; if (copy_to_user(argp, irq_event, sizeof irq_event)) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: fix assigned_device_enable_host_msix error handling
From: jing zhang zj.ba...@gmail.com Free IRQ's and disable MSIX upon failure. Cc: Avi Kivity a...@redhat.com Signed-off-by: Jing Zhang zj.ba...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 057e2cc..47ca447 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -315,12 +315,16 @@ static int assigned_device_enable_host_msix(struct kvm *kvm, kvm_assigned_dev_intr, 0, kvm_assigned_msix_device, (void *)dev); - /* FIXME: free requested_irq's on failure */ if (r) - return r; + goto err; } return 0; +err: + for (i -= 1; i = 0; i--) + free_irq(dev-host_msix_entries[i].vector, (void *)dev); + pci_disable_msix(dev-dev); + return r; } #endif -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()
From: Avi Kivity a...@redhat.com kvm_mmu_pte_write() reads guest ptes in two different occasions, both to allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate these into a single read, which also allows us to consolidate another read from an invlpg speculating a gpte into the shadow page table. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b137515..f63c9ad 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2556,36 +2556,11 @@ static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) } static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, - const u8 *new, int bytes) + u64 gpte) { gfn_t gfn; - int r; - u64 gpte = 0; pfn_t pfn; - if (bytes != 4 bytes != 8) - return; - - /* -* Assume that the pte write on a page table of the same type -* as the current vcpu paging mode. This is nearly always true -* (might be false while changing modes). Note it is verified later -* by update_pte(). -*/ - if (is_pae(vcpu)) { - /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ - if ((bytes == 4) (gpa % 4 == 0)) { - r = kvm_read_guest(vcpu-kvm, gpa ~(u64)7, gpte, 8); - if (r) - return; - memcpy((void *)gpte + (gpa % 8), new, 4); - } else if ((bytes == 8) (gpa % 8 == 0)) { - memcpy((void *)gpte, new, 8); - } - } else { - if ((bytes == 4) (gpa % 4 == 0)) - memcpy((void *)gpte, new, 4); - } if (!is_present_gpte(gpte)) return; gfn = (gpte PT64_BASE_ADDR_MASK) PAGE_SHIFT; @@ -2636,7 +2611,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int r; pgprintk(%s: gpa %llx bytes %d\n, __func__, gpa, bytes); - mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); + + switch (bytes) { + case 4: + gentry = *(const u32 *)new; + break; + case 8: + gentry = *(const u64 *)new; + break; + default: + gentry = 0; + break; + } + + /* +* Assume that the pte write on a page table of the same type +* as the current vcpu paging mode. This is nearly always true +* (might be false while changing modes). Note it is verified later +* by update_pte(). +*/ + if (is_pae(vcpu) bytes == 4) { + /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ + gpa = ~(gpa_t)7; + r = kvm_read_guest(vcpu-kvm, gpa, gentry, 8); + if (r) + gentry = 0; + } + + mmu_guess_page_from_pte_write(vcpu, gpa, gentry); spin_lock(vcpu-kvm-mmu_lock); kvm_mmu_access_page(vcpu, gfn); kvm_mmu_free_some_pages(vcpu); @@ -2701,20 +2703,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, continue; } spte = sp-spt[page_offset / sizeof(*spte)]; - if ((gpa (pte_size - 1)) || (bytes pte_size)) { - gentry = 0; - r = kvm_read_guest_atomic(vcpu-kvm, - gpa ~(u64)(pte_size - 1), - gentry, pte_size); - new = (const void *)gentry; - if (r 0) - new = NULL; - } while (npte--) { entry = *spte; mmu_pte_write_zap_pte(vcpu, sp, spte); - if (new) - mmu_pte_write_new_pte(vcpu, sp, spte, new); + if (gentry) + mmu_pte_write_new_pte(vcpu, sp, spte, gentry); mmu_pte_write_flush_tlb(vcpu, entry, *spte); ++spte; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failure
From: Wei Yongjun yj...@cn.fujitsu.com This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO from -EINVAL to -ENXIO if no coalesced mmio dev exists. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..22500d4 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -138,7 +138,7 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm, struct kvm_coalesced_mmio_dev *dev = kvm-coalesced_mmio_dev; if (dev == NULL) - return -EINVAL; + return -ENXIO; mutex_lock(kvm-slots_lock); if (dev-nb_zones = KVM_COALESCED_MMIO_ZONE_MAX) { @@ -161,7 +161,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm, struct kvm_coalesced_mmio_zone *z; if (dev == NULL) - return -EINVAL; + return -ENXIO; mutex_lock(kvm-slots_lock); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index bcd08b8..8c3743c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1602,7 +1602,6 @@ static long kvm_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(zone, argp, sizeof zone)) goto out; - r = -ENXIO; r = kvm_vm_ioctl_register_coalesced_mmio(kvm, zone); if (r) goto out; @@ -1614,7 +1613,6 @@ static long kvm_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(zone, argp, sizeof zone)) goto out; - r = -ENXIO; r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, zone); if (r) goto out; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: fix RCX access during rep emulation
From: Gleb Natapov g...@redhat.com During rep emulation access length to RCX depends on current address mode. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0b70a36..4dce805 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1852,7 +1852,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) if (c-rep_prefix (c-d String)) { /* All REP prefixes have the same first termination condition */ - if (c-regs[VCPU_REGS_RCX] == 0) { + if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) { kvm_rip_write(ctxt-vcpu, c-eip); goto done; } @@ -1876,7 +1876,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) goto done; } } - c-regs[VCPU_REGS_RCX]--; + register_address_increment(c, c-regs[VCPU_REGS_RCX], -1); c-eip = kvm_rip_read(ctxt-vcpu); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Make locked operations truly atomic
From: Avi Kivity a...@redhat.com Once upon a time, locked operations were emulated while holding the mmu mutex. Since mmu pages were write protected, it was safe to emulate the writes in a non-atomic manner, since there could be no other writer, either in the guest or in the kernel. These days emulation takes place without holding the mmu spinlock, so the write could be preempted by an unshadowing event, which exposes the page to writes by the guest. This may cause corruption of guest page tables. Fix by using an atomic cmpxchg for these operations. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9c81ece..1302bfb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3301,41 +3301,68 @@ int emulator_write_emulated(unsigned long addr, } EXPORT_SYMBOL_GPL(emulator_write_emulated); +#define CMPXCHG_TYPE(t, ptr, old, new) \ + (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) + +#ifdef CONFIG_X86_64 +# define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new) +#else +# define CMPXCHG64(ptr, old, new) \ + (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old)) +#endif + static int emulator_cmpxchg_emulated(unsigned long addr, const void *old, const void *new, unsigned int bytes, struct kvm_vcpu *vcpu) { - printk_once(KERN_WARNING kvm: emulating exchange as write\n); -#ifndef CONFIG_X86_64 - /* guests cmpxchg8b have to be emulated atomically */ - if (bytes == 8) { - gpa_t gpa; - struct page *page; - char *kaddr; - u64 val; + gpa_t gpa; + struct page *page; + char *kaddr; + bool exchanged; - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); + /* guests cmpxchg8b have to be emulated atomically */ + if (bytes 8 || (bytes (bytes - 1))) + goto emul_write; - if (gpa == UNMAPPED_GVA || - (gpa PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) - goto emul_write; + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); - if (((gpa + bytes - 1) PAGE_MASK) != (gpa PAGE_MASK)) - goto emul_write; + if (gpa == UNMAPPED_GVA || + (gpa PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) + goto emul_write; - val = *(u64 *)new; + if (((gpa + bytes - 1) PAGE_MASK) != (gpa PAGE_MASK)) + goto emul_write; - page = gfn_to_page(vcpu-kvm, gpa PAGE_SHIFT); + page = gfn_to_page(vcpu-kvm, gpa PAGE_SHIFT); - kaddr = kmap_atomic(page, KM_USER0); - set_64bit((u64 *)(kaddr + offset_in_page(gpa)), val); - kunmap_atomic(kaddr, KM_USER0); - kvm_release_page_dirty(page); + kaddr = kmap_atomic(page, KM_USER0); + kaddr += offset_in_page(gpa); + switch (bytes) { + case 1: + exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); + break; + case 2: + exchanged = CMPXCHG_TYPE(u16, kaddr, old, new); + break; + case 4: + exchanged = CMPXCHG_TYPE(u32, kaddr, old, new); + break; + case 8: + exchanged = CMPXCHG64(kaddr, old, new); + break; + default: + BUG(); } + kunmap_atomic(kaddr, KM_USER0); + kvm_release_page_dirty(page); + + if (!exchanged) + return X86EMUL_CMPXCHG_FAILED; + emul_write: -#endif + printk_once(KERN_WARNING kvm: emulating exchange as write\n); return emulator_write_emulated(addr, new, bytes, vcpu); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Do not instantiate nontrapping spte on unsync page
From: Avi Kivity a...@redhat.com The update_pte() path currently uses a nontrapping spte when a nonpresent (or nonaccessed) gpte is written. This is fine since at present it is only used on sync pages. However, on an unsync page this will cause an endless fault loop as the guest is under no obligation to invlpg a gpte that transitions from nonpresent to present. Needed for the next patch which reinstates update_pte() on invlpg. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 81eab9a..4b37e1a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -258,11 +258,17 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, pt_element_t gpte; unsigned pte_access; pfn_t pfn; + u64 new_spte; gpte = *(const pt_element_t *)pte; if (~gpte (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { - if (!is_present_gpte(gpte)) - __set_spte(spte, shadow_notrap_nonpresent_pte); + if (!is_present_gpte(gpte)) { + if (page-unsync) + new_spte = shadow_trap_nonpresent_pte; + else + new_spte = shadow_notrap_nonpresent_pte; + __set_spte(spte, new_spte); + } return; } pgprintk(%s: gpte %llx spte %p\n, __func__, (u64)gpte, spte); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: check return value against correct define
From: Gleb Natapov g...@redhat.com Check return value against correct define instead of open code the value. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4dce805..670ca8f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -566,7 +566,7 @@ static u32 group2_table[] = { #define insn_fetch(_type, _size, _eip) \ ({ unsigned long _x; \ rc = do_insn_fetch(ctxt, ops, (_eip), _x, (_size));\ - if (rc != 0)\ + if (rc != X86EMUL_CONTINUE) \ goto done; \ (_eip) += (_size); \ (_type)_x; \ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Remove pointer to rflags from realmode_set_cr parameters.
From: Gleb Natapov g...@redhat.com Mov reg, cr instruction doesn't change flags in any meaningful way, so no need to update rflags after instruction execution. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 28826c8..53f5202 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -587,8 +587,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags); unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value, -unsigned long *rflags); +void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 670ca8f..91450b5 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2534,8 +2534,7 @@ twobyte_insn: case 0x22: /* mov reg, cr */ if (c-modrm_mod != 3) goto cannot_emulate; - realmode_set_cr(ctxt-vcpu, - c-modrm_reg, c-modrm_val, ctxt-eflags); + realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val); c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5bbf47c..77f0955 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4081,13 +4081,11 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr) return value; } -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val, -unsigned long *rflags) +void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val) { switch (cr) { case 0: kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); - *rflags = kvm_get_rflags(vcpu); break; case 2: vcpu-arch.cr2 = val; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Don't follow an atomic operation by a non-atomic one
From: Avi Kivity a...@redhat.com Currently emulated atomic operations are immediately followed by a non-atomic operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu but undoes the whole point of doing things atomically. Fix by only performing the atomic operation and the mmu update, and avoiding the non-atomic write. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1302bfb..5bbf47c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3229,7 +3229,8 @@ static int emulator_write_emulated_onepage(unsigned long addr, const void *val, unsigned int bytes, struct kvm_vcpu *vcpu, - bool guest_initiated) + bool guest_initiated, + bool mmu_only) { gpa_t gpa; u32 error_code; @@ -3249,6 +3250,10 @@ static int emulator_write_emulated_onepage(unsigned long addr, if ((gpa PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) goto mmio; + if (mmu_only) { + kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1); + return X86EMUL_CONTINUE; + } if (emulator_write_phys(vcpu, gpa, val, bytes)) return X86EMUL_CONTINUE; @@ -3273,7 +3278,8 @@ int __emulator_write_emulated(unsigned long addr, const void *val, unsigned int bytes, struct kvm_vcpu *vcpu, - bool guest_initiated) + bool guest_initiated, + bool mmu_only) { /* Crossing a page boundary? */ if (((addr + bytes - 1) ^ addr) PAGE_MASK) { @@ -3281,7 +3287,7 @@ int __emulator_write_emulated(unsigned long addr, now = -addr ~PAGE_MASK; rc = emulator_write_emulated_onepage(addr, val, now, vcpu, -guest_initiated); +guest_initiated, mmu_only); if (rc != X86EMUL_CONTINUE) return rc; addr += now; @@ -3289,7 +3295,7 @@ int __emulator_write_emulated(unsigned long addr, bytes -= now; } return emulator_write_emulated_onepage(addr, val, bytes, vcpu, - guest_initiated); + guest_initiated, mmu_only); } int emulator_write_emulated(unsigned long addr, @@ -3297,7 +3303,7 @@ int emulator_write_emulated(unsigned long addr, unsigned int bytes, struct kvm_vcpu *vcpu) { - return __emulator_write_emulated(addr, val, bytes, vcpu, true); + return __emulator_write_emulated(addr, val, bytes, vcpu, true, false); } EXPORT_SYMBOL_GPL(emulator_write_emulated); @@ -3361,6 +3367,8 @@ static int emulator_cmpxchg_emulated(unsigned long addr, if (!exchanged) return X86EMUL_CMPXCHG_FAILED; + return __emulator_write_emulated(addr, new, bytes, vcpu, true, true); + emul_write: printk_once(KERN_WARNING kvm: emulating exchange as write\n); @@ -4015,7 +4023,8 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu) kvm_x86_ops-patch_hypercall(vcpu, instruction); - return __emulator_write_emulated(rip, instruction, 3, vcpu, false); + return __emulator_write_emulated(rip, instruction, 3, vcpu, +false, false); } static u64 mk_cr_64(u64 curr_cr, u32 new_val) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Provide current eip as part of emulator context.
From: Gleb Natapov g...@redhat.com Eliminate the need to call back into KVM to get it from emulator. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b048fd2..0765725 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -141,7 +141,7 @@ struct decode_cache { u8 seg_override; unsigned int d; unsigned long regs[NR_VCPU_REGS]; - unsigned long eip, eip_orig; + unsigned long eip; /* modrm */ u8 modrm; u8 modrm_mod; @@ -160,6 +160,7 @@ struct x86_emulate_ctxt { struct kvm_vcpu *vcpu; unsigned long eflags; + unsigned long eip; /* eip before instruction emulation */ /* Emulated execution mode, represented by an X86EMUL_MODE value. */ int mode; u32 cs_base; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8bd0557..2c27aa4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -667,7 +667,7 @@ static int do_insn_fetch(struct x86_emulate_ctxt *ctxt, int rc; /* x86 instructions are limited to 15 bytes. */ - if (eip + size - ctxt-decode.eip_orig 15) + if (eip + size - ctxt-eip 15) return X86EMUL_UNHANDLEABLE; eip += ctxt-cs_base; while (size--) { @@ -927,7 +927,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) /* Shadow copy of register state. Committed on successful emulation. */ memset(c, 0, sizeof(struct decode_cache)); - c-eip = c-eip_orig = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS); memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); @@ -1878,7 +1878,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } } register_address_increment(c, c-regs[VCPU_REGS_RCX], -1); - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; } if (c-src.type == OP_MEM) { @@ -2447,7 +2447,7 @@ twobyte_insn: goto done; /* Let the processor re-execute the fixed hypercall */ - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; /* Disable writeback. */ c-dst.type = OP_NONE; break; @@ -2551,7 +2551,7 @@ twobyte_insn: | ((u64)c-regs[VCPU_REGS_RDX] 32); if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; } rc = X86EMUL_CONTINUE; c-dst.type = OP_NONE; @@ -2560,7 +2560,7 @@ twobyte_insn: /* rdmsr */ if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; } else { c-regs[VCPU_REGS_RAX] = (u32)msr_data; c-regs[VCPU_REGS_RDX] = msr_data 32; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 81d417e..ca86efa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3531,6 +3531,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, vcpu-arch.emulate_ctxt.vcpu = vcpu; vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu); + vcpu-arch.emulate_ctxt.eip = kvm_rip_read(vcpu); vcpu-arch.emulate_ctxt.mode = (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : (vcpu-arch.emulate_ctxt.eflags X86_EFLAGS_VM) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: remove realmode_lmsw function.
From: Gleb Natapov g...@redhat.com Use (get|set)_cr callback to emulate lmsw inside emulator. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9d474c7..b99cec1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -583,8 +583,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context); void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5b060e4..5e2fa61 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2486,8 +2486,8 @@ twobyte_insn: c-dst.val = ops-get_cr(0, ctxt-vcpu); break; case 6: /* lmsw */ - realmode_lmsw(ctxt-vcpu, (u16)c-src.val, - ctxt-eflags); + ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu) ~0x0ful) | + (c-src.val 0x0f), ctxt-vcpu); c-dst.type = OP_NONE; break; case 7: /* invlpg*/ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b9ace70..6206600 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4099,13 +4099,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) kvm_x86_ops-set_idt(vcpu, dt); } -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags) -{ - kvm_lmsw(vcpu, msw); - *rflags = kvm_get_rflags(vcpu); -} - static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i) { struct kvm_cpuid_entry2 *e = vcpu-arch.cpuid_entries[i]; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: cleanup grp3 return value
From: Gleb Natapov g...@redhat.com When x86_emulate_insn() does not know how to emulate instruction it exits via cannot_emulate label in all cases except when emulating grp3. Fix that. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 46a7ee3..d696cbd 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1397,7 +1397,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; - int rc = X86EMUL_CONTINUE; switch (c-modrm_reg) { case 0 ... 1: /* test */ @@ -1410,11 +1409,9 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, emulate_1op(neg, c-dst, ctxt-eflags); break; default: - DPRINTF(Cannot emulate %02x\n, c-b); - rc = X86EMUL_UNHANDLEABLE; - break; + return 0; } - return rc; + return 1; } static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt, @@ -2374,9 +2371,8 @@ special_insn: c-dst.type = OP_NONE; /* Disable writeback. */ break; case 0xf6 ... 0xf7: /* Grp3 */ - rc = emulate_grp3(ctxt, ops); - if (rc != X86EMUL_CONTINUE) - goto done; + if (!emulate_grp3(ctxt, ops)) + goto cannot_emulate; break; case 0xf8: /* clc */ ctxt-eflags = ~EFLG_CF; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Provide x86_emulate_ctxt callback to get current cpl
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0c5caa4..b048fd2 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -110,6 +110,7 @@ struct x86_emulate_ops { struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); + int (*cpl)(struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5e2fa61..8bd0557 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, int rc; unsigned long val, change_mask; int iopl = (ctxt-eflags X86_EFLAGS_IOPL) IOPL_SHIFT; - int cpl = kvm_x86_ops-get_cpl(ctxt-vcpu); + int cpl = ops-cpl(ctxt-vcpu); rc = emulate_pop(ctxt, ops, val, len); if (rc != X86EMUL_CONTINUE) @@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } -static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) +static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops) { int iopl; if (ctxt-mode == X86EMUL_MODE_REAL) @@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) if (ctxt-mode == X86EMUL_MODE_VM86) return true; iopl = (ctxt-eflags X86_EFLAGS_IOPL) IOPL_SHIFT; - return kvm_x86_ops-get_cpl(ctxt-vcpu) iopl; + return ops-cpl(ctxt-vcpu) iopl; } static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt, @@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, u16 port, u16 len) { - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) if (!emulator_io_port_access_allowed(ctxt, ops, port, len)) return false; return true; @@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* Privileged instruction can be executed only in CPL=0 */ - if ((c-d Priv) kvm_x86_ops-get_cpl(ctxt-vcpu)) { + if ((c-d Priv) ops-cpl(ctxt-vcpu)) { kvm_inject_gp(ctxt-vcpu, 0); goto done; } @@ -2378,7 +2379,7 @@ special_insn: c-dst.type = OP_NONE; /* Disable writeback. */ break; case 0xfa: /* cli */ - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt-vcpu, 0); else { ctxt-eflags = ~X86_EFLAGS_IF; @@ -2386,7 +2387,7 @@ special_insn: } break; case 0xfb: /* sti */ - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt-vcpu, 0); else { toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6206600..81d417e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3479,6 +3479,11 @@ static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) } } +static int emulator_get_cpl(struct kvm_vcpu *vcpu) +{ + return kvm_x86_ops-get_cpl(vcpu); +} + static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .fetch = kvm_fetch_guest_virt, @@ -3487,6 +3492,7 @@ static struct x86_emulate_ops emulate_ops = { .cmpxchg_emulated= emulator_cmpxchg_emulated, .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, + .cpl = emulator_get_cpl, }; static void cache_all_regs(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: fix mov dr to inject #UD when needed.
From: Gleb Natapov g...@redhat.com If CR4.DE=1 access to registers DR4/DR5 cause #UD. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 836e97b..5afddcf 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2531,9 +2531,12 @@ twobyte_insn: c-dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ - if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) - goto cannot_emulate; - rc = X86EMUL_CONTINUE; + if ((ops-get_cr(4, ctxt-vcpu) X86_CR4_DE) + (c-modrm_reg == 4 || c-modrm_reg == 5)) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; + } + emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]); c-dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ @@ -2541,9 +2544,12 @@ twobyte_insn: c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ - if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) - goto cannot_emulate; - rc = X86EMUL_CONTINUE; + if ((ops-get_cr(4, ctxt-vcpu) X86_CR4_DE) + (c-modrm_reg == 4 || c-modrm_reg == 5)) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; + } + emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]); c-dst.type = OP_NONE; /* no writeback */ break; case 0x30: -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations
From: Gleb Natapov g...@redhat.com Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD for those instruction when appropriate. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5afddcf..1393bf0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1600,8 +1600,11 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) u64 msr_data; /* syscall is not available in real mode */ - if (ctxt-mode == X86EMUL_MODE_REAL || ctxt-mode == X86EMUL_MODE_VM86) - return X86EMUL_UNHANDLEABLE; + if (ctxt-mode == X86EMUL_MODE_REAL || + ctxt-mode == X86EMUL_MODE_VM86) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + return X86EMUL_PROPAGATE_FAULT; + } setup_syscalls_segments(ctxt, cs, ss); @@ -1651,14 +1654,16 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) /* inject #GP if in real mode */ if (ctxt-mode == X86EMUL_MODE_REAL) { kvm_inject_gp(ctxt-vcpu, 0); - return X86EMUL_UNHANDLEABLE; + return X86EMUL_PROPAGATE_FAULT; } /* XXX sysenter/sysexit have not been tested in 64bit mode. * Therefore, we inject an #UD. */ - if (ctxt-mode == X86EMUL_MODE_PROT64) - return X86EMUL_UNHANDLEABLE; + if (ctxt-mode == X86EMUL_MODE_PROT64) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + return X86EMUL_PROPAGATE_FAULT; + } setup_syscalls_segments(ctxt, cs, ss); @@ -1713,7 +1718,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) if (ctxt-mode == X86EMUL_MODE_REAL || ctxt-mode == X86EMUL_MODE_VM86) { kvm_inject_gp(ctxt-vcpu, 0); - return X86EMUL_UNHANDLEABLE; + return X86EMUL_PROPAGATE_FAULT; } setup_syscalls_segments(ctxt, cs, ss); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
From: Gleb Natapov g...@redhat.com If LOCK prefix is used dest arg should be memory, otherwise instruction should generate #UD. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index b89a8f2..46a7ee3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* LOCK prefix is allowed only with some instructions */ - if (c-lock_prefix !(c-d Lock)) { + if (c-lock_prefix (!(c-d Lock) || c-dst.type != OP_MEM)) { kvm_queue_exception(ctxt-vcpu, UD_VECTOR); goto done; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index db4776c..702bfff 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1508,7 +1508,7 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt, if (rc != X86EMUL_CONTINUE) return rc; - rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)selector, seg); + rc = load_segment_descriptor(ctxt, ops, (u16)selector, seg); return rc; } @@ -1683,7 +1683,7 @@ static int emulate_ret_far(struct x86_emulate_ctxt *ctxt, rc = emulate_pop(ctxt, ops, cs, c-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; - rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)cs, VCPU_SREG_CS); + rc = load_segment_descriptor(ctxt, ops, (u16)cs, VCPU_SREG_CS); return rc; } @@ -2717,7 +2717,7 @@ special_insn: if (c-modrm_reg == VCPU_SREG_SS) toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_MOV_SS); - rc = kvm_load_segment_descriptor(ctxt-vcpu, sel, c-modrm_reg); + rc = load_segment_descriptor(ctxt, ops, sel, c-modrm_reg); c-dst.type = OP_NONE; /* Disable writeback. */ break; @@ -2892,8 +2892,8 @@ special_insn: goto jmp; case 0xea: /* jmp far */ jump_far: - if (kvm_load_segment_descriptor(ctxt-vcpu, c-src2.val, - VCPU_SREG_CS)) + if (load_segment_descriptor(ctxt, ops, c-src2.val, + VCPU_SREG_CS)) goto done; c-eip = c-src.val; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: remove saved_eip
From: Gleb Natapov g...@redhat.com c-eip is never written back in case of emulation failure, so no need to set it to old value. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index b3ff673..c20 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2420,7 +2420,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { u64 msr_data; - unsigned long saved_eip = 0; struct decode_cache *c = ctxt-decode; int rc = X86EMUL_CONTINUE; @@ -2432,7 +2431,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) */ memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); - saved_eip = c-eip; if (ctxt-mode == X86EMUL_MODE_PROT64 (c-d No64)) { kvm_queue_exception(ctxt-vcpu, UD_VECTOR); @@ -2924,11 +2922,7 @@ writeback: kvm_rip_write(ctxt-vcpu, c-eip); done: - if (rc == X86EMUL_UNHANDLEABLE) { - c-eip = saved_eip; - return -1; - } - return 0; + return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; twobyte_insn: switch (c-b) { @@ -3205,6 +3199,5 @@ twobyte_insn: cannot_emulate: DPRINTF(Cannot emulate %02x\n, c-b); - c-eip = saved_eip; return -1; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Use task switch from emulator.c
From: Gleb Natapov g...@redhat.com Remove old task switch code from x86.c Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 61577ae..d6124f2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4833,553 +4833,30 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, return 0; } -static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector, - struct kvm_segment *kvm_desct) -{ - kvm_desct-base = get_desc_base(seg_desc); - kvm_desct-limit = get_desc_limit(seg_desc); - if (seg_desc-g) { - kvm_desct-limit = 12; - kvm_desct-limit |= 0xfff; - } - kvm_desct-selector = selector; - kvm_desct-type = seg_desc-type; - kvm_desct-present = seg_desc-p; - kvm_desct-dpl = seg_desc-dpl; - kvm_desct-db = seg_desc-d; - kvm_desct-s = seg_desc-s; - kvm_desct-l = seg_desc-l; - kvm_desct-g = seg_desc-g; - kvm_desct-avl = seg_desc-avl; - if (!selector) - kvm_desct-unusable = 1; - else - kvm_desct-unusable = 0; - kvm_desct-padding = 0; -} - -static void get_segment_descriptor_dtable(struct kvm_vcpu *vcpu, - u16 selector, - struct desc_ptr *dtable) -{ - if (selector 1 2) { - struct kvm_segment kvm_seg; - - kvm_get_segment(vcpu, kvm_seg, VCPU_SREG_LDTR); - - if (kvm_seg.unusable) - dtable-size = 0; - else - dtable-size = kvm_seg.limit; - dtable-address = kvm_seg.base; - } - else - kvm_x86_ops-get_gdt(vcpu, dtable); -} - -/* allowed just for 8 bytes segments */ -static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, -struct desc_struct *seg_desc) -{ - struct desc_ptr dtable; - u16 index = selector 3; - int ret; - u32 err; - gva_t addr; - - get_segment_descriptor_dtable(vcpu, selector, dtable); - - if (dtable.size index * 8 + 7) { - kvm_queue_exception_e(vcpu, GP_VECTOR, selector 0xfffc); - return X86EMUL_PROPAGATE_FAULT; - } - addr = dtable.address + index * 8; - ret = kvm_read_guest_virt_system(addr, seg_desc, sizeof(*seg_desc), -vcpu, err); - if (ret == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(vcpu, addr, err); - - return ret; -} - -/* allowed just for 8 bytes segments */ -static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, -struct desc_struct *seg_desc) -{ - struct desc_ptr dtable; - u16 index = selector 3; - - get_segment_descriptor_dtable(vcpu, selector, dtable); - - if (dtable.size index * 8 + 7) - return 1; - return kvm_write_guest_virt(dtable.address + index*8, seg_desc, sizeof(*seg_desc), vcpu, NULL); -} - -static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu, - struct desc_struct *seg_desc) -{ - u32 base_addr = get_desc_base(seg_desc); - - return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL); -} - -static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu, -struct desc_struct *seg_desc) -{ - u32 base_addr = get_desc_base(seg_desc); - - return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL); -} - -static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg) -{ - struct kvm_segment kvm_seg; - - kvm_get_segment(vcpu, kvm_seg, seg); - return kvm_seg.selector; -} - -static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg) -{ - struct kvm_segment segvar = { - .base = selector 4, - .limit = 0x, - .selector = selector, - .type = 3, - .present = 1, - .dpl = 3, - .db = 0, - .s = 1, - .l = 0, - .g = 0, - .avl = 0, - .unusable = 0, - }; - kvm_x86_ops-set_segment(vcpu, segvar, seg); - return X86EMUL_CONTINUE; -} - -static int is_vm86_segment(struct kvm_vcpu *vcpu, int seg) -{ - return (seg != VCPU_SREG_LDTR) - (seg != VCPU_SREG_TR) - (kvm_get_rflags(vcpu) X86_EFLAGS_VM); -} - -int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg) -{ - struct kvm_segment kvm_seg; - struct desc_struct seg_desc; - u8 dpl, rpl, cpl; - unsigned err_vec = GP_VECTOR; - u32 err_code = 0; - bool null_selector = !(selector ~0x3); /* -0003 are null
[COMMIT master] KVM: x86 emulator: add decoding of X, Y parameters from Intel SDM
From: Gleb Natapov g...@redhat.com Add decoding of X,Y parameters from Intel SDM which are used by string instruction to specify source and destination. Use this new decoding to implement movs, cmps, stos, lods in a generic way. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 55b8a8b..6ebd642 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -51,6 +51,7 @@ #define DstReg (21) /* Register operand. */ #define DstMem (31) /* Memory operand. */ #define DstAcc (41) /* Destination Accumulator */ +#define DstDI (51) /* Destination is in ES:(E)DI */ #define DstMask (71) /* Source operand type. */ #define SrcNone (04) /* No source operand. */ @@ -64,6 +65,7 @@ #define SrcOne (74) /* Implied '1' */ #define SrcImmUByte (84) /* 8-bit unsigned immediate operand. */ #define SrcImmU (94) /* Immediate operand, unsigned */ +#define SrcSI (0xa4) /* Source is in the DS:RSI */ #define SrcMask (0xf4) /* Generic ModRM decode. */ #define ModRM (18) @@ -177,12 +179,12 @@ static u32 opcode_table[256] = { /* 0xA0 - 0xA7 */ ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs, ByteOp | DstMem | SrcReg | Mov | MemAbs, DstMem | SrcReg | Mov | MemAbs, - ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, - ByteOp | ImplicitOps | String, ImplicitOps | String, + ByteOp | SrcSI | DstDI | Mov | String, SrcSI | DstDI | Mov | String, + ByteOp | SrcSI | DstDI | String, SrcSI | DstDI | String, /* 0xA8 - 0xAF */ - 0, 0, ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, - ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, - ByteOp | ImplicitOps | String, ImplicitOps | String, + 0, 0, ByteOp | DstDI | Mov | String, DstDI | Mov | String, + ByteOp | SrcSI | DstAcc | Mov | String, SrcSI | DstAcc | Mov | String, + ByteOp | DstDI | String, DstDI | String, /* 0xB0 - 0xB7 */ ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov, @@ -1145,6 +1147,14 @@ done_prefixes: c-src.bytes = 1; c-src.val = 1; break; + case SrcSI: + c-src.type = OP_MEM; + c-src.bytes = (c-d ByteOp) ? 1 : c-op_bytes; + c-src.ptr = (unsigned long *) + register_address(c, seg_override_base(ctxt, c), +c-regs[VCPU_REGS_RSI]); + c-src.val = 0; + break; } /* @@ -1230,6 +1240,14 @@ done_prefixes: } c-dst.orig_val = c-dst.val; break; + case DstDI: + c-dst.type = OP_MEM; + c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; + c-dst.ptr = (unsigned long *) + register_address(c, es_base(ctxt), +c-regs[VCPU_REGS_RDI]); + c-dst.val = 0; + break; } done: @@ -2388,6 +2406,16 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, return rc; } +static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned long base, + int reg, unsigned long **ptr) +{ + struct decode_cache *c = ctxt-decode; + int df = (ctxt-eflags EFLG_DF) ? -1 : 1; + + register_address_increment(c, c-regs[reg], df * c-src.bytes); + *ptr = (unsigned long *)register_address(c, base, c-regs[reg]); +} + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { @@ -2750,89 +2778,16 @@ special_insn: c-dst.val = (unsigned long)c-regs[VCPU_REGS_RAX]; break; case 0xa4 ... 0xa5: /* movs */ - c-dst.type = OP_MEM; - c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; - c-dst.ptr = (unsigned long *)register_address(c, - es_base(ctxt), - c-regs[VCPU_REGS_RDI]); - rc = ops-read_emulated(register_address(c, - seg_override_base(ctxt, c), - c-regs[VCPU_REGS_RSI]), - c-dst.val, - c-dst.bytes, ctxt-vcpu); - if (rc != X86EMUL_CONTINUE) - goto done; - register_address_increment(c, c-regs[VCPU_REGS_RSI], - (ctxt-eflags EFLG_DF) ? -c-dst.bytes - : c-dst.bytes); -
[COMMIT master] Revert KVM: x86: ignore access permissions for hypercall patching
From: Marcelo Tosatti mtosa...@redhat.com Its safer to disable the only problematic user of hypercall patching, pvmmu. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 68e8c89..bb9a24a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3243,17 +3243,12 @@ static int emulator_write_emulated_onepage(unsigned long addr, const void *val, unsigned int bytes, struct kvm_vcpu *vcpu, - bool guest_initiated, bool mmu_only) { gpa_t gpa; u32 error_code; - - if (guest_initiated) - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); - else - gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code); + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); if (gpa == UNMAPPED_GVA) { kvm_inject_page_fault(vcpu, addr, error_code); @@ -3292,7 +3287,6 @@ int __emulator_write_emulated(unsigned long addr, const void *val, unsigned int bytes, struct kvm_vcpu *vcpu, - bool guest_initiated, bool mmu_only) { /* Crossing a page boundary? */ @@ -3301,7 +3295,7 @@ int __emulator_write_emulated(unsigned long addr, now = -addr ~PAGE_MASK; rc = emulator_write_emulated_onepage(addr, val, now, vcpu, -guest_initiated, mmu_only); +mmu_only); if (rc != X86EMUL_CONTINUE) return rc; addr += now; @@ -3309,7 +3303,7 @@ int __emulator_write_emulated(unsigned long addr, bytes -= now; } return emulator_write_emulated_onepage(addr, val, bytes, vcpu, - guest_initiated, mmu_only); + mmu_only); } int emulator_write_emulated(unsigned long addr, @@ -3317,7 +3311,7 @@ int emulator_write_emulated(unsigned long addr, unsigned int bytes, struct kvm_vcpu *vcpu) { - return __emulator_write_emulated(addr, val, bytes, vcpu, true, false); + return __emulator_write_emulated(addr, val, bytes, vcpu, false); } EXPORT_SYMBOL_GPL(emulator_write_emulated); @@ -3381,7 +3375,7 @@ static int emulator_cmpxchg_emulated(unsigned long addr, if (!exchanged) return X86EMUL_CMPXCHG_FAILED; - return __emulator_write_emulated(addr, new, bytes, vcpu, true, true); + return __emulator_write_emulated(addr, new, bytes, vcpu, true); emul_write: printk_once(KERN_WARNING kvm: emulating exchange as write\n); @@ -4083,8 +4077,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu) kvm_x86_ops-patch_hypercall(vcpu, instruction); - return __emulator_write_emulated(rip, instruction, 3, vcpu, -false, false); + return __emulator_write_emulated(rip, instruction, 3, vcpu, false); } void realmode_lgdt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: restart string instruction without going back to a guest.
From: Gleb Natapov g...@redhat.com Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode, but return to a guest mode each 1024 iterations to allow interrupt injection. Pending exception causes immediate guest entry too. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 679245c..7fda16f 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -193,6 +193,7 @@ struct x86_emulate_ctxt { /* interruptibility state, as a result of execution of STI or MOV SS */ int interruptibility; + bool restart; /* restart string instruction after writeback */ /* decode cache */ struct decode_cache decode; }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c20..0467e9f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -927,8 +927,11 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) int mode = ctxt-mode; int def_op_bytes, def_ad_bytes, group; - /* Shadow copy of register state. Committed on successful emulation. */ + /* we cannot decode insn before we complete previous rep insn */ + WARN_ON(ctxt-restart); + + /* Shadow copy of register state. Committed on successful emulation. */ memset(c, 0, sizeof(struct decode_cache)); c-eip = ctxt-eip; ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS); @@ -2422,6 +2425,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) u64 msr_data; struct decode_cache *c = ctxt-decode; int rc = X86EMUL_CONTINUE; + int saved_dst_type = c-dst.type; ctxt-interruptibility = 0; @@ -2450,8 +2454,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c-rep_prefix (c-d String)) { + ctxt-restart = true; /* All REP prefixes have the same first termination condition */ if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) { + string_done: + ctxt-restart = false; kvm_rip_write(ctxt-vcpu, c-eip); goto done; } @@ -2463,17 +2470,13 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) * - if REPNE/REPNZ and ZF = 1 then done */ if ((c-b == 0xa6) || (c-b == 0xa7) || - (c-b == 0xae) || (c-b == 0xaf)) { + (c-b == 0xae) || (c-b == 0xaf)) { if ((c-rep_prefix == REPE_PREFIX) - ((ctxt-eflags EFLG_ZF) == 0)) { - kvm_rip_write(ctxt-vcpu, c-eip); - goto done; - } + ((ctxt-eflags EFLG_ZF) == 0)) + goto string_done; if ((c-rep_prefix == REPNE_PREFIX) - ((ctxt-eflags EFLG_ZF) == EFLG_ZF)) { - kvm_rip_write(ctxt-vcpu, c-eip); - goto done; - } + ((ctxt-eflags EFLG_ZF) == EFLG_ZF)) + goto string_done; } c-eip = ctxt-eip; } @@ -2907,6 +2910,12 @@ writeback: if (rc != X86EMUL_CONTINUE) goto done; + /* +* restore dst type in case the decoding will be reused +* (happens for string instruction ) +*/ + c-dst.type = saved_dst_type; + if ((c-d SrcMask) == SrcSI) string_addr_inc(ctxt, seg_override_base(ctxt, c), VCPU_REGS_RSI, c-src); @@ -2914,8 +2923,11 @@ writeback: if ((c-d DstMask) == DstDI) string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, c-dst); - if (c-rep_prefix (c-d String)) + if (c-rep_prefix (c-d String)) { register_address_increment(c, c-regs[VCPU_REGS_RCX], -1); + if (!(c-regs[VCPU_REGS_RCX] 0x3ff)) + ctxt-restart = false; + } /* Commit shadow register state. */ memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b96d629..dede682 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3755,6 +3755,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, return EMULATE_DONE; } +restart: r = x86_emulate_insn(vcpu-arch.emulate_ctxt,
[COMMIT master] KVM: small kvm_arch_vcpu_ioctl_run() cleanup.
From: Gleb Natapov g...@redhat.com Unify all conditions that get us back into emulator after returning from userspace. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dede682..68e8c89 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4543,33 +4543,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) if (!irqchip_in_kernel(vcpu-kvm)) kvm_set_cr8(vcpu, kvm_run-cr8); - if (vcpu-arch.pio.count) { - vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); - srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); - if (r == EMULATE_DO_MMIO) { - r = 0; - goto out; + if (vcpu-arch.pio.count || vcpu-mmio_needed || + vcpu-arch.emulate_ctxt.restart) { + if (vcpu-mmio_needed) { + memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8); + vcpu-mmio_read_completed = 1; + vcpu-mmio_needed = 0; } - } - if (vcpu-mmio_needed) { - memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8); - vcpu-mmio_read_completed = 1; - vcpu-mmio_needed = 0; - - vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = emulate_instruction(vcpu, vcpu-arch.mmio_fault_cr2, 0, - EMULTYPE_NO_DECODE); - srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); - if (r == EMULATE_DO_MMIO) { - /* -* Read-modify-write. Back to userspace. -*/ - r = 0; - goto out; - } - } - if (vcpu-arch.emulate_ctxt.restart) { vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: fix in/out emulation.
From: Gleb Natapov g...@redhat.com in/out emulation is broken now. The breakage is different depending on where IO device resides. If it is in userspace emulator reports emulation failure since it incorrectly interprets kvm_emulate_pio() return value. If IO device is in the kernel emulation of 'in' will do nothing since kvm_emulate_pio() stores result directly into vcpu registers, so emulator will overwrite result of emulation during commit of shadowed register. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index bd46929..679245c 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -119,6 +119,13 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); + + int (*pio_in_emulated)(int size, unsigned short port, void *val, + unsigned int count, struct kvm_vcpu *vcpu); + + int (*pio_out_emulated)(int size, unsigned short port, const void *val, + unsigned int count, struct kvm_vcpu *vcpu); + bool (*get_cached_descriptor)(struct desc_struct *desc, int seg, struct kvm_vcpu *vcpu); void (*set_cached_descriptor)(struct desc_struct *desc, diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b99cec1..776d3e2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -590,8 +590,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); struct x86_emulate_ctxt; -int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in, -int size, unsigned port); +int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in, int size, unsigned long count, int down, gva_t address, int rep, unsigned port); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a166235..c506137 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -210,13 +210,13 @@ static u32 opcode_table[256] = { 0, 0, 0, 0, 0, 0, 0, 0, /* 0xE0 - 0xE7 */ 0, 0, 0, 0, - ByteOp | SrcImmUByte, SrcImmUByte, - ByteOp | SrcImmUByte, SrcImmUByte, + ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc, + ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc, /* 0xE8 - 0xEF */ SrcImm | Stack, SrcImm | ImplicitOps, SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, + SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, + SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, /* 0xF0 - 0xF7 */ 0, 0, 0, 0, ImplicitOps | Priv, ImplicitOps, Group | Group3_Byte, Group | Group3, @@ -2422,8 +2422,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) u64 msr_data; unsigned long saved_eip = 0; struct decode_cache *c = ctxt-decode; - unsigned int port; - int io_dir_in; int rc = X86EMUL_CONTINUE; ctxt-interruptibility = 0; @@ -2819,14 +2817,10 @@ special_insn: break; case 0xe4: /* inb */ case 0xe5: /* in */ - port = c-src.val; - io_dir_in = 1; - goto do_io; + goto do_io_in; case 0xe6: /* outb */ case 0xe7: /* out */ - port = c-src.val; - io_dir_in = 0; - goto do_io; + goto do_io_out; case 0xe8: /* call (near) */ { long int rel = c-src.val; c-src.val = (unsigned long) c-eip; @@ -2851,25 +2845,29 @@ special_insn: break; case 0xec: /* in al,dx */ case 0xed: /* in (e/r)ax,dx */ - port = c-regs[VCPU_REGS_RDX]; - io_dir_in = 1; - goto do_io; + c-src.val = c-regs[VCPU_REGS_RDX]; + do_io_in: + c-dst.bytes = min(c-dst.bytes, 4u); + if (!emulator_io_permited(ctxt, ops, c-src.val, c-dst.bytes)) { + kvm_inject_gp(ctxt-vcpu, 0); + goto done; + } + if (!ops-pio_in_emulated(c-dst.bytes, c-src.val, + c-dst.val, 1, ctxt-vcpu)) + goto done; /* IO is needed */ + break; case 0xee: /* out al,dx */ case 0xef: /* out (e/r)ax,dx */ - port = c-regs[VCPU_REGS_RDX]; - io_dir_in = 0; - do_io: - if (!emulator_io_permited(ctxt, ops,
[COMMIT master] KVM: x86 emulator: Move string pio emulation into emulator.c
From: Gleb Natapov g...@redhat.com Currently emulation is done outside of emulator so things like doing ins/outs to/from mmio are broken it also makes it hard (if not impossible) to implement single stepping in the future. The implementation in this patch is not efficient since it exits to userspace for each IO while previous implementation did 'ins' in batches. Further patch that implements pio in string read ahead address this problem. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 776d3e2..26c629a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -224,14 +224,9 @@ struct kvm_pv_mmu_op_buffer { struct kvm_pio_request { unsigned long count; - int cur_count; - gva_t guest_gva; int in; int port; int size; - int string; - int down; - int rep; }; /* @@ -591,9 +586,6 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); struct x86_emulate_ctxt; int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); -int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in, - int size, unsigned long count, int down, - gva_t address, int rep, unsigned port); void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c506137..b3ff673 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -153,8 +153,8 @@ static u32 opcode_table[256] = { 0, 0, 0, 0, /* 0x68 - 0x6F */ SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* insb, insw/insd */ - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* outsb, outsw/outsd */ + DstDI | ByteOp | Mov | String, DstDI | Mov | String, /* insb, insw/insd */ + SrcSI | ByteOp | ImplicitOps | String, SrcSI | ImplicitOps | String, /* outsb, outsw/outsd */ /* 0x70 - 0x77 */ SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, @@ -2611,47 +2611,29 @@ special_insn: break; case 0x6c: /* insb */ case 0x6d: /* insw/insd */ + c-dst.bytes = min(c-dst.bytes, 4u); if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX], - (c-d ByteOp) ? 1 : c-op_bytes)) { + c-dst.bytes)) { kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (kvm_emulate_pio_string(ctxt-vcpu, - 1, - (c-d ByteOp) ? 1 : c-op_bytes, - c-rep_prefix ? - address_mask(c, c-regs[VCPU_REGS_RCX]) : 1, - (ctxt-eflags EFLG_DF), - register_address(c, es_base(ctxt), -c-regs[VCPU_REGS_RDI]), - c-rep_prefix, - c-regs[VCPU_REGS_RDX]) == 0) { - c-eip = saved_eip; - return -1; - } - return 0; + if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX], + c-dst.val, 1, ctxt-vcpu)) + goto done; /* IO is needed, skip writeback */ + break; case 0x6e: /* outsb */ case 0x6f: /* outsw/outsd */ + c-src.bytes = min(c-src.bytes, 4u); if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX], - (c-d ByteOp) ? 1 : c-op_bytes)) { + c-src.bytes)) { kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (kvm_emulate_pio_string(ctxt-vcpu, - 0, - (c-d ByteOp) ? 1 : c-op_bytes, - c-rep_prefix ? - address_mask(c, c-regs[VCPU_REGS_RCX]) : 1, - (ctxt-eflags EFLG_DF), -register_address(c, - seg_override_base(ctxt, c), -c-regs[VCPU_REGS_RSI]), - c-rep_prefix, - c-regs[VCPU_REGS_RDX]) == 0) { - c-eip = saved_eip; - return
[COMMIT master] KVM: x86 emulator: introduce pio in string read ahead.
From: Gleb Natapov g...@redhat.com To optimize rep ins instruction do IO in big chunks ahead of time instead of doing it only when required during instruction emulation. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 7fda16f..b5e12c5 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -151,6 +151,12 @@ struct fetch_cache { unsigned long end; }; +struct read_cache { + u8 data[1024]; + unsigned long pos; + unsigned long end; +}; + struct decode_cache { u8 twobyte; u8 b; @@ -178,6 +184,7 @@ struct decode_cache { void *modrm_ptr; unsigned long modrm_val; struct fetch_cache fetch; + struct read_cache io_read; }; struct x86_emulate_ctxt { diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0467e9f..266576c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1257,6 +1257,36 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + unsigned int size, unsigned short port, + void *dest) +{ + struct read_cache *rc = ctxt-decode.io_read; + + if (rc-pos == rc-end) { /* refill pio read ahead */ + struct decode_cache *c = ctxt-decode; + unsigned int in_page, n; + unsigned int count = c-rep_prefix ? + address_mask(c, c-regs[VCPU_REGS_RCX]) : 1; + in_page = (ctxt-eflags EFLG_DF) ? + offset_in_page(c-regs[VCPU_REGS_RDI]) : + PAGE_SIZE - offset_in_page(c-regs[VCPU_REGS_RDI]); + n = min(min(in_page, (unsigned int)sizeof(rc-data)) / size, + count); + if (n == 0) + n = 1; + rc-pos = rc-end = 0; + if (!ops-pio_in_emulated(size, port, rc-data, n, ctxt-vcpu)) + return 0; + rc-end = n * size; + } + + memcpy(dest, rc-data + rc-pos, size); + rc-pos += size; + return 1; +} + static u32 desc_limit_scaled(struct desc_struct *desc) { u32 limit = get_desc_limit(desc); @@ -2618,8 +2648,8 @@ special_insn: kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX], - c-dst.val, 1, ctxt-vcpu)) + if (!pio_in_emulated(ctxt, ops, c-dst.bytes, +c-regs[VCPU_REGS_RDX], c-dst.val)) goto done; /* IO is needed, skip writeback */ break; case 0x6e: /* outsb */ @@ -2835,8 +2865,8 @@ special_insn: kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (!ops-pio_in_emulated(c-dst.bytes, c-src.val, - c-dst.val, 1, ctxt-vcpu)) + if (!pio_in_emulated(ctxt, ops, c-dst.bytes, c-src.val, +c-dst.val)) goto done; /* IO is needed */ break; case 0xee: /* out al,dx */ @@ -2924,8 +2954,14 @@ writeback: string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, c-dst); if (c-rep_prefix (c-d String)) { + struct read_cache *rc = ctxt-decode.io_read; register_address_increment(c, c-regs[VCPU_REGS_RCX], -1); - if (!(c-regs[VCPU_REGS_RCX] 0x3ff)) + /* +* Re-enter guest when pio read ahead buffer is empty or, +* if it is not used, after each 1024 iteration. +*/ + if ((rc-end == 0 !(c-regs[VCPU_REGS_RCX] 0x3ff)) || + (rc-end != 0 rc-end == rc-pos)) ctxt-restart = false; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: fix 0f 01 /5 emulation
From: Gleb Natapov g...@redhat.com It is undefined and should generate #UD. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c3b9334..7c7debb 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2490,6 +2490,9 @@ twobyte_insn: (c-src.val 0x0f), ctxt-vcpu); c-dst.type = OP_NONE; break; + case 5: /* not defined */ + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; case 7: /* invlpg*/ emulate_invlpg(ctxt-vcpu, memop); /* Disable writeback. */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: populate OP_MEM operand during decoding.
From: Gleb Natapov g...@redhat.com All struct operand fields are initialized during decoding for all operand types except OP_MEM, but there is no reason for that. Move OP_MEM operand initialization into decoding stage for consistency. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 702bfff..55b8a8b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1057,6 +1057,10 @@ done_prefixes: if (c-ad_bytes != 8) c-modrm_ea = (u32)c-modrm_ea; + + if (c-rip_relative) + c-modrm_ea += c-eip; + /* * Decode and fetch the source operand: register, memory * or immediate. @@ -1091,6 +1095,8 @@ done_prefixes: break; } c-src.type = OP_MEM; + c-src.ptr = (unsigned long *)c-modrm_ea; + c-src.val = 0; break; case SrcImm: case SrcImmU: @@ -1169,8 +1175,10 @@ done_prefixes: c-src2.val = 1; break; case Src2Mem16: - c-src2.bytes = 2; c-src2.type = OP_MEM; + c-src2.bytes = 2; + c-src2.ptr = (unsigned long *)(c-modrm_ea + c-src.bytes); + c-src2.val = 0; break; } @@ -1192,6 +1200,15 @@ done_prefixes: break; } c-dst.type = OP_MEM; + c-dst.ptr = (unsigned long *)c-modrm_ea; + c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; + c-dst.val = 0; + if (c-d BitOp) { + unsigned long mask = ~(c-dst.bytes * 8 - 1); + + c-dst.ptr = (void *)c-dst.ptr + + (c-src.val mask) / 8; + } break; case DstAcc: c-dst.type = OP_REG; @@ -1215,9 +1232,6 @@ done_prefixes: break; } - if (c-rip_relative) - c-modrm_ea += c-eip; - done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } @@ -1638,14 +1652,13 @@ static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt, } static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, - struct x86_emulate_ops *ops, - unsigned long memop) + struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; u64 old, new; int rc; - rc = ops-read_emulated(memop, old, 8, ctxt-vcpu); + rc = ops-read_emulated(c-modrm_ea, old, 8, ctxt-vcpu); if (rc != X86EMUL_CONTINUE) return rc; @@ -1660,7 +1673,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, new = ((u64)c-regs[VCPU_REGS_RCX] 32) | (u32) c-regs[VCPU_REGS_RBX]; - rc = ops-cmpxchg_emulated(memop, old, new, 8, ctxt-vcpu); + rc = ops-cmpxchg_emulated(c-modrm_ea, old, new, 8, ctxt-vcpu); if (rc != X86EMUL_CONTINUE) return rc; ctxt-eflags |= EFLG_ZF; @@ -2378,7 +2391,6 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { - unsigned long memop = 0; u64 msr_data; unsigned long saved_eip = 0; struct decode_cache *c = ctxt-decode; @@ -2413,9 +2425,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) goto done; } - if (((c-d ModRM) (c-modrm_mod != 3)) || (c-d MemAbs)) - memop = c-modrm_ea; - if (c-rep_prefix (c-d String)) { /* All REP prefixes have the same first termination condition */ if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) { @@ -2447,8 +2456,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c-src.type == OP_MEM) { - c-src.ptr = (unsigned long *)memop; - c-src.val = 0; rc = ops-read_emulated((unsigned long)c-src.ptr, c-src.val, c-src.bytes, @@ -2459,8 +2466,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c-src2.type == OP_MEM) { - c-src2.ptr = (unsigned long *)(memop + c-src.bytes); - c-src2.val = 0; rc = ops-read_emulated((unsigned long)c-src2.ptr, c-src2.val, c-src2.bytes, @@ -2473,25 +2478,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) goto special_insn; -
[COMMIT master] KVM: x86 emulator: Emulate task switch in emulator.c
From: Gleb Natapov g...@redhat.com Implement emulation of 16/32 bit task switch in emulator.c Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f901467..bd46929 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -11,6 +11,8 @@ #ifndef _ASM_X86_KVM_X86_EMULATE_H #define _ASM_X86_KVM_X86_EMULATE_H +#include asm/desc_defs.h + struct x86_emulate_ctxt; /* @@ -210,5 +212,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops); int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops); +int emulator_task_switch(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +u16 tss_selector, int reason); #endif /* _ASM_X86_KVM_X86_EMULATE_H */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d696cbd..db4776c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -33,6 +33,7 @@ #include asm/kvm_emulate.h #include x86.h +#include tss.h /* * Opcode effective-address decode tables. @@ -1221,6 +1222,198 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static u32 desc_limit_scaled(struct desc_struct *desc) +{ + u32 limit = get_desc_limit(desc); + + return desc-g ? (limit 12) | 0xfff : limit; +} + +static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +u16 selector, struct desc_ptr *dt) +{ + if (selector 1 2) { + struct desc_struct desc; + memset (dt, 0, sizeof *dt); + if (!ops-get_cached_descriptor(desc, VCPU_SREG_LDTR, ctxt-vcpu)) + return; + + dt-size = desc_limit_scaled(desc); /* what if limit 65535? */ + dt-address = get_desc_base(desc); + } else + ops-get_gdt(dt, ctxt-vcpu); +} + +/* allowed just for 8 bytes segments */ +static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, struct desc_struct *desc) +{ + struct desc_ptr dt; + u16 index = selector 3; + int ret; + u32 err; + ulong addr; + + get_descriptor_table_ptr(ctxt, ops, selector, dt); + + if (dt.size index * 8 + 7) { + kvm_inject_gp(ctxt-vcpu, selector 0xfffc); + return X86EMUL_PROPAGATE_FAULT; + } + addr = dt.address + index * 8; + ret = ops-read_std(addr, desc, sizeof *desc, ctxt-vcpu, err); + if (ret == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt-vcpu, addr, err); + + return ret; +} + +/* allowed just for 8 bytes segments */ +static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, struct desc_struct *desc) +{ + struct desc_ptr dt; + u16 index = selector 3; + u32 err; + ulong addr; + int ret; + + get_descriptor_table_ptr(ctxt, ops, selector, dt); + + if (dt.size index * 8 + 7) { + kvm_inject_gp(ctxt-vcpu, selector 0xfffc); + return X86EMUL_PROPAGATE_FAULT; + } + + addr = dt.address + index * 8; + ret = ops-write_std(addr, desc, sizeof *desc, ctxt-vcpu, err); + if (ret == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt-vcpu, addr, err); + + return ret; +} + +static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, int seg) +{ + struct desc_struct seg_desc; + u8 dpl, rpl, cpl; + unsigned err_vec = GP_VECTOR; + u32 err_code = 0; + bool null_selector = !(selector ~0x3); /* -0003 are null */ + int ret; + + memset(seg_desc, 0, sizeof seg_desc); + + if ((seg = VCPU_SREG_GS ctxt-mode == X86EMUL_MODE_VM86) + || ctxt-mode == X86EMUL_MODE_REAL) { + /* set real mode segment descriptor */ + set_desc_base(seg_desc, selector 4); + set_desc_limit(seg_desc, 0x); + seg_desc.type = 3; + seg_desc.p = 1; + seg_desc.s = 1; + goto load; + } + + /* NULL selector is not valid for TR, CS and SS */ + if ((seg == VCPU_SREG_CS || seg == VCPU_SREG_SS || seg == VCPU_SREG_TR) +null_selector) + goto exception; + + /* TR should be in GDT only */ + if (seg == VCPU_SREG_TR (selector (1 2))) + goto exception; + +
[COMMIT master] KVM: x86 emulator: do not call writeback if msr access fails.
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1393bf0..b89a8f2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2563,7 +2563,7 @@ twobyte_insn: | ((u64)c-regs[VCPU_REGS_RDX] 32); if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = ctxt-eip; + goto done; } rc = X86EMUL_CONTINUE; c-dst.type = OP_NONE; @@ -2572,7 +2572,7 @@ twobyte_insn: /* rdmsr */ if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = ctxt-eip; + goto done; } else { c-regs[VCPU_REGS_RAX] = (u32)msr_data; c-regs[VCPU_REGS_RDX] = msr_data 32; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Provide callback to get/set control registers in emulator ops.
From: Gleb Natapov g...@redhat.com Use this callback instead of directly call kvm function. Also rename realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing to do with real mode. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 2666d7a..0c5caa4 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -108,7 +108,8 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); - + ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); + void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 53f5202..9d474c7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -586,8 +586,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags); -unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 91450b5..5b060e4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2483,7 +2483,7 @@ twobyte_insn: break; case 4: /* smsw */ c-dst.bytes = 2; - c-dst.val = realmode_get_cr(ctxt-vcpu, 0); + c-dst.val = ops-get_cr(0, ctxt-vcpu); break; case 6: /* lmsw */ realmode_lmsw(ctxt-vcpu, (u16)c-src.val, @@ -2519,8 +2519,7 @@ twobyte_insn: case 0x20: /* mov cr, reg */ if (c-modrm_mod != 3) goto cannot_emulate; - c-regs[c-modrm_rm] = - realmode_get_cr(ctxt-vcpu, c-modrm_reg); + c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ @@ -2534,7 +2533,7 @@ twobyte_insn: case 0x22: /* mov reg, cr */ if (c-modrm_mod != 3) goto cannot_emulate; - realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val); + ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu); c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 77f0955..b9ace70 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3423,12 +3423,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context) } EXPORT_SYMBOL_GPL(kvm_report_emulation_failure); +static u64 mk_cr_64(u64 curr_cr, u32 new_val) +{ + return (curr_cr ~((1ULL 32) - 1)) | new_val; +} + +static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) +{ + unsigned long value; + + switch (cr) { + case 0: + value = kvm_read_cr0(vcpu); + break; + case 2: + value = vcpu-arch.cr2; + break; + case 3: + value = vcpu-arch.cr3; + break; + case 4: + value = kvm_read_cr4(vcpu); + break; + case 8: + value = kvm_get_cr8(vcpu); + break; + default: + vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr); + return 0; + } + + return value; +} + +static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) +{ + switch (cr) { + case 0: + kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); + break; + case 2: + vcpu-arch.cr2 = val; + break; + case 3: + kvm_set_cr3(vcpu, val); + break; + case 4: + kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val)); + break; + case 8: + kvm_set_cr8(vcpu, val 0xfUL); + break; + default: + vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr); + } +} + static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .fetch = kvm_fetch_guest_virt, .read_emulated = emulator_read_emulated, .write_emulated =
[COMMIT master] KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. Also, if this function fails, though it might be rare, it seems to be suggesting the system's serious state: so we'd better stop the works following the kvm_creat_vm(). This patch clears these problems. We move the coalesced mmio's initialization out of kvm_create_vm(). This seems to be natural because it includes a registration which can be done only when vm is successfully created. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 22500d4..66a7391 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm-coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm-coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8c3743c..9379533 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -418,9 +418,6 @@ static struct kvm *kvm_create_vm(void) spin_lock(kvm_lock); list_add(kvm-vm_list, vm_list); spin_unlock(kvm_lock); -#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - kvm_coalesced_mmio_init(kvm); -#endif out: return kvm; @@ -1746,12 +1743,19 @@ static struct file_operations kvm_vm_fops = { static int kvm_dev_ioctl_create_vm(void) { - int fd; + int fd, r; struct kvm *kvm; kvm = kvm_create_vm(); if (IS_ERR(kvm)) return PTR_ERR(kvm); +#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET + r = kvm_coalesced_mmio_init(kvm); + if (r 0) { + kvm_put_kvm(kvm); + return r; + } +#endif fd = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR); if (fd 0) kvm_put_kvm(kvm); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.
From: Gleb Natapov g...@redhat.com Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field are ignored. Interestingly enough older spec says that 11 is only valid encoding. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 7c7debb..fa4604e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2520,28 +2520,20 @@ twobyte_insn: c-dst.type = OP_NONE; break; case 0x20: /* mov cr, reg */ - if (c-modrm_mod != 3) - goto cannot_emulate; c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ - if (c-modrm_mod != 3) - goto cannot_emulate; if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) goto cannot_emulate; rc = X86EMUL_CONTINUE; c-dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ - if (c-modrm_mod != 3) - goto cannot_emulate; ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu); c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ - if (c-modrm_mod != 3) - goto cannot_emulate; if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) goto cannot_emulate; rc = X86EMUL_CONTINUE; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: inject #UD on access to non-existing CR
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index fa4604e..836e97b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2520,6 +2520,13 @@ twobyte_insn: c-dst.type = OP_NONE; break; case 0x20: /* mov cr, reg */ + switch (c-modrm_reg) { + case 1: + case 5 ... 7: + case 9 ... 15: + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; + } c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Reinstate pte prefetch on invlpg
From: Avi Kivity a...@redhat.com Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path. And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ea1b6c6..28826c8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -389,6 +389,7 @@ struct kvm_arch { unsigned int n_free_mmu_pages; unsigned int n_requested_mmu_pages; unsigned int n_alloc_mmu_pages; + atomic_t invlpg_counter; struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; /* * Hash table of struct kvm_mmu_page. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f63c9ad..b3edc46 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2609,20 +2609,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int flooded = 0; int npte; int r; + int invlpg_counter; pgprintk(%s: gpa %llx bytes %d\n, __func__, gpa, bytes); - switch (bytes) { - case 4: - gentry = *(const u32 *)new; - break; - case 8: - gentry = *(const u64 *)new; - break; - default: - gentry = 0; - break; - } + invlpg_counter = atomic_read(vcpu-kvm-arch.invlpg_counter); /* * Assume that the pte write on a page table of the same type @@ -2630,16 +2621,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, * (might be false while changing modes). Note it is verified later * by update_pte(). */ - if (is_pae(vcpu) bytes == 4) { + if ((is_pae(vcpu) bytes == 4) || !new) { /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ - gpa = ~(gpa_t)7; - r = kvm_read_guest(vcpu-kvm, gpa, gentry, 8); + if (is_pae(vcpu)) { + gpa = ~(gpa_t)7; + bytes = 8; + } + r = kvm_read_guest(vcpu-kvm, gpa, gentry, min(bytes, 8)); if (r) gentry = 0; + new = (const u8 *)gentry; + } + + switch (bytes) { + case 4: + gentry = *(const u32 *)new; + break; + case 8: + gentry = *(const u64 *)new; + break; + default: + gentry = 0; + break; } mmu_guess_page_from_pte_write(vcpu, gpa, gentry); spin_lock(vcpu-kvm-mmu_lock); + if (atomic_read(vcpu-kvm-arch.invlpg_counter) != invlpg_counter) + gentry = 0; kvm_mmu_access_page(vcpu, gfn); kvm_mmu_free_some_pages(vcpu); ++vcpu-kvm-stat.mmu_pte_write; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 4b37e1a..067797a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -463,6 +463,7 @@ out_unlock: static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) { struct kvm_shadow_walk_iterator iterator; + gpa_t pte_gpa = -1; int level; u64 *sptep; int need_flush = 0; @@ -476,6 +477,10 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (level == PT_PAGE_TABLE_LEVEL || ((level == PT_DIRECTORY_LEVEL is_large_pte(*sptep))) || ((level == PT_PDPE_LEVEL is_large_pte(*sptep { + struct kvm_mmu_page *sp = page_header(__pa(sptep)); + + pte_gpa = (sp-gfn PAGE_SHIFT); + pte_gpa += (sptep - sp-spt) * sizeof(pt_element_t); if (is_shadow_present_pte(*sptep)) { rmap_remove(vcpu-kvm, sptep); @@ -493,7 +498,17 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (need_flush) kvm_flush_remote_tlbs(vcpu-kvm); + + atomic_inc(vcpu-kvm-arch.invlpg_counter); + spin_unlock(vcpu-kvm-mmu_lock); + + if (pte_gpa == -1) + return; + + if (mmu_topup_memory_caches(vcpu)) + return; + kvm_mmu_pte_write(vcpu, pte_gpa, NULL, sizeof(pt_element_t), 0); } static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr,
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On 03/21/2010 04:54 PM, Ingo Molnar wrote: * Avi Kivitya...@redhat.com wrote: On 03/21/2010 10:55 PM, Ingo Molnar wrote: Of course you could say the following: ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not able to add this to the v2.6.35 kernel queue anymore as the ongoing usability work already takes up all of the project's maintainer and testing bandwidth. If you want the feature to be merged sooner than that then please help us cut down on the TODO and BUGS list that can be found at XYZ. There's quite a few low hanging fruits there. ' That would be shooting at my own foot as well as the contributor's since I badly want that RCU stuff, and while a GUI would be nice, that itch isn't on my back. I think this sums up the root cause of all the problems i see with KVM pretty well. A good maintainer has to strike a balance between asking more of people than what they initially volunteer and getting people to implement the less fun things that are nonetheless required. The kernel can take this to an extreme because at the end of the day, it's the only game in town and there is an unending number of potential volunteers. Most other projects are not quite as fortunate. When someone submits a patch set to QEMU implementing a new network backend for raw sockets, we can push back about how it fits into the entire stack wrt security, usability, etc. Ultimately, we can arrive at a different, more user friendly solution (networking helpers) and along with some time investment on my part, we can create a much nicer, more user friendly solution. Still command line based though. Responding to such a patch set with, replace the SDL front end with a GTK one that lets you graphically configure networking, is not reasonable and the result would be one less QEMU contributor in the long run. Overtime, we can, and are, pushing people to focus more on usability. But that doesn't get you a first class GTK GUI overnight. The only way you're going to get that is by having a contributor be specifically interesting in building such a thing. We simply haven't had that in the past 5 years that I've been involved in the project. If someone stepped up to build this, I'd certainly support it in every way possible and there are probably some steps we could take to even further encourage this. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On 03/21/2010 05:00 PM, Ingo Molnar wrote: If that is the theory then it has failed to trickle through in practice. As you know i have reported a long list of usability problems with hardly a look. That list could be created by pretty much anyone spending a few minutes of getting a first impression with qemu-kvm. Can you transfer your list to the following wiki page: http://wiki.qemu.org/Features/Usability This thread is so large that I can't find your note that contained the initial list. I want to make sure this input doesn't die once this thread settles down. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] KVM-Test: Add kvm userspace unit test
OK, I approve of your suggestion. - Lucas Meneghel Rodrigues l...@redhat.com 写道: I have an update about this test after talking to Naphtali Sprei: This patch does the unit testing using the old way of invoking it, and Avi superseded it with a new -kernel option. Naphtali is working in making the new way of doing the test work, so I will wait until we can merge both ways of doing this test, OK? On Thu, Mar 18, 2010 at 12:16 AM, Lucas Meneghel Rodrigues l...@redhat.com wrote: Hi Shuxi, sorry that it took so long before I could give you return on this one. The general idea is just fine, but there is one gotcha that will need more thought: This is dependent of having the KVM source code for testing (ie, it depends on the build test *and* the build mode has to involve source code, such as git builds, things like koji install will also not work). Since by default we are not making the tests depending directly on build, so we have to figure out a way to have this integrated without breaking things for users who are not interested to run the build test. Today I was reviewing the qemu-img functional test, so it occurred to me that all those tests that do not depend on guests and different qemu command line options, we can make them all dependent on the build test. This way we'd have the separation that we need, still not breaking anything for users that do not care about build and other types of test. Michael, what do you think? Should we put the config of tests like this one and qemu_img on build.cfg, making them depend on build? Oh Shuxi, on the code below I have some small comments to make: On Fri, Mar 5, 2010 at 3:22 AM, sshang ssh...@redhat.com wrote: The test use kvm test harness kvmctl load binary test case file to test various function of kvm kernel module. Signed-off-by: sshang ssh...@redhat.com --- client/tests/kvm/tests/unit_test.py | 29 + client/tests/kvm/tests_base.cfg.sample | 7 +++ 2 files changed, 36 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/tests/unit_test.py diff --git a/client/tests/kvm/tests/unit_test.py b/client/tests/kvm/tests/unit_test.py new file mode 100644 index 000..9bc7441 --- /dev/null +++ b/client/tests/kvm/tests/unit_test.py @@ -0,0 +1,29 @@ +import os +from autotest_lib.client.bin import utils +from autotest_lib.client.common_lib import error + +def run_unit_test(test, params, env): + + This is kvm userspace unit test, use kvm test harness kvmctl load binary + test case file to test various function of kvm kernel module. + The output of all unit test can be found in the test result dir. + + + case_list = params.get(case_list,access apic emulator hypercall irq\ + port80 realmode sieve smptest tsc stringio vmexit).split() + srcdir = params.get(srcdir,test.srcdir) + user_dir = os.path.join(srcdir,kvm_userspace/kvm/user) + os.chdir(user_dir) + test_fail_list = [] + + for i in case_list: + result_file = test.outputdir + / + i + testfile = i + .flat + results = utils.system(./kvmctl test/x86/bootstrap test/x86/ + \ + testfile ++ result_file,ignore_status=True) About the above statement: In general you should not use shell redirection to write the output of your program to the log files. Please take advantage of the fact utils.run allow you to connect stdout and stderr pipes to the result file. Also, utils.run return a CmdResult object, hat has a list of useful properties out of it. + if results != 0: + test_fail_list.append(i) + + if test_fail_list: + raise error.TestFail( + .join(test_fail_list) + \ + ) In the above, you could just have used raise error.TestFail(KVM module unit test failed. Test cases failed: %s % test_fail_list) IMHO it's easier to understand. diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 040d0c3..0918c26 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -300,6 +300,13 @@ variants: shutdown_method = shell kill_vm = yes kill_vm_gracefully = no + + - unit_test: + type = unit_test + case_list = access apic emulator hypercall msr port80 realmode sieve smptest tsc stringio vmexit + #srcdir should be same as build.cfg + srcdir = + vms = '' # Do not define test variants below shutdown -- 1.5.5.6 ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- Lucas -- Lucas -- To unsubscribe from this list: send
[KVM-AUTOTEST PATCH 1/5] KVM test: kvm_preprocessing.py: minor style corrections
Also, fetch the KVM version before setting up the VMs. Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm/kvm_preprocessing.py | 58 +++- 1 files changed, 27 insertions(+), 31 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index e91d1da..e3a5501 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -58,8 +58,8 @@ def preprocess_vm(test, params, env, name): for_migration = False if params.get(start_vm_for_migration) == yes: -logging.debug('start_vm_for_migration' specified; (re)starting VM with - -incoming option...) +logging.debug('start_vm_for_migration' specified; (re)starting VM + with -incoming option...) start_vm = True for_migration = True elif params.get(restart_vm) == yes: @@ -187,12 +187,12 @@ def preprocess(test, params, env): @param env: The environment (a dict-like object). # Start tcpdump if it isn't already running -if not env.has_key(address_cache): +if address_cache not in env: env[address_cache] = {} -if env.has_key(tcpdump) and not env[tcpdump].is_alive(): +if tcpdump in env and not env[tcpdump].is_alive(): env[tcpdump].close() del env[tcpdump] -if not env.has_key(tcpdump): +if tcpdump not in env: command = /usr/sbin/tcpdump -npvi any 'dst port 68' logging.debug(Starting tcpdump (%s)..., command) env[tcpdump] = kvm_subprocess.kvm_tail( @@ -208,35 +208,23 @@ def preprocess(test, params, env): # Destroy and remove VMs that are no longer needed in the environment requested_vms = kvm_utils.get_sub_dict_names(params, vms) -for key in env.keys(): +for key in env: vm = env[key] if not kvm_utils.is_vm(vm): continue if not vm.name in requested_vms: -logging.debug(VM '%s' found in environment but not required for - test; removing it... % vm.name) +logging.debug(VM '%s' found in environment but not required for + test; removing it... % vm.name) vm.destroy() del env[key] -# Execute any pre_commands -if params.get(pre_command): -process_command(test, params, env, params.get(pre_command), -int(params.get(pre_command_timeout, 600)), -params.get(pre_command_noncritical) == yes) - -# Preprocess all VMs and images -process(test, params, env, preprocess_image, preprocess_vm) - # Get the KVM kernel module version and write it as a keyval logging.debug(Fetching KVM module version...) if os.path.exists(/dev/kvm): -kvm_version = os.uname()[2] try: -file = open(/sys/module/kvm/version, r) -kvm_version = file.read().strip() -file.close() +kvm_version = open(/sys/module/kvm/version).read().strip() except: -pass +kvm_version = os.uname()[2] else: kvm_version = Unknown logging.debug(KVM module not loaded) @@ -248,16 +236,24 @@ def preprocess(test, params, env): qemu_path = kvm_utils.get_path(test.bindir, params.get(qemu_binary, qemu)) version_line = commands.getoutput(%s -help | head -n 1 % qemu_path) -exp = re.compile([Vv]ersion .*?,) -match = exp.search(version_line) -if match: -kvm_userspace_version = .join(match.group().split()[1:]).strip(,) +matches = re.findall([Vv]ersion .*?,, version_line) +if matches: +kvm_userspace_version = .join(matches[0].split()[1:]).strip(,) else: kvm_userspace_version = Unknown logging.debug(Could not fetch KVM userspace version) logging.debug(KVM userspace version: %s % kvm_userspace_version) test.write_test_keyval({kvm_userspace_version: kvm_userspace_version}) +# Execute any pre_commands +if params.get(pre_command): +process_command(test, params, env, params.get(pre_command), +int(params.get(pre_command_timeout, 600)), +params.get(pre_command_noncritical) == yes) + +# Preprocess all VMs and images +process(test, params, env, preprocess_image, preprocess_vm) + def postprocess(test, params, env): @@ -276,8 +272,8 @@ def postprocess(test, params, env): # Should we convert PPM files to PNG format? if params.get(convert_ppm_files_to_png) == yes: -logging.debug('convert_ppm_files_to_png' specified; converting PPM - files to PNG format...) +logging.debug('convert_ppm_files_to_png' specified; converting PPM + files to PNG format...) try: for f in
[KVM-AUTOTEST PATCH 2/5] KVM test: kvm.py: make sure all dump_env() calls are inside 'finally' blocks
Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm/kvm.py | 29 +++-- 1 files changed, 19 insertions(+), 10 deletions(-) diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py index 9b8a10c..c6e146d 100644 --- a/client/tests/kvm/kvm.py +++ b/client/tests/kvm/kvm.py @@ -21,6 +21,7 @@ class kvm(test.test): (Online doc - Getting started with KVM testing) version = 1 + def run_once(self, params): # Report the parameters we've received and write them as keyvals logging.debug(Test parameters:) @@ -33,7 +34,7 @@ class kvm(test.test): # Open the environment file env_filename = os.path.join(self.bindir, params.get(env, env)) env = kvm_utils.load_env(env_filename, {}) -logging.debug(Contents of environment: %s % str(env)) +logging.debug(Contents of environment: %s, str(env)) try: try: @@ -50,22 +51,30 @@ class kvm(test.test): f.close() # Preprocess -kvm_preprocessing.preprocess(self, params, env) -kvm_utils.dump_env(env, env_filename) +try: +kvm_preprocessing.preprocess(self, params, env) +finally: +kvm_utils.dump_env(env, env_filename) # Run the test function run_func = getattr(test_module, run_%s % t_type) -run_func(self, params, env) -kvm_utils.dump_env(env, env_filename) +try: +run_func(self, params, env) +finally: +kvm_utils.dump_env(env, env_filename) except Exception, e: logging.error(Test failed: %s, e) logging.debug(Postprocessing on error...) -kvm_preprocessing.postprocess_on_error(self, params, env) -kvm_utils.dump_env(env, env_filename) +try: +kvm_preprocessing.postprocess_on_error(self, params, env) +finally: +kvm_utils.dump_env(env, env_filename) raise finally: # Postprocess -kvm_preprocessing.postprocess(self, params, env) -logging.debug(Contents of environment: %s, str(env)) -kvm_utils.dump_env(env, env_filename) +try: +kvm_preprocessing.postprocess(self, params, env) +finally: +kvm_utils.dump_env(env, env_filename) +logging.debug(Contents of environment: %s, str(env)) -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 4/5] KVM test: make kvm_stat usage optional
Relying on the test tag is not cool. Use a dedicated parameter instead. By default, all tests except build tests will use kvm_stat. Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm/kvm_utils.py |8 client/tests/kvm/tests_base.cfg.sample |3 +++ 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index cc39b5d..5834539 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -845,8 +845,8 @@ def run_tests(test_list, job): @return: True, if all tests ran passed, False if any of them failed. status_dict = {} - failed = False + for dict in test_list: if dict.get(skip) == yes: continue @@ -863,12 +863,12 @@ def run_tests(test_list, job): test_tag = dict.get(shortname) # Setting up kvm_stat profiling during test execution. # We don't need kvm_stat profiling on the build tests. -if build in test_tag: +if dict.get(run_kvm_stat) == yes: +profile = True +else: # None because it's the default value on the base_test class # and the value None is specifically checked there. profile = None -else: -profile = True if profile: job.profilers.add('kvm_stat') diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 9963a44..b13aec4 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -40,6 +40,9 @@ nic_mode = user nic_script = scripts/qemu-ifup address_index = 0 +# Misc +run_kvm_stat = yes + # Tests variants: -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 3/5] KVM test: kvm_utils.load_env(): do not fail if env file is corrupted
- Include the unpickling code in the 'try' block, so that an exception raised during unpickling will not fail the test. - Change the default env (returned by load_env() when the file is missing or corrupt) to {}. Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm/kvm_utils.py | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index d386456..cc39b5d 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -22,7 +22,7 @@ def dump_env(obj, filename): file.close() -def load_env(filename, default=None): +def load_env(filename, default={}): Load KVM test environment from an environment file. @@ -30,11 +30,13 @@ def load_env(filename, default=None): try: file = open(filename, r) +obj = cPickle.load(file) +file.close() +return obj +# Almost any exception can be raised during unpickling, so let's catch +# them all except: return default -obj = cPickle.load(file) -file.close() -return obj def get_sub_dict(dict, name): -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 5/5] KVM test: take frequent screendumps during all tests
Screendumps are taken regularly and converted to JPEG format. They are stored in .../debug/screendumps_VMname/. Requires python-imaging. - Enabled by 'take_regular_screendumps = yes' (naming suggestions welcome). - Delay between screendumps is controlled by 'screendump_delay' (default 5). - Compression quality is controlled by 'screendump_quality' (default 30). - It's probably a good idea to dump them to /dev/shm before converting them in order to minimize disk use. This can be enabled by 'screendump_temp_dir = /dev/shm' (commented out by default because I'm not sure /dev/shm is available on all machines.) - Screendumps are removed unless 'keep_screendumps'['_on_error'] is 'yes'. The recommended setting when submitting jobs from autoserv is 'keep_screendumps_on_error = yes', which means screendumps are kept only if the test fails. Keeping all screendumps may use up all of the server's storage space. This patch sets reasonable defaults in tests_base.cfg.sample. (It also makes sure post_command is executed last in the postprocessing procedure -- otherwise post_command failure can prevent other postprocessing steps (like removing the screendump dirs) from taking place.) Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm/kvm_preprocessing.py | 85 +-- client/tests/kvm/tests_base.cfg.sample | 13 - 2 files changed, 89 insertions(+), 9 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index e3a5501..0e4ce87 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -1,4 +1,4 @@ -import sys, os, time, commands, re, logging, signal, glob +import sys, os, time, commands, re, logging, signal, glob, threading, shutil from autotest_lib.client.bin import test, utils from autotest_lib.client.common_lib import error import kvm_vm, kvm_utils, kvm_subprocess, ppm_utils @@ -11,6 +11,10 @@ except ImportError: 'distro.') +_screendump_thread = None +_screendump_thread_termination_event = None + + def preprocess_image(test, params): Preprocess a single QEMU image according to the instructions in params. @@ -254,6 +258,14 @@ def preprocess(test, params, env): # Preprocess all VMs and images process(test, params, env, preprocess_image, preprocess_vm) +# Start the screendump thread +if params.get(take_regular_screendumps) == yes: +global _screendump_thread, _screendump_thread_termination_event +_screendump_thread_termination_event = threading.Event() +_screendump_thread = threading.Thread(target=_take_screendumps, + args=(test, params, env)) +_screendump_thread.start() + def postprocess(test, params, env): @@ -263,8 +275,15 @@ def postprocess(test, params, env): @param params: Dict containing all VM and image parameters. @param env: The environment (a dict-like object). +# Postprocess all VMs and images process(test, params, env, postprocess_image, postprocess_vm) +# Terminate the screendump thread +global _screendump_thread, _screendump_thread_termination_event +if _screendump_thread: +_screendump_thread_termination_event.set() +_screendump_thread.join(10) + # Warn about corrupt PPM files for f in glob.glob(os.path.join(test.debugdir, *.ppm)): if not ppm_utils.image_verify_ppm_file(f): @@ -290,11 +309,13 @@ def postprocess(test, params, env): for f in glob.glob(os.path.join(test.debugdir, '*.ppm')): os.unlink(f) -# Execute any post_commands -if params.get(post_command): -process_command(test, params, env, params.get(post_command), -int(params.get(post_command_timeout, 600)), -params.get(post_command_noncritical) == yes) +# Should we keep the screendump dirs? +if params.get(keep_screendumps) != yes: +logging.debug('keep_screendumps' not specified; removing screendump + dirs...) +for d in glob.glob(os.path.join(test.debugdir, screendumps_*)): +if os.path.isdir(d) and not os.path.islink(d): +shutil.rmtree(d, ignore_errors=True) # Kill all unresponsive VMs if params.get(kill_unresponsive_vms) == yes: @@ -318,6 +339,12 @@ def postprocess(test, params, env): env[tcpdump].close() del env[tcpdump] +# Execute any post_commands +if params.get(post_command): +process_command(test, params, env, params.get(post_command), +int(params.get(post_command_timeout, 600)), +params.get(post_command_noncritical) == yes) + def postprocess_on_error(test, params, env): @@ -343,3 +370,49 @@ def _update_address_cache(address_cache, line): mac_address, address_cache.get(last_seen))
Re: Streaming Audio from Virtual Machine
On 03/21/2010 01:12 PM, Gus Zernial wrote: I'm using Kubuntu 9.10 32-bit on a quad-core Phenom II with Gigabit ethernet. I want to stream audio from MLB.com from a WinXP client thru a Linksys WMB54G wireless music bridge. Note that there are drivers for the WMB54G only for WinXP and Vista. If I stream the audio thru a native WinXP box thru the WMB54G, all is well and the audio sounds fine. When I try to stream thru a WinXP virtual machine on Kubuntu 9.10, the audio is poor quality and subject to gaps and dropping the stream altogether. So far I've tried KVM/QEMU and VirtualBox, same result. Regards KVM/QEMU, I note AMD-V is activated in the BIOS, and I have a custom 2.6.32.7 kernel, and QEMU 0.11.0. The kvm kvm_amd modules are compiled in and loaded. I've been using bridged networking . I think it's set up correctly but I confess I'm no networking expert. My start command for the WinXP virtual machine is: sudo /usr/bin/qemu -m 1024 -boot c -netnic,vlan=0,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=/etc/qemu-ifup -localtime -soundhw ac97 -smp 4 -fda /dev/fd0 -vga std -usb /home/rbroman/windows.img I also tried model=virtio but that didn't help. I suspect this is a virtual machine networking problem but I'm not sure. So my questions are: -What's the best/fastest networking option and how do I set it up? Pointers to step-by-step instructions appreciated. -Is it possible I have a problem other than networking? Configuration problem with KVM/QEMU? Or could there be a problem with the WMB54G driver when used thru a virtual machine? -Is there a better virtual machine solution than KVM/QEMU for what I'm trying to do? [dsa] I have been able to stream and video in a KVM-hosted winxp VM, and I have even watched a netflix-based movie. My laptop has a Core-2 duo cpu, T9550, with 4 GB of RAM. Networking at home is through a wireless-N router, and I use bridged networking and NAT for VMs. Host activity definitely has an impact. When streaming I make sure I am not doing any heavy activity in the host layer, and if I notice jitter the first thing I do is up the priority of the VM threads using chrt. David Recommendations appreciated - Gus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] qemu-kvm: Introduce wrapper functions to access phys_ram_dirty, and replace existing direct accesses to it.
Marcelo Tosatti wrote: On Wed, Mar 17, 2010 at 02:51:46PM +0900, Yoshiaki Tamura wrote: Before replacing byte-based dirty bitmap with bit-based dirty bitmap, clearing direct accesses to the bitmap first seems to be good point to start with. This patch set is based on the following discussion. http://www.mail-archive.com/kvm@vger.kernel.org/msg30724.html Thanks, Yoshi Looks fine to me. This is qemu upstream material, though. Thanks for your comment. I should have removed qemu-kvm from the title. Should I rebase the patch to qemu.git and repost? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote: When creating a guest with 2 virtio-net interfaces, i am running into a issue causing the 2nd i/f falling back to userpace virtio even when vhost is enabled. After some debugging, it turned out that KVM_IOEVENTFD ioctl() call in qemu is failing with ENOSPC. This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev() routine in the host kernel. I think we need to increase this limit if we want to support multiple network interfaces using vhost-net. Is there an alternate solution? Thanks Sridhar Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On 03/20/2010 04:59 PM, Andrea Arcangeli wrote: On Fri, Mar 19, 2010 at 09:21:49AM +0200, Avi Kivity wrote: On 03/19/2010 12:44 AM, Ingo Molnar wrote: Too bad - there was heavy initial opposition to the arch/x86 unification as well [and heavy opposition to tools/perf/ as well], still both worked out extremely well :-) Did you forget that arch/x86 was a merging of a code fork that happened several years previously? Maybe that fork shouldn't have been done to begin with. We discussed and probably timidly tried to share the sharable initially but we realized it was too time wasteful. In addition to having to adapt the code to 64bit we would also had to constantly solve another problem on top of it (see the various split on _32/_64, those takes time to achieve, maybe not huge time but still definitely some time and effort). Even in retrospect I am quite sure the way x86-64 happened was optimal and if we would go back we would do it again the exact same way even if the final object was to have a common arch/x86 (and thankfully Linus is flexible and smart enough to realize that code that isn't risking to destabilize anything shouldn't be forced out just because it's not to a totally theoretical-perfect-nitpicking-clean-state yet). It's still a lot of work do the unification later as a separate task, but it's not like if we did it immediately it would have been a lot less work. It's about the same amount of effort and we were able to defer it for later and decrease the time to market which surely has contributed to the success of x86-64. In hindsight decisions are much easier. I agree it was less risky to fork than to share. But if another instruction set forks out a 64-bit not-exactly-compatible variant, I'm sure we'll start out shared and not fork it, especially if the platform remains the same. Problem of qemu is not some lack of GUI or that it's not included in the linux kernel git tree, the definitive problem is how to merge qemu-kvm/kvm and qlx into it. If you (Avi) were the qemu maintainer I am sure there wouldn't two trees so as a developer I would totally love it, and I am sure that with you as maintainer it would have a chance to move forward with qlx on desktop virtualization without proposing to extend vnc instead to achieve a similar result (imagine if btrfs is published on a website and people starts to discuss if it should ever be merged ever because reinventing some part of btrfs inside ext5 might achieve similar results). The qemu/qemu-kvm fork is definitely hurting. Some history: when kvm started out I pulled qemu for fast hacking and, much like arch/x86_64, I couldn't destabilize qemu for something that was completely experimental (and closed source at the time). Moreover, it wasn't clear if the qemu community would be interested. The qemu-kvm fork was designed for minimal intrusion so I could merge upstream qemu regularly. This resulted in kvm integration that was fairly ugly. Later Anthony merged a well-integrated alternative implementation (in retrospect this was a mistake IMO - we were left with a well tested high performing ugly implementation and a clean, slow, untested, and unfeatured implementation, and no one who wants to merge the two). So now it is pretty confusing to read the code which has the two alternate implementation sometimes sharing code and sometimes diverging. About a GUI for KVM to use on desktop distributions, that is an irrelevant concern compared to the lack of protocol more efficient than rdesktop/rdp/vnc for desktop virtualization. I've people asking me to migrate hundreds of desktops to desktop virtualization on KVM in their organizations and I tell them to use spice because I believe it's the most efficient option available (at least as far as we stick to open source open protocols), there are universities using spice on thousand of student desktops, and I think we need paravirt graphics to happen ASAP in the main qemu tree too. That effort will have to wait for the spice project to mature. In short: running KVM on the desktop is irrelevant compared to running the desktop on KVM so I suggest to focus on what is more important first ;). Anyone can focus on what interests them, if someone has an interest in a good desktop-on-desktop experience they should start hacking and sending patches. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On 03/21/2010 02:13 AM, Sebastian Hetze wrote: Hi *, in an 6 CPU SMP guest running on an host with 2 quad core Intel Xeon E5520 with hyperthrading enabled we see one or more guest CPUs working in a very strange pattern. It looks like all or nothing. We can easily identify the effected CPU with xosview. Here is the mpstat output compared to one regular working CPU: mpstat -P 4 1 Linux 2.6.31-16-generic-pae (guest) 21.03.2010 _i686_ (6 CPU) 00:45:19 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 00:45:20 40,00 100,000,000,000,000,000,00 0,000,00 00:45:21 40,00 100,000,000,000,000,000,00 0,000,00 00:45:22 40,00 100,000,000,000,000,000,00 0,000,00 00:45:23 40,00 100,000,000,000,000,000,00 0,000,00 00:45:24 40,00 66,670,000,000,00 33,330,00 0,000,00 00:45:25 40,00 100,000,000,000,000,000,00 0,000,00 00:45:26 40,00 100,000,000,000,000,000,00 0,000,00 Looks like the guest is only receiving 3-4 timer interrupts per second, so time becomes quantized. Please run the attached irqtop in the affected guest and report the results. Is the host overly busy? What host kernel, kvm, and qemu are you running? Is the guest running an I/O workload? if so, how are the disks configured? -- error compiling committee.c: too many arguments to function #!/usr/bin/python import curses import sys, os, time, optparse def read_interrupts(): irq = {} proc = file('/proc/interrupts') nrcpu = len(proc.readline().split()) for line in proc.readlines(): vec, data = line.strip().split(':', 1) if vec in ('ERR', 'MIS'): continue counts = data.split(None, nrcpu) counts, rest = (counts[:-1], counts[-1]) count = sum([int(x) for x in counts]) try: v = int(vec) name = rest.split(None, 1)[1] except: name = rest irq[name] = count return irq def delta_interrupts(): old = read_interrupts() while True: irq = read_interrupts() delta = {} for key in irq.keys(): delta[key] = irq[key] - old[key] yield delta old = irq label_width = 30 number_width = 10 def tui(screen): curses.use_default_colors() curses.noecho() def getcount(x): return x[1] def refresh(irq): screen.erase() screen.addstr(0, 0, 'irqtop') row = 2 for name, count in sorted(irq.items(), key = getcount, reverse = True): if row = screen.getmaxyx()[0]: break col = 1 screen.addstr(row, col, name) col += label_width screen.addstr(row, col, '%10d' % (count,)) row += 1 screen.refresh() for irqs in delta_interrupts(): refresh(irqs) curses.halfdelay(10) try: c = screen.getkey() if c == 'q': break except KeyboardInterrupt: break except curses.error: continue import curses.wrapper curses.wrapper(tui)
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote: On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote: When creating a guest with 2 virtio-net interfaces, i am running into a issue causing the 2nd i/f falling back to userpace virtio even when vhost is enabled. After some debugging, it turned out that KVM_IOEVENTFD ioctl() call in qemu is failing with ENOSPC. This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev() routine in the host kernel. I think we need to increase this limit if we want to support multiple network interfaces using vhost-net. Is there an alternate solution? Thanks Sridhar Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. Increase it to 200, then. Is the limit visible to userspace? If not, we need to expose it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On Sun, Mar 21, 2010 at 12:11:33PM +0200, Avi Kivity wrote: On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote: On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote: When creating a guest with 2 virtio-net interfaces, i am running into a issue causing the 2nd i/f falling back to userpace virtio even when vhost is enabled. After some debugging, it turned out that KVM_IOEVENTFD ioctl() call in qemu is failing with ENOSPC. This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev() routine in the host kernel. I think we need to increase this limit if we want to support multiple network interfaces using vhost-net. Is there an alternate solution? Thanks Sridhar Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. Increase it to 200, then. OK. I think we'll also need a smarter allocator than bus-dev_count++ than we now have. Right? Is the limit visible to userspace? If not, we need to expose it. I don't think it's visible: it seems to be used in a single place in kvm. Let's add an ioctl? Note that qemu doesn't need it now ... -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On Sun, Mar 21, 2010 at 12:11:33PM +0200, Avi Kivity wrote: On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote: On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote: When creating a guest with 2 virtio-net interfaces, i am running into a issue causing the 2nd i/f falling back to userpace virtio even when vhost is enabled. After some debugging, it turned out that KVM_IOEVENTFD ioctl() call in qemu is failing with ENOSPC. This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev() routine in the host kernel. I think we need to increase this limit if we want to support multiple network interfaces using vhost-net. Is there an alternate solution? Thanks Sridhar Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. Increase it to 200, then. Currently on each device read/write we iterate over all registered devices. This is not scalable. Is the limit visible to userspace? If not, we need to expose it. -- error compiling committee.c: too many arguments to function -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On 03/21/2010 12:15 PM, Michael S. Tsirkin wrote: Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. Increase it to 200, then. OK. I think we'll also need a smarter allocator than bus-dev_count++ than we now have. Right? No, why? Eventually we'll want faster scanning than the linear search we employ now, though. Is the limit visible to userspace? If not, we need to expose it. I don't think it's visible: it seems to be used in a single place in kvm. Let's add an ioctl? Note that qemu doesn't need it now ... We usually expose limits via KVM_CHECK_EXTENSION(KVM_CAP_BLAH). We can expose it via KVM_CAP_IOEVENTFD (and need to reserve iodev entries for those). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On 03/21/2010 12:21 PM, Gleb Natapov wrote: On Sun, Mar 21, 2010 at 12:11:33PM +0200, Avi Kivity wrote: On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote: On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote: When creating a guest with 2 virtio-net interfaces, i am running into a issue causing the 2nd i/f falling back to userpace virtio even when vhost is enabled. After some debugging, it turned out that KVM_IOEVENTFD ioctl() call in qemu is failing with ENOSPC. This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev() routine in the host kernel. I think we need to increase this limit if we want to support multiple network interfaces using vhost-net. Is there an alternate solution? Thanks Sridhar Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. Increase it to 200, then. Currently on each device read/write we iterate over all registered devices. This is not scalable. Yeah. We need first to drop the callback based matching and replace it with explicit ranges, then to replace the search with a hash table for small ranges (keeping a linear search for large ranges, can happen for coalesced mmio). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.
Make sure that rflags is committed only after successful instruction emulation. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c |1 + arch/x86/kvm/x86.c |8 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b5e12c5..a1319c8 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -136,6 +136,7 @@ struct x86_emulate_ops { ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); + void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 266576c..c1aa983 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2968,6 +2968,7 @@ writeback: /* Commit shadow register state. */ memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); kvm_rip_write(ctxt-vcpu, c-eip); + ops-set_rflags(ctxt-vcpu, ctxt-eflags); done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bb9a24a..3fa70b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3643,6 +3643,11 @@ static void emulator_set_segment_selector(u16 sel, int seg, kvm_set_segment(vcpu, kvm_seg, seg); } +static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + kvm_x86_ops-set_rflags(vcpu, rflags); +} + static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .write_std = kvm_write_guest_virt_system, @@ -3660,6 +3665,7 @@ static struct x86_emulate_ops emulate_ops = { .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, .cpl = emulator_get_cpl, + .set_rflags = emulator_set_rflags, }; static void cache_all_regs(struct kvm_vcpu *vcpu) @@ -3780,8 +3786,6 @@ restart: return EMULATE_DO_MMIO; } - kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); - if (vcpu-mmio_is_write) { vcpu-mmio_needed = 0; return EMULATE_DO_MMIO; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.
Decode CMPXCHG8B destination operand in decoding stage. Fixes regression introduced by If LOCK prefix is used dest arg should be memory commit. This commit relies on dst operand be decoded at the beginning of an instruction emulation. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c | 24 ++-- 1 files changed, 10 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c1aa983..904351e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -52,6 +52,7 @@ #define DstMem (31) /* Memory operand. */ #define DstAcc (41) /* Destination Accumulator */ #define DstDI (51) /* Destination is in ES:(E)DI */ +#define DstMem64(61) /* 64bit memory operand */ #define DstMask (71) /* Source operand type. */ #define SrcNone (04) /* No source operand. */ @@ -360,7 +361,7 @@ static u32 group_table[] = { DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock, [Group9*8] = - 0, ImplicitOps | ModRM | Lock, 0, 0, 0, 0, 0, 0, + 0, DstMem64 | ModRM | Lock, 0, 0, 0, 0, 0, 0, }; static u32 group2_table[] = { @@ -1205,6 +1206,7 @@ done_prefixes: c-twobyte (c-b == 0xb6 || c-b == 0xb7)); break; case DstMem: + case DstMem64: if ((c-d ModRM) c-modrm_mod == 3) { c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; c-dst.type = OP_REG; @@ -1214,7 +1216,10 @@ done_prefixes: } c-dst.type = OP_MEM; c-dst.ptr = (unsigned long *)c-modrm_ea; - c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; + if ((c-d DstMask) == DstMem64) + c-dst.bytes = 8; + else + c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; c-dst.val = 0; if (c-d BitOp) { unsigned long mask = ~(c-dst.bytes * 8 - 1); @@ -1706,12 +1711,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; - u64 old, new; - int rc; - - rc = ops-read_emulated(c-modrm_ea, old, 8, ctxt-vcpu); - if (rc != X86EMUL_CONTINUE) - return rc; + u64 old = c-dst.orig_val; if (((u32) (old 0) != (u32) c-regs[VCPU_REGS_RAX]) || ((u32) (old 32) != (u32) c-regs[VCPU_REGS_RDX])) { @@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, c-regs[VCPU_REGS_RAX] = (u32) (old 0); c-regs[VCPU_REGS_RDX] = (u32) (old 32); ctxt-eflags = ~EFLG_ZF; - } else { - new = ((u64)c-regs[VCPU_REGS_RCX] 32) | + c-dst.val = ((u64)c-regs[VCPU_REGS_RCX] 32) | (u32) c-regs[VCPU_REGS_RBX]; - rc = ops-cmpxchg_emulated(c-modrm_ea, old, new, 8, ctxt-vcpu); - if (rc != X86EMUL_CONTINUE) - return rc; ctxt-eflags |= EFLG_ZF; + c-lock_prefix = 1; } return X86EMUL_CONTINUE; } @@ -3241,7 +3238,6 @@ twobyte_insn: rc = emulate_grp9(ctxt, ops); if (rc != X86EMUL_CONTINUE) goto done; - c-dst.type = OP_NONE; break; } goto writeback; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.
Make sure that rflags is committed only after successful instruction emulation. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c |1 + arch/x86/kvm/x86.c |8 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b5e12c5..a1319c8 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -136,6 +136,7 @@ struct x86_emulate_ops { ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); + void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 266576c..c1aa983 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2968,6 +2968,7 @@ writeback: /* Commit shadow register state. */ memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); kvm_rip_write(ctxt-vcpu, c-eip); + ops-set_rflags(ctxt-vcpu, ctxt-eflags); done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bb9a24a..3fa70b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3643,6 +3643,11 @@ static void emulator_set_segment_selector(u16 sel, int seg, kvm_set_segment(vcpu, kvm_seg, seg); } +static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + kvm_x86_ops-set_rflags(vcpu, rflags); +} + static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .write_std = kvm_write_guest_virt_system, @@ -3660,6 +3665,7 @@ static struct x86_emulate_ops emulate_ops = { .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, .cpl = emulator_get_cpl, + .set_rflags = emulator_set_rflags, }; static void cache_all_regs(struct kvm_vcpu *vcpu) @@ -3780,8 +3786,6 @@ restart: return EMULATE_DO_MMIO; } - kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); - if (vcpu-mmio_is_write) { vcpu-mmio_needed = 0; return EMULATE_DO_MMIO; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.
Decode CMPXCHG8B destination operand in decoding stage. Fixes regression introduced by If LOCK prefix is used dest arg should be memory commit. This commit relies on dst operand be decoded at the beginning of an instruction emulation. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c | 24 ++-- 1 files changed, 10 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c1aa983..904351e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -52,6 +52,7 @@ #define DstMem (31) /* Memory operand. */ #define DstAcc (41) /* Destination Accumulator */ #define DstDI (51) /* Destination is in ES:(E)DI */ +#define DstMem64(61) /* 64bit memory operand */ #define DstMask (71) /* Source operand type. */ #define SrcNone (04) /* No source operand. */ @@ -360,7 +361,7 @@ static u32 group_table[] = { DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock, [Group9*8] = - 0, ImplicitOps | ModRM | Lock, 0, 0, 0, 0, 0, 0, + 0, DstMem64 | ModRM | Lock, 0, 0, 0, 0, 0, 0, }; static u32 group2_table[] = { @@ -1205,6 +1206,7 @@ done_prefixes: c-twobyte (c-b == 0xb6 || c-b == 0xb7)); break; case DstMem: + case DstMem64: if ((c-d ModRM) c-modrm_mod == 3) { c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; c-dst.type = OP_REG; @@ -1214,7 +1216,10 @@ done_prefixes: } c-dst.type = OP_MEM; c-dst.ptr = (unsigned long *)c-modrm_ea; - c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; + if ((c-d DstMask) == DstMem64) + c-dst.bytes = 8; + else + c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; c-dst.val = 0; if (c-d BitOp) { unsigned long mask = ~(c-dst.bytes * 8 - 1); @@ -1706,12 +1711,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; - u64 old, new; - int rc; - - rc = ops-read_emulated(c-modrm_ea, old, 8, ctxt-vcpu); - if (rc != X86EMUL_CONTINUE) - return rc; + u64 old = c-dst.orig_val; if (((u32) (old 0) != (u32) c-regs[VCPU_REGS_RAX]) || ((u32) (old 32) != (u32) c-regs[VCPU_REGS_RDX])) { @@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, c-regs[VCPU_REGS_RAX] = (u32) (old 0); c-regs[VCPU_REGS_RDX] = (u32) (old 32); ctxt-eflags = ~EFLG_ZF; - } else { - new = ((u64)c-regs[VCPU_REGS_RCX] 32) | + c-dst.val = ((u64)c-regs[VCPU_REGS_RCX] 32) | (u32) c-regs[VCPU_REGS_RBX]; - rc = ops-cmpxchg_emulated(c-modrm_ea, old, new, 8, ctxt-vcpu); - if (rc != X86EMUL_CONTINUE) - return rc; ctxt-eflags |= EFLG_ZF; + c-lock_prefix = 1; } return X86EMUL_CONTINUE; } @@ -3241,7 +3238,6 @@ twobyte_insn: rc = emulate_grp9(ctxt, ops); if (rc != X86EMUL_CONTINUE) goto done; - c-dst.type = OP_NONE; break; } goto writeback; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.
Wrong To: header. Ignore please. On Sun, Mar 21, 2010 at 01:06:02PM +0200, Gleb Natapov wrote: Make sure that rflags is committed only after successful instruction emulation. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c |1 + arch/x86/kvm/x86.c |8 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b5e12c5..a1319c8 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -136,6 +136,7 @@ struct x86_emulate_ops { ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); + void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 266576c..c1aa983 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2968,6 +2968,7 @@ writeback: /* Commit shadow register state. */ memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); kvm_rip_write(ctxt-vcpu, c-eip); + ops-set_rflags(ctxt-vcpu, ctxt-eflags); done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bb9a24a..3fa70b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3643,6 +3643,11 @@ static void emulator_set_segment_selector(u16 sel, int seg, kvm_set_segment(vcpu, kvm_seg, seg); } +static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + kvm_x86_ops-set_rflags(vcpu, rflags); +} + static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .write_std = kvm_write_guest_virt_system, @@ -3660,6 +3665,7 @@ static struct x86_emulate_ops emulate_ops = { .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, .cpl = emulator_get_cpl, + .set_rflags = emulator_set_rflags, }; static void cache_all_regs(struct kvm_vcpu *vcpu) @@ -3780,8 +3786,6 @@ restart: return EMULATE_DO_MMIO; } - kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); - if (vcpu-mmio_is_write) { vcpu-mmio_needed = 0; return EMULATE_DO_MMIO; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tracking KVM development
Hey all, I've recently started testing KVM as a possible virtualization solution for a bunch of servers, and so far things are going pretty well. My OS of choice is Slackware, and I usually just go with whatever kernel Slackware comes with. But with KVM I feel I might need to pay a bit more attention to that part of Slackware, as it appears to a be a project in rapid development, so my questions concern how best to track and keep KVM up-to-date? Currently I upgrade to the latest stable kernel almost as soon as its been released by Linus, and I track qemu-kvm using this Git repository: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git But should I perhaps also track the KVM modules, and if so, from where? Any and all suggestions to keeping a healthy and stable KVM setup running is more than welcome. :o) /Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Time and KVM - best practices
Hey, What is considered best practice when running a KVM host with a mixture of Linux and Windows guests? Currently I have ntpd running on the host, and I start my guests using -rtc base=localhost,clock=host, with an extra -tdf added for Windows guests, just to keep their clock from drifting madly during load. But with this setup, all my guests are constantly 1-2 seconds behind the host. I can live with that for the Windows guests, as they are not running anything that depends heavily on the time being set perfect, but for some of the Linux guests it's an issue. Would I be better of using ntpd and -rtc base=localhost,clock=vm for all the Linux guests, or is there some other magic way of ensuring that the clock is perfectly in sync with the host? Perhaps there are some kernel configuration I can do to optimize the host for KVM? I'm currently using QEMU PC emulator version 0.12.50 (qemu-kvm-devel) because version 0.12.30 did not work well at all with Windows guests, and the kernel in both host and Linux guests is 2.6.33.1 :o) /Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On Sun, Mar 21, 2010 at 12:29:31PM +0200, Avi Kivity wrote: On 03/21/2010 12:15 PM, Michael S. Tsirkin wrote: Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. Increase it to 200, then. OK. I think we'll also need a smarter allocator than bus-dev_count++ than we now have. Right? No, why? We'll run into problems if devices are created/removed in random order, won't we? Eventually we'll want faster scanning than the linear search we employ now, though. Yes I suspect with 200 entries we will :). Let's just make it 16 for now? Is the limit visible to userspace? If not, we need to expose it. I don't think it's visible: it seems to be used in a single place in kvm. Let's add an ioctl? Note that qemu doesn't need it now ... We usually expose limits via KVM_CHECK_EXTENSION(KVM_CAP_BLAH). We can expose it via KVM_CAP_IOEVENTFD (and need to reserve iodev entries for those). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to create more than 1 guest virtio-net device using vhost-net backend
On 03/21/2010 01:34 PM, Michael S. Tsirkin wrote: On Sun, Mar 21, 2010 at 12:29:31PM +0200, Avi Kivity wrote: On 03/21/2010 12:15 PM, Michael S. Tsirkin wrote: Nothing easy that I can see. Each device needs 2 of these. Avi, Gleb, any objections to increasing the limit to say 16? That would give us 5 more devices to the limit of 6 per guest. Increase it to 200, then. OK. I think we'll also need a smarter allocator than bus-dev_count++ than we now have. Right? No, why? We'll run into problems if devices are created/removed in random order, won't we? unregister_dev() takes care of it. Eventually we'll want faster scanning than the linear search we employ now, though. Yes I suspect with 200 entries we will :). Let's just make it 16 for now? Let's make it 200 and fix the performance problems later. Making it 16 is just asking for trouble. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Sun, Mar 21, 2010 at 12:09:00PM +0200, Avi Kivity wrote: On 03/21/2010 02:13 AM, Sebastian Hetze wrote: Hi *, in an 6 CPU SMP guest running on an host with 2 quad core Intel Xeon E5520 with hyperthrading enabled we see one or more guest CPUs working in a very strange pattern. It looks like all or nothing. We can easily identify the effected CPU with xosview. Here is the mpstat output compared to one regular working CPU: mpstat -P 4 1 Linux 2.6.31-16-generic-pae (guest) 21.03.2010 _i686_ (6 CPU) 00:45:19 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 00:45:20 40,00 100,000,000,000,000,000,00 0,000,00 00:45:21 40,00 100,000,000,000,000,000,00 0,000,00 00:45:22 40,00 100,000,000,000,000,000,00 0,000,00 00:45:23 40,00 100,000,000,000,000,000,00 0,000,00 00:45:24 40,00 66,670,000,000,00 33,330,00 0,000,00 00:45:25 40,00 100,000,000,000,000,000,00 0,000,00 00:45:26 40,00 100,000,000,000,000,000,00 0,000,00 Looks like the guest is only receiving 3-4 timer interrupts per second, so time becomes quantized. Please run the attached irqtop in the affected guest and report the results. Is the host overly busy? What host kernel, kvm, and qemu are you running? Is the guest running an I/O workload? if so, how are the disks The host is not busy at all. In fact, currently it is running only one guest. The host is running an ubuntu 2.6.31-14-server kernel. qemu-kvm is 0.12.2-0ubuntu6. The kvm module has srcversion: 82D6B673524596F9CF3E84C as stated by modinfo. The guest occasionally is running IO workload. However, the effect is visible all the time. And it is only one out of 6 CPUs the very same guest is running. This is the output on the guest for all CPUs: mpstat -P ALL 1 12:45:59 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:00 all0,409,742,395,370,803,980,00 0,00 77,34 12:46:00 01,005,006,003,001,009,000,00 0,00 75,00 12:46:00 10,00 23,002,00 10,000,000,000,00 0,00 65,00 12:46:00 20,005,940,996,930,001,980,00 0,00 84,16 12:46:00 30,008,002,005,002,009,000,00 0,00 74,00 12:46:00 40,00 33,330,000,000,000,000,00 0,00 66,67 12:46:00 50,005,940,003,960,000,990,00 0,00 89,11 12:46:00 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:01 all0,605,813,21 24,450,403,610,00 0,00 61,92 12:46:01 01,014,047,07 31,311,016,060,00 0,00 49,49 12:46:01 10,005,002,00 19,000,002,000,00 0,00 72,00 12:46:01 20,997,921,98 35,640,002,970,00 0,00 50,50 12:46:01 31,984,952,97 13,860,006,930,00 0,00 69,31 12:46:01 40,00 33,330,000,000,000,000,00 0,00 66,67 12:46:01 50,008,083,03 22,220,001,010,00 0,00 65,66 12:46:01 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:02 all2,38 12,70 17,06 14,680,601,980,00 0,00 50,60 12:46:02 03,96 15,849,90 13,860,002,970,00 0,00 53,47 12:46:02 12,976,935,94 19,802,972,970,00 0,00 58,42 12:46:02 22,02 17,178,08 18,182,021,010,00 0,00 51,52 12:46:02 32,02 10,108,08 14,140,002,020,00 0,00 63,64 12:46:02 40,000,000,000,000,000,000,00 0,00 100,00 12:46:02 50,00 13,00 55,006,000,001,000,00 0,00 25,00 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour.
hi, may I ask some help on the paravirtualization of KVM?
I want to set up the virtio-net for the GuestOS on KVM. Following is my steps: 1.Compile the kvm-88 and make, make install. 2.Compile the GuestOS(redhat) with kernel version 2.6.27.45(with virtio support). The required option are all selected. o CONFIG_VIRTIO_PCI=y (Virtualization - PCI driver for virtio devices) o CONFIG_VIRTIO_BALLOON=y (Virtualization - Virtio balloon driver) o CONFIG_VIRTIO_BLK=y (Device Drivers - Block - Virtio block driver) o CONFIG_VIRTIO_NET=y (Device Drivers - Network device support - Virtio network driver) o CONFIG_VIRTIO=y (automatically selected) o CONFIG_VIRTIO_RING=y (automatically selected) 3.Then start up the GuestOS by such command: x86_64-softmmu/qemu-system-x86_64 -m 1024 /root/redhat.img -net nic,model=virtio -net tap,script=/etc/kvm/qemu-ifup 4.Result is this: * The Guest OS start up. * But the network not, no eth-X device found. * lsmod | grep virtio get none module about virtio Then why the virtio_net not show up in the GuestOS? Is there any wrongs on my each steps? or lacking some settings? I have referred the page http://www.linux-kvm.org/page/Virtio, but not found any special requirement. Does anyone have some tips? Thanks in advance. -- BestRegards. YangLiang _ Department of Computer Science . School of Electronics Engineering Computer Science . _ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On 03/21/2010 02:02 PM, Sebastian Hetze wrote: 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. Can you adjust irqtop to only count cpu4? or even just post a few 'cat /proc/interrupts' from that guest. Most likely the timer interrupt for cpu4 died. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracking KVM development
On 03/21/2010 01:21 PM, Thomas Løcke wrote: Hey all, I've recently started testing KVM as a possible virtualization solution for a bunch of servers, and so far things are going pretty well. My OS of choice is Slackware, and I usually just go with whatever kernel Slackware comes with. But with KVM I feel I might need to pay a bit more attention to that part of Slackware, as it appears to a be a project in rapid development, so my questions concern how best to track and keep KVM up-to-date? Currently I upgrade to the latest stable kernel almost as soon as its been released by Linus, and I track qemu-kvm using this Git repository: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git But should I perhaps also track the KVM modules, and if so, from where? Any and all suggestions to keeping a healthy and stable KVM setup running is more than welcome. Tracking git repositories and stable setups are mutually exclusive. If you are interested in something stable I recommend staying with the distribution provided setup (and picking a distribution that has an emphasis on kvm). If you want to track upstream, use qemu-kvm-0.12.x stable releases and kernel.org 2.6.x.y stable releases. If you want to track git repositories, use qemu-kvm.git and kvm.git for the kernel and kvm. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Drop KVM_REQ_PENDING_TIMER
On 03/20/2010 05:20 AM, Xiao Wang wrote: The pending timer is not detected through KVM_REQ_PENDING_TIMER now. It does, see the commit message of 06e056456. Marcelo, IIRC this is the second time time we get this patch... we need either a comment in the code, or better, a fix that doesn't involve an atomic in the fast path. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On Thu, Mar 18, 2010 at 05:13:10PM +0100, Ingo Molnar wrote: Why does Linux AIO still suck? Why do we not have a proper interface in userspace for doing asynchronous file system operations? Good that you mention it, i think it's an excellent example. The suckage of kernel async IO is for similar reasons: there's an ugly package separation problem between the kernel and between glibc - and between the apps that would make use of it. No, kernel async IO sucks because it still does not play well with buffered I/O. Last time I checked (about a year ago or so), AIO syscall latencies were much worse when buffered I/O was used compared to direct I/O. Unfortunately, to achieve good performance with direct I/O, you need a HW RAID card with lots of on-board cache. Gabor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.
On 03/21/2010 01:09 PM, Gleb Natapov wrote: Wrong To: header. Ignore please. See sendemail.aliasesfile in 'git help send-email'. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Fix 32-bit build breakage due to typo
On 03/20/2010 11:14 AM, Jan Kiszka wrote: Obviously, the 64-bit case is considered stable now and 32 bit remained untested (not included in autotest?). We don't autotest on 32-bit hosts these days. So here is the build fix: Thanks, applied. Should have done it myself. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Fix a build error
On 03/20/2010 07:17 PM, Amos Kong wrote: arch/x86/kvm/x86.c: In function ‘emulator_cmpxchg_emulated’: arch/x86/kvm/x86.c:3367: error: ‘u’ undeclared (first use in this function) arch/x86/kvm/x86.c:3367: error: (Each undeclared identifier is reported only once arch/x86/kvm/x86.c:3367: error: for each function it appears in.) arch/x86/kvm/x86.c:3367: error: expected expression before ‘)’ token Thanks, just applied same patch from Jan. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.
On 03/21/2010 04:35 PM, Gleb Natapov wrote: On Sun, Mar 21, 2010 at 04:32:42PM +0200, Avi Kivity wrote: On 03/21/2010 01:09 PM, Gleb Natapov wrote: Wrong To: header. Ignore please. See sendemail.aliasesfile in 'git help send-email'. I use alisesfile, but unfortunately if alias is not found there git does not complain, just pass it as is to sendmail and sendmail adds part after @ by itself. Ah. Then don't use sendmail. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.
On 03/21/2010 01:08 PM, Gleb Natapov wrote: Decode CMPXCHG8B destination operand in decoding stage. Fixes regression introduced by If LOCK prefix is used dest arg should be memory commit. This commit relies on dst operand be decoded at the beginning of an instruction emulation. @@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, c-regs[VCPU_REGS_RAX] = (u32) (old 0); c-regs[VCPU_REGS_RDX] = (u32) (old 32); ctxt-eflags= ~EFLG_ZF; - } else { - new = ((u64)c-regs[VCPU_REGS_RCX] 32) | + c-dst.val = ((u64)c-regs[VCPU_REGS_RCX] 32) | (u32) c-regs[VCPU_REGS_RBX]; - rc = ops-cmpxchg_emulated(c-modrm_ea,old,new, 8, ctxt-vcpu); - if (rc != X86EMUL_CONTINUE) - return rc; ctxt-eflags |= EFLG_ZF; + c-lock_prefix = 1; Why is this bit needed? cmpxchg64b without lock is valid and racy, but the guest may know it is safe. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.
On Sun, Mar 21, 2010 at 04:41:24PM +0200, Avi Kivity wrote: On 03/21/2010 01:08 PM, Gleb Natapov wrote: Decode CMPXCHG8B destination operand in decoding stage. Fixes regression introduced by If LOCK prefix is used dest arg should be memory commit. This commit relies on dst operand be decoded at the beginning of an instruction emulation. @@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, c-regs[VCPU_REGS_RAX] = (u32) (old 0); c-regs[VCPU_REGS_RDX] = (u32) (old 32); ctxt-eflags= ~EFLG_ZF; - } else { -new = ((u64)c-regs[VCPU_REGS_RCX] 32) | +c-dst.val = ((u64)c-regs[VCPU_REGS_RCX] 32) | (u32) c-regs[VCPU_REGS_RBX]; -rc = ops-cmpxchg_emulated(c-modrm_ea,old,new, 8, ctxt-vcpu); -if (rc != X86EMUL_CONTINUE) -return rc; ctxt-eflags |= EFLG_ZF; +c-lock_prefix = 1; Why is this bit needed? cmpxchg64b without lock is valid and racy, but the guest may know it is safe. Agree. Before this patch cmpxchg8b emulation always called cmpxchg_emulated(), so to be extra careful I wanted to preserve old behaviour. Resend the patch without this line? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.
On 03/21/2010 04:44 PM, Gleb Natapov wrote: On Sun, Mar 21, 2010 at 04:41:24PM +0200, Avi Kivity wrote: On 03/21/2010 01:08 PM, Gleb Natapov wrote: Decode CMPXCHG8B destination operand in decoding stage. Fixes regression introduced by If LOCK prefix is used dest arg should be memory commit. This commit relies on dst operand be decoded at the beginning of an instruction emulation. @@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, c-regs[VCPU_REGS_RAX] = (u32) (old 0); c-regs[VCPU_REGS_RDX] = (u32) (old 32); ctxt-eflags= ~EFLG_ZF; - } else { - new = ((u64)c-regs[VCPU_REGS_RCX] 32) | + c-dst.val = ((u64)c-regs[VCPU_REGS_RCX] 32) | (u32) c-regs[VCPU_REGS_RBX]; - rc = ops-cmpxchg_emulated(c-modrm_ea,old,new, 8, ctxt-vcpu); - if (rc != X86EMUL_CONTINUE) - return rc; ctxt-eflags |= EFLG_ZF; + c-lock_prefix = 1; Why is this bit needed? cmpxchg64b without lock is valid and racy, but the guest may know it is safe. Agree. Before this patch cmpxchg8b emulation always called cmpxchg_emulated(), so to be extra careful I wanted to preserve old behaviour. Resend the patch without this line? Better a 3/2 that removes it. So we have a large patch that just transforms code, and a small patch that corrects an earlier bug. May help a bisector one day. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: On 03/21/2010 02:02 PM, Sebastian Hetze wrote: 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. Can you adjust irqtop to only count cpu4? or even just post a few 'cat /proc/interrupts' from that guest. Most likely the timer interrupt for cpu4 died. I've added two keys +/- to your irqtop to focus up and down in the row of available CPUs. The irqtop for CPU4 shows a constant number of 6 local timer interrupts per update, while the other CPUs show various higher values: irqtop for cpu 4 eth0 188 Rescheduling interrupts 162 Local timer interrupts 6 ata_piix3 TLB shootdowns 1 Spurious interrupts 0 Machine check exceptions0 irqtop for cpu 5 eth0 257 Local timer interrupts251 Rescheduling interrupts 237 Spurious interrupts 0 Machine check exceptions0 So the timer interrupt for cpu4 is not completely dead but somehow broken. What can cause this problem? Any way to speed it up again? #!/usr/bin/python import curses import sys, os, time, optparse def read_interrupts(): global target irq = {} proc = file('/proc/interrupts') nrcpu = len(proc.readline().split()) if target 0: target = 0; if target nrcpu: target = nrcpu for line in proc.readlines(): vec, data = line.strip().split(':', 1) if vec in ('ERR', 'MIS'): continue counts = data.split(None, nrcpu) counts, rest = (counts[:-1], counts[-1]) if target == 0: count = sum([int(x) for x in counts]) else: count = int(counts[target-1]) try: v = int(vec) name = rest.split(None, 1)[1] except: name = rest irq[name] = count return irq def delta_interrupts(): old = read_interrupts() while True: irq = read_interrupts() delta = {} for key in irq.keys(): delta[key] = irq[key] - old[key] yield delta old = irq target = 0 label_width = 35 number_width = 10 def tui(screen): curses.use_default_colors() global target curses.noecho() def getcount(x): return x[1] def refresh(irq): screen.erase() if target 0: title = irqtop for cpu %d%(target-1) else: title = irqtop sum for all cpu's screen.addstr(0, 0, title) row = 2 for name, count in sorted(irq.items(), key = getcount, reverse = True): if row = screen.getmaxyx()[0]: break col = 1 screen.addstr(row, col, name) col += label_width screen.addstr(row, col, '%10d' % (count,)) row += 1 screen.refresh() for irqs in delta_interrupts(): refresh(irqs) curses.halfdelay(10) try: c = screen.getkey() if c == 'q': break if c == '+': target = target+1 if c == '-': target = target-1 except KeyboardInterrupt: break except curses.error: continue import curses.wrapper curses.wrapper(tui)
[PATCH] KVM: x86 emulator: fix unlocked CMPXCHG8B emulation.
When CMPXCHG8B is executed without LOCK prefix it is racy. Preserve this behaviour in emulator too. Signed-off-by: Gleb Natapov g...@redhat.com --- This patch goes on top of my previous KVM: x86 emulator: add decoding of CMPXCHG8B dst operand. patch. diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 904351e..e2bbb9c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1724,7 +1724,6 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, (u32) c-regs[VCPU_REGS_RBX]; ctxt-eflags |= EFLG_ZF; - c-lock_prefix = 1; } return X86EMUL_CONTINUE; } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On 03/21/2010 04:55 PM, Sebastian Hetze wrote: On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: On 03/21/2010 02:02 PM, Sebastian Hetze wrote: 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. Can you adjust irqtop to only count cpu4? or even just post a few 'cat /proc/interrupts' from that guest. Most likely the timer interrupt for cpu4 died. I've added two keys +/- to your irqtop to focus up and down in the row of available CPUs. The irqtop for CPU4 shows a constant number of 6 local timer interrupts per update, while the other CPUs show various higher values: irqtop for cpu 4 eth0 188 Rescheduling interrupts 162 Local timer interrupts 6 ata_piix3 TLB shootdowns 1 Spurious interrupts 0 Machine check exceptions0 irqtop for cpu 5 eth0 257 Local timer interrupts251 Rescheduling interrupts 237 Spurious interrupts 0 Machine check exceptions0 So the timer interrupt for cpu4 is not completely dead but somehow broken. That is incredibly weird. What can cause this problem? Any way to speed it up again? The host has 8 cpus and is only running this 6 vcpu guest, yes? Can you confirm the other vcpus are ticking at 250 Hz? What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a last-used-cpu field in the display. Marcelo, any ideas? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote: On 03/21/2010 04:55 PM, Sebastian Hetze wrote: On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: On 03/21/2010 02:02 PM, Sebastian Hetze wrote: 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. Can you adjust irqtop to only count cpu4? or even just post a few 'cat /proc/interrupts' from that guest. Most likely the timer interrupt for cpu4 died. I've added two keys +/- to your irqtop to focus up and down in the row of available CPUs. The irqtop for CPU4 shows a constant number of 6 local timer interrupts per update, while the other CPUs show various higher values: irqtop for cpu 4 eth0 188 Rescheduling interrupts 162 Local timer interrupts 6 ata_piix3 TLB shootdowns 1 Spurious interrupts 0 Machine check exceptions0 irqtop for cpu 5 eth0 257 Local timer interrupts251 Rescheduling interrupts 237 Spurious interrupts 0 Machine check exceptions0 So the timer interrupt for cpu4 is not completely dead but somehow broken. That is incredibly weird. What can cause this problem? Any way to speed it up again? The host has 8 cpus and is only running this 6 vcpu guest, yes? The host is an dual quad core E5520 with hyperthrading enabled, so we see 2x4x2=16 CPUs on the host. The guest is started with 6 CPUs. Can you confirm the other vcpus are ticking at 250 Hz? The irqtop shows different numbers for local timer interrupts on the other CPUs. The total number (summed up over all CPUs) varies between something like 700 and 1400. Any CPU can be down to 10 and next update up to 260. Only CPU4 stays at the 6 local timer interrupts. What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a last-used-cpu field in the display. The processes are not bound to a particular CPU, so the picture varies. Here are two shots: take1: 15 root RT -5 000 S0 0.0 0:01.70 4 migration/4 16 root 15 -5 000 S0 0.0 0:00.08 4 ksoftirqd/4 17 root RT -5 000 S0 0.0 0:00.00 4 watchdog/4 25 root 15 -5 000 S0 0.0 0:00.01 4 events/4 35 root 15 -5 000 S0 0.0 0:00.00 4 kintegrityd/4 41 root 15 -5 000 S0 0.0 0:00.03 4 kblockd/4 50 root 15 -5 000 S0 0.0 0:00.90 4 ata/4 55 root 15 -5 000 S0 0.0 0:00.00 4 kseriod 66 root 15 -5 000 S0 0.0 0:00.00 4 aio/4 73 root 15 -5 000 S0 0.0 0:00.00 4 crypto/4 80 root 15 -5 000 S0 0.0 2:11.71 4 scsi_eh_1 87 root 15 -5 000 S0 0.0 0:00.00 4 kmpathd/4 95 root 15 -5 000 S0 0.0 0:00.00 4 kondemand/4 101 root 15 -5 000 S0 0.0 0:00.00 4 kconservative/4 103 root 10 -10 000 S0 0.0 0:00.00 4 krfcommd 681 root 15 -5 000 S0 0.0 0:00.00 4 kdmflush 686 root 15 -5 000 S0 0.0 0:00.00 4 kdmflush 691 root 15 -5 000 S0 0.0 0:00.00 4 kdmflush 737 root 15 -5 000 S0 0.0 0:00.71 4 kjournald 826 root 16 -4 2100 452 312 S0 0.0 0:00.14 4 udevd 1350 root 15 -5 000 S0 0.0 0:00.00 4 kpsmoused 1444 root 15 -5 000 S0 0.0 0:00.00 4 kgameportd 1718 root 15 -5 000 S0 0.0 0:14.62 4 kjournald 2108 statd 20 0 2252 1152 760 S0 0.0 0:02.66 4 rpc.statd 2117 root 15 -5 000 S0 0.0 0:00.36 4 rpciod/4 2123 root 15 -5 000 S0 0.0 0:06.61 4 nfsiod 2259 root 20 0 1696 444 440 S0 0.0 0:00.00 4 getty 2265 root 20 0 1696 444 440 S0 0.0 0:00.00 4
Re: Tracking KVM development
On Sun, Mar 21, 2010 at 1:23 PM, Avi Kivity a...@redhat.com wrote: Tracking git repositories and stable setups are mutually exclusive. If you are interested in something stable I recommend staying with the distribution provided setup (and picking a distribution that has an emphasis on kvm). If you want to track upstream, use qemu-kvm-0.12.x stable releases and kernel.org 2.6.x.y stable releases. If you want to track git repositories, use qemu-kvm.git and kvm.git for the kernel and kvm. Thanks Avi. I will stay with the stable qemu-kvm releases and stable kernel.org kernel releases from now on. I've never heard of any KVM specific distributions. Are you aware of any? My primary reason for going with Slackware, is because I already know it. But if there are better choices for a KVM virtualization host, then I'm willing to switch. :o) /Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracking KVM development
On 03/21/2010 06:37 PM, Thomas Løcke wrote: On Sun, Mar 21, 2010 at 1:23 PM, Avi Kivitya...@redhat.com wrote: Tracking git repositories and stable setups are mutually exclusive. If you are interested in something stable I recommend staying with the distribution provided setup (and picking a distribution that has an emphasis on kvm). If you want to track upstream, use qemu-kvm-0.12.x stable releases and kernel.org 2.6.x.y stable releases. If you want to track git repositories, use qemu-kvm.git and kvm.git for the kernel and kvm. Thanks Avi. I will stay with the stable qemu-kvm releases and stable kernel.org kernel releases from now on. I've never heard of any KVM specific distributions. Are you aware of any? My primary reason for going with Slackware, is because I already know it. But if there are better choices for a KVM virtualization host, then I'm willing to switch. The only kvm-specific distribution I know of is RHEV-H, but that's probably not what you're looking for. I'm talking about distributions that have an active kvm package maintainer, update the packages regularly, have bug trackers that someone looks into, etc. At least Fedora and Ubuntu do this, perhaps openSuSE as well (though the latter has a stronger Xen emphasis). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unexpected exit_ini_info when nesting svm
Hello Oliver, On Thu, Mar 18, 2010 at 08:43:53PM +0100, Olivier Berghmans wrote: I tried nesting kvm in kvm on an AMD processor with support for svm and npt (the dmesg told me both were in use). I managed to install the nested kvm and when starting the L2 guest in order to install an operating system, I got following messages in the L1 guest: [ 2016.712047] handle_exit: unexpected exit_ini_info 0x8008 exit_code 0x60 [ 2031.432032] handle_exit: unexpected exit_ini_info 0x8008 exit_code 0x60 [ 2034.468058] handle_exit: unexpected exit_ini_info 0x8008 exit_code 0x60 These messages result from a difference between a real hardware svm and the emulated svm from kvm. Hardware SVM always injects an exception first before it does an #vmexit(0x60) while the svm emulation does immediatlt #vmexit again. I have a patch to fix this but it needs more testing. The patch implements detection of the above situation and sends an self-ipi in this case. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
* oerg Roedel j...@8bytes.org wrote: On Fri, Mar 19, 2010 at 09:21:22AM +0100, Ingo Molnar wrote: Unfortunately, in a previous thread the Qemu maintainer has indicated that he will essentially NAK any attempt to enhance Qemu to provide an easily discoverable, self-contained, transparent guest mount on the host side. No technical justification was given for that NAK, despite my repeated requests to particulate the exact security problems that such an approach would cause. If that NAK does not stand in that form then i'd like to know about it - it makes no sense for us to try to code up a solution against a standing maintainer NAK ... I still think it is the best and most generic way to let the guest do the symbol resolution. [...] Not really. [...] This has several advantages: 1. The guest knows best about its symbol space. So this would be extensible to other guest operating systems. A brave developer may even implement symbol passing for Windows or the BSDs ;-) Having access to the actual executable files that include the symbols achieves precisely that - with the additional robustness that all this functionality is concentrated into the host, while the guest side is kept minimal (and transparent). 2. The guest can decide for its own if it want to pass this inforamtion to the host-perf. No security issues at all. It can decide whether it exposes the files. Nor are there any security issues to begin with. 3. The guest can also pass us the call-chain and we don't need to care about complicated of fetching from the guest ourself. You need to be aware of the fact that symbol resolution is a separate step from call chain generation. I.e. call-chains are a (entirely) separate issue, and could reasonably be done in the guest or in the host. It has no bearing on this symbol resolution question. 4. This way extensible to nested virtualization too. Nested virtualization is actually already taken care of by the filesystem solution via an existing method called 'subdirectories'. If the guest offers sub-guests then those symbols will be exposed in a similar way via its own 'guest files' directory hierarchy. I.e. if we have 'Guest-2' nested inside 'the 'Guest-Fedora-1' instance, we get: /guests/ /guests/Guest-Fedora-1/etc/ /guests/Guest-Fedora-1/usr/ we'd also have: /guests/Guest-Fedora-1/guests/Guest-2/ So this is taken care of automatically. I.e. none of the four 'advantages' listed here are actually advantages over my proposed solution, so your conclusion is subsequently flawed as well. How we speak to the guest was already discussed in this thread. My personal opinion is that going through qemu is an unnecessary step and we can solve that more clever and transparent for perf. Meaning exactly what? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
* Avi Kivity a...@redhat.com wrote: [...] Second, from my point of view all contributors are volunteers (perhaps their employer volunteered them, but there's no difference from my perspective). Asking them to repaint my apartment as a condition to get a patch applied is abuse. If a patch is good, it gets applied. This is one of the weirdest arguments i've seen in this thread. Almost all the time do we make contributions conditional on the general shape of the project. Developers dont get to do just the fun stuff. So, do you think a reply to a patch along the lines of NAK. Improving scalability is pointless while we don't have a decent GUI. I'll review you RCU patches _after_ you've contributed a usable GUI. ? What does this have to do with RCU? I'm talking about KVM, which is a Linux kernel feature that is useless without a proper, KVM-specific app making use of it. RCU is a general kernel performance feature that works across the board. It helps KVM indirectly, and it helps many other kernel subsystems as well. It needs no user-space tool to be useful. KVM on the other hand is useless without a user-space tool. [ Theoretically you might have a fair point if it were a critical feature of RCU for it to have a GUI, and if the main tool that made use of it sucked. But it isnt and you should know that. ] Had you suggested the following 'NAK', applied to a different, relevant subsystem: | NAK. Improving scalability is pointless while we don't have a usable | tool. I'll review you perf patches _after_ you've contributed a usable | tool. you would have a fair point. In fact, we are doing that we are living by that. It makes absolutely zero sense to improve the scalability of perf if its usability sucks. So where you are trying to point out an inconsistency in my argument there is none. This is a basic quid pro quo: new features introduce risks and create additional workload not just to the originating developer but on the rest of the community as well. You should check how Linus has pulled new features in the past 15 years: he very much requires the existing code to first be top-notch before he accepts new features for a given area of functionality. For a given area, yes. [...] That is my precise point. KVM is a specific subsystem or area that makes no sense without the user-space tooling it relates to. You seem to argue that you have no 'right' to insist on good quality of that tooling - and IMO you are fundamentally wrong with that. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
* Anthony Liguori anth...@codemonkey.ws wrote: On 03/19/2010 03:53 AM, Ingo Molnar wrote: * Avi Kivitya...@redhat.com wrote: There were two negative reactions immediately, both showed a fundamental server versus desktop bias: - you did not accept that the most important usecase is when there is a single guest running. Well, it isn't. Erm, my usability points are _doubly_ true when there are multiple guests ... The inconvenience of having to type: perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ --guestmodules=/home/ymzhang/guest/modules top is very obvious even with a single guest. Now multiply that by more guests ... If you want to improve this, you need to do the following: 1) Add a userspace daemon that uses vmchannel that runs in the guest and can fetch kallsyms and arbitrary modules. If that daemon lives in tools/perf, that's fine. Adding any new daemon to an existing guest is a deployment and usability nightmare. The basic rule of good instrumentation is to be transparent. The moment we have to modify the user-space of a guest just to monitor it, the purpose of transparent instrumentation is defeated. That was one of the fundamental usability mistakes of Oprofile. There is no 'perf' daemon - all the perf functionality is _built in_, and for very good reasons. It is one of the main reasons for perf's success as well. Now Qemu is trying to repeat that stupid mistake ... So please either suggest a different transparent solution that is technically better than the one i suggested, or you should concede the point really. Please try think with the heads of our users and developers and dont suggest some weird ivory-tower design that is totally impractical ... And no, you have to code none of this, we'll do all the coding. The only thing we are asking is for you to not stand in the way of good usability ... Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Streaming Audio from Virtual Machine
I'm using Kubuntu 9.10 32-bit on a quad-core Phenom II with Gigabit ethernet. I want to stream audio from MLB.com from a WinXP client thru a Linksys WMB54G wireless music bridge. Note that there are drivers for the WMB54G only for WinXP and Vista. If I stream the audio thru a native WinXP box thru the WMB54G, all is well and the audio sounds fine. When I try to stream thru a WinXP virtual machine on Kubuntu 9.10, the audio is poor quality and subject to gaps and dropping the stream altogether. So far I've tried KVM/QEMU and VirtualBox, same result. Regards KVM/QEMU, I note AMD-V is activated in the BIOS, and I have a custom 2.6.32.7 kernel, and QEMU 0.11.0. The kvm kvm_amd modules are compiled in and loaded. I've been using bridged networking . I think it's set up correctly but I confess I'm no networking expert. My start command for the WinXP virtual machine is: sudo /usr/bin/qemu -m 1024 -boot c -netnic,vlan=0,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=/etc/qemu-ifup -localtime -soundhw ac97 -smp 4 -fda /dev/fd0 -vga std -usb /home/rbroman/windows.img I also tried model=virtio but that didn't help. I suspect this is a virtual machine networking problem but I'm not sure. So my questions are: -What's the best/fastest networking option and how do I set it up? Pointers to step-by-step instructions appreciated. -Is it possible I have a problem other than networking? Configuration problem with KVM/QEMU? Or could there be a problem with the WMB54G driver when used thru a virtual machine? -Is there a better virtual machine solution than KVM/QEMU for what I'm trying to do? Recommendations appreciated - Gus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On 03/22/2010 02:17 AM, Ingo Molnar wrote: * Anthony Liguorianth...@codemonkey.ws wrote: On 03/19/2010 03:53 AM, Ingo Molnar wrote: * Avi Kivitya...@redhat.com wrote: There were two negative reactions immediately, both showed a fundamental server versus desktop bias: - you did not accept that the most important usecase is when there is a single guest running. Well, it isn't. Erm, my usability points are _doubly_ true when there are multiple guests ... The inconvenience of having to type: perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ --guestmodules=/home/ymzhang/guest/modules top is very obvious even with a single guest. Now multiply that by more guests ... If you want to improve this, you need to do the following: 1) Add a userspace daemon that uses vmchannel that runs in the guest and can fetch kallsyms and arbitrary modules. If that daemon lives in tools/perf, that's fine. Adding any new daemon to an existing guest is a deployment and usability nightmare. Absolutely. In most cases it is not desirable, and you'll find that in a lot of cases it is not even possible - for non-technical reasons. One of the main benefits of virtualization is the ability to manage and see things from the outside. The basic rule of good instrumentation is to be transparent. The moment we have to modify the user-space of a guest just to monitor it, the purpose of transparent instrumentation is defeated. Not to mention Heisenbugs and interference. Cheers Antoine That was one of the fundamental usability mistakes of Oprofile. There is no 'perf' daemon - all the perf functionality is _built in_, and for very good reasons. It is one of the main reasons for perf's success as well. Now Qemu is trying to repeat that stupid mistake ... So please either suggest a different transparent solution that is technically better than the one i suggested, or you should concede the point really. Please try think with the heads of our users and developers and dont suggest some weird ivory-tower design that is totally impractical ... And no, you have to code none of this, we'll do all the coding. The only thing we are asking is for you to not stand in the way of good usability ... Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
* Antoine Martin anto...@nagafix.co.uk wrote: On 03/22/2010 02:17 AM, Ingo Molnar wrote: * Anthony Liguorianth...@codemonkey.ws wrote: On 03/19/2010 03:53 AM, Ingo Molnar wrote: * Avi Kivitya...@redhat.com wrote: There were two negative reactions immediately, both showed a fundamental server versus desktop bias: - you did not accept that the most important usecase is when there is a single guest running. Well, it isn't. Erm, my usability points are _doubly_ true when there are multiple guests ... The inconvenience of having to type: perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ --guestmodules=/home/ymzhang/guest/modules top is very obvious even with a single guest. Now multiply that by more guests ... If you want to improve this, you need to do the following: 1) Add a userspace daemon that uses vmchannel that runs in the guest and can fetch kallsyms and arbitrary modules. If that daemon lives in tools/perf, that's fine. Adding any new daemon to an existing guest is a deployment and usability nightmare. Absolutely. In most cases it is not desirable, and you'll find that in a lot of cases it is not even possible - for non-technical reasons. One of the main benefits of virtualization is the ability to manage and see things from the outside. The basic rule of good instrumentation is to be transparent. The moment we have to modify the user-space of a guest just to monitor it, the purpose of transparent instrumentation is defeated. Not to mention Heisenbugs and interference. Correct. Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony suggesting such a clearly inferior add a demon to the guest space solution. It's a usability and deployment non-starter. Furthermore, allowing a guest to integrate/mount its files into the host VFS space (which was my suggestion) has many other uses and advantages as well, beyond the instrumentation/symbol-lookup purpose. So can we please have some resolution here and move on: the KVM maintainers should either suggest a different transparent approach, or should retract the NAK for the solution we suggested. We very much want to make progress and want to write code, but obviously we cannot code against a maintainer NAK, nor can we code up an inferior solution either. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On 03/21/2010 09:17 PM, Ingo Molnar wrote: Adding any new daemon to an existing guest is a deployment and usability nightmare. The logical conclusion of that is that everything should be built into the kernel. Where a failure brings the system down or worse. Where you have to bear the memory footprint whether you ever use the functionality or not. Where to update the functionality you need to deploy a new kernel (possibly introducing unrelated bugs) and reboot. If userspace daemons are such a deployment and usability nightmare, maybe we should fix that instead. The basic rule of good instrumentation is to be transparent. The moment we have to modify the user-space of a guest just to monitor it, the purpose of transparent instrumentation is defeated. You have to modify the guest anyway by deploying a new kernel. Please try think with the heads of our users and developers and dont suggest some weird ivory-tower design that is totally impractical ... inetd.d style 'drop a listener config here and it will be executed on connection' should work. The listener could come with the kernel package, though I don't think it's a good idea. module-init-tools doesn't and people have survived somehow. And no, you have to code none of this, we'll do all the coding. The only thing we are asking is for you to not stand in the way of good usability ... Thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote: On 03/21/2010 09:17 PM, Ingo Molnar wrote: Adding any new daemon to an existing guest is a deployment and usability nightmare. The logical conclusion of that is that everything should be built into the kernel. Where a failure brings the system down or worse. Where you have to bear the memory footprint whether you ever use the functionality or not. Where to update the functionality you need to deploy a new kernel (possibly introducing unrelated bugs) and reboot. If userspace daemons are such a deployment and usability nightmare, maybe we should fix that instead. Which userspace? Deploying *anything* in the guest can be a nightmare, including paravirt drivers if you don't have a natively supported in the OS virtual hardware backoff. Deploying things in the host OTOH is business as usual. And you're smart enough to know that. OG. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracking KVM development
On 21.03.2010, at 17:42, Avi Kivity wrote: On 03/21/2010 06:37 PM, Thomas Løcke wrote: On Sun, Mar 21, 2010 at 1:23 PM, Avi Kivitya...@redhat.com wrote: Tracking git repositories and stable setups are mutually exclusive. If you are interested in something stable I recommend staying with the distribution provided setup (and picking a distribution that has an emphasis on kvm). If you want to track upstream, use qemu-kvm-0.12.x stable releases and kernel.org 2.6.x.y stable releases. If you want to track git repositories, use qemu-kvm.git and kvm.git for the kernel and kvm. Thanks Avi. I will stay with the stable qemu-kvm releases and stable kernel.org kernel releases from now on. I've never heard of any KVM specific distributions. Are you aware of any? My primary reason for going with Slackware, is because I already know it. But if there are better choices for a KVM virtualization host, then I'm willing to switch. The only kvm-specific distribution I know of is RHEV-H, but that's probably not what you're looking for. I'm talking about distributions that have an active kvm package maintainer, update the packages regularly, have bug trackers that someone looks into, etc. At least Fedora and Ubuntu do this, perhaps openSuSE as well (though the latter has a stronger Xen emphasis). Yes, we do. Though openSUSE 11.2 isn't exactly where I want it to be. Expect 11.3 to be a lot better there. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On 03/21/2010 09:59 PM, Ingo Molnar wrote: Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony suggesting such a clearly inferior add a demon to the guest space solution. It's a usability and deployment non-starter. It's only clearly inferior if you ignore every consideration against it. It's definitely not a deployment non-starter, see the tons of daemons that come with any Linux system. The basic ones are installed and enabled automatically during system installation. Furthermore, allowing a guest to integrate/mount its files into the host VFS space (which was my suggestion) has many other uses and advantages as well, beyond the instrumentation/symbol-lookup purpose. Yes. I'm just not sure about the auto-enabling part. So can we please have some resolution here and move on: the KVM maintainers should either suggest a different transparent approach, or should retract the NAK for the solution we suggested. So long as you define 'transparent' as in 'only the guest kernel is involved' or even 'only the guest and host kernels are involved' we aren't going to make a lot of progress. I oppose shoving random bits of functionality into the kernel, especially things that are in daily use. While us developers do and will use profiling extensively, it doesn't need sit in every guest's non-swappable .text. We very much want to make progress and want to write code, but obviously we cannot code against a maintainer NAK, nor can we code up an inferior solution either. You haven't heard any NAKs, only objections. If we discuss things perhaps we can achieve something that works for everyone. If we keep turning the flames higher that's unlikely. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unify KVM kernel-space and user-space code into a single project
On 03/21/2010 10:08 PM, Olivier Galibert wrote: On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote: On 03/21/2010 09:17 PM, Ingo Molnar wrote: Adding any new daemon to an existing guest is a deployment and usability nightmare. The logical conclusion of that is that everything should be built into the kernel. Where a failure brings the system down or worse. Where you have to bear the memory footprint whether you ever use the functionality or not. Where to update the functionality you need to deploy a new kernel (possibly introducing unrelated bugs) and reboot. If userspace daemons are such a deployment and usability nightmare, maybe we should fix that instead. Which userspace? Deploying *anything* in the guest can be a nightmare, including paravirt drivers if you don't have a natively supported in the OS virtual hardware backoff. That includes the guest kernel. If you can deploy a new kernel in the guest, presumably you can deploy a userspace package. Deploying things in the host OTOH is business as usual. True. And you're smart enough to know that. Thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
CONFIG_HAVE_KVM=n impossible?
Hello, does anybody know why it seems that it`s not possible to build a kernel with CONFIG_HAVE_KVM=n ? It always switches back to y with every kernel build and i have no clue, why. i`m using 2.6.33 vanilla. regards Roland ___ WEB.DE DSL: Internet, Telefon und Entertainment für nur 19,99 EUR/mtl.! http://produkte.web.de/go/02/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracking KVM development
Avi Kivity wrote: [] The only kvm-specific distribution I know of is RHEV-H, but that's probably not what you're looking for. I'm talking about distributions that have an active kvm package maintainer, update the packages regularly, have bug trackers that someone looks into, etc. At least Fedora and Ubuntu do this, perhaps openSuSE as well (though the latter has a stronger Xen emphasis). Debian is a lot better on this front than it used to be a year ago. At least I'm trying to look for the bugreports on a regular basis ;) /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CONFIG_HAVE_KVM=n impossible?
devz...@web.de wrote: Hello, does anybody know why it seems that it`s not possible to build a kernel with CONFIG_HAVE_KVM=n ? It always switches back to y with every kernel build and i have no clue, why. It's an internal config symbol which is not visible in the menu system and is always set up unconditionally based on the platform. Just like CONFIG_HAVE_MMU. You want another symbols, like CONFIG_KVM. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html