date:20100321

[COMMIT master] KVM: x86: ignore access permissions for hypercall patching

2010-03-21 Thread Avi Kivity

From: Marcelo Tosatti mtosa...@redhat.com

Ignore access permissions while patching hypercall instructions.
Otherwise KVM injects a page fault when trying to patch vmcall
on read-only text regions:

Freeing initrd memory: 8843k freed
Freeing unused kernel memory: 660k freed
Write protecting the kernel text: 4780k
Write protecting the kernel read-only data: 1912k
BUG: unable to handle kernel paging request at c01292e3
IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70
*pde = 00910067 *pte = 00129161
Oops: 0003 [#1] SMP

CC: sta...@kernel.org
Reported-and-Tested-by: Stefan Bader stefan.ba...@canonical.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bcf52d1..9d02cc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3226,12 +3226,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t 
gpa,
 static int emulator_write_emulated_onepage(unsigned long addr,
   const void *val,
   unsigned int bytes,
-  struct kvm_vcpu *vcpu)
+  struct kvm_vcpu *vcpu,
+  bool guest_initiated)
 {
gpa_t gpa;
u32 error_code;
 
-   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
+
+   if (guest_initiated)
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
+   else
+   gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code);
 
if (gpa == UNMAPPED_GVA) {
kvm_inject_page_fault(vcpu, addr, error_code);
@@ -3262,24 +3267,35 @@ mmio:
return X86EMUL_CONTINUE;
 }
 
-int emulator_write_emulated(unsigned long addr,
+int __emulator_write_emulated(unsigned long addr,
   const void *val,
   unsigned int bytes,
-  struct kvm_vcpu *vcpu)
+  struct kvm_vcpu *vcpu,
+  bool guest_initiated)
 {
/* Crossing a page boundary? */
if (((addr + bytes - 1) ^ addr)  PAGE_MASK) {
int rc, now;
 
now = -addr  ~PAGE_MASK;
-   rc = emulator_write_emulated_onepage(addr, val, now, vcpu);
+   rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
+guest_initiated);
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
val += now;
bytes -= now;
}
-   return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
+   return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
+  guest_initiated);
+}
+
+int emulator_write_emulated(unsigned long addr,
+  const void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu)
+{
+   return __emulator_write_emulated(addr, val, bytes, vcpu, true);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3970,7 +3986,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
kvm_x86_ops-patch_hypercall(vcpu, instruction);
 
-   return emulator_write_emulated(rip, instruction, 3, vcpu);
+   return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
 }
 
 static u64 mk_cr_64(u64 curr_cr, u32 new_val)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: ia64: fix the error code of ioctl KVM_IA64_VCPU_GET_STACK failure

2010-03-21 Thread Avi Kivity

From: Wei Yongjun yj...@cn.fujitsu.com

The ioctl KVM_IA64_VCPU_GET_STACK does not set the error code if
copy_to_user() fail, and 0 will be return, we should use -EFAULT
instead of 0 in this case, so this patch fixed it.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 26e0e08..bc07c81 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1535,8 +1535,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
goto out;
 
if (copy_to_user(user_stack, stack,
-sizeof(struct kvm_ia64_vcpu_stack)))
+sizeof(struct kvm_ia64_vcpu_stack))) {
+   r = -EFAULT;
goto out;
+   }
 
break;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: Use native_store_idt() instead of kvm_get_idt()

2010-03-21 Thread Avi Kivity

From: Wei Yongjun yj...@cn.fujitsu.com

This patch use generic linux function native_store_idt()
instead of kvm_get_idt(), and also removed the useless
function kvm_get_idt().

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..ea1b6c6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -716,11 +716,6 @@ static inline void kvm_load_ldt(u16 sel)
asm(lldt %0 : : rm(sel));
 }
 
-static inline void kvm_get_idt(struct desc_ptr *table)
-{
-   asm(sidt %0 : =m(*table));
-}
-
 #ifdef CONFIG_X86_64
 static inline unsigned long read_msr(unsigned long msr)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 06108f3..df70244 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2445,7 +2445,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8);  /* 22.2.4 */
 
-   kvm_get_idt(dt);
+   native_store_idt(dt);
vmcs_writel(HOST_IDTR_BASE, dt.address);   /* 22.2.4 */
 
asm(mov $.Lkvm_vmx_return, %0 : =r(kvm_vmx_return));
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: ia64: fix the error of ioctl KVM_IRQ_LINE if no irq chip

2010-03-21 Thread Avi Kivity

From: Wei Yongjun yj...@cn.fujitsu.com

If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index bc07c81..b0ed80c 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -979,11 +979,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(irq_event, argp, sizeof irq_event))
goto out;
+   r = -ENXIO;
if (irqchip_in_kernel(kvm)) {
__s32 status;
status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
irq_event.irq, irq_event.level);
if (ioctl == KVM_IRQ_LINE_STATUS) {
+   r = -EFAULT;
irq_event.status = status;
if (copy_to_user(argp, irq_event,
sizeof irq_event))
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: fix assigned_device_enable_host_msix error handling

2010-03-21 Thread Avi Kivity

From: jing zhang zj.ba...@gmail.com

Free IRQ's and disable MSIX upon failure.

Cc: Avi Kivity a...@redhat.com
Signed-off-by: Jing Zhang zj.ba...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 057e2cc..47ca447 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -315,12 +315,16 @@ static int assigned_device_enable_host_msix(struct kvm 
*kvm,
kvm_assigned_dev_intr, 0,
kvm_assigned_msix_device,
(void *)dev);
-   /* FIXME: free requested_irq's on failure */
if (r)
-   return r;
+   goto err;
}
 
return 0;
+err:
+   for (i -= 1; i = 0; i--)
+   free_irq(dev-host_msix_entries[i].vector, (void *)dev);
+   pci_disable_msix(dev-dev);
+   return r;
 }
 
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()

2010-03-21 Thread Avi Kivity

From: Avi Kivity a...@redhat.com

kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes.  Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b137515..f63c9ad 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2556,36 +2556,11 @@ static bool last_updated_pte_accessed(struct kvm_vcpu 
*vcpu)
 }
 
 static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
- const u8 *new, int bytes)
+ u64 gpte)
 {
gfn_t gfn;
-   int r;
-   u64 gpte = 0;
pfn_t pfn;
 
-   if (bytes != 4  bytes != 8)
-   return;
-
-   /*
-* Assume that the pte write on a page table of the same type
-* as the current vcpu paging mode.  This is nearly always true
-* (might be false while changing modes).  Note it is verified later
-* by update_pte().
-*/
-   if (is_pae(vcpu)) {
-   /* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   if ((bytes == 4)  (gpa % 4 == 0)) {
-   r = kvm_read_guest(vcpu-kvm, gpa  ~(u64)7, gpte, 8);
-   if (r)
-   return;
-   memcpy((void *)gpte + (gpa % 8), new, 4);
-   } else if ((bytes == 8)  (gpa % 8 == 0)) {
-   memcpy((void *)gpte, new, 8);
-   }
-   } else {
-   if ((bytes == 4)  (gpa % 4 == 0))
-   memcpy((void *)gpte, new, 4);
-   }
if (!is_present_gpte(gpte))
return;
gfn = (gpte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
@@ -2636,7 +2611,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int r;
 
pgprintk(%s: gpa %llx bytes %d\n, __func__, gpa, bytes);
-   mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes);
+
+   switch (bytes) {
+   case 4:
+   gentry = *(const u32 *)new;
+   break;
+   case 8:
+   gentry = *(const u64 *)new;
+   break;
+   default:
+   gentry = 0;
+   break;
+   }
+
+   /*
+* Assume that the pte write on a page table of the same type
+* as the current vcpu paging mode.  This is nearly always true
+* (might be false while changing modes).  Note it is verified later
+* by update_pte().
+*/
+   if (is_pae(vcpu)  bytes == 4) {
+   /* Handle a 32-bit guest writing two halves of a 64-bit gpte */
+   gpa = ~(gpa_t)7;
+   r = kvm_read_guest(vcpu-kvm, gpa, gentry, 8);
+   if (r)
+   gentry = 0;
+   }
+
+   mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
spin_lock(vcpu-kvm-mmu_lock);
kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
@@ -2701,20 +2703,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
continue;
}
spte = sp-spt[page_offset / sizeof(*spte)];
-   if ((gpa  (pte_size - 1)) || (bytes  pte_size)) {
-   gentry = 0;
-   r = kvm_read_guest_atomic(vcpu-kvm,
- gpa  ~(u64)(pte_size - 1),
- gentry, pte_size);
-   new = (const void *)gentry;
-   if (r  0)
-   new = NULL;
-   }
while (npte--) {
entry = *spte;
mmu_pte_write_zap_pte(vcpu, sp, spte);
-   if (new)
-   mmu_pte_write_new_pte(vcpu, sp, spte, new);
+   if (gentry)
+   mmu_pte_write_new_pte(vcpu, sp, spte, gentry);
mmu_pte_write_flush_tlb(vcpu, entry, *spte);
++spte;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failure

2010-03-21 Thread Avi Kivity

From: Wei Yongjun yj...@cn.fujitsu.com

This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
from -EINVAL to -ENXIO if no coalesced mmio dev exists.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5169736..22500d4 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -138,7 +138,7 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_dev *dev = kvm-coalesced_mmio_dev;
 
if (dev == NULL)
-   return -EINVAL;
+   return -ENXIO;
 
mutex_lock(kvm-slots_lock);
if (dev-nb_zones = KVM_COALESCED_MMIO_ZONE_MAX) {
@@ -161,7 +161,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_zone *z;
 
if (dev == NULL)
-   return -EINVAL;
+   return -ENXIO;
 
mutex_lock(kvm-slots_lock);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bcd08b8..8c3743c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1602,7 +1602,6 @@ static long kvm_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(zone, argp, sizeof zone))
goto out;
-   r = -ENXIO;
r = kvm_vm_ioctl_register_coalesced_mmio(kvm, zone);
if (r)
goto out;
@@ -1614,7 +1613,6 @@ static long kvm_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(zone, argp, sizeof zone))
goto out;
-   r = -ENXIO;
r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, zone);
if (r)
goto out;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: fix RCX access during rep emulation

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

During rep emulation access length to RCX depends on current address
mode.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0b70a36..4dce805 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1852,7 +1852,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 
if (c-rep_prefix  (c-d  String)) {
/* All REP prefixes have the same first termination condition */
-   if (c-regs[VCPU_REGS_RCX] == 0) {
+   if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) {
kvm_rip_write(ctxt-vcpu, c-eip);
goto done;
}
@@ -1876,7 +1876,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
goto done;
}
}
-   c-regs[VCPU_REGS_RCX]--;
+   register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
c-eip = kvm_rip_read(ctxt-vcpu);
}
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Make locked operations truly atomic

2010-03-21 Thread Avi Kivity

From: Avi Kivity a...@redhat.com

Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.

These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest.  This may cause corruption of guest page tables.

Fix by using an atomic cmpxchg for these operations.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9c81ece..1302bfb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3301,41 +3301,68 @@ int emulator_write_emulated(unsigned long addr,
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
+#define CMPXCHG_TYPE(t, ptr, old, new) \
+   (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old))
+
+#ifdef CONFIG_X86_64
+#  define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new)
+#else
+#  define CMPXCHG64(ptr, old, new) \
+   (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old))
+#endif
+
 static int emulator_cmpxchg_emulated(unsigned long addr,
 const void *old,
 const void *new,
 unsigned int bytes,
 struct kvm_vcpu *vcpu)
 {
-   printk_once(KERN_WARNING kvm: emulating exchange as write\n);
-#ifndef CONFIG_X86_64
-   /* guests cmpxchg8b have to be emulated atomically */
-   if (bytes == 8) {
-   gpa_t gpa;
-   struct page *page;
-   char *kaddr;
-   u64 val;
+   gpa_t gpa;
+   struct page *page;
+   char *kaddr;
+   bool exchanged;
 
-   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
+   /* guests cmpxchg8b have to be emulated atomically */
+   if (bytes  8 || (bytes  (bytes - 1)))
+   goto emul_write;
 
-   if (gpa == UNMAPPED_GVA ||
-  (gpa  PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
-   goto emul_write;
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
 
-   if (((gpa + bytes - 1)  PAGE_MASK) != (gpa  PAGE_MASK))
-   goto emul_write;
+   if (gpa == UNMAPPED_GVA ||
+   (gpa  PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
+   goto emul_write;
 
-   val = *(u64 *)new;
+   if (((gpa + bytes - 1)  PAGE_MASK) != (gpa  PAGE_MASK))
+   goto emul_write;
 
-   page = gfn_to_page(vcpu-kvm, gpa  PAGE_SHIFT);
+   page = gfn_to_page(vcpu-kvm, gpa  PAGE_SHIFT);
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   set_64bit((u64 *)(kaddr + offset_in_page(gpa)), val);
-   kunmap_atomic(kaddr, KM_USER0);
-   kvm_release_page_dirty(page);
+   kaddr = kmap_atomic(page, KM_USER0);
+   kaddr += offset_in_page(gpa);
+   switch (bytes) {
+   case 1:
+   exchanged = CMPXCHG_TYPE(u8, kaddr, old, new);
+   break;
+   case 2:
+   exchanged = CMPXCHG_TYPE(u16, kaddr, old, new);
+   break;
+   case 4:
+   exchanged = CMPXCHG_TYPE(u32, kaddr, old, new);
+   break;
+   case 8:
+   exchanged = CMPXCHG64(kaddr, old, new);
+   break;
+   default:
+   BUG();
}
+   kunmap_atomic(kaddr, KM_USER0);
+   kvm_release_page_dirty(page);
+
+   if (!exchanged)
+   return X86EMUL_CMPXCHG_FAILED;
+
 emul_write:
-#endif
+   printk_once(KERN_WARNING kvm: emulating exchange as write\n);
 
return emulator_write_emulated(addr, new, bytes, vcpu);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: MMU: Do not instantiate nontrapping spte on unsync page

2010-03-21 Thread Avi Kivity

From: Avi Kivity a...@redhat.com

The update_pte() path currently uses a nontrapping spte when a nonpresent
(or nonaccessed) gpte is written.  This is fine since at present it is only
used on sync pages.  However, on an unsync page this will cause an endless
fault loop as the guest is under no obligation to invlpg a gpte that
transitions from nonpresent to present.

Needed for the next patch which reinstates update_pte() on invlpg.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 81eab9a..4b37e1a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -258,11 +258,17 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *page,
pt_element_t gpte;
unsigned pte_access;
pfn_t pfn;
+   u64 new_spte;
 
gpte = *(const pt_element_t *)pte;
if (~gpte  (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
-   if (!is_present_gpte(gpte))
-   __set_spte(spte, shadow_notrap_nonpresent_pte);
+   if (!is_present_gpte(gpte)) {
+   if (page-unsync)
+   new_spte = shadow_trap_nonpresent_pte;
+   else
+   new_spte = shadow_notrap_nonpresent_pte;
+   __set_spte(spte, new_spte);
+   }
return;
}
pgprintk(%s: gpte %llx spte %p\n, __func__, (u64)gpte, spte);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: check return value against correct define

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Check return value against correct define instead of open code
the value.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4dce805..670ca8f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -566,7 +566,7 @@ static u32 group2_table[] = {
 #define insn_fetch(_type, _size, _eip)  \
 ({ unsigned long _x;   \
rc = do_insn_fetch(ctxt, ops, (_eip), _x, (_size));\
-   if (rc != 0)\
+   if (rc != X86EMUL_CONTINUE) \
goto done;  \
(_eip) += (_size);  \
(_type)_x;  \
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Remove pointer to rflags from realmode_set_cr parameters.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Mov reg, cr instruction doesn't change flags in any meaningful way, so
no need to update rflags after instruction execution.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 28826c8..53f5202 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -587,8 +587,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
   unsigned long *rflags);
 
 unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value,
-unsigned long *rflags);
+void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 670ca8f..91450b5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2534,8 +2534,7 @@ twobyte_insn:
case 0x22: /* mov reg, cr */
if (c-modrm_mod != 3)
goto cannot_emulate;
-   realmode_set_cr(ctxt-vcpu,
-   c-modrm_reg, c-modrm_val, ctxt-eflags);
+   realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val);
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bbf47c..77f0955 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4081,13 +4081,11 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, 
int cr)
return value;
 }
 
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val,
-unsigned long *rflags)
+void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val)
 {
switch (cr) {
case 0:
kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
-   *rflags = kvm_get_rflags(vcpu);
break;
case 2:
vcpu-arch.cr2 = val;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Don't follow an atomic operation by a non-atomic one

2010-03-21 Thread Avi Kivity

From: Avi Kivity a...@redhat.com

Currently emulated atomic operations are immediately followed by a non-atomic
operation, so that kvm_mmu_pte_write() can be invoked.  This updates the mmu
but undoes the whole point of doing things atomically.

Fix by only performing the atomic operation and the mmu update, and avoiding
the non-atomic write.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1302bfb..5bbf47c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3229,7 +3229,8 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   const void *val,
   unsigned int bytes,
   struct kvm_vcpu *vcpu,
-  bool guest_initiated)
+  bool guest_initiated,
+  bool mmu_only)
 {
gpa_t gpa;
u32 error_code;
@@ -3249,6 +3250,10 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
if ((gpa  PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
goto mmio;
 
+   if (mmu_only) {
+   kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1);
+   return X86EMUL_CONTINUE;
+   }
if (emulator_write_phys(vcpu, gpa, val, bytes))
return X86EMUL_CONTINUE;
 
@@ -3273,7 +3278,8 @@ int __emulator_write_emulated(unsigned long addr,
   const void *val,
   unsigned int bytes,
   struct kvm_vcpu *vcpu,
-  bool guest_initiated)
+  bool guest_initiated,
+  bool mmu_only)
 {
/* Crossing a page boundary? */
if (((addr + bytes - 1) ^ addr)  PAGE_MASK) {
@@ -3281,7 +3287,7 @@ int __emulator_write_emulated(unsigned long addr,
 
now = -addr  ~PAGE_MASK;
rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
-guest_initiated);
+guest_initiated, mmu_only);
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
@@ -3289,7 +3295,7 @@ int __emulator_write_emulated(unsigned long addr,
bytes -= now;
}
return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
-  guest_initiated);
+  guest_initiated, mmu_only);
 }
 
 int emulator_write_emulated(unsigned long addr,
@@ -3297,7 +3303,7 @@ int emulator_write_emulated(unsigned long addr,
   unsigned int bytes,
   struct kvm_vcpu *vcpu)
 {
-   return __emulator_write_emulated(addr, val, bytes, vcpu, true);
+   return __emulator_write_emulated(addr, val, bytes, vcpu, true, false);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3361,6 +3367,8 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;
 
+   return __emulator_write_emulated(addr, new, bytes, vcpu, true, true);
+
 emul_write:
printk_once(KERN_WARNING kvm: emulating exchange as write\n);
 
@@ -4015,7 +4023,8 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
kvm_x86_ops-patch_hypercall(vcpu, instruction);
 
-   return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
+   return __emulator_write_emulated(rip, instruction, 3, vcpu,
+false, false);
 }
 
 static u64 mk_cr_64(u64 curr_cr, u32 new_val)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Provide current eip as part of emulator context.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Eliminate the need to call back into KVM to get it from emulator.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b048fd2..0765725 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -141,7 +141,7 @@ struct decode_cache {
u8 seg_override;
unsigned int d;
unsigned long regs[NR_VCPU_REGS];
-   unsigned long eip, eip_orig;
+   unsigned long eip;
/* modrm */
u8 modrm;
u8 modrm_mod;
@@ -160,6 +160,7 @@ struct x86_emulate_ctxt {
struct kvm_vcpu *vcpu;
 
unsigned long eflags;
+   unsigned long eip; /* eip before instruction emulation */
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
int mode;
u32 cs_base;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8bd0557..2c27aa4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -667,7 +667,7 @@ static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
int rc;
 
/* x86 instructions are limited to 15 bytes. */
-   if (eip + size - ctxt-decode.eip_orig  15)
+   if (eip + size - ctxt-eip  15)
return X86EMUL_UNHANDLEABLE;
eip += ctxt-cs_base;
while (size--) {
@@ -927,7 +927,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
/* Shadow copy of register state. Committed on successful emulation. */
 
memset(c, 0, sizeof(struct decode_cache));
-   c-eip = c-eip_orig = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS);
memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
 
@@ -1878,7 +1878,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
}
register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
}
 
if (c-src.type == OP_MEM) {
@@ -2447,7 +2447,7 @@ twobyte_insn:
goto done;
 
/* Let the processor re-execute the fixed hypercall */
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
/* Disable writeback. */
c-dst.type = OP_NONE;
break;
@@ -2551,7 +2551,7 @@ twobyte_insn:
| ((u64)c-regs[VCPU_REGS_RDX]  32);
if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) {
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
}
rc = X86EMUL_CONTINUE;
c-dst.type = OP_NONE;
@@ -2560,7 +2560,7 @@ twobyte_insn:
/* rdmsr */
if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) 
{
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
} else {
c-regs[VCPU_REGS_RAX] = (u32)msr_data;
c-regs[VCPU_REGS_RDX] = msr_data  32;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 81d417e..ca86efa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3531,6 +3531,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
vcpu-arch.emulate_ctxt.vcpu = vcpu;
vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu);
+   vcpu-arch.emulate_ctxt.eip = kvm_rip_read(vcpu);
vcpu-arch.emulate_ctxt.mode =
(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(vcpu-arch.emulate_ctxt.eflags  X86_EFLAGS_VM)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: remove realmode_lmsw function.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Use (get|set)_cr callback to emulate lmsw inside emulator.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d474c7..b99cec1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -583,8 +583,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context);
 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
 void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags);
 
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5b060e4..5e2fa61 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2486,8 +2486,8 @@ twobyte_insn:
c-dst.val = ops-get_cr(0, ctxt-vcpu);
break;
case 6: /* lmsw */
-   realmode_lmsw(ctxt-vcpu, (u16)c-src.val,
- ctxt-eflags);
+   ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu)  ~0x0ful) |
+   (c-src.val  0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
case 7: /* invlpg*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9ace70..6206600 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4099,13 +4099,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, 
unsigned long base)
kvm_x86_ops-set_idt(vcpu, dt);
 }
 
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags)
-{
-   kvm_lmsw(vcpu, msw);
-   *rflags = kvm_get_rflags(vcpu);
-}
-
 static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i)
 {
struct kvm_cpuid_entry2 *e = vcpu-arch.cpuid_entries[i];
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: cleanup grp3 return value

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

When x86_emulate_insn() does not know how to emulate instruction it
exits via cannot_emulate label in all cases except when emulating
grp3. Fix that.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 46a7ee3..d696cbd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1397,7 +1397,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
   struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
-   int rc = X86EMUL_CONTINUE;
 
switch (c-modrm_reg) {
case 0 ... 1:   /* test */
@@ -1410,11 +1409,9 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
emulate_1op(neg, c-dst, ctxt-eflags);
break;
default:
-   DPRINTF(Cannot emulate %02x\n, c-b);
-   rc = X86EMUL_UNHANDLEABLE;
-   break;
+   return 0;
}
-   return rc;
+   return 1;
 }
 
 static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt,
@@ -2374,9 +2371,8 @@ special_insn:
c-dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xf6 ... 0xf7: /* Grp3 */
-   rc = emulate_grp3(ctxt, ops);
-   if (rc != X86EMUL_CONTINUE)
-   goto done;
+   if (!emulate_grp3(ctxt, ops))
+   goto cannot_emulate;
break;
case 0xf8: /* clc */
ctxt-eflags = ~EFLG_CF;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Provide x86_emulate_ctxt callback to get current cpl

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0c5caa4..b048fd2 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -110,6 +110,7 @@ struct x86_emulate_ops {
struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
+   int (*cpl)(struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5e2fa61..8bd0557 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
int rc;
unsigned long val, change_mask;
int iopl = (ctxt-eflags  X86_EFLAGS_IOPL)  IOPL_SHIFT;
-   int cpl = kvm_x86_ops-get_cpl(ctxt-vcpu);
+   int cpl = ops-cpl(ctxt-vcpu);
 
rc = emulate_pop(ctxt, ops, val, len);
if (rc != X86EMUL_CONTINUE)
@@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
 }
 
-static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)
+static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops)
 {
int iopl;
if (ctxt-mode == X86EMUL_MODE_REAL)
@@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt 
*ctxt)
if (ctxt-mode == X86EMUL_MODE_VM86)
return true;
iopl = (ctxt-eflags  X86_EFLAGS_IOPL)  IOPL_SHIFT;
-   return kvm_x86_ops-get_cpl(ctxt-vcpu)  iopl;
+   return ops-cpl(ctxt-vcpu)  iopl;
 }
 
 static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
@@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct x86_emulate_ctxt 
*ctxt,
 struct x86_emulate_ops *ops,
 u16 port, u16 len)
 {
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
if (!emulator_io_port_access_allowed(ctxt, ops, port, len))
return false;
return true;
@@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
/* Privileged instruction can be executed only in CPL=0 */
-   if ((c-d  Priv)  kvm_x86_ops-get_cpl(ctxt-vcpu)) {
+   if ((c-d  Priv)  ops-cpl(ctxt-vcpu)) {
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
@@ -2378,7 +2379,7 @@ special_insn:
c-dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xfa: /* cli */
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt-vcpu, 0);
else {
ctxt-eflags = ~X86_EFLAGS_IF;
@@ -2386,7 +2387,7 @@ special_insn:
}
break;
case 0xfb: /* sti */
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt-vcpu, 0);
else {
toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6206600..81d417e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3479,6 +3479,11 @@ static void emulator_set_cr(int cr, unsigned long val, 
struct kvm_vcpu *vcpu)
}
 }
 
+static int emulator_get_cpl(struct kvm_vcpu *vcpu)
+{
+   return kvm_x86_ops-get_cpl(vcpu);
+}
+
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.fetch   = kvm_fetch_guest_virt,
@@ -3487,6 +3492,7 @@ static struct x86_emulate_ops emulate_ops = {
.cmpxchg_emulated= emulator_cmpxchg_emulated,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .cpl = emulator_get_cpl,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: fix mov dr to inject #UD when needed.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

If CR4.DE=1 access to registers DR4/DR5 cause #UD.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 836e97b..5afddcf 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2531,9 +2531,12 @@ twobyte_insn:
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
-   if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
-   goto cannot_emulate;
-   rc = X86EMUL_CONTINUE;
+   if ((ops-get_cr(4, ctxt-vcpu)  X86_CR4_DE) 
+   (c-modrm_reg == 4 || c-modrm_reg == 5)) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
+   }
+   emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
@@ -2541,9 +2544,12 @@ twobyte_insn:
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
-   if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
-   goto cannot_emulate;
-   rc = X86EMUL_CONTINUE;
+   if ((ops-get_cr(4, ctxt-vcpu)  X86_CR4_DE) 
+   (c-modrm_reg == 4 || c-modrm_reg == 5)) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
+   }
+   emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x30:
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD
for those instruction when appropriate.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5afddcf..1393bf0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1600,8 +1600,11 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
u64 msr_data;
 
/* syscall is not available in real mode */
-   if (ctxt-mode == X86EMUL_MODE_REAL || ctxt-mode == X86EMUL_MODE_VM86)
-   return X86EMUL_UNHANDLEABLE;
+   if (ctxt-mode == X86EMUL_MODE_REAL ||
+   ctxt-mode == X86EMUL_MODE_VM86) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
 
setup_syscalls_segments(ctxt, cs, ss);
 
@@ -1651,14 +1654,16 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
/* inject #GP if in real mode */
if (ctxt-mode == X86EMUL_MODE_REAL) {
kvm_inject_gp(ctxt-vcpu, 0);
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_PROPAGATE_FAULT;
}
 
/* XXX sysenter/sysexit have not been tested in 64bit mode.
* Therefore, we inject an #UD.
*/
-   if (ctxt-mode == X86EMUL_MODE_PROT64)
-   return X86EMUL_UNHANDLEABLE;
+   if (ctxt-mode == X86EMUL_MODE_PROT64) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
 
setup_syscalls_segments(ctxt, cs, ss);
 
@@ -1713,7 +1718,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
if (ctxt-mode == X86EMUL_MODE_REAL ||
ctxt-mode == X86EMUL_MODE_VM86) {
kvm_inject_gp(ctxt-vcpu, 0);
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_PROPAGATE_FAULT;
}
 
setup_syscalls_segments(ctxt, cs, ss);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

If LOCK prefix is used dest arg should be memory, otherwise instruction
should generate #UD.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b89a8f2..46a7ee3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
/* LOCK prefix is allowed only with some instructions */
-   if (c-lock_prefix  !(c-d  Lock)) {
+   if (c-lock_prefix  (!(c-d  Lock) || c-dst.type != OP_MEM)) {
kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
goto done;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index db4776c..702bfff 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1508,7 +1508,7 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt,
if (rc != X86EMUL_CONTINUE)
return rc;
 
-   rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)selector, seg);
+   rc = load_segment_descriptor(ctxt, ops, (u16)selector, seg);
return rc;
 }
 
@@ -1683,7 +1683,7 @@ static int emulate_ret_far(struct x86_emulate_ctxt *ctxt,
rc = emulate_pop(ctxt, ops, cs, c-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)cs, VCPU_SREG_CS);
+   rc = load_segment_descriptor(ctxt, ops, (u16)cs, VCPU_SREG_CS);
return rc;
 }
 
@@ -2717,7 +2717,7 @@ special_insn:
if (c-modrm_reg == VCPU_SREG_SS)
toggle_interruptibility(ctxt, 
KVM_X86_SHADOW_INT_MOV_SS);
 
-   rc = kvm_load_segment_descriptor(ctxt-vcpu, sel, c-modrm_reg);
+   rc = load_segment_descriptor(ctxt, ops, sel, c-modrm_reg);
 
c-dst.type = OP_NONE;  /* Disable writeback. */
break;
@@ -2892,8 +2892,8 @@ special_insn:
goto jmp;
case 0xea: /* jmp far */
jump_far:
-   if (kvm_load_segment_descriptor(ctxt-vcpu, c-src2.val,
-   VCPU_SREG_CS))
+   if (load_segment_descriptor(ctxt, ops, c-src2.val,
+   VCPU_SREG_CS))
goto done;
 
c-eip = c-src.val;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: remove saved_eip

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

c-eip is never written back in case of emulation failure, so no need to
set it to old value.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b3ff673..c20 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2420,7 +2420,6 @@ int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
u64 msr_data;
-   unsigned long saved_eip = 0;
struct decode_cache *c = ctxt-decode;
int rc = X86EMUL_CONTINUE;
 
@@ -2432,7 +2431,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 */
 
memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
-   saved_eip = c-eip;
 
if (ctxt-mode == X86EMUL_MODE_PROT64  (c-d  No64)) {
kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
@@ -2924,11 +2922,7 @@ writeback:
kvm_rip_write(ctxt-vcpu, c-eip);
 
 done:
-   if (rc == X86EMUL_UNHANDLEABLE) {
-   c-eip = saved_eip;
-   return -1;
-   }
-   return 0;
+   return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 
 twobyte_insn:
switch (c-b) {
@@ -3205,6 +3199,5 @@ twobyte_insn:
 
 cannot_emulate:
DPRINTF(Cannot emulate %02x\n, c-b);
-   c-eip = saved_eip;
return -1;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Use task switch from emulator.c

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Remove old task switch code from x86.c

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 61577ae..d6124f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4833,553 +4833,30 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu 
*vcpu,
return 0;
 }
 
-static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector,
-  struct kvm_segment *kvm_desct)
-{
-   kvm_desct-base = get_desc_base(seg_desc);
-   kvm_desct-limit = get_desc_limit(seg_desc);
-   if (seg_desc-g) {
-   kvm_desct-limit = 12;
-   kvm_desct-limit |= 0xfff;
-   }
-   kvm_desct-selector = selector;
-   kvm_desct-type = seg_desc-type;
-   kvm_desct-present = seg_desc-p;
-   kvm_desct-dpl = seg_desc-dpl;
-   kvm_desct-db = seg_desc-d;
-   kvm_desct-s = seg_desc-s;
-   kvm_desct-l = seg_desc-l;
-   kvm_desct-g = seg_desc-g;
-   kvm_desct-avl = seg_desc-avl;
-   if (!selector)
-   kvm_desct-unusable = 1;
-   else
-   kvm_desct-unusable = 0;
-   kvm_desct-padding = 0;
-}
-
-static void get_segment_descriptor_dtable(struct kvm_vcpu *vcpu,
- u16 selector,
- struct desc_ptr *dtable)
-{
-   if (selector  1  2) {
-   struct kvm_segment kvm_seg;
-
-   kvm_get_segment(vcpu, kvm_seg, VCPU_SREG_LDTR);
-
-   if (kvm_seg.unusable)
-   dtable-size = 0;
-   else
-   dtable-size = kvm_seg.limit;
-   dtable-address = kvm_seg.base;
-   }
-   else
-   kvm_x86_ops-get_gdt(vcpu, dtable);
-}
-
-/* allowed just for 8 bytes segments */
-static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-struct desc_struct *seg_desc)
-{
-   struct desc_ptr dtable;
-   u16 index = selector  3;
-   int ret;
-   u32 err;
-   gva_t addr;
-
-   get_segment_descriptor_dtable(vcpu, selector, dtable);
-
-   if (dtable.size  index * 8 + 7) {
-   kvm_queue_exception_e(vcpu, GP_VECTOR, selector  0xfffc);
-   return X86EMUL_PROPAGATE_FAULT;
-   }
-   addr = dtable.address + index * 8;
-   ret = kvm_read_guest_virt_system(addr, seg_desc, sizeof(*seg_desc),
-vcpu,  err);
-   if (ret == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(vcpu, addr, err);
-
-   return ret;
-}
-
-/* allowed just for 8 bytes segments */
-static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-struct desc_struct *seg_desc)
-{
-   struct desc_ptr dtable;
-   u16 index = selector  3;
-
-   get_segment_descriptor_dtable(vcpu, selector, dtable);
-
-   if (dtable.size  index * 8 + 7)
-   return 1;
-   return kvm_write_guest_virt(dtable.address + index*8, seg_desc, 
sizeof(*seg_desc), vcpu, NULL);
-}
-
-static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu,
-  struct desc_struct *seg_desc)
-{
-   u32 base_addr = get_desc_base(seg_desc);
-
-   return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL);
-}
-
-static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
-struct desc_struct *seg_desc)
-{
-   u32 base_addr = get_desc_base(seg_desc);
-
-   return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
-}
-
-static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
-{
-   struct kvm_segment kvm_seg;
-
-   kvm_get_segment(vcpu, kvm_seg, seg);
-   return kvm_seg.selector;
-}
-
-static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int 
seg)
-{
-   struct kvm_segment segvar = {
-   .base = selector  4,
-   .limit = 0x,
-   .selector = selector,
-   .type = 3,
-   .present = 1,
-   .dpl = 3,
-   .db = 0,
-   .s = 1,
-   .l = 0,
-   .g = 0,
-   .avl = 0,
-   .unusable = 0,
-   };
-   kvm_x86_ops-set_segment(vcpu, segvar, seg);
-   return X86EMUL_CONTINUE;
-}
-
-static int is_vm86_segment(struct kvm_vcpu *vcpu, int seg)
-{
-   return (seg != VCPU_SREG_LDTR) 
-   (seg != VCPU_SREG_TR) 
-   (kvm_get_rflags(vcpu)  X86_EFLAGS_VM);
-}
-
-int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg)
-{
-   struct kvm_segment kvm_seg;
-   struct desc_struct seg_desc;
-   u8 dpl, rpl, cpl;
-   unsigned err_vec = GP_VECTOR;
-   u32 err_code = 0;
-   bool null_selector = !(selector  ~0x3); /* -0003 are null

[COMMIT master] KVM: x86 emulator: add decoding of X, Y parameters from Intel SDM

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Add decoding of X,Y parameters from Intel SDM which are used by string
instruction to specify source and destination. Use this new decoding
to implement movs, cmps, stos, lods in a generic way.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 55b8a8b..6ebd642 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -51,6 +51,7 @@
 #define DstReg  (21) /* Register operand. */
 #define DstMem  (31) /* Memory operand. */
 #define DstAcc  (41)  /* Destination Accumulator */
+#define DstDI   (51) /* Destination is in ES:(E)DI */
 #define DstMask (71)
 /* Source operand type. */
 #define SrcNone (04) /* No source operand. */
@@ -64,6 +65,7 @@
 #define SrcOne  (74) /* Implied '1' */
 #define SrcImmUByte (84)  /* 8-bit unsigned immediate operand. */
 #define SrcImmU (94)  /* Immediate operand, unsigned */
+#define SrcSI   (0xa4)   /* Source is in the DS:RSI */
 #define SrcMask (0xf4)
 /* Generic ModRM decode. */
 #define ModRM   (18)
@@ -177,12 +179,12 @@ static u32 opcode_table[256] = {
/* 0xA0 - 0xA7 */
ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs,
ByteOp | DstMem | SrcReg | Mov | MemAbs, DstMem | SrcReg | Mov | MemAbs,
-   ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
-   ByteOp | ImplicitOps | String, ImplicitOps | String,
+   ByteOp | SrcSI | DstDI | Mov | String, SrcSI | DstDI | Mov | String,
+   ByteOp | SrcSI | DstDI | String, SrcSI | DstDI | String,
/* 0xA8 - 0xAF */
-   0, 0, ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
-   ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
-   ByteOp | ImplicitOps | String, ImplicitOps | String,
+   0, 0, ByteOp | DstDI | Mov | String, DstDI | Mov | String,
+   ByteOp | SrcSI | DstAcc | Mov | String, SrcSI | DstAcc | Mov | String,
+   ByteOp | DstDI | String, DstDI | String,
/* 0xB0 - 0xB7 */
ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
@@ -1145,6 +1147,14 @@ done_prefixes:
c-src.bytes = 1;
c-src.val = 1;
break;
+   case SrcSI:
+   c-src.type = OP_MEM;
+   c-src.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
+   c-src.ptr = (unsigned long *)
+   register_address(c,  seg_override_base(ctxt, c),
+c-regs[VCPU_REGS_RSI]);
+   c-src.val = 0;
+   break;
}
 
/*
@@ -1230,6 +1240,14 @@ done_prefixes:
}
c-dst.orig_val = c-dst.val;
break;
+   case DstDI:
+   c-dst.type = OP_MEM;
+   c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
+   c-dst.ptr = (unsigned long *)
+   register_address(c, es_base(ctxt),
+c-regs[VCPU_REGS_RDI]);
+   c-dst.val = 0;
+   break;
}
 
 done:
@@ -2388,6 +2406,16 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
return rc;
 }
 
+static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned long base,
+   int reg, unsigned long **ptr)
+{
+   struct decode_cache *c = ctxt-decode;
+   int df = (ctxt-eflags  EFLG_DF) ? -1 : 1;
+
+   register_address_increment(c, c-regs[reg], df * c-src.bytes);
+   *ptr = (unsigned long *)register_address(c,  base, c-regs[reg]);
+}
+
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
@@ -2750,89 +2778,16 @@ special_insn:
c-dst.val = (unsigned long)c-regs[VCPU_REGS_RAX];
break;
case 0xa4 ... 0xa5: /* movs */
-   c-dst.type = OP_MEM;
-   c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
-   c-dst.ptr = (unsigned long *)register_address(c,
-  es_base(ctxt),
-  c-regs[VCPU_REGS_RDI]);
-   rc = ops-read_emulated(register_address(c,
-   seg_override_base(ctxt, c),
-   c-regs[VCPU_REGS_RSI]),
-   c-dst.val,
-   c-dst.bytes, ctxt-vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   goto done;
-   register_address_increment(c, c-regs[VCPU_REGS_RSI],
-  (ctxt-eflags  EFLG_DF) ? -c-dst.bytes
-  : c-dst.bytes);
-

[COMMIT master] Revert KVM: x86: ignore access permissions for hypercall patching

2010-03-21 Thread Avi Kivity

From: Marcelo Tosatti mtosa...@redhat.com

Its safer to disable the only problematic user of hypercall patching,
pvmmu.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 68e8c89..bb9a24a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3243,17 +3243,12 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
   const void *val,
   unsigned int bytes,
   struct kvm_vcpu *vcpu,
-  bool guest_initiated,
   bool mmu_only)
 {
gpa_t gpa;
u32 error_code;
 
-
-   if (guest_initiated)
-   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
-   else
-   gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code);
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
 
if (gpa == UNMAPPED_GVA) {
kvm_inject_page_fault(vcpu, addr, error_code);
@@ -3292,7 +3287,6 @@ int __emulator_write_emulated(unsigned long addr,
   const void *val,
   unsigned int bytes,
   struct kvm_vcpu *vcpu,
-  bool guest_initiated,
   bool mmu_only)
 {
/* Crossing a page boundary? */
@@ -3301,7 +3295,7 @@ int __emulator_write_emulated(unsigned long addr,
 
now = -addr  ~PAGE_MASK;
rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
-guest_initiated, mmu_only);
+mmu_only);
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
@@ -3309,7 +3303,7 @@ int __emulator_write_emulated(unsigned long addr,
bytes -= now;
}
return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
-  guest_initiated, mmu_only);
+  mmu_only);
 }
 
 int emulator_write_emulated(unsigned long addr,
@@ -3317,7 +3311,7 @@ int emulator_write_emulated(unsigned long addr,
   unsigned int bytes,
   struct kvm_vcpu *vcpu)
 {
-   return __emulator_write_emulated(addr, val, bytes, vcpu, true, false);
+   return __emulator_write_emulated(addr, val, bytes, vcpu, false);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3381,7 +3375,7 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;
 
-   return __emulator_write_emulated(addr, new, bytes, vcpu, true, true);
+   return __emulator_write_emulated(addr, new, bytes, vcpu, true);
 
 emul_write:
printk_once(KERN_WARNING kvm: emulating exchange as write\n);
@@ -4083,8 +4077,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
kvm_x86_ops-patch_hypercall(vcpu, instruction);
 
-   return __emulator_write_emulated(rip, instruction, 3, vcpu,
-false, false);
+   return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
 }
 
 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode, but return to a guest
mode each 1024 iterations to allow interrupt injection. Pending
exception causes immediate guest entry too.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 679245c..7fda16f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -193,6 +193,7 @@ struct x86_emulate_ctxt {
/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
 
+   bool restart; /* restart string instruction after writeback */
/* decode cache */
struct decode_cache decode;
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c20..0467e9f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -927,8 +927,11 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
int mode = ctxt-mode;
int def_op_bytes, def_ad_bytes, group;
 
-   /* Shadow copy of register state. Committed on successful emulation. */
 
+   /* we cannot decode insn before we complete previous rep insn */
+   WARN_ON(ctxt-restart);
+
+   /* Shadow copy of register state. Committed on successful emulation. */
memset(c, 0, sizeof(struct decode_cache));
c-eip = ctxt-eip;
ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS);
@@ -2422,6 +2425,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
u64 msr_data;
struct decode_cache *c = ctxt-decode;
int rc = X86EMUL_CONTINUE;
+   int saved_dst_type = c-dst.type;
 
ctxt-interruptibility = 0;
 
@@ -2450,8 +2454,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c-rep_prefix  (c-d  String)) {
+   ctxt-restart = true;
/* All REP prefixes have the same first termination condition */
if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) {
+   string_done:
+   ctxt-restart = false;
kvm_rip_write(ctxt-vcpu, c-eip);
goto done;
}
@@ -2463,17 +2470,13 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 *  - if REPNE/REPNZ and ZF = 1 then done
 */
if ((c-b == 0xa6) || (c-b == 0xa7) ||
-   (c-b == 0xae) || (c-b == 0xaf)) {
+   (c-b == 0xae) || (c-b == 0xaf)) {
if ((c-rep_prefix == REPE_PREFIX) 
-   ((ctxt-eflags  EFLG_ZF) == 0)) {
-   kvm_rip_write(ctxt-vcpu, c-eip);
-   goto done;
-   }
+   ((ctxt-eflags  EFLG_ZF) == 0))
+   goto string_done;
if ((c-rep_prefix == REPNE_PREFIX) 
-   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF)) {
-   kvm_rip_write(ctxt-vcpu, c-eip);
-   goto done;
-   }
+   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))
+   goto string_done;
}
c-eip = ctxt-eip;
}
@@ -2907,6 +2910,12 @@ writeback:
if (rc != X86EMUL_CONTINUE)
goto done;
 
+   /*
+* restore dst type in case the decoding will be reused
+* (happens for string instruction )
+*/
+   c-dst.type = saved_dst_type;
+
if ((c-d  SrcMask) == SrcSI)
string_addr_inc(ctxt, seg_override_base(ctxt, c), VCPU_REGS_RSI,
c-src);
@@ -2914,8 +2923,11 @@ writeback:
if ((c-d  DstMask) == DstDI)
string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, c-dst);
 
-   if (c-rep_prefix  (c-d  String))
+   if (c-rep_prefix  (c-d  String)) {
register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
+   if (!(c-regs[VCPU_REGS_RCX]  0x3ff))
+   ctxt-restart = false;
+   }
 
/* Commit shadow register state. */
memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b96d629..dede682 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3755,6 +3755,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
return EMULATE_DONE;
}
 
+restart:
r = x86_emulate_insn(vcpu-arch.emulate_ctxt,

[COMMIT master] KVM: small kvm_arch_vcpu_ioctl_run() cleanup.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Unify all conditions that get us back into emulator after returning from
userspace.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dede682..68e8c89 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4543,33 +4543,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
if (!irqchip_in_kernel(vcpu-kvm))
kvm_set_cr8(vcpu, kvm_run-cr8);
 
-   if (vcpu-arch.pio.count) {
-   vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
-   srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
-   if (r == EMULATE_DO_MMIO) {
-   r = 0;
-   goto out;
+   if (vcpu-arch.pio.count || vcpu-mmio_needed ||
+   vcpu-arch.emulate_ctxt.restart) {
+   if (vcpu-mmio_needed) {
+   memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8);
+   vcpu-mmio_read_completed = 1;
+   vcpu-mmio_needed = 0;
}
-   }
-   if (vcpu-mmio_needed) {
-   memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8);
-   vcpu-mmio_read_completed = 1;
-   vcpu-mmio_needed = 0;
-
-   vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = emulate_instruction(vcpu, vcpu-arch.mmio_fault_cr2, 0,
-   EMULTYPE_NO_DECODE);
-   srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
-   if (r == EMULATE_DO_MMIO) {
-   /*
-* Read-modify-write.  Back to userspace.
-*/
-   r = 0;
-   goto out;
-   }
-   }
-   if (vcpu-arch.emulate_ctxt.restart) {
vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: fix in/out emulation.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index bd46929..679245c 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -119,6 +119,13 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
+
+   int (*pio_in_emulated)(int size, unsigned short port, void *val,
+  unsigned int count, struct kvm_vcpu *vcpu);
+
+   int (*pio_out_emulated)(int size, unsigned short port, const void *val,
+   unsigned int count, struct kvm_vcpu *vcpu);
+
bool (*get_cached_descriptor)(struct desc_struct *desc,
  int seg, struct kvm_vcpu *vcpu);
void (*set_cached_descriptor)(struct desc_struct *desc,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b99cec1..776d3e2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -590,8 +590,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 
data);
 
 struct x86_emulate_ctxt;
 
-int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in,
-int size, unsigned port);
+int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
 int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
   int size, unsigned long count, int down,
gva_t address, int rep, unsigned port);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a166235..c506137 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -210,13 +210,13 @@ static u32 opcode_table[256] = {
0, 0, 0, 0, 0, 0, 0, 0,
/* 0xE0 - 0xE7 */
0, 0, 0, 0,
-   ByteOp | SrcImmUByte, SrcImmUByte,
-   ByteOp | SrcImmUByte, SrcImmUByte,
+   ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc,
+   ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc,
/* 0xE8 - 0xEF */
SrcImm | Stack, SrcImm | ImplicitOps,
SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps,
-   SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
-   SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
+   SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
+   SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
/* 0xF0 - 0xF7 */
0, 0, 0, 0,
ImplicitOps | Priv, ImplicitOps, Group | Group3_Byte, Group | Group3,
@@ -2422,8 +2422,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
u64 msr_data;
unsigned long saved_eip = 0;
struct decode_cache *c = ctxt-decode;
-   unsigned int port;
-   int io_dir_in;
int rc = X86EMUL_CONTINUE;
 
ctxt-interruptibility = 0;
@@ -2819,14 +2817,10 @@ special_insn:
break;
case 0xe4:  /* inb */
case 0xe5:  /* in */
-   port = c-src.val;
-   io_dir_in = 1;
-   goto do_io;
+   goto do_io_in;
case 0xe6: /* outb */
case 0xe7: /* out */
-   port = c-src.val;
-   io_dir_in = 0;
-   goto do_io;
+   goto do_io_out;
case 0xe8: /* call (near) */ {
long int rel = c-src.val;
c-src.val = (unsigned long) c-eip;
@@ -2851,25 +2845,29 @@ special_insn:
break;
case 0xec: /* in al,dx */
case 0xed: /* in (e/r)ax,dx */
-   port = c-regs[VCPU_REGS_RDX];
-   io_dir_in = 1;
-   goto do_io;
+   c-src.val = c-regs[VCPU_REGS_RDX];
+   do_io_in:
+   c-dst.bytes = min(c-dst.bytes, 4u);
+   if (!emulator_io_permited(ctxt, ops, c-src.val, c-dst.bytes)) 
{
+   kvm_inject_gp(ctxt-vcpu, 0);
+   goto done;
+   }
+   if (!ops-pio_in_emulated(c-dst.bytes, c-src.val,
+ c-dst.val, 1, ctxt-vcpu))
+   goto done; /* IO is needed */
+   break;
case 0xee: /* out al,dx */
case 0xef: /* out (e/r)ax,dx */
-   port = c-regs[VCPU_REGS_RDX];
-   io_dir_in = 0;
-   do_io:
-   if (!emulator_io_permited(ctxt, ops,

[COMMIT master] KVM: x86 emulator: Move string pio emulation into emulator.c

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Currently emulation is done outside of emulator so things like doing
ins/outs to/from mmio are broken it also makes it hard (if not impossible)
to implement single stepping in the future. The implementation in this
patch is not efficient since it exits to userspace for each IO while
previous implementation did 'ins' in batches. Further patch that
implements pio in string read ahead address this problem.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 776d3e2..26c629a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -224,14 +224,9 @@ struct kvm_pv_mmu_op_buffer {
 
 struct kvm_pio_request {
unsigned long count;
-   int cur_count;
-   gva_t guest_gva;
int in;
int port;
int size;
-   int string;
-   int down;
-   int rep;
 };
 
 /*
@@ -591,9 +586,6 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 
data);
 struct x86_emulate_ctxt;
 
 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
-int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
-  int size, unsigned long count, int down,
-   gva_t address, int rep, unsigned port);
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c506137..b3ff673 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -153,8 +153,8 @@ static u32 opcode_table[256] = {
0, 0, 0, 0,
/* 0x68 - 0x6F */
SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0,
-   SrcNone  | ByteOp  | ImplicitOps, SrcNone  | ImplicitOps, /* insb, 
insw/insd */
-   SrcNone  | ByteOp  | ImplicitOps, SrcNone  | ImplicitOps, /* outsb, 
outsw/outsd */
+   DstDI | ByteOp | Mov | String, DstDI | Mov | String, /* insb, insw/insd 
*/
+   SrcSI | ByteOp | ImplicitOps | String, SrcSI | ImplicitOps | String, /* 
outsb, outsw/outsd */
/* 0x70 - 0x77 */
SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
@@ -2611,47 +2611,29 @@ special_insn:
break;
case 0x6c:  /* insb */
case 0x6d:  /* insw/insd */
+   c-dst.bytes = min(c-dst.bytes, 4u);
if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX],
- (c-d  ByteOp) ? 1 : c-op_bytes)) {
+ c-dst.bytes)) {
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (kvm_emulate_pio_string(ctxt-vcpu,
-   1,
-   (c-d  ByteOp) ? 1 : c-op_bytes,
-   c-rep_prefix ?
-   address_mask(c, c-regs[VCPU_REGS_RCX]) : 1,
-   (ctxt-eflags  EFLG_DF),
-   register_address(c, es_base(ctxt),
-c-regs[VCPU_REGS_RDI]),
-   c-rep_prefix,
-   c-regs[VCPU_REGS_RDX]) == 0) {
-   c-eip = saved_eip;
-   return -1;
-   }
-   return 0;
+   if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX],
+ c-dst.val, 1, ctxt-vcpu))
+   goto done; /* IO is needed, skip writeback */
+   break;
case 0x6e:  /* outsb */
case 0x6f:  /* outsw/outsd */
+   c-src.bytes = min(c-src.bytes, 4u);
if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX],
- (c-d  ByteOp) ? 1 : c-op_bytes)) {
+ c-src.bytes)) {
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (kvm_emulate_pio_string(ctxt-vcpu,
-   0,
-   (c-d  ByteOp) ? 1 : c-op_bytes,
-   c-rep_prefix ?
-   address_mask(c, c-regs[VCPU_REGS_RCX]) : 1,
-   (ctxt-eflags  EFLG_DF),
-register_address(c,
- seg_override_base(ctxt, c),
-c-regs[VCPU_REGS_RSI]),
-   c-rep_prefix,
-   c-regs[VCPU_REGS_RDX]) == 0) {
-   c-eip = saved_eip;
-   return

[COMMIT master] KVM: x86 emulator: introduce pio in string read ahead.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

To optimize rep ins instruction do IO in big chunks ahead of time
instead of doing it only when required during instruction emulation.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 7fda16f..b5e12c5 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -151,6 +151,12 @@ struct fetch_cache {
unsigned long end;
 };
 
+struct read_cache {
+   u8 data[1024];
+   unsigned long pos;
+   unsigned long end;
+};
+
 struct decode_cache {
u8 twobyte;
u8 b;
@@ -178,6 +184,7 @@ struct decode_cache {
void *modrm_ptr;
unsigned long modrm_val;
struct fetch_cache fetch;
+   struct read_cache io_read;
 };
 
 struct x86_emulate_ctxt {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0467e9f..266576c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1257,6 +1257,36 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  unsigned int size, unsigned short port,
+  void *dest)
+{
+   struct read_cache *rc = ctxt-decode.io_read;
+
+   if (rc-pos == rc-end) { /* refill pio read ahead */
+   struct decode_cache *c = ctxt-decode;
+   unsigned int in_page, n;
+   unsigned int count = c-rep_prefix ?
+   address_mask(c, c-regs[VCPU_REGS_RCX]) : 1;
+   in_page = (ctxt-eflags  EFLG_DF) ?
+   offset_in_page(c-regs[VCPU_REGS_RDI]) :
+   PAGE_SIZE - offset_in_page(c-regs[VCPU_REGS_RDI]);
+   n = min(min(in_page, (unsigned int)sizeof(rc-data)) / size,
+   count);
+   if (n == 0)
+   n = 1;
+   rc-pos = rc-end = 0;
+   if (!ops-pio_in_emulated(size, port, rc-data, n, ctxt-vcpu))
+   return 0;
+   rc-end = n * size;
+   }
+
+   memcpy(dest, rc-data + rc-pos, size);
+   rc-pos += size;
+   return 1;
+}
+
 static u32 desc_limit_scaled(struct desc_struct *desc)
 {
u32 limit = get_desc_limit(desc);
@@ -2618,8 +2648,8 @@ special_insn:
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX],
- c-dst.val, 1, ctxt-vcpu))
+   if (!pio_in_emulated(ctxt, ops, c-dst.bytes,
+c-regs[VCPU_REGS_RDX], c-dst.val))
goto done; /* IO is needed, skip writeback */
break;
case 0x6e:  /* outsb */
@@ -2835,8 +2865,8 @@ special_insn:
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (!ops-pio_in_emulated(c-dst.bytes, c-src.val,
- c-dst.val, 1, ctxt-vcpu))
+   if (!pio_in_emulated(ctxt, ops, c-dst.bytes, c-src.val,
+c-dst.val))
goto done; /* IO is needed */
break;
case 0xee: /* out al,dx */
@@ -2924,8 +2954,14 @@ writeback:
string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, c-dst);
 
if (c-rep_prefix  (c-d  String)) {
+   struct read_cache *rc = ctxt-decode.io_read;
register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
-   if (!(c-regs[VCPU_REGS_RCX]  0x3ff))
+   /*
+* Re-enter guest when pio read ahead buffer is empty or,
+* if it is not used, after each 1024 iteration.
+*/
+   if ((rc-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
+   (rc-end != 0  rc-end == rc-pos))
ctxt-restart = false;
}
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: fix 0f 01 /5 emulation

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

It is undefined and should generate #UD.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c3b9334..7c7debb 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2490,6 +2490,9 @@ twobyte_insn:
(c-src.val  0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
+   case 5: /* not defined */
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
case 7: /* invlpg*/
emulate_invlpg(ctxt-vcpu, memop);
/* Disable writeback. */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: populate OP_MEM operand during decoding.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

All struct operand fields are initialized during decoding for all
operand types except OP_MEM, but there is no reason for that. Move
OP_MEM operand initialization into decoding stage for consistency.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 702bfff..55b8a8b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1057,6 +1057,10 @@ done_prefixes:
 
if (c-ad_bytes != 8)
c-modrm_ea = (u32)c-modrm_ea;
+
+   if (c-rip_relative)
+   c-modrm_ea += c-eip;
+
/*
 * Decode and fetch the source operand: register, memory
 * or immediate.
@@ -1091,6 +1095,8 @@ done_prefixes:
break;
}
c-src.type = OP_MEM;
+   c-src.ptr = (unsigned long *)c-modrm_ea;
+   c-src.val = 0;
break;
case SrcImm:
case SrcImmU:
@@ -1169,8 +1175,10 @@ done_prefixes:
c-src2.val = 1;
break;
case Src2Mem16:
-   c-src2.bytes = 2;
c-src2.type = OP_MEM;
+   c-src2.bytes = 2;
+   c-src2.ptr = (unsigned long *)(c-modrm_ea + c-src.bytes);
+   c-src2.val = 0;
break;
}
 
@@ -1192,6 +1200,15 @@ done_prefixes:
break;
}
c-dst.type = OP_MEM;
+   c-dst.ptr = (unsigned long *)c-modrm_ea;
+   c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
+   c-dst.val = 0;
+   if (c-d  BitOp) {
+   unsigned long mask = ~(c-dst.bytes * 8 - 1);
+
+   c-dst.ptr = (void *)c-dst.ptr +
+  (c-src.val  mask) / 8;
+   }
break;
case DstAcc:
c-dst.type = OP_REG;
@@ -1215,9 +1232,6 @@ done_prefixes:
break;
}
 
-   if (c-rip_relative)
-   c-modrm_ea += c-eip;
-
 done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
@@ -1638,14 +1652,13 @@ static inline int emulate_grp45(struct x86_emulate_ctxt 
*ctxt,
 }
 
 static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt,
-  struct x86_emulate_ops *ops,
-  unsigned long memop)
+  struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
u64 old, new;
int rc;
 
-   rc = ops-read_emulated(memop, old, 8, ctxt-vcpu);
+   rc = ops-read_emulated(c-modrm_ea, old, 8, ctxt-vcpu);
if (rc != X86EMUL_CONTINUE)
return rc;
 
@@ -1660,7 +1673,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
new = ((u64)c-regs[VCPU_REGS_RCX]  32) |
   (u32) c-regs[VCPU_REGS_RBX];
 
-   rc = ops-cmpxchg_emulated(memop, old, new, 8, ctxt-vcpu);
+   rc = ops-cmpxchg_emulated(c-modrm_ea, old, new, 8, 
ctxt-vcpu);
if (rc != X86EMUL_CONTINUE)
return rc;
ctxt-eflags |= EFLG_ZF;
@@ -2378,7 +2391,6 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
-   unsigned long memop = 0;
u64 msr_data;
unsigned long saved_eip = 0;
struct decode_cache *c = ctxt-decode;
@@ -2413,9 +2425,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
goto done;
}
 
-   if (((c-d  ModRM)  (c-modrm_mod != 3)) || (c-d  MemAbs))
-   memop = c-modrm_ea;
-
if (c-rep_prefix  (c-d  String)) {
/* All REP prefixes have the same first termination condition */
if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) {
@@ -2447,8 +2456,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c-src.type == OP_MEM) {
-   c-src.ptr = (unsigned long *)memop;
-   c-src.val = 0;
rc = ops-read_emulated((unsigned long)c-src.ptr,
c-src.val,
c-src.bytes,
@@ -2459,8 +2466,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c-src2.type == OP_MEM) {
-   c-src2.ptr = (unsigned long *)(memop + c-src.bytes);
-   c-src2.val = 0;
rc = ops-read_emulated((unsigned long)c-src2.ptr,
c-src2.val,
c-src2.bytes,
@@ -2473,25 +2478,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
goto special_insn;
 
 
-

[COMMIT master] KVM: x86 emulator: Emulate task switch in emulator.c

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Implement emulation of 16/32 bit task switch in emulator.c

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f901467..bd46929 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -11,6 +11,8 @@
 #ifndef _ASM_X86_KVM_X86_EMULATE_H
 #define _ASM_X86_KVM_X86_EMULATE_H
 
+#include asm/desc_defs.h
+
 struct x86_emulate_ctxt;
 
 /*
@@ -210,5 +212,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops);
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt,
 struct x86_emulate_ops *ops);
+int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+u16 tss_selector, int reason);
 
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d696cbd..db4776c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -33,6 +33,7 @@
 #include asm/kvm_emulate.h
 
 #include x86.h
+#include tss.h
 
 /*
  * Opcode effective-address decode tables.
@@ -1221,6 +1222,198 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static u32 desc_limit_scaled(struct desc_struct *desc)
+{
+   u32 limit = get_desc_limit(desc);
+
+   return desc-g ? (limit  12) | 0xfff : limit;
+}
+
+static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+u16 selector, struct desc_ptr *dt)
+{
+   if (selector  1  2) {
+   struct desc_struct desc;
+   memset (dt, 0, sizeof *dt);
+   if (!ops-get_cached_descriptor(desc, VCPU_SREG_LDTR, 
ctxt-vcpu))
+   return;
+
+   dt-size = desc_limit_scaled(desc); /* what if limit  65535? 
*/
+   dt-address = get_desc_base(desc);
+   } else
+   ops-get_gdt(dt, ctxt-vcpu);
+}
+
+/* allowed just for 8 bytes segments */
+static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  u16 selector, struct desc_struct *desc)
+{
+   struct desc_ptr dt;
+   u16 index = selector  3;
+   int ret;
+   u32 err;
+   ulong addr;
+
+   get_descriptor_table_ptr(ctxt, ops, selector, dt);
+
+   if (dt.size  index * 8 + 7) {
+   kvm_inject_gp(ctxt-vcpu, selector  0xfffc);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+   addr = dt.address + index * 8;
+   ret = ops-read_std(addr, desc, sizeof *desc, ctxt-vcpu,  err);
+   if (ret == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt-vcpu, addr, err);
+
+   return ret;
+}
+
+/* allowed just for 8 bytes segments */
+static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops,
+   u16 selector, struct desc_struct *desc)
+{
+   struct desc_ptr dt;
+   u16 index = selector  3;
+   u32 err;
+   ulong addr;
+   int ret;
+
+   get_descriptor_table_ptr(ctxt, ops, selector, dt);
+
+   if (dt.size  index * 8 + 7) {
+   kvm_inject_gp(ctxt-vcpu, selector  0xfffc);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+
+   addr = dt.address + index * 8;
+   ret = ops-write_std(addr, desc, sizeof *desc, ctxt-vcpu, err);
+   if (ret == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt-vcpu, addr, err);
+
+   return ret;
+}
+
+static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  u16 selector, int seg)
+{
+   struct desc_struct seg_desc;
+   u8 dpl, rpl, cpl;
+   unsigned err_vec = GP_VECTOR;
+   u32 err_code = 0;
+   bool null_selector = !(selector  ~0x3); /* -0003 are null */
+   int ret;
+
+   memset(seg_desc, 0, sizeof seg_desc);
+
+   if ((seg = VCPU_SREG_GS  ctxt-mode == X86EMUL_MODE_VM86)
+   || ctxt-mode == X86EMUL_MODE_REAL) {
+   /* set real mode segment descriptor */
+   set_desc_base(seg_desc, selector  4);
+   set_desc_limit(seg_desc, 0x);
+   seg_desc.type = 3;
+   seg_desc.p = 1;
+   seg_desc.s = 1;
+   goto load;
+   }
+
+   /* NULL selector is not valid for TR, CS and SS */
+   if ((seg == VCPU_SREG_CS || seg == VCPU_SREG_SS || seg == VCPU_SREG_TR)
+null_selector)
+   goto exception;
+
+   /* TR should be in GDT only */
+   if (seg == VCPU_SREG_TR  (selector  (1  2)))
+   goto exception;
+
+

[COMMIT master] KVM: x86 emulator: do not call writeback if msr access fails.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1393bf0..b89a8f2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2563,7 +2563,7 @@ twobyte_insn:
| ((u64)c-regs[VCPU_REGS_RDX]  32);
if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) {
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = ctxt-eip;
+   goto done;
}
rc = X86EMUL_CONTINUE;
c-dst.type = OP_NONE;
@@ -2572,7 +2572,7 @@ twobyte_insn:
/* rdmsr */
if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) 
{
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = ctxt-eip;
+   goto done;
} else {
c-regs[VCPU_REGS_RAX] = (u32)msr_data;
c-regs[VCPU_REGS_RDX] = msr_data  32;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 2666d7a..0c5caa4 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -108,7 +108,8 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
-
+   ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
+   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53f5202..9d474c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -586,8 +586,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, 
unsigned long address);
 void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
   unsigned long *rflags);
 
-unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 91450b5..5b060e4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2483,7 +2483,7 @@ twobyte_insn:
break;
case 4: /* smsw */
c-dst.bytes = 2;
-   c-dst.val = realmode_get_cr(ctxt-vcpu, 0);
+   c-dst.val = ops-get_cr(0, ctxt-vcpu);
break;
case 6: /* lmsw */
realmode_lmsw(ctxt-vcpu, (u16)c-src.val,
@@ -2519,8 +2519,7 @@ twobyte_insn:
case 0x20: /* mov cr, reg */
if (c-modrm_mod != 3)
goto cannot_emulate;
-   c-regs[c-modrm_rm] =
-   realmode_get_cr(ctxt-vcpu, c-modrm_reg);
+   c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
@@ -2534,7 +2533,7 @@ twobyte_insn:
case 0x22: /* mov reg, cr */
if (c-modrm_mod != 3)
goto cannot_emulate;
-   realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val);
+   ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu);
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77f0955..b9ace70 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3423,12 +3423,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu 
*vcpu, const char *context)
 }
 EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
 
+static u64 mk_cr_64(u64 curr_cr, u32 new_val)
+{
+   return (curr_cr  ~((1ULL  32) - 1)) | new_val;
+}
+
+static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
+{
+   unsigned long value;
+
+   switch (cr) {
+   case 0:
+   value = kvm_read_cr0(vcpu);
+   break;
+   case 2:
+   value = vcpu-arch.cr2;
+   break;
+   case 3:
+   value = vcpu-arch.cr3;
+   break;
+   case 4:
+   value = kvm_read_cr4(vcpu);
+   break;
+   case 8:
+   value = kvm_get_cr8(vcpu);
+   break;
+   default:
+   vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr);
+   return 0;
+   }
+
+   return value;
+}
+
+static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu)
+{
+   switch (cr) {
+   case 0:
+   kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
+   break;
+   case 2:
+   vcpu-arch.cr2 = val;
+   break;
+   case 3:
+   kvm_set_cr3(vcpu, val);
+   break;
+   case 4:
+   kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val));
+   break;
+   case 8:
+   kvm_set_cr8(vcpu, val  0xfUL);
+   break;
+   default:
+   vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr);
+   }
+}
+
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.fetch   = kvm_fetch_guest_virt,
.read_emulated   = emulator_read_emulated,
.write_emulated  =

[COMMIT master] KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling

2010-03-21 Thread Avi Kivity

From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
mmio ring page and dev even after it has freed them.

Also, if this function fails, though it might be rare, it seems to be
suggesting the system's serious state: so we'd better stop the works
following the kvm_creat_vm().

This patch clears these problems.

  We move the coalesced mmio's initialization out of kvm_create_vm().
  This seems to be natural because it includes a registration which
  can be done only when vm is successfully created.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 22500d4..66a7391 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
return ret;
 
 out_free_dev:
+   kvm-coalesced_mmio_dev = NULL;
kfree(dev);
 out_free_page:
+   kvm-coalesced_mmio_ring = NULL;
__free_page(page);
 out_err:
return ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8c3743c..9379533 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -418,9 +418,6 @@ static struct kvm *kvm_create_vm(void)
spin_lock(kvm_lock);
list_add(kvm-vm_list, vm_list);
spin_unlock(kvm_lock);
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   kvm_coalesced_mmio_init(kvm);
-#endif
 out:
return kvm;
 
@@ -1746,12 +1743,19 @@ static struct file_operations kvm_vm_fops = {
 
 static int kvm_dev_ioctl_create_vm(void)
 {
-   int fd;
+   int fd, r;
struct kvm *kvm;
 
kvm = kvm_create_vm();
if (IS_ERR(kvm))
return PTR_ERR(kvm);
+#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
+   r = kvm_coalesced_mmio_init(kvm);
+   if (r  0) {
+   kvm_put_kvm(kvm);
+   return r;
+   }
+#endif
fd = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR);
if (fd  0)
kvm_put_kvm(kvm);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field
are ignored. Interestingly enough older spec says that 11 is only valid
encoding.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7c7debb..fa4604e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2520,28 +2520,20 @@ twobyte_insn:
c-dst.type = OP_NONE;
break;
case 0x20: /* mov cr, reg */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
goto cannot_emulate;
rc = X86EMUL_CONTINUE;
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu);
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
goto cannot_emulate;
rc = X86EMUL_CONTINUE;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: inject #UD on access to non-existing CR

2010-03-21 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fa4604e..836e97b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2520,6 +2520,13 @@ twobyte_insn:
c-dst.type = OP_NONE;
break;
case 0x20: /* mov cr, reg */
+   switch (c-modrm_reg) {
+   case 1:
+   case 5 ... 7:
+   case 9 ... 15:
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
+   }
c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: MMU: Reinstate pte prefetch on invlpg

2010-03-21 Thread Avi Kivity

From: Avi Kivity a...@redhat.com

Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:

  The processor may create entries in paging-structure caches for
   translations required for prefetches and for accesses that are a
   result of speculative execution that would never actually occur
   in the executed code path.

And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.

Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter.  If a race occured, do not install the pte.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ea1b6c6..28826c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -389,6 +389,7 @@ struct kvm_arch {
unsigned int n_free_mmu_pages;
unsigned int n_requested_mmu_pages;
unsigned int n_alloc_mmu_pages;
+   atomic_t invlpg_counter;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
 * Hash table of struct kvm_mmu_page.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f63c9ad..b3edc46 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2609,20 +2609,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int flooded = 0;
int npte;
int r;
+   int invlpg_counter;
 
pgprintk(%s: gpa %llx bytes %d\n, __func__, gpa, bytes);
 
-   switch (bytes) {
-   case 4:
-   gentry = *(const u32 *)new;
-   break;
-   case 8:
-   gentry = *(const u64 *)new;
-   break;
-   default:
-   gentry = 0;
-   break;
-   }
+   invlpg_counter = atomic_read(vcpu-kvm-arch.invlpg_counter);
 
/*
 * Assume that the pte write on a page table of the same type
@@ -2630,16 +2621,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 * (might be false while changing modes).  Note it is verified later
 * by update_pte().
 */
-   if (is_pae(vcpu)  bytes == 4) {
+   if ((is_pae(vcpu)  bytes == 4) || !new) {
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   gpa = ~(gpa_t)7;
-   r = kvm_read_guest(vcpu-kvm, gpa, gentry, 8);
+   if (is_pae(vcpu)) {
+   gpa = ~(gpa_t)7;
+   bytes = 8;
+   }
+   r = kvm_read_guest(vcpu-kvm, gpa, gentry, min(bytes, 8));
if (r)
gentry = 0;
+   new = (const u8 *)gentry;
+   }
+
+   switch (bytes) {
+   case 4:
+   gentry = *(const u32 *)new;
+   break;
+   case 8:
+   gentry = *(const u64 *)new;
+   break;
+   default:
+   gentry = 0;
+   break;
}
 
mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
spin_lock(vcpu-kvm-mmu_lock);
+   if (atomic_read(vcpu-kvm-arch.invlpg_counter) != invlpg_counter)
+   gentry = 0;
kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
++vcpu-kvm-stat.mmu_pte_write;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 4b37e1a..067797a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -463,6 +463,7 @@ out_unlock:
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
struct kvm_shadow_walk_iterator iterator;
+   gpa_t pte_gpa = -1;
int level;
u64 *sptep;
int need_flush = 0;
@@ -476,6 +477,10 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
if (level == PT_PAGE_TABLE_LEVEL  ||
((level == PT_DIRECTORY_LEVEL  is_large_pte(*sptep))) ||
((level == PT_PDPE_LEVEL  is_large_pte(*sptep {
+   struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+   pte_gpa = (sp-gfn  PAGE_SHIFT);
+   pte_gpa += (sptep - sp-spt) * sizeof(pt_element_t);
 
if (is_shadow_present_pte(*sptep)) {
rmap_remove(vcpu-kvm, sptep);
@@ -493,7 +498,17 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 
if (need_flush)
kvm_flush_remote_tlbs(vcpu-kvm);
+
+   atomic_inc(vcpu-kvm-arch.invlpg_counter);
+
spin_unlock(vcpu-kvm-mmu_lock);
+
+   if (pte_gpa == -1)
+   return;
+
+   if (mmu_topup_memory_caches(vcpu))
+   return;
+   kvm_mmu_pte_write(vcpu, pte_gpa, NULL, sizeof(pt_element_t), 0);
 }
 
 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr,

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Anthony Liguori


On 03/21/2010 04:54 PM, Ingo Molnar wrote:

* Avi Kivitya...@redhat.com  wrote:

   

On 03/21/2010 10:55 PM, Ingo Molnar wrote:
 

Of course you could say the following:

   ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
 able to add this to the v2.6.35 kernel queue anymore as the ongoing
 usability work already takes up all of the project's maintainer and
 testing bandwidth. If you want the feature to be merged sooner than that
 then please help us cut down on the TODO and BUGS list that can be found
 at XYZ. There's quite a few low hanging fruits there. '
   

That would be shooting at my own foot as well as the contributor's since I
badly want that RCU stuff, and while a GUI would be nice, that itch isn't on
my back.
 

I think this sums up the root cause of all the problems i see with KVM pretty
well.
   


A good maintainer has to strike a balance between asking more of people 
than what they initially volunteer and getting people to implement the 
less fun things that are nonetheless required.  The kernel can take this 
to an extreme because at the end of the day, it's the only game in town 
and there is an unending number of potential volunteers.  Most other 
projects are not quite as fortunate.


When someone submits a patch set to QEMU implementing a new network 
backend for raw sockets, we can push back about how it fits into the 
entire stack wrt security, usability, etc.  Ultimately, we can arrive at 
a different, more user friendly solution (networking helpers) and along 
with some time investment on my part, we can create a much nicer, more 
user friendly solution.  Still command line based though.


Responding to such a patch set with, replace the SDL front end with a 
GTK one that lets you graphically configure networking, is not 
reasonable and the result would be one less QEMU contributor in the long 
run.


Overtime, we can, and are, pushing people to focus more on usability.  
But that doesn't get you a first class GTK GUI overnight.  The only way 
you're going to get that is by having a contributor be specifically 
interesting in building such a thing.


We simply haven't had that in the past 5 years that I've been involved 
in the project.  If someone stepped up to build this, I'd certainly 
support it in every way possible and there are probably some steps we 
could take to even further encourage this.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Anthony Liguori


On 03/21/2010 05:00 PM, Ingo Molnar wrote:

If that is the theory then it has failed to trickle through in practice. As
you know i have reported a long list of usability problems with hardly a look.
That list could be created by pretty much anyone spending a few minutes of
getting a first impression with qemu-kvm.
   


Can you transfer your list to the following wiki page:

http://wiki.qemu.org/Features/Usability

This thread is so large that I can't find your note that contained the 
initial list.


I want to make sure this input doesn't die once this thread settles down.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [PATCH] KVM-Test: Add kvm userspace unit test

2010-03-21 Thread Shuxi Shang

OK, I approve of your suggestion.

- Lucas Meneghel Rodrigues l...@redhat.com 写道：

 I have an update about this test after talking to Naphtali Sprei:
 
 This patch does the unit testing using the old way of invoking it,
 and
 Avi superseded it with a new -kernel option. Naphtali is working in
 making the new way of doing the test work, so I will wait until we
 can
 merge both ways of doing this test, OK?
 
 On Thu, Mar 18, 2010 at 12:16 AM, Lucas Meneghel Rodrigues
 l...@redhat.com wrote:
  Hi Shuxi, sorry that it took so long before I could give you return
 on this one.
 
  The general idea is just fine, but there is one gotcha that will
 need
  more thought: This is dependent of having the KVM source code for
  testing (ie, it depends on the build test *and* the build mode has
 to
  involve source code, such as git builds, things like koji install
 will
  also not work). Since by default we are not making the tests
 depending
  directly on build, so we have to figure out a way to have this
  integrated without breaking things for users who are not interested
 to
  run the build test.
 
  Today I was reviewing the qemu-img functional test, so it occurred
 to
  me that all those tests that do not depend on guests and different
  qemu command line options, we can make them all dependent on the
 build
  test. This way we'd have the separation that we need, still not
  breaking anything for users that do not care about build and other
  types of test.
 
  Michael, what do you think? Should we put the config of tests like
  this one and qemu_img on build.cfg, making them depend on build?
 
  Oh Shuxi, on the code below I have some small comments to make:
 
  On Fri, Mar 5, 2010 at 3:22 AM, sshang ssh...@redhat.com wrote:
   The test use kvm test harness kvmctl load binary test case file to
 test various function of kvm kernel module.
 
  Signed-off-by: sshang ssh...@redhat.com
  ---
   client/tests/kvm/tests/unit_test.py    |   29
 +
   client/tests/kvm/tests_base.cfg.sample |    7 +++
   2 files changed, 36 insertions(+), 0 deletions(-)
   create mode 100644 client/tests/kvm/tests/unit_test.py
 
  diff --git a/client/tests/kvm/tests/unit_test.py
 b/client/tests/kvm/tests/unit_test.py
  new file mode 100644
  index 000..9bc7441
  --- /dev/null
  +++ b/client/tests/kvm/tests/unit_test.py
  @@ -0,0 +1,29 @@
  +import os
  +from autotest_lib.client.bin import utils
  +from autotest_lib.client.common_lib import error
  +
  +def run_unit_test(test, params, env):
  +    
  +    This is kvm userspace unit test, use kvm test harness kvmctl
 load binary
  +    test case file to test various function of kvm kernel module.
  +    The output of all unit test can be found in the test result
 dir.
  +    
  +
  +    case_list = params.get(case_list,access apic emulator
 hypercall irq\
  +               port80 realmode sieve smptest tsc stringio
 vmexit).split()
  +    srcdir = params.get(srcdir,test.srcdir)
  +    user_dir = os.path.join(srcdir,kvm_userspace/kvm/user)
  +    os.chdir(user_dir)
  +    test_fail_list = []
  +
  +    for i in case_list:
  +        result_file = test.outputdir + / + i
  +        testfile = i + .flat
  +        results = utils.system(./kvmctl test/x86/bootstrap
 test/x86/ + \
  +                     testfile ++
 result_file,ignore_status=True)
 
  About the above statement: In general you should not use shell
  redirection to write the output of your program to the log files.
  Please take advantage of the fact utils.run allow you to connect
  stdout and stderr pipes to the result file. Also, utils.run return
 a
  CmdResult object, hat has a list of useful properties out of it.
 
  +        if results != 0:
  +            test_fail_list.append(i)
  +
  +    if test_fail_list:
  +        raise error.TestFail(  +  .join(test_fail_list) + \
  +                                    )
 
  In the above, you could just have used
 
         raise error.TestFail(KVM module unit test failed. Test
 cases
  failed: %s % test_fail_list)
 
  IMHO it's easier to understand.
 
  diff --git a/client/tests/kvm/tests_base.cfg.sample
 b/client/tests/kvm/tests_base.cfg.sample
  index 040d0c3..0918c26 100644
  --- a/client/tests/kvm/tests_base.cfg.sample
  +++ b/client/tests/kvm/tests_base.cfg.sample
  @@ -300,6 +300,13 @@ variants:
          shutdown_method = shell
          kill_vm = yes
          kill_vm_gracefully = no
  +
  +    - unit_test:
  +        type = unit_test
  +        case_list = access apic emulator hypercall msr port80
 realmode sieve smptest tsc stringio vmexit
  +        #srcdir should be same as build.cfg
  +        srcdir =
  +        vms = ''
      # Do not define test variants below shutdown
 
 
  --
  1.5.5.6
 
  ___
  Autotest mailing list
  autot...@test.kernel.org
  http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
 
 
 
 
  --
  Lucas
 
 
 
 
 -- 
 Lucas
--
To unsubscribe from this list: send

[KVM-AUTOTEST PATCH 1/5] KVM test: kvm_preprocessing.py: minor style corrections

2010-03-21 Thread Michael Goldish

Also, fetch the KVM version before setting up the VMs.

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py |   58 +++-
 1 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index e91d1da..e3a5501 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -58,8 +58,8 @@ def preprocess_vm(test, params, env, name):
 for_migration = False
 
 if params.get(start_vm_for_migration) == yes:
-logging.debug('start_vm_for_migration' specified; (re)starting VM 
with
-   -incoming option...)
+logging.debug('start_vm_for_migration' specified; (re)starting VM 
+  with -incoming option...)
 start_vm = True
 for_migration = True
 elif params.get(restart_vm) == yes:
@@ -187,12 +187,12 @@ def preprocess(test, params, env):
 @param env: The environment (a dict-like object).
 
 # Start tcpdump if it isn't already running
-if not env.has_key(address_cache):
+if address_cache not in env:
 env[address_cache] = {}
-if env.has_key(tcpdump) and not env[tcpdump].is_alive():
+if tcpdump in env and not env[tcpdump].is_alive():
 env[tcpdump].close()
 del env[tcpdump]
-if not env.has_key(tcpdump):
+if tcpdump not in env:
 command = /usr/sbin/tcpdump -npvi any 'dst port 68'
 logging.debug(Starting tcpdump (%s)..., command)
 env[tcpdump] = kvm_subprocess.kvm_tail(
@@ -208,35 +208,23 @@ def preprocess(test, params, env):
 
 # Destroy and remove VMs that are no longer needed in the environment
 requested_vms = kvm_utils.get_sub_dict_names(params, vms)
-for key in env.keys():
+for key in env:
 vm = env[key]
 if not kvm_utils.is_vm(vm):
 continue
 if not vm.name in requested_vms:
-logging.debug(VM '%s' found in environment but not required for
-   test; removing it... % vm.name)
+logging.debug(VM '%s' found in environment but not required for 
+  test; removing it... % vm.name)
 vm.destroy()
 del env[key]
 
-# Execute any pre_commands
-if params.get(pre_command):
-process_command(test, params, env, params.get(pre_command),
-int(params.get(pre_command_timeout, 600)),
-params.get(pre_command_noncritical) == yes)
-
-# Preprocess all VMs and images
-process(test, params, env, preprocess_image, preprocess_vm)
-
 # Get the KVM kernel module version and write it as a keyval
 logging.debug(Fetching KVM module version...)
 if os.path.exists(/dev/kvm):
-kvm_version = os.uname()[2]
 try:
-file = open(/sys/module/kvm/version, r)
-kvm_version = file.read().strip()
-file.close()
+kvm_version = open(/sys/module/kvm/version).read().strip()
 except:
-pass
+kvm_version = os.uname()[2]
 else:
 kvm_version = Unknown
 logging.debug(KVM module not loaded)
@@ -248,16 +236,24 @@ def preprocess(test, params, env):
 qemu_path = kvm_utils.get_path(test.bindir, params.get(qemu_binary,
qemu))
 version_line = commands.getoutput(%s -help | head -n 1 % qemu_path)
-exp = re.compile([Vv]ersion .*?,)
-match = exp.search(version_line)
-if match:
-kvm_userspace_version =  .join(match.group().split()[1:]).strip(,)
+matches = re.findall([Vv]ersion .*?,, version_line)
+if matches:
+kvm_userspace_version =  .join(matches[0].split()[1:]).strip(,)
 else:
 kvm_userspace_version = Unknown
 logging.debug(Could not fetch KVM userspace version)
 logging.debug(KVM userspace version: %s % kvm_userspace_version)
 test.write_test_keyval({kvm_userspace_version: kvm_userspace_version})
 
+# Execute any pre_commands
+if params.get(pre_command):
+process_command(test, params, env, params.get(pre_command),
+int(params.get(pre_command_timeout, 600)),
+params.get(pre_command_noncritical) == yes)
+
+# Preprocess all VMs and images
+process(test, params, env, preprocess_image, preprocess_vm)
+
 
 def postprocess(test, params, env):
 
@@ -276,8 +272,8 @@ def postprocess(test, params, env):
 
 # Should we convert PPM files to PNG format?
 if params.get(convert_ppm_files_to_png) == yes:
-logging.debug('convert_ppm_files_to_png' specified; converting PPM
-   files to PNG format...)
+logging.debug('convert_ppm_files_to_png' specified; converting PPM 
+  files to PNG format...)
 try:
 for f in

[KVM-AUTOTEST PATCH 2/5] KVM test: kvm.py: make sure all dump_env() calls are inside 'finally' blocks

2010-03-21 Thread Michael Goldish

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm.py |   29 +++--
 1 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
index 9b8a10c..c6e146d 100644
--- a/client/tests/kvm/kvm.py
+++ b/client/tests/kvm/kvm.py
@@ -21,6 +21,7 @@ class kvm(test.test):
 (Online doc - Getting started with KVM testing)
 
 version = 1
+
 def run_once(self, params):
 # Report the parameters we've received and write them as keyvals
 logging.debug(Test parameters:)
@@ -33,7 +34,7 @@ class kvm(test.test):
 # Open the environment file
 env_filename = os.path.join(self.bindir, params.get(env, env))
 env = kvm_utils.load_env(env_filename, {})
-logging.debug(Contents of environment: %s % str(env))
+logging.debug(Contents of environment: %s, str(env))
 
 try:
 try:
@@ -50,22 +51,30 @@ class kvm(test.test):
 f.close()
 
 # Preprocess
-kvm_preprocessing.preprocess(self, params, env)
-kvm_utils.dump_env(env, env_filename)
+try:
+kvm_preprocessing.preprocess(self, params, env)
+finally:
+kvm_utils.dump_env(env, env_filename)
 # Run the test function
 run_func = getattr(test_module, run_%s % t_type)
-run_func(self, params, env)
-kvm_utils.dump_env(env, env_filename)
+try:
+run_func(self, params, env)
+finally:
+kvm_utils.dump_env(env, env_filename)
 
 except Exception, e:
 logging.error(Test failed: %s, e)
 logging.debug(Postprocessing on error...)
-kvm_preprocessing.postprocess_on_error(self, params, env)
-kvm_utils.dump_env(env, env_filename)
+try:
+kvm_preprocessing.postprocess_on_error(self, params, env)
+finally:
+kvm_utils.dump_env(env, env_filename)
 raise
 
 finally:
 # Postprocess
-kvm_preprocessing.postprocess(self, params, env)
-logging.debug(Contents of environment: %s, str(env))
-kvm_utils.dump_env(env, env_filename)
+try:
+kvm_preprocessing.postprocess(self, params, env)
+finally:
+kvm_utils.dump_env(env, env_filename)
+logging.debug(Contents of environment: %s, str(env))
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH 4/5] KVM test: make kvm_stat usage optional

2010-03-21 Thread Michael Goldish

Relying on the test tag is not cool.  Use a dedicated parameter instead.
By default, all tests except build tests will use kvm_stat.

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_utils.py  |8 
 client/tests/kvm/tests_base.cfg.sample |3 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index cc39b5d..5834539 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -845,8 +845,8 @@ def run_tests(test_list, job):
 @return: True, if all tests ran passed, False if any of them failed.
 
 status_dict = {}
-
 failed = False
+
 for dict in test_list:
 if dict.get(skip) == yes:
 continue
@@ -863,12 +863,12 @@ def run_tests(test_list, job):
 test_tag = dict.get(shortname)
 # Setting up kvm_stat profiling during test execution.
 # We don't need kvm_stat profiling on the build tests.
-if build in test_tag:
+if dict.get(run_kvm_stat) == yes:
+profile = True
+else:
 # None because it's the default value on the base_test class
 # and the value None is specifically checked there.
 profile = None
-else:
-profile = True
 
 if profile:
 job.profilers.add('kvm_stat')
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 9963a44..b13aec4 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -40,6 +40,9 @@ nic_mode = user
 nic_script = scripts/qemu-ifup
 address_index = 0
 
+# Misc
+run_kvm_stat = yes
+
 
 # Tests
 variants:
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH 3/5] KVM test: kvm_utils.load_env(): do not fail if env file is corrupted

2010-03-21 Thread Michael Goldish

- Include the unpickling code in the 'try' block, so that an exception raised
  during unpickling will not fail the test.
- Change the default env (returned by load_env() when the file is missing or
  corrupt) to {}.

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_utils.py |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index d386456..cc39b5d 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -22,7 +22,7 @@ def dump_env(obj, filename):
 file.close()
 
 
-def load_env(filename, default=None):
+def load_env(filename, default={}):
 
 Load KVM test environment from an environment file.
 
@@ -30,11 +30,13 @@ def load_env(filename, default=None):
 
 try:
 file = open(filename, r)
+obj = cPickle.load(file)
+file.close()
+return obj
+# Almost any exception can be raised during unpickling, so let's catch
+# them all
 except:
 return default
-obj = cPickle.load(file)
-file.close()
-return obj
 
 
 def get_sub_dict(dict, name):
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH 5/5] KVM test: take frequent screendumps during all tests

2010-03-21 Thread Michael Goldish

Screendumps are taken regularly and converted to JPEG format.
They are stored in .../debug/screendumps_VMname/.
Requires python-imaging.

- Enabled by 'take_regular_screendumps = yes' (naming suggestions welcome).
- Delay between screendumps is controlled by 'screendump_delay' (default 5).
- Compression quality is controlled by 'screendump_quality' (default 30).
- It's probably a good idea to dump them to /dev/shm before converting them
  in order to minimize disk use.  This can be enabled by
  'screendump_temp_dir = /dev/shm' (commented out by default because I'm not
  sure /dev/shm is available on all machines.)
- Screendumps are removed unless 'keep_screendumps'['_on_error'] is 'yes'.
  The recommended setting when submitting jobs from autoserv is
  'keep_screendumps_on_error = yes', which means screendumps are kept only if
  the test fails.  Keeping all screendumps may use up all of the server's
  storage space.

This patch sets reasonable defaults in tests_base.cfg.sample.

(It also makes sure post_command is executed last in the postprocessing
procedure -- otherwise post_command failure can prevent other postprocessing
steps (like removing the screendump dirs) from taking place.)

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py  |   85 +--
 client/tests/kvm/tests_base.cfg.sample |   13 -
 2 files changed, 89 insertions(+), 9 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index e3a5501..0e4ce87 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -1,4 +1,4 @@
-import sys, os, time, commands, re, logging, signal, glob
+import sys, os, time, commands, re, logging, signal, glob, threading, shutil
 from autotest_lib.client.bin import test, utils
 from autotest_lib.client.common_lib import error
 import kvm_vm, kvm_utils, kvm_subprocess, ppm_utils
@@ -11,6 +11,10 @@ except ImportError:
 'distro.')
 
 
+_screendump_thread = None
+_screendump_thread_termination_event = None
+
+
 def preprocess_image(test, params):
 
 Preprocess a single QEMU image according to the instructions in params.
@@ -254,6 +258,14 @@ def preprocess(test, params, env):
 # Preprocess all VMs and images
 process(test, params, env, preprocess_image, preprocess_vm)
 
+# Start the screendump thread
+if params.get(take_regular_screendumps) == yes:
+global _screendump_thread, _screendump_thread_termination_event
+_screendump_thread_termination_event = threading.Event()
+_screendump_thread = threading.Thread(target=_take_screendumps,
+  args=(test, params, env))
+_screendump_thread.start()
+
 
 def postprocess(test, params, env):
 
@@ -263,8 +275,15 @@ def postprocess(test, params, env):
 @param params: Dict containing all VM and image parameters.
 @param env: The environment (a dict-like object).
 
+# Postprocess all VMs and images
 process(test, params, env, postprocess_image, postprocess_vm)
 
+# Terminate the screendump thread
+global _screendump_thread, _screendump_thread_termination_event
+if _screendump_thread:
+_screendump_thread_termination_event.set()
+_screendump_thread.join(10)
+
 # Warn about corrupt PPM files
 for f in glob.glob(os.path.join(test.debugdir, *.ppm)):
 if not ppm_utils.image_verify_ppm_file(f):
@@ -290,11 +309,13 @@ def postprocess(test, params, env):
 for f in glob.glob(os.path.join(test.debugdir, '*.ppm')):
 os.unlink(f)
 
-# Execute any post_commands
-if params.get(post_command):
-process_command(test, params, env, params.get(post_command),
-int(params.get(post_command_timeout, 600)),
-params.get(post_command_noncritical) == yes)
+# Should we keep the screendump dirs?
+if params.get(keep_screendumps) != yes:
+logging.debug('keep_screendumps' not specified; removing screendump 
+  dirs...)
+for d in glob.glob(os.path.join(test.debugdir, screendumps_*)):
+if os.path.isdir(d) and not os.path.islink(d):
+shutil.rmtree(d, ignore_errors=True)
 
 # Kill all unresponsive VMs
 if params.get(kill_unresponsive_vms) == yes:
@@ -318,6 +339,12 @@ def postprocess(test, params, env):
 env[tcpdump].close()
 del env[tcpdump]
 
+# Execute any post_commands
+if params.get(post_command):
+process_command(test, params, env, params.get(post_command),
+int(params.get(post_command_timeout, 600)),
+params.get(post_command_noncritical) == yes)
+
 
 def postprocess_on_error(test, params, env):
 
@@ -343,3 +370,49 @@ def _update_address_cache(address_cache, line):
   mac_address, address_cache.get(last_seen))

Re: Streaming Audio from Virtual Machine

2010-03-21 Thread David S. Ahern



On 03/21/2010 01:12 PM, Gus Zernial wrote:
 I'm using Kubuntu 9.10 32-bit on a quad-core Phenom II with 
 Gigabit ethernet. I want to stream audio from MLB.com from a 
 WinXP client thru a Linksys WMB54G wireless music bridge. Note 
 that there are drivers for the WMB54G only for WinXP and Vista.
 
 If I stream the audio thru a native WinXP box thru the WMB54G,
 all is well and the audio sounds fine. When I try to stream thru a 
 WinXP virtual machine on Kubuntu 9.10, the audio is poor quality
 and subject to gaps and dropping the stream altogether. So far
 I've tried KVM/QEMU and VirtualBox, same result.
 
 Regards KVM/QEMU, I note AMD-V is activated in the BIOS, and I have a 
 custom 2.6.32.7 kernel, and QEMU 0.11.0. The kvm kvm_amd modules are compiled 
 in and loaded. I've been using bridged networking . I think it's set up 
 correctly but I confess I'm no networking expert. My start command for the 
 WinXP virtual machine is:
 
 sudo /usr/bin/qemu -m 1024 -boot c 
 -netnic,vlan=0,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net 
 tap,vlan=0,ifname=tap0,script=/etc/qemu-ifup -localtime -soundhw ac97 -smp 4 
 -fda /dev/fd0 -vga std -usb /home/rbroman/windows.img
 
 I also tried model=virtio but that didn't help. 
 
 I suspect this is a virtual machine networking problem but I'm
 not sure. So my questions are:
 
 -What's the best/fastest networking option and how do I set it up?
 Pointers to step-by-step instructions appreciated.
 
 -Is it possible I have a problem other than networking? Configuration
 problem with KVM/QEMU? Or could there be a problem with the WMB54G driver 
 when used thru a virtual machine?
 
 -Is there a better virtual machine solution than KVM/QEMU for what 
 I'm trying to do?

[dsa] I have been able to stream and video in a KVM-hosted winxp VM, and
I have even watched a netflix-based movie. My laptop has a Core-2 duo
cpu, T9550, with 4 GB of RAM. Networking at home is through a wireless-N
router, and I use bridged networking and NAT for VMs.

Host activity definitely has an impact. When streaming I make sure I am
not doing any heavy activity in the host layer, and if I notice jitter
the first thing I do is up the priority of the VM threads using chrt.

David

 
 Recommendations appreciated - Gus
 
 
 
 
 
   
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] qemu-kvm: Introduce wrapper functions to access phys_ram_dirty, and replace existing direct accesses to it.

2010-03-21 Thread Yoshiaki Tamura


Marcelo Tosatti wrote:

On Wed, Mar 17, 2010 at 02:51:46PM +0900, Yoshiaki Tamura wrote:


Before replacing byte-based dirty bitmap with bit-based dirty bitmap,
clearing direct accesses to the bitmap first seems to be good point to
start with.

This patch set is based on the following discussion.

http://www.mail-archive.com/kvm@vger.kernel.org/msg30724.html

Thanks,

Yoshi


Looks fine to me.

This is qemu upstream material, though.


Thanks for your comment.
I should have removed qemu-kvm from the title.

Should I rebase the patch to qemu.git and repost?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Michael S. Tsirkin

On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote:
 When creating a guest with 2 virtio-net interfaces, i am running
 into a issue causing the 2nd i/f falling back to userpace virtio
 even when vhost is enabled.
 
 After some debugging, it turned out that KVM_IOEVENTFD ioctl() 
 call in qemu is failing with ENOSPC.
 This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev()
 routine in the host kernel.
 
 I think we need to increase this limit if we want to support multiple
 network interfaces using vhost-net.
 Is there an alternate solution?
 
 Thanks
 Sridhar

Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
any objections to increasing the limit to say 16?  That would give us
5 more devices to the limit of 6 per guest.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Avi Kivity


On 03/20/2010 04:59 PM, Andrea Arcangeli wrote:

On Fri, Mar 19, 2010 at 09:21:49AM +0200, Avi Kivity wrote:
   

On 03/19/2010 12:44 AM, Ingo Molnar wrote:
 

Too bad - there was heavy initial opposition to the arch/x86 unification as
well [and heavy opposition to tools/perf/ as well], still both worked out
extremely well :-)

   

Did you forget that arch/x86 was a merging of a code fork that happened
several years previously?  Maybe that fork shouldn't have been done to
begin with.
 

We discussed and probably timidly tried to share the sharable
initially but we realized it was too time wasteful. In addition to
having to adapt the code to 64bit we would also had to constantly
solve another problem on top of it (see the various split on _32/_64,
those takes time to achieve, maybe not huge time but still definitely
some time and effort). Even in retrospect I am quite sure the way
x86-64 happened was optimal and if we would go back we would do it
again the exact same way even if the final object was to have a common
arch/x86 (and thankfully Linus is flexible and smart enough to realize
that code that isn't risking to destabilize anything shouldn't be
forced out just because it's not to a totally
theoretical-perfect-nitpicking-clean-state yet). It's still a lot of
work do the unification later as a separate task, but it's not like if
we did it immediately it would have been a lot less work. It's about
the same amount of effort and we were able to defer it for later and
decrease the time to market which surely has contributed to the
success of x86-64.
   


In hindsight decisions are much easier.  I agree it was less risky to 
fork than to share.  But if another instruction set forks out a 64-bit 
not-exactly-compatible variant, I'm sure we'll start out shared and not 
fork it, especially if the platform remains the same.



Problem of qemu is not some lack of GUI or that it's not included in
the linux kernel git tree, the definitive problem is how to merge
qemu-kvm/kvm and qlx into it. If you (Avi) were the qemu maintainer I
am sure there wouldn't two trees so as a developer I would totally
love it, and I am sure that with you as maintainer it would have a
chance to move forward with qlx on desktop virtualization without
proposing to extend vnc instead to achieve a similar result (imagine
if btrfs is published on a website and people starts to discuss if it
should ever be merged ever because reinventing some part of btrfs
inside ext5 might achieve similar results).
   


The qemu/qemu-kvm fork is definitely hurting.  Some history: when kvm 
started out I pulled qemu for fast hacking and, much like arch/x86_64, I 
couldn't destabilize qemu for something that was completely experimental 
(and closed source at the time).  Moreover, it wasn't clear if the qemu 
community would be interested.


The qemu-kvm fork was designed for minimal intrusion so I could merge 
upstream qemu regularly.  This resulted in kvm integration that was 
fairly ugly.  Later Anthony merged a well-integrated alternative 
implementation (in retrospect this was a mistake IMO - we were left with 
a well tested high performing ugly implementation and a clean, slow, 
untested, and unfeatured implementation, and no one who wants to merge 
the two).  So now it is pretty confusing to read the code which has the 
two alternate implementation sometimes sharing code and sometimes diverging.




About a GUI for KVM to use on desktop distributions, that is an
irrelevant concern compared to the lack of protocol more efficient
than rdesktop/rdp/vnc for desktop virtualization. I've people asking
me to migrate hundreds of desktops to desktop virtualization on KVM in
their organizations and I tell them to use spice because I believe
it's the most efficient option available (at least as far as we stick
to open source open protocols), there are universities using spice on
thousand of student desktops, and I think we need paravirt graphics to
happen ASAP in the main qemu tree too.
   


That effort will have to wait for the spice project to mature.


In short: running KVM on the desktop is irrelevant compared to running
the desktop on KVM so I suggest to focus on what is more important
first ;).
   


Anyone can focus on what interests them, if someone has an interest in a 
good desktop-on-desktop experience they should start hacking and sending 
patches.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Avi Kivity


On 03/21/2010 02:13 AM, Sebastian Hetze wrote:

Hi *,

in an 6 CPU SMP guest running on an host with 2 quad core
Intel Xeon E5520 with hyperthrading enabled
we see one or more guest CPUs working in a very strange
pattern. It looks like all or nothing. We can easily identify
the effected CPU with xosview. Here is the mpstat output
compared to one regular working CPU:


mpstat -P 4 1
Linux 2.6.31-16-generic-pae (guest) 21.03.2010  _i686_  (6 CPU)
00:45:19 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
00:45:20   40,00  100,000,000,000,000,000,00
0,000,00
00:45:21   40,00  100,000,000,000,000,000,00
0,000,00
00:45:22   40,00  100,000,000,000,000,000,00
0,000,00
00:45:23   40,00  100,000,000,000,000,000,00
0,000,00
00:45:24   40,00   66,670,000,000,00   33,330,00
0,000,00
00:45:25   40,00  100,000,000,000,000,000,00
0,000,00
00:45:26   40,00  100,000,000,000,000,000,00
0,000,00
   


Looks like the guest is only receiving 3-4 timer interrupts per second, 
so time becomes quantized.


Please run the attached irqtop in the affected guest and report the results.

Is the host overly busy?  What host kernel, kvm, and qemu are you 
running?  Is the guest running an I/O workload? if so, how are the disks 
configured?


--
error compiling committee.c: too many arguments to function

#!/usr/bin/python

import curses
import sys, os, time, optparse

def read_interrupts():
irq = {}
proc = file('/proc/interrupts')
nrcpu = len(proc.readline().split())
for line in proc.readlines():
vec, data = line.strip().split(':', 1)
if vec in ('ERR', 'MIS'):
continue
counts = data.split(None, nrcpu)
counts, rest = (counts[:-1], counts[-1])
count = sum([int(x) for x in counts])
try:
v = int(vec)
name = rest.split(None, 1)[1]
except:
name = rest
irq[name] = count
return irq

def delta_interrupts():
old = read_interrupts()
while True:
irq = read_interrupts()
delta = {}
for key in irq.keys():
delta[key] = irq[key] - old[key]
yield delta
old = irq

label_width = 30
number_width = 10

def tui(screen):
curses.use_default_colors()
curses.noecho()
def getcount(x):
return x[1]
def refresh(irq):
screen.erase()
screen.addstr(0, 0, 'irqtop')
row = 2
for name, count in sorted(irq.items(), key = getcount, reverse = True):
if row = screen.getmaxyx()[0]:
break
col = 1
screen.addstr(row, col, name)
col += label_width
screen.addstr(row, col, '%10d' % (count,))
row += 1
screen.refresh()

for irqs in delta_interrupts():
refresh(irqs)
curses.halfdelay(10)
try:
c = screen.getkey()
if c == 'q':
break
except KeyboardInterrupt:
break
except curses.error:
continue

import curses.wrapper
curses.wrapper(tui)

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Avi Kivity


On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote:

On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote:
   

When creating a guest with 2 virtio-net interfaces, i am running
into a issue causing the 2nd i/f falling back to userpace virtio
even when vhost is enabled.

After some debugging, it turned out that KVM_IOEVENTFD ioctl()
call in qemu is failing with ENOSPC.
This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev()
routine in the host kernel.

I think we need to increase this limit if we want to support multiple
network interfaces using vhost-net.
Is there an alternate solution?

Thanks
Sridhar
 

Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
any objections to increasing the limit to say 16?  That would give us
5 more devices to the limit of 6 per guest.
   


Increase it to 200, then.

Is the limit visible to userspace?  If not, we need to expose it.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Michael S. Tsirkin

On Sun, Mar 21, 2010 at 12:11:33PM +0200, Avi Kivity wrote:
 On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote:
 On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote:

 When creating a guest with 2 virtio-net interfaces, i am running
 into a issue causing the 2nd i/f falling back to userpace virtio
 even when vhost is enabled.

 After some debugging, it turned out that KVM_IOEVENTFD ioctl()
 call in qemu is failing with ENOSPC.
 This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev()
 routine in the host kernel.

 I think we need to increase this limit if we want to support multiple
 network interfaces using vhost-net.
 Is there an alternate solution?

 Thanks
 Sridhar
  
 Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
 any objections to increasing the limit to say 16?  That would give us
 5 more devices to the limit of 6 per guest.


 Increase it to 200, then.

OK. I think we'll also need a smarter allocator
than bus-dev_count++ than we now have. Right?

 Is the limit visible to userspace?  If not, we need to expose it.

I don't think it's visible: it seems to be used in a single
place in kvm. Let's add an ioctl? Note that qemu doesn't
need it now ...

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Gleb Natapov

On Sun, Mar 21, 2010 at 12:11:33PM +0200, Avi Kivity wrote:
 On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote:
 On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote:
 When creating a guest with 2 virtio-net interfaces, i am running
 into a issue causing the 2nd i/f falling back to userpace virtio
 even when vhost is enabled.
 
 After some debugging, it turned out that KVM_IOEVENTFD ioctl()
 call in qemu is failing with ENOSPC.
 This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev()
 routine in the host kernel.
 
 I think we need to increase this limit if we want to support multiple
 network interfaces using vhost-net.
 Is there an alternate solution?
 
 Thanks
 Sridhar
 Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
 any objections to increasing the limit to say 16?  That would give us
 5 more devices to the limit of 6 per guest.
 
 Increase it to 200, then.
 
Currently on each device read/write we iterate over all registered
devices. This is not scalable.

 Is the limit visible to userspace?  If not, we need to expose it.
 
 -- 
 error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Avi Kivity


On 03/21/2010 12:15 PM, Michael S. Tsirkin wrote:

Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
any objections to increasing the limit to say 16?  That would give us
5 more devices to the limit of 6 per guest.

   

Increase it to 200, then.
 

OK. I think we'll also need a smarter allocator
than bus-dev_count++ than we now have. Right?
   


No, why?

Eventually we'll want faster scanning than the linear search we employ 
now, though.



Is the limit visible to userspace?  If not, we need to expose it.
 

I don't think it's visible: it seems to be used in a single
place in kvm. Let's add an ioctl? Note that qemu doesn't
need it now ...
   


We usually expose limits via KVM_CHECK_EXTENSION(KVM_CAP_BLAH).  We can 
expose it via KVM_CAP_IOEVENTFD (and need to reserve iodev entries for 
those).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Avi Kivity


On 03/21/2010 12:21 PM, Gleb Natapov wrote:

On Sun, Mar 21, 2010 at 12:11:33PM +0200, Avi Kivity wrote:
   

On 03/21/2010 11:55 AM, Michael S. Tsirkin wrote:
 

On Fri, Mar 19, 2010 at 03:19:27PM -0700, Sridhar Samudrala wrote:
   

When creating a guest with 2 virtio-net interfaces, i am running
into a issue causing the 2nd i/f falling back to userpace virtio
even when vhost is enabled.

After some debugging, it turned out that KVM_IOEVENTFD ioctl()
call in qemu is failing with ENOSPC.
This is because of the NR_IOBUS_DEVS(6) limit in kvm_io_bus_register_dev()
routine in the host kernel.

I think we need to increase this limit if we want to support multiple
network interfaces using vhost-net.
Is there an alternate solution?

Thanks
Sridhar
 

Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
any objections to increasing the limit to say 16?  That would give us
5 more devices to the limit of 6 per guest.
   

Increase it to 200, then.

 

Currently on each device read/write we iterate over all registered
devices. This is not scalable.
   


Yeah.  We need first to drop the callback based matching and replace it 
with explicit ranges, then to replace the search with a hash table for 
small ranges (keeping a linear search for large ranges, can happen for 
coalesced mmio).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.

2010-03-21 Thread Gleb Natapov

Make sure that rflags is committed only after successful instruction
emulation.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |1 +
 arch/x86/kvm/x86.c |8 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b5e12c5..a1319c8 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -136,6 +136,7 @@ struct x86_emulate_ops {
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
+   void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 266576c..c1aa983 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2968,6 +2968,7 @@ writeback:
/* Commit shadow register state. */
memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
kvm_rip_write(ctxt-vcpu, c-eip);
+   ops-set_rflags(ctxt-vcpu, ctxt-eflags);
 
 done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb9a24a..3fa70b3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3643,6 +3643,11 @@ static void emulator_set_segment_selector(u16 sel, int 
seg,
kvm_set_segment(vcpu, kvm_seg, seg);
 }
 
+static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+   kvm_x86_ops-set_rflags(vcpu, rflags);
+}
+
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.write_std   = kvm_write_guest_virt_system,
@@ -3660,6 +3665,7 @@ static struct x86_emulate_ops emulate_ops = {
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
.cpl = emulator_get_cpl,
+   .set_rflags  = emulator_set_rflags,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)
@@ -3780,8 +3786,6 @@ restart:
return EMULATE_DO_MMIO;
}
 
-   kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
-
if (vcpu-mmio_is_write) {
vcpu-mmio_needed = 0;
return EMULATE_DO_MMIO;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.

2010-03-21 Thread Gleb Natapov

Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
introduced by If LOCK prefix is used dest arg should be memory commit.
This commit relies on dst operand be decoded at the beginning of an
instruction emulation.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |   24 ++--
 1 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c1aa983..904351e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -52,6 +52,7 @@
 #define DstMem  (31) /* Memory operand. */
 #define DstAcc  (41)  /* Destination Accumulator */
 #define DstDI   (51) /* Destination is in ES:(E)DI */
+#define DstMem64(61) /* 64bit memory operand */
 #define DstMask (71)
 /* Source operand type. */
 #define SrcNone (04) /* No source operand. */
@@ -360,7 +361,7 @@ static u32 group_table[] = {
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM | Lock,
DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock,
[Group9*8] =
-   0, ImplicitOps | ModRM | Lock, 0, 0, 0, 0, 0, 0,
+   0, DstMem64 | ModRM | Lock, 0, 0, 0, 0, 0, 0,
 };
 
 static u32 group2_table[] = {
@@ -1205,6 +1206,7 @@ done_prefixes:
 c-twobyte  (c-b == 0xb6 || c-b == 0xb7));
break;
case DstMem:
+   case DstMem64:
if ((c-d  ModRM)  c-modrm_mod == 3) {
c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
c-dst.type = OP_REG;
@@ -1214,7 +1216,10 @@ done_prefixes:
}
c-dst.type = OP_MEM;
c-dst.ptr = (unsigned long *)c-modrm_ea;
-   c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
+   if ((c-d  DstMask) == DstMem64)
+   c-dst.bytes = 8;
+   else
+   c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
c-dst.val = 0;
if (c-d  BitOp) {
unsigned long mask = ~(c-dst.bytes * 8 - 1);
@@ -1706,12 +1711,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
   struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
-   u64 old, new;
-   int rc;
-
-   rc = ops-read_emulated(c-modrm_ea, old, 8, ctxt-vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
+   u64 old = c-dst.orig_val;
 
if (((u32) (old  0) != (u32) c-regs[VCPU_REGS_RAX]) ||
((u32) (old  32) != (u32) c-regs[VCPU_REGS_RDX])) {
@@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
c-regs[VCPU_REGS_RAX] = (u32) (old  0);
c-regs[VCPU_REGS_RDX] = (u32) (old  32);
ctxt-eflags = ~EFLG_ZF;
-
} else {
-   new = ((u64)c-regs[VCPU_REGS_RCX]  32) |
+   c-dst.val = ((u64)c-regs[VCPU_REGS_RCX]  32) |
   (u32) c-regs[VCPU_REGS_RBX];
 
-   rc = ops-cmpxchg_emulated(c-modrm_ea, old, new, 8, 
ctxt-vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
ctxt-eflags |= EFLG_ZF;
+   c-lock_prefix = 1;
}
return X86EMUL_CONTINUE;
 }
@@ -3241,7 +3238,6 @@ twobyte_insn:
rc = emulate_grp9(ctxt, ops);
if (rc != X86EMUL_CONTINUE)
goto done;
-   c-dst.type = OP_NONE;
break;
}
goto writeback;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.

2010-03-21 Thread Gleb Natapov

Make sure that rflags is committed only after successful instruction
emulation.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |1 +
 arch/x86/kvm/x86.c |8 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b5e12c5..a1319c8 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -136,6 +136,7 @@ struct x86_emulate_ops {
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
+   void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 266576c..c1aa983 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2968,6 +2968,7 @@ writeback:
/* Commit shadow register state. */
memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
kvm_rip_write(ctxt-vcpu, c-eip);
+   ops-set_rflags(ctxt-vcpu, ctxt-eflags);
 
 done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb9a24a..3fa70b3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3643,6 +3643,11 @@ static void emulator_set_segment_selector(u16 sel, int 
seg,
kvm_set_segment(vcpu, kvm_seg, seg);
 }
 
+static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+   kvm_x86_ops-set_rflags(vcpu, rflags);
+}
+
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.write_std   = kvm_write_guest_virt_system,
@@ -3660,6 +3665,7 @@ static struct x86_emulate_ops emulate_ops = {
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
.cpl = emulator_get_cpl,
+   .set_rflags  = emulator_set_rflags,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)
@@ -3780,8 +3786,6 @@ restart:
return EMULATE_DO_MMIO;
}
 
-   kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
-
if (vcpu-mmio_is_write) {
vcpu-mmio_needed = 0;
return EMULATE_DO_MMIO;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.

2010-03-21 Thread Gleb Natapov

Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
introduced by If LOCK prefix is used dest arg should be memory commit.
This commit relies on dst operand be decoded at the beginning of an
instruction emulation.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |   24 ++--
 1 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c1aa983..904351e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -52,6 +52,7 @@
 #define DstMem  (31) /* Memory operand. */
 #define DstAcc  (41)  /* Destination Accumulator */
 #define DstDI   (51) /* Destination is in ES:(E)DI */
+#define DstMem64(61) /* 64bit memory operand */
 #define DstMask (71)
 /* Source operand type. */
 #define SrcNone (04) /* No source operand. */
@@ -360,7 +361,7 @@ static u32 group_table[] = {
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM | Lock,
DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock,
[Group9*8] =
-   0, ImplicitOps | ModRM | Lock, 0, 0, 0, 0, 0, 0,
+   0, DstMem64 | ModRM | Lock, 0, 0, 0, 0, 0, 0,
 };
 
 static u32 group2_table[] = {
@@ -1205,6 +1206,7 @@ done_prefixes:
 c-twobyte  (c-b == 0xb6 || c-b == 0xb7));
break;
case DstMem:
+   case DstMem64:
if ((c-d  ModRM)  c-modrm_mod == 3) {
c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
c-dst.type = OP_REG;
@@ -1214,7 +1216,10 @@ done_prefixes:
}
c-dst.type = OP_MEM;
c-dst.ptr = (unsigned long *)c-modrm_ea;
-   c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
+   if ((c-d  DstMask) == DstMem64)
+   c-dst.bytes = 8;
+   else
+   c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
c-dst.val = 0;
if (c-d  BitOp) {
unsigned long mask = ~(c-dst.bytes * 8 - 1);
@@ -1706,12 +1711,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
   struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
-   u64 old, new;
-   int rc;
-
-   rc = ops-read_emulated(c-modrm_ea, old, 8, ctxt-vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
+   u64 old = c-dst.orig_val;
 
if (((u32) (old  0) != (u32) c-regs[VCPU_REGS_RAX]) ||
((u32) (old  32) != (u32) c-regs[VCPU_REGS_RDX])) {
@@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
c-regs[VCPU_REGS_RAX] = (u32) (old  0);
c-regs[VCPU_REGS_RDX] = (u32) (old  32);
ctxt-eflags = ~EFLG_ZF;
-
} else {
-   new = ((u64)c-regs[VCPU_REGS_RCX]  32) |
+   c-dst.val = ((u64)c-regs[VCPU_REGS_RCX]  32) |
   (u32) c-regs[VCPU_REGS_RBX];
 
-   rc = ops-cmpxchg_emulated(c-modrm_ea, old, new, 8, 
ctxt-vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
ctxt-eflags |= EFLG_ZF;
+   c-lock_prefix = 1;
}
return X86EMUL_CONTINUE;
 }
@@ -3241,7 +3238,6 @@ twobyte_insn:
rc = emulate_grp9(ctxt, ops);
if (rc != X86EMUL_CONTINUE)
goto done;
-   c-dst.type = OP_NONE;
break;
}
goto writeback;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.

2010-03-21 Thread Gleb Natapov

Wrong To: header. Ignore please.

On Sun, Mar 21, 2010 at 01:06:02PM +0200, Gleb Natapov wrote:
 Make sure that rflags is committed only after successful instruction
 emulation.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/include/asm/kvm_emulate.h |1 +
  arch/x86/kvm/emulate.c |1 +
  arch/x86/kvm/x86.c |8 ++--
  3 files changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_emulate.h 
 b/arch/x86/include/asm/kvm_emulate.h
 index b5e12c5..a1319c8 100644
 --- a/arch/x86/include/asm/kvm_emulate.h
 +++ b/arch/x86/include/asm/kvm_emulate.h
 @@ -136,6 +136,7 @@ struct x86_emulate_ops {
   ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
   int (*cpl)(struct kvm_vcpu *vcpu);
 + void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
  };
  
  /* Type, address-of, and value of an instruction's operand. */
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 266576c..c1aa983 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2968,6 +2968,7 @@ writeback:
   /* Commit shadow register state. */
   memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
   kvm_rip_write(ctxt-vcpu, c-eip);
 + ops-set_rflags(ctxt-vcpu, ctxt-eflags);
  
  done:
   return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bb9a24a..3fa70b3 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3643,6 +3643,11 @@ static void emulator_set_segment_selector(u16 sel, int 
 seg,
   kvm_set_segment(vcpu, kvm_seg, seg);
  }
  
 +static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 +{
 + kvm_x86_ops-set_rflags(vcpu, rflags);
 +}
 +
  static struct x86_emulate_ops emulate_ops = {
   .read_std= kvm_read_guest_virt_system,
   .write_std   = kvm_write_guest_virt_system,
 @@ -3660,6 +3665,7 @@ static struct x86_emulate_ops emulate_ops = {
   .get_cr  = emulator_get_cr,
   .set_cr  = emulator_set_cr,
   .cpl = emulator_get_cpl,
 + .set_rflags  = emulator_set_rflags,
  };
  
  static void cache_all_regs(struct kvm_vcpu *vcpu)
 @@ -3780,8 +3786,6 @@ restart:
   return EMULATE_DO_MMIO;
   }
  
 - kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
 -
   if (vcpu-mmio_is_write) {
   vcpu-mmio_needed = 0;
   return EMULATE_DO_MMIO;
 -- 
 1.6.5
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tracking KVM development

2010-03-21 Thread Thomas Løcke

Hey all,

I've recently started testing KVM as a possible virtualization
solution for a bunch of servers, and so far things are going pretty
well. My OS of choice is Slackware, and I usually just go with
whatever kernel Slackware comes with.

But with KVM I feel I might need to pay a bit more attention to that
part of Slackware, as it appears to a be a project in rapid
development, so my questions concern how best to track and keep KVM
up-to-date?

Currently I upgrade to the latest stable kernel almost as soon as its
been released by Linus, and I track qemu-kvm using this Git
repository:  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git

But should I perhaps also track the KVM modules, and if so, from where?

Any and all suggestions to keeping a healthy and stable KVM setup
running is more than welcome.

:o)
/Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Time and KVM - best practices

2010-03-21 Thread Thomas Løcke

Hey,

What is considered best practice when running a KVM host with a
mixture of Linux and Windows guests?

Currently I have ntpd running on the host, and I start my guests using
-rtc base=localhost,clock=host, with an extra -tdf added for
Windows guests, just to keep their clock from drifting madly during
load.

But with this setup, all my guests are constantly 1-2 seconds behind
the host. I can live with that for the Windows guests, as they are not
running anything that depends heavily on the time being set perfect,
but for some of the Linux guests it's an issue.

Would I be better of using ntpd and -rtc base=localhost,clock=vm for
all the Linux guests, or is there some other magic way of ensuring
that the clock is perfectly in sync with the host? Perhaps there are
some kernel configuration I can do to optimize the host for KVM?

I'm currently using QEMU PC emulator version 0.12.50 (qemu-kvm-devel)
because version 0.12.30 did not work well at all with Windows guests,
and the kernel in both host and Linux guests is 2.6.33.1

:o)
/Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Michael S. Tsirkin

On Sun, Mar 21, 2010 at 12:29:31PM +0200, Avi Kivity wrote:
 On 03/21/2010 12:15 PM, Michael S. Tsirkin wrote:
 Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
 any objections to increasing the limit to say 16?  That would give us
 5 more devices to the limit of 6 per guest.


 Increase it to 200, then.
  
 OK. I think we'll also need a smarter allocator
 than bus-dev_count++ than we now have. Right?


 No, why?

We'll run into problems if devices are created/removed in random order,
won't we?

 Eventually we'll want faster scanning than the linear search we employ  
 now, though.

Yes I suspect with 200 entries we will :). Let's just make it 16 for
now?

 Is the limit visible to userspace?  If not, we need to expose it.
  
 I don't think it's visible: it seems to be used in a single
 place in kvm. Let's add an ioctl? Note that qemu doesn't
 need it now ...


 We usually expose limits via KVM_CHECK_EXTENSION(KVM_CAP_BLAH).  We can  
 expose it via KVM_CAP_IOEVENTFD (and need to reserve iodev entries for  
 those).

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to create more than 1 guest virtio-net device using vhost-net backend

2010-03-21 Thread Avi Kivity


On 03/21/2010 01:34 PM, Michael S. Tsirkin wrote:

On Sun, Mar 21, 2010 at 12:29:31PM +0200, Avi Kivity wrote:
   

On 03/21/2010 12:15 PM, Michael S. Tsirkin wrote:
 

Nothing easy that I can see. Each device needs 2 of these.  Avi, Gleb,
any objections to increasing the limit to say 16?  That would give us
5 more devices to the limit of 6 per guest.


   

Increase it to 200, then.

 

OK. I think we'll also need a smarter allocator
than bus-dev_count++ than we now have. Right?

   

No, why?
 

We'll run into problems if devices are created/removed in random order,
won't we?
   


unregister_dev() takes care of it.


Eventually we'll want faster scanning than the linear search we employ
now, though.
 

Yes I suspect with 200 entries we will :). Let's just make it 16 for
now?
   


Let's make it 200 and fix the performance problems later.  Making it 16 
is just asking for trouble.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Sebastian Hetze

On Sun, Mar 21, 2010 at 12:09:00PM +0200, Avi Kivity wrote:
 On 03/21/2010 02:13 AM, Sebastian Hetze wrote:
 Hi *,

 in an 6 CPU SMP guest running on an host with 2 quad core
 Intel Xeon E5520 with hyperthrading enabled
 we see one or more guest CPUs working in a very strange
 pattern. It looks like all or nothing. We can easily identify
 the effected CPU with xosview. Here is the mpstat output
 compared to one regular working CPU:


 mpstat -P 4 1
 Linux 2.6.31-16-generic-pae (guest)  21.03.2010  _i686_  (6 CPU)
 00:45:19 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest   %idle
 00:45:20   40,00  100,000,000,000,000,000,00
 0,000,00
 00:45:21   40,00  100,000,000,000,000,000,00
 0,000,00
 00:45:22   40,00  100,000,000,000,000,000,00
 0,000,00
 00:45:23   40,00  100,000,000,000,000,000,00
 0,000,00
 00:45:24   40,00   66,670,000,000,00   33,330,00
 0,000,00
 00:45:25   40,00  100,000,000,000,000,000,00
 0,000,00
 00:45:26   40,00  100,000,000,000,000,000,00
 0,000,00


 Looks like the guest is only receiving 3-4 timer interrupts per second,  
 so time becomes quantized.

 Please run the attached irqtop in the affected guest and report the results.

 Is the host overly busy?  What host kernel, kvm, and qemu are you  
 running?  Is the guest running an I/O workload? if so, how are the disks  

The host is not busy at all. In fact, currently it is running only one
guest. The host is running an ubuntu 2.6.31-14-server kernel. qemu-kvm
is 0.12.2-0ubuntu6. The kvm module has srcversion: 82D6B673524596F9CF3E84C
as stated by modinfo.

The guest occasionally is running IO workload. However, the effect is
visible all the time. And it is only one out of 6 CPUs the very same guest
is running. This is the output on the guest for all CPUs:

mpstat -P ALL 1
12:45:59 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:00 all0,409,742,395,370,803,980,00
0,00   77,34
12:46:00   01,005,006,003,001,009,000,00
0,00   75,00
12:46:00   10,00   23,002,00   10,000,000,000,00
0,00   65,00
12:46:00   20,005,940,996,930,001,980,00
0,00   84,16
12:46:00   30,008,002,005,002,009,000,00
0,00   74,00
12:46:00   40,00   33,330,000,000,000,000,00
0,00   66,67
12:46:00   50,005,940,003,960,000,990,00
0,00   89,11

12:46:00 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:01 all0,605,813,21   24,450,403,610,00
0,00   61,92
12:46:01   01,014,047,07   31,311,016,060,00
0,00   49,49
12:46:01   10,005,002,00   19,000,002,000,00
0,00   72,00
12:46:01   20,997,921,98   35,640,002,970,00
0,00   50,50
12:46:01   31,984,952,97   13,860,006,930,00
0,00   69,31
12:46:01   40,00   33,330,000,000,000,000,00
0,00   66,67
12:46:01   50,008,083,03   22,220,001,010,00
0,00   65,66

12:46:01 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:02 all2,38   12,70   17,06   14,680,601,980,00
0,00   50,60
12:46:02   03,96   15,849,90   13,860,002,970,00
0,00   53,47
12:46:02   12,976,935,94   19,802,972,970,00
0,00   58,42
12:46:02   22,02   17,178,08   18,182,021,010,00
0,00   51,52
12:46:02   32,02   10,108,08   14,140,002,020,00
0,00   63,64
12:46:02   40,000,000,000,000,000,000,00
0,00  100,00
12:46:02   50,00   13,00   55,006,000,001,000,00
0,00   25,00

12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:03 all0,20   11,35   10,968,960,402,990,00
0,00   65,14
12:46:03   01,00   11,007,00   15,000,001,000,00
0,00   65,00
12:46:03   10,007,142,046,121,02   11,220,00
0,00   72,45
12:46:03   20,00   15,001,00   12,000,001,000,00
0,00   71,00
12:46:03   30,00   11,00   23,008,000,000,000,00
0,00   58,00
12:46:03   40,000,00   50,000,000,000,000,00
0,00   50,00
12:46:03   50,00   13,00   20,004,000,001,000,00
0,00   62,00

So it is only CPU4 that is showing this strange behaviour.

hi, may I ask some help on the paravirtualization of KVM?

2010-03-21 Thread Liang YANG

I want to set up the virtio-net for the GuestOS on KVM. Following is my steps:

1.Compile the kvm-88 and make, make install.
2.Compile the GuestOS(redhat) with kernel version 2.6.27.45(with
virtio support). The required option are all selected.
  o CONFIG_VIRTIO_PCI=y (Virtualization - PCI driver for
virtio devices)
  o CONFIG_VIRTIO_BALLOON=y (Virtualization - Virtio balloon driver)
  o CONFIG_VIRTIO_BLK=y (Device Drivers - Block - Virtio block driver)
  o CONFIG_VIRTIO_NET=y (Device Drivers - Network device
support - Virtio network driver)
  o CONFIG_VIRTIO=y (automatically selected)
  o CONFIG_VIRTIO_RING=y (automatically selected)
3.Then start up the GuestOS by such command:
  x86_64-softmmu/qemu-system-x86_64  -m 1024 /root/redhat.img
-net nic,model=virtio -net tap,script=/etc/kvm/qemu-ifup
4.Result is this:
  * The Guest OS start up.
  * But the network not, no eth-X device found.
  * lsmod | grep virtio get none module about virtio

Then why the virtio_net not show up in the GuestOS? Is there any
wrongs on my each steps? or lacking some settings? I have referred the
page http://www.linux-kvm.org/page/Virtio, but not found any special
requirement.

Does anyone have some tips? Thanks in advance.



-- 
BestRegards.
YangLiang
_
 Department of Computer Science .
 School of Electronics Engineering  Computer Science .
_
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Avi Kivity


On 03/21/2010 02:02 PM, Sebastian Hetze wrote:


12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:03 all0,20   11,35   10,968,960,402,990,00
0,00   65,14
12:46:03   01,00   11,007,00   15,000,001,000,00
0,00   65,00
12:46:03   10,007,142,046,121,02   11,220,00
0,00   72,45
12:46:03   20,00   15,001,00   12,000,001,000,00
0,00   71,00
12:46:03   30,00   11,00   23,008,000,000,000,00
0,00   58,00
12:46:03   40,000,00   50,000,000,000,000,00
0,00   50,00
12:46:03   50,00   13,00   20,004,000,001,000,00
0,00   62,00

So it is only CPU4 that is showing this strange behaviour.
   


Can you adjust irqtop to only count cpu4?  or even just post a few 'cat 
/proc/interrupts' from that guest.


Most likely the timer interrupt for cpu4 died.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Tracking KVM development

2010-03-21 Thread Avi Kivity


On 03/21/2010 01:21 PM, Thomas Løcke wrote:

Hey all,

I've recently started testing KVM as a possible virtualization
solution for a bunch of servers, and so far things are going pretty
well. My OS of choice is Slackware, and I usually just go with
whatever kernel Slackware comes with.

But with KVM I feel I might need to pay a bit more attention to that
part of Slackware, as it appears to a be a project in rapid
development, so my questions concern how best to track and keep KVM
up-to-date?

Currently I upgrade to the latest stable kernel almost as soon as its
been released by Linus, and I track qemu-kvm using this Git
repository:  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git

But should I perhaps also track the KVM modules, and if so, from where?

Any and all suggestions to keeping a healthy and stable KVM setup
running is more than welcome.

   


Tracking git repositories and stable setups are mutually exclusive.  If 
you are interested in something stable I recommend staying with the 
distribution provided setup (and picking a distribution that has an 
emphasis on kvm).  If you want to track upstream, use qemu-kvm-0.12.x 
stable releases and kernel.org 2.6.x.y stable releases.  If you want to 
track git repositories, use qemu-kvm.git and kvm.git for the kernel and kvm.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Drop KVM_REQ_PENDING_TIMER

2010-03-21 Thread Avi Kivity


On 03/20/2010 05:20 AM, Xiao Wang wrote:

The pending timer is not detected through KVM_REQ_PENDING_TIMER now.

   


It does, see the commit message of 06e056456.

Marcelo, IIRC this is the second time time we get this patch... we need 
either a comment in the code, or better, a fix that doesn't involve an 
atomic in the fast path.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Gabor Gombas

On Thu, Mar 18, 2010 at 05:13:10PM +0100, Ingo Molnar wrote:

  Why does Linux AIO still suck?  Why do we not have a proper interface in 
  userspace for doing asynchronous file system operations?
 
 Good that you mention it, i think it's an excellent example.
 
 The suckage of kernel async IO is for similar reasons: there's an ugly 
 package 
 separation problem between the kernel and between glibc - and between the 
 apps 
 that would make use of it.

No, kernel async IO sucks because it still does not play well with
buffered I/O. Last time I checked (about a year ago or so), AIO syscall
latencies were much worse when buffered I/O was used compared to direct
I/O. Unfortunately, to achieve good performance with direct I/O, you
need a HW RAID card with lots of on-board cache.

Gabor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.

2010-03-21 Thread Avi Kivity


On 03/21/2010 01:09 PM, Gleb Natapov wrote:

Wrong To: header. Ignore please.
   


See sendemail.aliasesfile in 'git help send-email'.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Fix 32-bit build breakage due to typo

2010-03-21 Thread Avi Kivity


On 03/20/2010 11:14 AM, Jan Kiszka wrote:


Obviously, the 64-bit case is considered stable now and 32 bit remained
untested (not included in autotest?).


We don't autotest on 32-bit hosts these days.


So here is the build fix:
   


Thanks, applied.  Should have done it myself.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Fix a build error

2010-03-21 Thread Avi Kivity


On 03/20/2010 07:17 PM, Amos Kong wrote:

arch/x86/kvm/x86.c: In function ‘emulator_cmpxchg_emulated’:
arch/x86/kvm/x86.c:3367: error: ‘u’ undeclared (first use in this function)
arch/x86/kvm/x86.c:3367: error: (Each undeclared identifier is reported only 
once
arch/x86/kvm/x86.c:3367: error: for each function it appears in.)
arch/x86/kvm/x86.c:3367: error: expected expression before ‘)’ token
   


Thanks, just applied same patch from Jan.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: commit rflags as part of registers commit.

2010-03-21 Thread Avi Kivity


On 03/21/2010 04:35 PM, Gleb Natapov wrote:

On Sun, Mar 21, 2010 at 04:32:42PM +0200, Avi Kivity wrote:
   

On 03/21/2010 01:09 PM, Gleb Natapov wrote:
 

Wrong To: header. Ignore please.
   

See sendemail.aliasesfile in 'git help send-email'.

 

I use alisesfile, but unfortunately if alias is not found there git does
not complain, just pass it as is to sendmail and sendmail adds part
after @ by itself.
   


Ah.  Then don't use sendmail.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.

2010-03-21 Thread Avi Kivity


On 03/21/2010 01:08 PM, Gleb Natapov wrote:

Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
introduced by If LOCK prefix is used dest arg should be memory commit.
This commit relies on dst operand be decoded at the beginning of an
instruction emulation.
   



@@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
c-regs[VCPU_REGS_RAX] = (u32) (old  0);
c-regs[VCPU_REGS_RDX] = (u32) (old  32);
ctxt-eflags= ~EFLG_ZF;
-
} else {
-   new = ((u64)c-regs[VCPU_REGS_RCX]  32) |
+   c-dst.val = ((u64)c-regs[VCPU_REGS_RCX]  32) |
   (u32) c-regs[VCPU_REGS_RBX];

-   rc = ops-cmpxchg_emulated(c-modrm_ea,old,new, 8, 
ctxt-vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
ctxt-eflags |= EFLG_ZF;
+   c-lock_prefix = 1;
   


Why is this bit needed?  cmpxchg64b without lock is valid and racy, but 
the guest may know it is safe.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.

2010-03-21 Thread Gleb Natapov

On Sun, Mar 21, 2010 at 04:41:24PM +0200, Avi Kivity wrote:
 On 03/21/2010 01:08 PM, Gleb Natapov wrote:
 Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
 introduced by If LOCK prefix is used dest arg should be memory commit.
 This commit relies on dst operand be decoded at the beginning of an
 instruction emulation.
 
 @@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct 
 x86_emulate_ctxt *ctxt,
  c-regs[VCPU_REGS_RAX] = (u32) (old  0);
  c-regs[VCPU_REGS_RDX] = (u32) (old  32);
  ctxt-eflags= ~EFLG_ZF;
 -
  } else {
 -new = ((u64)c-regs[VCPU_REGS_RCX]  32) |
 +c-dst.val = ((u64)c-regs[VCPU_REGS_RCX]  32) |
 (u32) c-regs[VCPU_REGS_RBX];
 
 -rc = ops-cmpxchg_emulated(c-modrm_ea,old,new, 8, 
 ctxt-vcpu);
 -if (rc != X86EMUL_CONTINUE)
 -return rc;
  ctxt-eflags |= EFLG_ZF;
 +c-lock_prefix = 1;
 
 Why is this bit needed?  cmpxchg64b without lock is valid and racy,
 but the guest may know it is safe.
 
Agree. Before this patch cmpxchg8b emulation always called
cmpxchg_emulated(), so to be extra careful I wanted to preserve old
behaviour. Resend the patch without this line?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: x86 emulator: add decoding of CMPXCHG8B dst operand.

2010-03-21 Thread Avi Kivity


On 03/21/2010 04:44 PM, Gleb Natapov wrote:

On Sun, Mar 21, 2010 at 04:41:24PM +0200, Avi Kivity wrote:
   

On 03/21/2010 01:08 PM, Gleb Natapov wrote:
 

Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
introduced by If LOCK prefix is used dest arg should be memory commit.
This commit relies on dst operand be decoded at the beginning of an
instruction emulation.
   
 

@@ -1719,15 +1719,12 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
c-regs[VCPU_REGS_RAX] = (u32) (old   0);
c-regs[VCPU_REGS_RDX] = (u32) (old   32);
ctxt-eflags= ~EFLG_ZF;
-
} else {
-   new = ((u64)c-regs[VCPU_REGS_RCX]   32) |
+   c-dst.val = ((u64)c-regs[VCPU_REGS_RCX]   32) |
   (u32) c-regs[VCPU_REGS_RBX];

-   rc = ops-cmpxchg_emulated(c-modrm_ea,old,new, 8, 
ctxt-vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
ctxt-eflags |= EFLG_ZF;
+   c-lock_prefix = 1;
   

Why is this bit needed?  cmpxchg64b without lock is valid and racy,
but the guest may know it is safe.

 

Agree. Before this patch cmpxchg8b emulation always called
cmpxchg_emulated(), so to be extra careful I wanted to preserve old
behaviour. Resend the patch without this line?
   


Better a 3/2 that removes it.  So we have a large patch that just 
transforms code, and a small patch that corrects an earlier bug.  May 
help a bisector one day.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Sebastian Hetze

On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
 On 03/21/2010 02:02 PM, Sebastian Hetze wrote:

 12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest   %idle
 12:46:03 all0,20   11,35   10,968,960,402,990,00
 0,00   65,14
 12:46:03   01,00   11,007,00   15,000,001,000,00
 0,00   65,00
 12:46:03   10,007,142,046,121,02   11,220,00
 0,00   72,45
 12:46:03   20,00   15,001,00   12,000,001,000,00
 0,00   71,00
 12:46:03   30,00   11,00   23,008,000,000,000,00
 0,00   58,00
 12:46:03   40,000,00   50,000,000,000,000,00
 0,00   50,00
 12:46:03   50,00   13,00   20,004,000,001,000,00
 0,00   62,00

 So it is only CPU4 that is showing this strange behaviour.


 Can you adjust irqtop to only count cpu4?  or even just post a few 'cat  
 /proc/interrupts' from that guest.

 Most likely the timer interrupt for cpu4 died.

I've added two keys +/- to your irqtop to focus up and down
in the row of available CPUs.
The irqtop for CPU4 shows a constant number of 6 local timer interrupts
per update, while the other CPUs show various higher values:

irqtop for cpu 4

 eth0  188
 Rescheduling interrupts   162
 Local timer interrupts  6
 ata_piix3
 TLB shootdowns  1
 Spurious interrupts 0
 Machine check exceptions0


irqtop for cpu 5

 eth0  257
 Local timer interrupts251
 Rescheduling interrupts   237
 Spurious interrupts 0
 Machine check exceptions0

So the timer interrupt for cpu4 is not completely dead but somehow
broken. What can cause this problem? Any way to speed it up again?

#!/usr/bin/python

import curses
import sys, os, time, optparse

def read_interrupts():
global target
irq = {}
proc = file('/proc/interrupts')
nrcpu = len(proc.readline().split())
if target  0:
target = 0;
if target  nrcpu:
target = nrcpu
for line in proc.readlines():
vec, data = line.strip().split(':', 1)
if vec in ('ERR', 'MIS'):
continue
counts = data.split(None, nrcpu)
counts, rest = (counts[:-1], counts[-1])
if target == 0:
count = sum([int(x) for x in counts])
else:
count = int(counts[target-1])
try:
v = int(vec)
name = rest.split(None, 1)[1]
except:
name = rest
irq[name] = count
return irq

def delta_interrupts():
old = read_interrupts()
while True:
irq = read_interrupts()
delta = {}
for key in irq.keys():
delta[key] = irq[key] - old[key]
yield delta
old = irq

target = 0
label_width = 35
number_width = 10

def tui(screen):
curses.use_default_colors()
global target
curses.noecho()
def getcount(x):
return x[1]
def refresh(irq):
screen.erase()
if target  0:
title = irqtop for cpu %d%(target-1)
else:
title = irqtop sum for all cpu's
screen.addstr(0, 0, title)
row = 2
for name, count in sorted(irq.items(), key = getcount, reverse = True):
if row = screen.getmaxyx()[0]:
break
col = 1
screen.addstr(row, col, name)
col += label_width
screen.addstr(row, col, '%10d' % (count,))
row += 1
screen.refresh()

for irqs in delta_interrupts():
refresh(irqs)
curses.halfdelay(10)
try:
c = screen.getkey()
if c == 'q':
break
if c == '+':
target = target+1
if c == '-':
target = target-1
except KeyboardInterrupt:
break
except curses.error:
continue

import curses.wrapper
curses.wrapper(tui)

[PATCH] KVM: x86 emulator: fix unlocked CMPXCHG8B emulation.

2010-03-21 Thread Gleb Natapov


When CMPXCHG8B is executed without LOCK prefix it is racy. Preserve this
behaviour in emulator too.

Signed-off-by: Gleb Natapov g...@redhat.com
---

This patch goes on top of my previous KVM: x86 emulator: add decoding
of CMPXCHG8B dst operand. patch.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 904351e..e2bbb9c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1724,7 +1724,6 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
   (u32) c-regs[VCPU_REGS_RBX];
 
ctxt-eflags |= EFLG_ZF;
-   c-lock_prefix = 1;
}
return X86EMUL_CONTINUE;
 }
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Avi Kivity


On 03/21/2010 04:55 PM, Sebastian Hetze wrote:

On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
   

On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
 

12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:03 all0,20   11,35   10,968,960,402,990,00
0,00   65,14
12:46:03   01,00   11,007,00   15,000,001,000,00
0,00   65,00
12:46:03   10,007,142,046,121,02   11,220,00
0,00   72,45
12:46:03   20,00   15,001,00   12,000,001,000,00
0,00   71,00
12:46:03   30,00   11,00   23,008,000,000,000,00
0,00   58,00
12:46:03   40,000,00   50,000,000,000,000,00
0,00   50,00
12:46:03   50,00   13,00   20,004,000,001,000,00
0,00   62,00

So it is only CPU4 that is showing this strange behaviour.

   

Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
/proc/interrupts' from that guest.

Most likely the timer interrupt for cpu4 died.
 

I've added two keys +/- to your irqtop to focus up and down
in the row of available CPUs.
The irqtop for CPU4 shows a constant number of 6 local timer interrupts
per update, while the other CPUs show various higher values:

irqtop for cpu 4

  eth0  188
  Rescheduling interrupts   162
  Local timer interrupts  6
  ata_piix3
  TLB shootdowns  1
  Spurious interrupts 0
  Machine check exceptions0


irqtop for cpu 5

  eth0  257
  Local timer interrupts251
  Rescheduling interrupts   237
  Spurious interrupts 0
  Machine check exceptions0

So the timer interrupt for cpu4 is not completely dead but somehow
broken.


That is incredibly weird.


What can cause this problem? Any way to speed it up again?
   


The host has 8 cpus and is only running this 6 vcpu guest, yes?

Can you confirm the other vcpus are ticking at 250 Hz?

What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a 
last-used-cpu field in the display.


Marcelo, any ideas?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Sebastian Hetze

On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote:
 On 03/21/2010 04:55 PM, Sebastian Hetze wrote:
 On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:

 On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
  
 12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest   %idle
 12:46:03 all0,20   11,35   10,968,960,402,990,00   
  0,00   65,14
 12:46:03   01,00   11,007,00   15,000,001,000,00   
  0,00   65,00
 12:46:03   10,007,142,046,121,02   11,220,00   
  0,00   72,45
 12:46:03   20,00   15,001,00   12,000,001,000,00   
  0,00   71,00
 12:46:03   30,00   11,00   23,008,000,000,000,00   
  0,00   58,00
 12:46:03   40,000,00   50,000,000,000,000,00   
  0,00   50,00
 12:46:03   50,00   13,00   20,004,000,001,000,00   
  0,00   62,00

 So it is only CPU4 that is showing this strange behaviour.


 Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
 /proc/interrupts' from that guest.

 Most likely the timer interrupt for cpu4 died.
  
 I've added two keys +/- to your irqtop to focus up and down
 in the row of available CPUs.
 The irqtop for CPU4 shows a constant number of 6 local timer interrupts
 per update, while the other CPUs show various higher values:

 irqtop for cpu 4

   eth0  188
   Rescheduling interrupts   162
   Local timer interrupts  6
   ata_piix3
   TLB shootdowns  1
   Spurious interrupts 0
   Machine check exceptions0


 irqtop for cpu 5

   eth0  257
   Local timer interrupts251
   Rescheduling interrupts   237
   Spurious interrupts 0
   Machine check exceptions0

 So the timer interrupt for cpu4 is not completely dead but somehow
 broken.

 That is incredibly weird.

 What can cause this problem? Any way to speed it up again?


 The host has 8 cpus and is only running this 6 vcpu guest, yes?

The host is an dual quad core E5520 with hyperthrading enabled, so we
see 2x4x2=16 CPUs on the host. The guest is started with 6 CPUs.

 Can you confirm the other vcpus are ticking at 250 Hz?

The irqtop shows different numbers for local timer interrupts on the
other CPUs. The total number (summed up over all CPUs) varies between
something like 700 and 1400. Any CPU can be down to 10 and next update
up to 260. Only CPU4 stays at the 6 local timer interrupts.


 What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a  
 last-used-cpu field in the display.

The processes are not bound to a particular CPU, so the picture varies.
Here are two shots:

take1:

   15 root  RT  -5 000 S0  0.0   0:01.70 4 migration/4
   16 root  15  -5 000 S0  0.0   0:00.08 4 ksoftirqd/4
   17 root  RT  -5 000 S0  0.0   0:00.00 4 watchdog/4
   25 root  15  -5 000 S0  0.0   0:00.01 4 events/4
   35 root  15  -5 000 S0  0.0   0:00.00 4 kintegrityd/4
   41 root  15  -5 000 S0  0.0   0:00.03 4 kblockd/4
   50 root  15  -5 000 S0  0.0   0:00.90 4 ata/4
   55 root  15  -5 000 S0  0.0   0:00.00 4 kseriod
   66 root  15  -5 000 S0  0.0   0:00.00 4 aio/4
   73 root  15  -5 000 S0  0.0   0:00.00 4 crypto/4
   80 root  15  -5 000 S0  0.0   2:11.71 4 scsi_eh_1
   87 root  15  -5 000 S0  0.0   0:00.00 4 kmpathd/4
   95 root  15  -5 000 S0  0.0   0:00.00 4 kondemand/4
  101 root  15  -5 000 S0  0.0   0:00.00 4 kconservative/4
  103 root  10 -10 000 S0  0.0   0:00.00 4 krfcommd
  681 root  15  -5 000 S0  0.0   0:00.00 4 kdmflush
  686 root  15  -5 000 S0  0.0   0:00.00 4 kdmflush
  691 root  15  -5 000 S0  0.0   0:00.00 4 kdmflush
  737 root  15  -5 000 S0  0.0   0:00.71 4 kjournald
  826 root  16  -4  2100  452  312 S0  0.0   0:00.14 4 udevd
 1350 root  15  -5 000 S0  0.0   0:00.00 4 kpsmoused
 1444 root  15  -5 000 S0  0.0   0:00.00 4 kgameportd
 1718 root  15  -5 000 S0  0.0   0:14.62 4 kjournald
 2108 statd 20   0  2252 1152  760 S0  0.0   0:02.66 4 rpc.statd
 2117 root  15  -5 000 S0  0.0   0:00.36 4 rpciod/4
 2123 root  15  -5 000 S0  0.0   0:06.61 4 nfsiod
 2259 root  20   0  1696  444  440 S0  0.0   0:00.00 4 getty
 2265 root  20   0  1696  444  440 S0  0.0   0:00.00 4

Re: Tracking KVM development

2010-03-21 Thread Thomas Løcke

On Sun, Mar 21, 2010 at 1:23 PM, Avi Kivity a...@redhat.com wrote:
 Tracking git repositories and stable setups are mutually exclusive.  If you
 are interested in something stable I recommend staying with the distribution
 provided setup (and picking a distribution that has an emphasis on kvm).  If
 you want to track upstream, use qemu-kvm-0.12.x stable releases and
 kernel.org 2.6.x.y stable releases.  If you want to track git repositories,
 use qemu-kvm.git and kvm.git for the kernel and kvm.

Thanks Avi.

I will stay with the stable qemu-kvm releases and stable kernel.org
kernel releases from now on.

I've never heard of any KVM specific distributions. Are you aware of
any? My primary reason for going with Slackware, is because I already
know it. But if there are better choices for a KVM virtualization
host, then I'm willing to switch.

:o)
/Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Tracking KVM development

2010-03-21 Thread Avi Kivity


On 03/21/2010 06:37 PM, Thomas Løcke wrote:

On Sun, Mar 21, 2010 at 1:23 PM, Avi Kivitya...@redhat.com  wrote:
   

Tracking git repositories and stable setups are mutually exclusive.  If you
are interested in something stable I recommend staying with the distribution
provided setup (and picking a distribution that has an emphasis on kvm).  If
you want to track upstream, use qemu-kvm-0.12.x stable releases and
kernel.org 2.6.x.y stable releases.  If you want to track git repositories,
use qemu-kvm.git and kvm.git for the kernel and kvm.
 

Thanks Avi.

I will stay with the stable qemu-kvm releases and stable kernel.org
kernel releases from now on.

I've never heard of any KVM specific distributions. Are you aware of
any? My primary reason for going with Slackware, is because I already
know it. But if there are better choices for a KVM virtualization
host, then I'm willing to switch.
   


The only kvm-specific distribution I know of is RHEV-H, but that's 
probably not what you're looking for.  I'm talking about distributions 
that have an active kvm package maintainer, update the packages 
regularly, have bug trackers that someone looks into, etc.  At least 
Fedora and Ubuntu do this, perhaps openSuSE as well (though the latter 
has a stronger Xen emphasis).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unexpected exit_ini_info when nesting svm

2010-03-21 Thread Joerg Roedel

Hello Oliver,

On Thu, Mar 18, 2010 at 08:43:53PM +0100, Olivier Berghmans wrote:
 I tried nesting kvm in kvm on an AMD processor with support for svm
 and npt (the dmesg told me both were in use). I managed to install the
 nested kvm and when starting the L2 guest in order to install an
 operating system, I got following messages in the L1 guest:
 
 [ 2016.712047] handle_exit: unexpected exit_ini_info 0x8008 exit_code 0x60
 [ 2031.432032] handle_exit: unexpected exit_ini_info 0x8008 exit_code 0x60
 [ 2034.468058] handle_exit: unexpected exit_ini_info 0x8008 exit_code 0x60

These messages result from a difference between a real hardware svm and
the emulated svm from kvm. Hardware SVM always injects an exception
first before it does an #vmexit(0x60) while the svm emulation does
immediatlt #vmexit again. I have a patch to fix this but it needs more
testing.

The patch implements detection of the above situation and sends an
self-ipi in this case.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-21 Thread Ingo Molnar


* oerg Roedel j...@8bytes.org wrote:

 On Fri, Mar 19, 2010 at 09:21:22AM +0100, Ingo Molnar wrote:
  Unfortunately, in a previous thread the Qemu maintainer has indicated that 
  he 
  will essentially NAK any attempt to enhance Qemu to provide an easily 
  discoverable, self-contained, transparent guest mount on the host side.
  
  No technical justification was given for that NAK, despite my repeated 
  requests to particulate the exact security problems that such an approach 
  would cause.
  
  If that NAK does not stand in that form then i'd like to know about it - it 
  makes no sense for us to try to code up a solution against a standing 
  maintainer NAK ...
 
 I still think it is the best and most generic way to let the guest do the 
 symbol resolution. [...]

Not really.

 [...] This has several advantages:
 
   1. The guest knows best about its symbol space. So this would be
  extensible to other guest operating systems.  A brave
  developer may even implement symbol passing for Windows or
  the BSDs ;-)

Having access to the actual executable files that include the symbols achieves 
precisely that - with the additional robustness that all this functionality is 
concentrated into the host, while the guest side is kept minimal (and 
transparent).

   2. The guest can decide for its own if it want to pass this
  inforamtion to the host-perf. No security issues at all.

It can decide whether it exposes the files. Nor are there any security 
issues to begin with.

   3. The guest can also pass us the call-chain and we don't need
  to care about complicated of fetching from the guest
  ourself.

You need to be aware of the fact that symbol resolution is a separate step 
from call chain generation.

I.e. call-chains are a (entirely) separate issue, and could reasonably be done 
in the guest or in the host.

It has no bearing on this symbol resolution question.

   4. This way extensible to nested virtualization too.

Nested virtualization is actually already taken care of by the filesystem 
solution via an existing method called 'subdirectories'. If the guest offers 
sub-guests then those symbols will be exposed in a similar way via its own 
'guest files' directory hierarchy.

I.e. if we have 'Guest-2' nested inside 'the 'Guest-Fedora-1' instance, we get:

 /guests/
 /guests/Guest-Fedora-1/etc/
 /guests/Guest-Fedora-1/usr/

we'd also have:

 /guests/Guest-Fedora-1/guests/Guest-2/

So this is taken care of automatically.

I.e. none of the four 'advantages' listed here are actually advantages over my 
proposed solution, so your conclusion is subsequently flawed as well.

 How we speak to the guest was already discussed in this thread. My personal 
 opinion is that going through qemu is an unnecessary step and we can solve 
 that more clever and transparent for perf.

Meaning exactly what?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Ingo Molnar


* Avi Kivity a...@redhat.com wrote:

  [...] Second, from my point of view all contributors are volunteers 
  (perhaps their employer volunteered them, but there's no difference from 
  my perspective). Asking them to repaint my apartment as a condition to 
  get a patch applied is abuse.  If a patch is good, it gets applied.
 
  This is one of the weirdest arguments i've seen in this thread. Almost all 
  the time do we make contributions conditional on the general shape of the 
  project. Developers dont get to do just the fun stuff.
 
 So, do you think a reply to a patch along the lines of
 
   NAK.  Improving scalability is pointless while we don't have a decent GUI.  
 I'll review you RCU patches
   _after_ you've contributed a usable GUI.
 
 ?

What does this have to do with RCU?

I'm talking about KVM, which is a Linux kernel feature that is useless without 
a proper, KVM-specific app making use of it.

RCU is a general kernel performance feature that works across the board. It 
helps KVM indirectly, and it helps many other kernel subsystems as well. It 
needs no user-space tool to be useful.

KVM on the other hand is useless without a user-space tool.

[ Theoretically you might have a fair point if it were a critical feature of 
  RCU for it to have a GUI, and if the main tool that made use of it sucked. 
  But it isnt and you should know that. ]

Had you suggested the following 'NAK', applied to a different, relevant 
subsystem:

  |   NAK.  Improving scalability is pointless while we don't have a usable 
  | tool.  I'll review you perf patches _after_ you've contributed a usable 
  | tool.

you would have a fair point. In fact, we are doing that we are living by that. 
It makes absolutely zero sense to improve the scalability of perf if its 
usability sucks.

So where you are trying to point out an inconsistency in my argument there is 
none.

  This is a basic quid pro quo: new features introduce risks and create 
  additional workload not just to the originating developer but on the rest 
  of the community as well. You should check how Linus has pulled new 
  features in the past 15 years: he very much requires the existing code to 
  first be top-notch before he accepts new features for a given area of 
  functionality.
 
 For a given area, yes. [...]

That is my precise point.

KVM is a specific subsystem or area that makes no sense without the 
user-space tooling it relates to. You seem to argue that you have no 'right' 
to insist on good quality of that tooling - and IMO you are fundamentally 
wrong with that.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Ingo Molnar


* Anthony Liguori anth...@codemonkey.ws wrote:

 On 03/19/2010 03:53 AM, Ingo Molnar wrote:
 * Avi Kivitya...@redhat.com  wrote:
 
 There were two negative reactions immediately, both showed a fundamental
 server versus desktop bias:
 
   - you did not accept that the most important usecase is when there is a
 single guest running.
 Well, it isn't.
 Erm, my usability points are _doubly_ true when there are multiple guests ...
 
 The inconvenience of having to type:
 
perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
--guestmodules=/home/ymzhang/guest/modules top
 
 is very obvious even with a single guest. Now multiply that by more guests 
 ...
 
 If you want to improve this, you need to do the following:
 
 1) Add a userspace daemon that uses vmchannel that runs in the guest and can 
fetch kallsyms and arbitrary modules.  If that daemon lives in 
tools/perf, that's fine.

Adding any new daemon to an existing guest is a deployment and usability 
nightmare.

The basic rule of good instrumentation is to be transparent. The moment we 
have to modify the user-space of a guest just to monitor it, the purpose of 
transparent instrumentation is defeated.

That was one of the fundamental usability mistakes of Oprofile.

There is no 'perf' daemon - all the perf functionality is _built in_, and for 
very good reasons. It is one of the main reasons for perf's success as well.

Now Qemu is trying to repeat that stupid mistake ...

So please either suggest a different transparent solution that is technically 
better than the one i suggested, or you should concede the point really.

Please try think with the heads of our users and developers and dont suggest 
some weird ivory-tower design that is totally impractical ...

And no, you have to code none of this, we'll do all the coding. The only thing 
we are asking is for you to not stand in the way of good usability ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Streaming Audio from Virtual Machine

2010-03-21 Thread Gus Zernial

I'm using Kubuntu 9.10 32-bit on a quad-core Phenom II with 
Gigabit ethernet. I want to stream audio from MLB.com from a 
WinXP client thru a Linksys WMB54G wireless music bridge. Note 
that there are drivers for the WMB54G only for WinXP and Vista.

If I stream the audio thru a native WinXP box thru the WMB54G,
all is well and the audio sounds fine. When I try to stream thru a 
WinXP virtual machine on Kubuntu 9.10, the audio is poor quality
and subject to gaps and dropping the stream altogether. So far
I've tried KVM/QEMU and VirtualBox, same result.

Regards KVM/QEMU, I note AMD-V is activated in the BIOS, and I have a 
custom 2.6.32.7 kernel, and QEMU 0.11.0. The kvm kvm_amd modules are compiled 
in and loaded. I've been using bridged networking . I think it's set up 
correctly but I confess I'm no networking expert. My start command for the 
WinXP virtual machine is:

sudo /usr/bin/qemu -m 1024 -boot c 
-netnic,vlan=0,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net 
tap,vlan=0,ifname=tap0,script=/etc/qemu-ifup -localtime -soundhw ac97 -smp 4 
-fda /dev/fd0 -vga std -usb /home/rbroman/windows.img

I also tried model=virtio but that didn't help. 

I suspect this is a virtual machine networking problem but I'm
not sure. So my questions are:

-What's the best/fastest networking option and how do I set it up?
Pointers to step-by-step instructions appreciated.

-Is it possible I have a problem other than networking? Configuration
problem with KVM/QEMU? Or could there be a problem with the WMB54G driver when 
used thru a virtual machine?

-Is there a better virtual machine solution than KVM/QEMU for what 
I'm trying to do?

Recommendations appreciated - Gus





  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Antoine Martin


On 03/22/2010 02:17 AM, Ingo Molnar wrote:

* Anthony Liguorianth...@codemonkey.ws  wrote:
   

On 03/19/2010 03:53 AM, Ingo Molnar wrote:
 

* Avi Kivitya...@redhat.com   wrote:
   

There were two negative reactions immediately, both showed a fundamental
server versus desktop bias:

  - you did not accept that the most important usecase is when there is a
single guest running.
   

Well, it isn't.
 

Erm, my usability points are _doubly_ true when there are multiple guests ...

The inconvenience of having to type:

   perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
   --guestmodules=/home/ymzhang/guest/modules top

is very obvious even with a single guest. Now multiply that by more guests ...
   

If you want to improve this, you need to do the following:

1) Add a userspace daemon that uses vmchannel that runs in the guest and can
fetch kallsyms and arbitrary modules.  If that daemon lives in
tools/perf, that's fine.
 

Adding any new daemon to an existing guest is a deployment and usability
nightmare.
   
Absolutely. In most cases it is not desirable, and you'll find that in a 
lot of cases it is not even possible - for non-technical reasons.
One of the main benefits of virtualization is the ability to manage and 
see things from the outside.

The basic rule of good instrumentation is to be transparent. The moment we
have to modify the user-space of a guest just to monitor it, the purpose of
transparent instrumentation is defeated.
   

Not to mention Heisenbugs and interference.

Cheers
Antoine


That was one of the fundamental usability mistakes of Oprofile.

There is no 'perf' daemon - all the perf functionality is _built in_, and for
very good reasons. It is one of the main reasons for perf's success as well.

Now Qemu is trying to repeat that stupid mistake ...

So please either suggest a different transparent solution that is technically
better than the one i suggested, or you should concede the point really.

Please try think with the heads of our users and developers and dont suggest
some weird ivory-tower design that is totally impractical ...

And no, you have to code none of this, we'll do all the coding. The only thing
we are asking is for you to not stand in the way of good usability ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Ingo Molnar


* Antoine Martin anto...@nagafix.co.uk wrote:

 On 03/22/2010 02:17 AM, Ingo Molnar wrote:
 * Anthony Liguorianth...@codemonkey.ws  wrote:
 On 03/19/2010 03:53 AM, Ingo Molnar wrote:
 * Avi Kivitya...@redhat.com   wrote:
 There were two negative reactions immediately, both showed a fundamental
 server versus desktop bias:
 
   - you did not accept that the most important usecase is when there is a
 single guest running.
 Well, it isn't.
 Erm, my usability points are _doubly_ true when there are multiple guests 
 ...
 
 The inconvenience of having to type:
 
perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
--guestmodules=/home/ymzhang/guest/modules top
 
 is very obvious even with a single guest. Now multiply that by more guests 
 ...
 If you want to improve this, you need to do the following:
 
 1) Add a userspace daemon that uses vmchannel that runs in the guest and can
 fetch kallsyms and arbitrary modules.  If that daemon lives in
 tools/perf, that's fine.
 
  Adding any new daemon to an existing guest is a deployment and usability 
  nightmare.

 Absolutely. In most cases it is not desirable, and you'll find that in a lot 
 of cases it is not even possible - for non-technical reasons.

 One of the main benefits of virtualization is the ability to manage and see 
 things from the outside.

  The basic rule of good instrumentation is to be transparent. The moment we 
  have to modify the user-space of a guest just to monitor it, the purpose 
  of transparent instrumentation is defeated.

 Not to mention Heisenbugs and interference.

Correct.

Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony 
suggesting such a clearly inferior add a demon to the guest space solution. 
It's a usability and deployment non-starter.

Furthermore, allowing a guest to integrate/mount its files into the host VFS 
space (which was my suggestion) has many other uses and advantages as well, 
beyond the instrumentation/symbol-lookup purpose.

So can we please have some resolution here and move on: the KVM maintainers 
should either suggest a different transparent approach, or should retract the 
NAK for the solution we suggested.

We very much want to make progress and want to write code, but obviously we 
cannot code against a maintainer NAK, nor can we code up an inferior solution 
either.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Avi Kivity


On 03/21/2010 09:17 PM, Ingo Molnar wrote:


Adding any new daemon to an existing guest is a deployment and usability
nightmare.
   


The logical conclusion of that is that everything should be built into 
the kernel.  Where a failure brings the system down or worse.  Where you 
have to bear the memory footprint whether you ever use the functionality 
or not.  Where to update the functionality you need to deploy a new 
kernel (possibly introducing unrelated bugs) and reboot.


If userspace daemons are such a deployment and usability nightmare, 
maybe we should fix that instead.



The basic rule of good instrumentation is to be transparent. The moment we
have to modify the user-space of a guest just to monitor it, the purpose of
transparent instrumentation is defeated.
   


You have to modify the guest anyway by deploying a new kernel.


Please try think with the heads of our users and developers and dont suggest
some weird ivory-tower design that is totally impractical ...
   


inetd.d style 'drop a listener config here and it will be executed on 
connection' should work.  The listener could come with the kernel 
package, though I don't think it's a good idea.  module-init-tools 
doesn't and people have survived somehow.



And no, you have to code none of this, we'll do all the coding. The only thing
we are asking is for you to not stand in the way of good usability ...
   


Thanks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Olivier Galibert

On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote:
 On 03/21/2010 09:17 PM, Ingo Molnar wrote:
 
 Adding any new daemon to an existing guest is a deployment and usability
 nightmare.

 
 The logical conclusion of that is that everything should be built into 
 the kernel.  Where a failure brings the system down or worse.  Where you 
 have to bear the memory footprint whether you ever use the functionality 
 or not.  Where to update the functionality you need to deploy a new 
 kernel (possibly introducing unrelated bugs) and reboot.
 
 If userspace daemons are such a deployment and usability nightmare, 
 maybe we should fix that instead.

Which userspace?  Deploying *anything* in the guest can be a
nightmare, including paravirt drivers if you don't have a natively
supported in the OS virtual hardware backoff.  Deploying things in the
host OTOH is business as usual.

And you're smart enough to know that.

  OG.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Tracking KVM development

2010-03-21 Thread Alexander Graf


On 21.03.2010, at 17:42, Avi Kivity wrote:

 On 03/21/2010 06:37 PM, Thomas Løcke wrote:
 On Sun, Mar 21, 2010 at 1:23 PM, Avi Kivitya...@redhat.com  wrote:
   
 Tracking git repositories and stable setups are mutually exclusive.  If you
 are interested in something stable I recommend staying with the distribution
 provided setup (and picking a distribution that has an emphasis on kvm).  If
 you want to track upstream, use qemu-kvm-0.12.x stable releases and
 kernel.org 2.6.x.y stable releases.  If you want to track git repositories,
 use qemu-kvm.git and kvm.git for the kernel and kvm.
 
 Thanks Avi.
 
 I will stay with the stable qemu-kvm releases and stable kernel.org
 kernel releases from now on.
 
 I've never heard of any KVM specific distributions. Are you aware of
 any? My primary reason for going with Slackware, is because I already
 know it. But if there are better choices for a KVM virtualization
 host, then I'm willing to switch.
   
 
 The only kvm-specific distribution I know of is RHEV-H, but that's probably 
 not what you're looking for.  I'm talking about distributions that have an 
 active kvm package maintainer, update the packages regularly, have bug 
 trackers that someone looks into, etc.  At least Fedora and Ubuntu do this, 
 perhaps openSuSE as well (though the latter has a stronger Xen emphasis).

Yes, we do. Though openSUSE 11.2 isn't exactly where I want it to be. Expect 
11.3 to be a lot better there.

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Avi Kivity


On 03/21/2010 09:59 PM, Ingo Molnar wrote:


Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony
suggesting such a clearly inferior add a demon to the guest space solution.
It's a usability and deployment non-starter.
   


It's only clearly inferior if you ignore every consideration against 
it.  It's definitely not a deployment non-starter, see the tons of 
daemons that come with any Linux system.  The basic ones are installed 
and enabled automatically during system installation.



Furthermore, allowing a guest to integrate/mount its files into the host VFS
space (which was my suggestion) has many other uses and advantages as well,
beyond the instrumentation/symbol-lookup purpose.
   


Yes.  I'm just not sure about the auto-enabling part.


So can we please have some resolution here and move on: the KVM maintainers
should either suggest a different transparent approach, or should retract the
NAK for the solution we suggested.
   


So long as you define 'transparent' as in 'only the guest kernel is 
involved' or even 'only the guest and host kernels are involved' we 
aren't going to make a lot of progress.  I oppose shoving random bits of 
functionality into the kernel, especially things that are in daily use.  
While us developers do and will use profiling extensively, it doesn't 
need sit in every guest's non-swappable .text.



We very much want to make progress and want to write code, but obviously we
cannot code against a maintainer NAK, nor can we code up an inferior solution
either.
   


You haven't heard any NAKs, only objections.  If we discuss things 
perhaps we can achieve something that works for everyone.  If we keep 
turning the flames higher that's unlikely.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-21 Thread Avi Kivity


On 03/21/2010 10:08 PM, Olivier Galibert wrote:

On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote:
   

On 03/21/2010 09:17 PM, Ingo Molnar wrote:
 

Adding any new daemon to an existing guest is a deployment and usability
nightmare.

   

The logical conclusion of that is that everything should be built into
the kernel.  Where a failure brings the system down or worse.  Where you
have to bear the memory footprint whether you ever use the functionality
or not.  Where to update the functionality you need to deploy a new
kernel (possibly introducing unrelated bugs) and reboot.

If userspace daemons are such a deployment and usability nightmare,
maybe we should fix that instead.
 

Which userspace?  Deploying *anything* in the guest can be a
nightmare, including paravirt drivers if you don't have a natively
supported in the OS virtual hardware backoff.


That includes the guest kernel.  If you can deploy a new kernel in the 
guest, presumably you can deploy a userspace package.



Deploying things in the
host OTOH is business as usual.
   


True.


And you're smart enough to know that.
   


Thanks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

CONFIG_HAVE_KVM=n impossible?

2010-03-21 Thread devzero

Hello, 

does anybody know why it seems that it`s not possible to build a kernel with 
CONFIG_HAVE_KVM=n ?

It always switches back to y with every kernel build and i have no clue, why.

i`m using 2.6.33 vanilla.

regards
Roland
___
WEB.DE DSL: Internet, Telefon und Entertainment für nur 19,99 EUR/mtl.!
http://produkte.web.de/go/02/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Tracking KVM development

2010-03-21 Thread Michael Tokarev

Avi Kivity wrote:
[]
 The only kvm-specific distribution I know of is RHEV-H, but that's
 probably not what you're looking for.  I'm talking about distributions
 that have an active kvm package maintainer, update the packages
 regularly, have bug trackers that someone looks into, etc.  At least
 Fedora and Ubuntu do this, perhaps openSuSE as well (though the latter
 has a stronger Xen emphasis).

Debian is a lot better on this front than it used to be a year ago.
At least I'm trying to look for the bugreports on a regular basis ;)

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CONFIG_HAVE_KVM=n impossible?

2010-03-21 Thread Michael Tokarev

devz...@web.de wrote:
 Hello, 
 
 does anybody know why it seems that it`s not possible to build a kernel with 
 CONFIG_HAVE_KVM=n ?
 
 It always switches back to y with every kernel build and i have no clue, 
 why.

It's an internal config symbol which is not visible in the menu
system and is always set up unconditionally based on the platform.
Just like CONFIG_HAVE_MMU.

You want another symbols, like CONFIG_KVM.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 123 matches

Mail list logo