[PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-19 Thread Han, Weidong
>From 9d8e927a937ff7c9fa2bcc3aa5359e73990658f0 Mon Sep 17 00:00:00 2001
From: Weidong Han <[EMAIL PROTECTED]>
Date: Fri, 19 Sep 2008 14:04:52 +0800
Subject: [PATCH] Fix iommu map page for mmio pages

Don't need to map mmio pages for iommu. When find mmio pages in
kvm_iommu_map_pages(), don't map them, and shouldn't return error due to
it's not an error. If return error (such as -EINVAL), device assigment
will fail.

Signed-off-by: Weidong Han <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vtd.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index 667bf3f..b00cdbd 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -36,14 +36,13 @@ int kvm_iommu_map_pages(struct kvm *kvm,
 {
gfn_t gfn = base_gfn;
pfn_t pfn;
-   int i, r;
+   int i, r = 0;
struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
 
/* check if iommu exists and in use */
if (!domain)
return 0;
 
-   r = -EINVAL;
for (i = 0; i < npages; i++) {
/* check if already mapped */
pfn = (pfn_t)intel_iommu_iova_to_pfn(domain,
@@ -60,13 +59,14 @@ int kvm_iommu_map_pages(struct kvm *kvm,
 DMA_PTE_READ |
 DMA_PTE_WRITE);
if (r) {
-   printk(KERN_DEBUG "kvm_iommu_map_pages:"
+   printk(KERN_ERR "kvm_iommu_map_pages:"
   "iommu failed to map pfn=%lx\n",
pfn);
goto unmap_pages;
}
} else {
-   printk(KERN_DEBUG "kvm_iommu_map_page:"
-  "invalid pfn=%lx\n", pfn);
+   printk(KERN_DEBUG "kvm_iommu_map_pages:"
+  "invalid pfn=%lx, iommu needn't map "
+  "MMIO pages!\n", pfn);
goto unmap_pages;
}
gfn++;
-- 
1.5.1


0001-Fix-iommu-map-page-for-mmio-pages.patch
Description: 0001-Fix-iommu-map-page-for-mmio-pages.patch


RE: [PATCH] KVM/userspace: Support for assigning PCI devices to guests

2008-09-19 Thread Han, Weidong
Amit,

There are a few format issues in your patch, and this patch doesn't
work. 

Flag KVM_DEV_ASSIGN_ENABLE_IOMMU is not set correctly. My comment
inline.


Amit Shah wrote:
> +
> +static AssignedDevice *register_real_device(PCIBus *e_bus,
> + const char *e_dev_name,
> + int e_devfn, uint8_t r_bus,
> + uint8_t r_dev, uint8_t
r_func,
> + int flags)
> +{
> + int r;
> + AssignedDevice *pci_dev;
> + uint8_t e_device, e_intx;
> +
> + DEBUG("%s: Registering real physical device %s (devfn=0x%x)\n",
> +   __func__, e_dev_name, e_devfn);
> +
> + pci_dev = (AssignedDevice *)
> + pci_register_device(e_bus, e_dev_name,
sizeof(AssignedDevice),
> + e_devfn,
assigned_dev_pci_read_config,
> + assigned_dev_pci_write_config);
> + if (NULL == pci_dev) {
> + fprintf(stderr, "%s: Error: Couldn't register real
device %s\n",
> + __func__, e_dev_name);
> + return NULL;
> + }
> + if (get_real_device(pci_dev, r_bus, r_dev, r_func)) {
> + fprintf(stderr, "%s: Error: Couldn't get real device
(%s)!\n",
> + __func__, e_dev_name);
> + goto out;
> + }
> +
> + /* handle real device's MMIO/PIO BARs */
> + if (assigned_dev_register_regions(pci_dev->real_device.regions,
> +
pci_dev->real_device.region_number,
> +   pci_dev))
> + goto out;
> +
> + /* handle interrupt routing */
> + e_device = (pci_dev->dev.devfn >> 3) & 0x1f;
> + e_intx = pci_dev->dev.config[0x3d] - 1;
> + pci_dev->intpin = e_intx;
> + pci_dev->run = 0;
> + pci_dev->girq = 0;
> + pci_dev->h_busnr = r_bus;
> + pci_dev->h_devfn = PCI_DEVFN(r_dev, r_func);
> +
> +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> + if (kvm_enabled()) {
> + struct kvm_assigned_pci_dev assigned_dev_data;
> +
> + memset(&assigned_dev_data, 0,
sizeof(assigned_dev_data));
> + assigned_dev_data.assigned_dev_id  =
> + calc_assigned_dev_id(pci_dev->h_busnr,
> +
(uint32_t)pci_dev->h_devfn);
> + assigned_dev_data.busnr = pci_dev->h_busnr;
> + assigned_dev_data.devfn = pci_dev->h_devfn;
> + assigned_dev_data.flags = flags;
> +#ifdef KVM_CAP_PV_DMA
> + assigned_dev_data.guest_dev_id =
> + calc_assigned_dev_id(pci_bus_num(e_bus),
> +  PCI_DEVFN(e_device,
r_func));
> +#endif
> +
> +#ifdef KVM_CAP_IOMMU
> + /* We always enable the IOMMU if present
> +  * (or when not disabled on the command line)
> +  */
> + r = kvm_check_extension(kvm_context, KVM_CAP_IOMMU);
> + if (r && !disable_iommu)
> + assigned_devices[nr_assigned_devices].dma |=
> + KVM_DEV_ASSIGN_ENABLE_IOMMU;

should add assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU here,
otherwise following kvm_assign_pci_device() won't assign device with
iommu.

> +#endif
> + r = kvm_assign_pci_device(kvm_context,
> +   &assigned_dev_data);
> + if (r < 0) {
> + fprintf(stderr, "Could not notify kernel about "
> + "assigned device \"%s\"\n", e_dev_name);
> + perror("pt-ioctl");
> + goto out;
> + }
> + }
> +#endif


In addtion, I think we should add following lines to kernel/x86/Kbuild:

ifeq ($(CONFIG_DMAR),y)
kvm-objs += vtd.o
endif

otherwise, "modprobe kvm" after making userspace.

Randy (Weidong)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/userspace: Support for assigning PCI devices to guests

2008-09-19 Thread Amit Shah
Hello Weidong,

- "Weidong Han" <[EMAIL PROTECTED]> wrote:

> Amit,
> 
> There are a few format issues in your patch, and this patch doesn't
> work. 
> 
> Flag KVM_DEV_ASSIGN_ENABLE_IOMMU is not set correctly. My comment
> inline.

> > +#ifdef KVM_CAP_IOMMU
> > +   /* We always enable the IOMMU if present
> > +* (or when not disabled on the command line)
> > +*/
> > +   r = kvm_check_extension(kvm_context, KVM_CAP_IOMMU);
> > +   if (r && !disable_iommu)
> > +   assigned_devices[nr_assigned_devices].dma |=
> > +   KVM_DEV_ASSIGN_ENABLE_IOMMU;
> 
> should add assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU
> here,
> otherwise following kvm_assign_pci_device() won't assign device with
> iommu.

Correct. I'll update this.

BTW, I might get a VT-d machine soon so that I can start testing VT-d.

> In addtion, I think we should add following lines to
> kernel/x86/Kbuild:
> 
>   ifeq ($(CONFIG_DMAR),y)
>   kvm-objs += vtd.o
>   endif
> 
> otherwise, "modprobe kvm" after making userspace.

Yes, the userspace needs a way to compile the vtd module.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] Enhance NMI support of KVM - v2

2008-09-19 Thread Jan Kiszka
After going through the NMI patches again, implementing a workaround for
older VMX CPUs without virtual NMIs, I came across several inconsistency
and lacking/forgotten features around NMI (and also a bit IRQ) handling.
So here is an enhanced patch series. Changes are:

 - VMX: workaround for lacking VNMI support on older CPUs with VMX
 - VMX: consolidate and fix NMI/IRQ window state determination
 - VMX: consolidate enabling code for NMI/IRQ window notification
 - VMX: fix NMI delivery in real-mode
 - rebased patch for in-kernel NMI watchdog support

Looking forward to feedback.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Enhance NMI support of KVM - v2

2008-09-19 Thread Jan Kiszka
Jan Kiszka wrote:
> After going through the NMI patches again, implementing a workaround for
> older VMX CPUs without virtual NMIs, I came across several inconsistency
> and lacking/forgotten features around NMI (and also a bit IRQ) handling.
> So here is an enhanced patch series. Changes are:
> 
>  - VMX: workaround for lacking VNMI support on older CPUs with VMX
>  - VMX: consolidate and fix NMI/IRQ window state determination
>  - VMX: consolidate enabling code for NMI/IRQ window notification
>  - VMX: fix NMI delivery in real-mode
>  - rebased patch for in-kernel NMI watchdog support

Oh, and:
 - make 'nmi' monitor command kvm-safe

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] VMX: refactor/fix IRQ and NMI injectability determination

2008-09-19 Thread Jan Kiszka
There are currently two ways in VMX to check if an IRQ or NMI can be
injected:
 - vmx_{nmi|irq}_enabled and
 - vcpu.arch.{nmi|interrupt}_window_open.
Even worse, one test (at the end of vmx_vcpu_run) uses an inconsistent,
likely incorrect logic.

This patch consolidates and unifies the tests over
{nmi|interrupt}_window_open as cache + vmx_update_window_states
for updating the cache content.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   46 -
 include/asm-x86/kvm_host.h |1 
 2 files changed, 22 insertions(+), 25 deletions(-)

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2358,6 +2358,21 @@ static void vmx_inject_nmi(struct kvm_vc
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
 }
 
+static void vmx_update_window_states(struct kvm_vcpu *vcpu)
+{
+   u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
+
+   vcpu->arch.nmi_window_open =
+   !(guest_intr & (GUEST_INTR_STATE_STI |
+   GUEST_INTR_STATE_MOV_SS |
+   GUEST_INTR_STATE_NMI));
+
+   vcpu->arch.interrupt_window_open =
+   ((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
+!(guest_intr & (GUEST_INTR_STATE_STI |
+GUEST_INTR_STATE_MOV_SS)));
+}
+
 static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
 {
int word_index = __ffs(vcpu->arch.irq_summary);
@@ -2370,15 +2385,12 @@ static void kvm_do_inject_irq(struct kvm
kvm_queue_interrupt(vcpu, irq);
 }
 
-
 static void do_interrupt_requests(struct kvm_vcpu *vcpu,
   struct kvm_run *kvm_run)
 {
u32 cpu_based_vm_exec_control;
 
-   vcpu->arch.interrupt_window_open =
-   ((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
-(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0);
+   vmx_update_window_states(vcpu);
 
if (vcpu->arch.interrupt_window_open &&
vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
@@ -3049,22 +3061,6 @@ static void enable_nmi_window(struct kvm
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
 }
 
-static int vmx_nmi_enabled(struct kvm_vcpu *vcpu)
-{
-   u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
-   return !(guest_intr & (GUEST_INTR_STATE_NMI |
-  GUEST_INTR_STATE_MOV_SS |
-  GUEST_INTR_STATE_STI));
-}
-
-static int vmx_irq_enabled(struct kvm_vcpu *vcpu)
-{
-   u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
-   return (!(guest_intr & (GUEST_INTR_STATE_MOV_SS |
-  GUEST_INTR_STATE_STI)) &&
-   (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF));
-}
-
 static void enable_intr_window(struct kvm_vcpu *vcpu)
 {
if (vcpu->arch.nmi_pending)
@@ -3133,9 +3129,11 @@ static void vmx_intr_assist(struct kvm_v
 {
update_tpr_threshold(vcpu);
 
+   vmx_update_window_states(vcpu);
+
if (cpu_has_virtual_nmis()) {
if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
-   if (vmx_nmi_enabled(vcpu)) {
+   if (vcpu->arch.nmi_window_open) {
vcpu->arch.nmi_pending = false;
vcpu->arch.nmi_injected = true;
} else {
@@ -3150,7 +3148,7 @@ static void vmx_intr_assist(struct kvm_v
}
}
if (!vcpu->arch.interrupt.pending && kvm_cpu_has_interrupt(vcpu)) {
-   if (vmx_irq_enabled(vcpu))
+   if (vcpu->arch.interrupt_window_open)
kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
else
enable_irq_window(vcpu);
@@ -3311,9 +3309,7 @@ static void vmx_vcpu_run(struct kvm_vcpu
if (vmx->rmode.irq.pending)
fixup_rmode_irq(vmx);
 
-   vcpu->arch.interrupt_window_open =
-   (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
-(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)) == 0;
+   vmx_update_window_states(vcpu);
 
asm("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
vmx->launched = 1;
Index: b/include/asm-x86/kvm_host.h
===
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -321,6 +321,7 @@ struct kvm_vcpu_arch {
 
bool nmi_pending;
bool nmi_injected;
+   bool nmi_window_open;
 
u64 mtrr[0x100];
 };

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] VMX: Provide support for user space injected NMIs

2008-09-19 Thread Jan Kiszka
This patch adds the required bits to the VMX side for user space
injected NMIs. As with the preexisting in-kernel irqchip support, the
CPU must provide the "virtual NMI" feature for proper tracking of the
NMI blocking state.

Based on the original patch by Sheng Yang.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   29 +
 1 file changed, 29 insertions(+)

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2356,6 +2356,7 @@ static void vmx_inject_nmi(struct kvm_vc
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+   ++vcpu->stat.nmi_injections;
if (vcpu->arch.rmode.active) {
vmx->rmode.irq.pending = true;
vmx->rmode.irq.vector = NMI_VECTOR;
@@ -2424,6 +2425,26 @@ static void do_interrupt_requests(struct
 {
vmx_update_window_states(vcpu);
 
+   if (cpu_has_virtual_nmis()) {
+   if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
+   if (vcpu->arch.nmi_window_open) {
+   vcpu->arch.nmi_pending = false;
+   vcpu->arch.nmi_injected = true;
+   } else {
+   enable_nmi_window(vcpu);
+   return;
+   }
+   }
+   if (vcpu->arch.nmi_injected) {
+   vmx_inject_nmi(vcpu);
+   if (vcpu->arch.nmi_pending)
+   enable_nmi_window(vcpu);
+   else if (vcpu->arch.irq_summary)
+   enable_irq_window(vcpu);
+   return;
+   }
+   }
+
if (vcpu->arch.interrupt_window_open) {
if (vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
kvm_do_inject_irq(vcpu);
@@ -2936,6 +2957,14 @@ static int handle_nmi_window(struct kvm_
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
++vcpu->stat.nmi_window_exits;
 
+   /*
+* If the user space waits to inject a NNI, exit as soon as possible
+*/
+   if (kvm_run->request_nmi_window && !vcpu->arch.nmi_pending) {
+   kvm_run->exit_reason = KVM_EXIT_NMI_WINDOW_OPEN;
+   return 0;
+   }
+
return 1;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] kvm-x86: Enable NMI Watchdog via in-kernel PIT source

2008-09-19 Thread Jan Kiszka
LINT0 of the LAPIC can be used to route PIT events as NMI watchdog
ticks into the guest. This patch aligns the in-kernel irqchip emulation
with the user space irqchip with already supports this feature. The
trick is to route PIT interrupts to all LAPIC's LVT0 lines.

Rebased patch and slightly polished patch originally posted by Sheng
Yang.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/i8254.c |   15 +++
 arch/x86/kvm/irq.h   |1 +
 arch/x86/kvm/lapic.c |   32 
 3 files changed, 44 insertions(+), 4 deletions(-)

Index: b/arch/x86/kvm/i8254.c
===
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -594,10 +594,25 @@ void kvm_free_pit(struct kvm *kvm)
 
 static void __inject_pit_timer_intr(struct kvm *kvm)
 {
+   struct kvm_vcpu *vcpu;
+   int i;
+
mutex_lock(&kvm->lock);
kvm_set_irq(kvm, 0, 1);
kvm_set_irq(kvm, 0, 0);
mutex_unlock(&kvm->lock);
+
+   /*
+* Provideds NMI watchdog support in IOAPIC mode.
+* The route is: PIT -> PIC -> LVT0 in NMI mode,
+* timer IRQs will continue to flow through the IOAPIC.
+*/
+   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+   vcpu = kvm->vcpus[i];
+   if (!vcpu)
+   continue;
+   kvm_apic_local_deliver(vcpu, APIC_LVT0);
+   }
 }
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
Index: b/arch/x86/kvm/irq.h
===
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -93,6 +93,7 @@ void kvm_unregister_irq_ack_notifier(str
 void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
Index: b/arch/x86/kvm/lapic.c
===
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -382,6 +382,14 @@ static int __apic_accept_irq(struct kvm_
}
break;
 
+   case APIC_DM_EXTINT:
+   /*
+* Should only be called by kvm_apic_local_deliver() with LVT0,
+* before NMI watchdog was enabled. Already handled by
+* kvm_apic_accept_pic_intr().
+*/
+   break;
+
default:
printk(KERN_ERR "TODO: unsupported delivery mode %x\n",
   delivery_mode);
@@ -749,6 +757,9 @@ static void apic_mmio_write(struct kvm_i
case APIC_LVTTHMR:
case APIC_LVTPC:
case APIC_LVT0:
+   if (val == APIC_DM_NMI)
+   apic_debug("Receive NMI setting on APIC_LVT0 "
+   "for cpu %d\n", apic->vcpu->vcpu_id);
case APIC_LVT1:
case APIC_LVTERR:
/* TODO: Check vector */
@@ -965,12 +976,25 @@ int apic_has_pending_timer(struct kvm_vc
return 0;
 }
 
-static int __inject_apic_timer_irq(struct kvm_lapic *apic)
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type)
 {
-   int vector;
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   int vector, mode, trig_mode;
+   u32 reg;
+
+   if (apic && apic_enabled(apic)) {
+   reg = apic_get_reg(apic, lvt_type);
+   vector = reg & APIC_VECTOR_MASK;
+   mode = reg & APIC_MODE_MASK;
+   trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
+   return __apic_accept_irq(apic, mode, vector, 1, trig_mode);
+   }
+   return 0;
+}
 
-   vector = apic_lvt_vector(apic, APIC_LVTT);
-   return __apic_accept_irq(apic, APIC_DM_FIXED, vector, 1, 0);
+static inline int __inject_apic_timer_irq(struct kvm_lapic *apic)
+{
+   return kvm_apic_local_deliver(apic->vcpu, APIC_LVTT);
 }
 
 static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] VMX: fix real-mode NMI support

2008-09-19 Thread Jan Kiszka
Fix NMI injection in real-mode with the same pattern we perform IRQ
injection.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   13 +
 1 file changed, 13 insertions(+)

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2354,6 +2354,19 @@ static void vmx_inject_irq(struct kvm_vc
 
 static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (vcpu->arch.rmode.active) {
+   vmx->rmode.irq.pending = true;
+   vmx->rmode.irq.vector = NMI_VECTOR;
+   vmx->rmode.irq.rip = kvm_rip_read(vcpu);
+   vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
+NMI_VECTOR | INTR_TYPE_SOFT_INTR |
+INTR_INFO_VALID_MASK);
+   vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, 1);
+   kvm_rip_write(vcpu, vmx->rmode.irq.rip - 1);
+   return;
+   }
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
 }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] VMX: refactor IRQ and NMI window enabling

2008-09-19 Thread Jan Kiszka
do_interrupt_requests and vmx_intr_assist go different way for
achieving the same: enabling the nmi/irq window start notification.
Unify their code over enable_{irq|nmi}_window, get rid of a redundant
call to enable_intr_window instead of direct enable_nmi_window
invocation and unroll enable_intr_window for both in-kernel and user
space irq injection accordingly.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   78 +
 1 file changed, 32 insertions(+), 46 deletions(-)

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2385,30 +2385,42 @@ static void kvm_do_inject_irq(struct kvm
kvm_queue_interrupt(vcpu, irq);
 }
 
-static void do_interrupt_requests(struct kvm_vcpu *vcpu,
-  struct kvm_run *kvm_run)
+static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
u32 cpu_based_vm_exec_control;
 
-   vmx_update_window_states(vcpu);
+   cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+   cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
+   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
 
-   if (vcpu->arch.interrupt_window_open &&
-   vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
-   kvm_do_inject_irq(vcpu);
+static void enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+   u32 cpu_based_vm_exec_control;
 
-   if (vcpu->arch.interrupt_window_open && vcpu->arch.interrupt.pending)
-   vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+   if (!cpu_has_virtual_nmis())
+   return;
 
cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+   cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
+   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
+
+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
+  struct kvm_run *kvm_run)
+{
+   vmx_update_window_states(vcpu);
+
+   if (vcpu->arch.interrupt_window_open) {
+   if (vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
+   kvm_do_inject_irq(vcpu);
+
+   if (vcpu->arch.interrupt.pending)
+   vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+   }
if (!vcpu->arch.interrupt_window_open &&
(vcpu->arch.irq_summary || kvm_run->request_interrupt_window))
-   /*
-* Interrupts blocked.  Wait for unblock.
-*/
-   cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
-   else
-   cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
-   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+   enable_irq_window(vcpu);
 }
 
 static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
@@ -3040,35 +3052,6 @@ static void update_tpr_threshold(struct
vmcs_write32(TPR_THRESHOLD, (max_irr > tpr) ? tpr >> 4 : max_irr >> 4);
 }
 
-static void enable_irq_window(struct kvm_vcpu *vcpu)
-{
-   u32 cpu_based_vm_exec_control;
-
-   cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
-   cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
-   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
-}
-
-static void enable_nmi_window(struct kvm_vcpu *vcpu)
-{
-   u32 cpu_based_vm_exec_control;
-
-   if (!cpu_has_virtual_nmis())
-   return;
-
-   cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
-   cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
-   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
-}
-
-static void enable_intr_window(struct kvm_vcpu *vcpu)
-{
-   if (vcpu->arch.nmi_pending)
-   enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu))
-   enable_irq_window(vcpu);
-}
-
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
u32 exit_intr_info;
@@ -3137,13 +3120,16 @@ static void vmx_intr_assist(struct kvm_v
vcpu->arch.nmi_pending = false;
vcpu->arch.nmi_injected = true;
} else {
-   enable_intr_window(vcpu);
+   enable_nmi_window(vcpu);
return;
}
}
if (vcpu->arch.nmi_injected) {
vmx_inject_nmi(vcpu);
-   enable_intr_window(vcpu);
+   if (vcpu->arch.nmi_pending)
+   enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu))
+   enable_irq_window(vcpu);
 

[PATCH 8/9] VMX: work around lacking VNMI support

2008-09-19 Thread Jan Kiszka
Older VMX supporting CPUs do not provide the "Virtual NMI" feature for
tracking the NMI-blocked state after injecting such events. For now
KVM is unable to inject NMIs on those CPUs.

Derived from Sheng Yang's suggestion to use the IRQ window notification
for detecting the end of NMI handlers, this patch implements virtual
NMI support without impact on the host's ability to receive real NMIs.
The downside is that the given approach requires some heuristics that
can cause NMI nesting in vary rare corner cases.

The approach works as follows:
 - check if the guest will receive the next NMI via an interrupt gate
   (i.e. handler will have interrupts disable), reject injection if not
 - inject NMI and set a software-based NMI-blocked flag
 - arm the IRQ window start notification whenever an NMI window is
   requested
 - if the guest exits due to an opening IRQ window, clear the emulated
   NMI-blocked flag
 - if the guest net execution time with NMI-blocked but without an IRQ
   window exceeds 1 second, force NMI-blocked reset and inject anyway

This approach covers most practical scenarios:
 - succeeding NMIs are seperated by at least one open IRQ window
 - the guest may spin with IRQs disabled (e.g. due to a bug), but
   leaving the NMI handler takes much less time than one second
 - the guest does not rely on strict ordering or timing of NMIs
   (would be problematic in virtualized environments anyway)

Successfully tested with the 'nmi n' monitor command, the kgdbts
testsuite on smp guests (additional patches required to add debug
register support to kvm), the kernel's nmi_watchdog=1, and a Siemens-
specific board emulation (+ guest) that comes with its own NMI
watchdog mechanism.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |  173 -
 1 file changed, 120 insertions(+), 53 deletions(-)

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -90,6 +90,11 @@ struct vcpu_vmx {
} rmode;
int vpid;
bool emulation_required;
+
+   /* Support for vnmi-less CPUs */
+   int soft_vnmi_blocked;
+   ktime_t entry_time;
+   s64 vnmi_blocked_time;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -2331,6 +2336,29 @@ out:
return ret;
 }
 
+static void enable_irq_window(struct kvm_vcpu *vcpu)
+{
+   u32 cpu_based_vm_exec_control;
+
+   cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+   cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
+   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
+
+static void enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+   u32 cpu_based_vm_exec_control;
+
+   if (!cpu_has_virtual_nmis()) {
+   enable_irq_window(vcpu);
+   return;
+   }
+
+   cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+   cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
+   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
+
 static void vmx_inject_irq(struct kvm_vcpu *vcpu, int irq)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2356,6 +2384,29 @@ static void vmx_inject_nmi(struct kvm_vc
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+   if (!cpu_has_virtual_nmis()) {
+   int desc_size = is_long_mode(vcpu) ? 16 : 8;
+   struct descriptor_table dt;
+   gpa_t gpa;
+   u64 desc;
+
+   /*
+* Deny delivery if the NMI will not be handled by an
+* interrupt gate (workaround depends on IRQ masking).
+*/
+   vmx_get_idt(vcpu, &dt);
+   if (!vcpu->arch.rmode.active && dt.limit
+   >= desc_size * (NMI_VECTOR + 1) - 1) {
+   gpa = vcpu->arch.mmu.gva_to_gpa(vcpu,
+   dt.base + desc_size * NMI_VECTOR);
+   if (kvm_read_guest(vcpu->kvm, gpa, &desc, 8) == 0
+   && ((desc >> 40) & 0x7) != 0x6)
+   return;
+   }
+   vmx->soft_vnmi_blocked = 1;
+   vmx->vnmi_blocked_time = 0;
+   }
+
++vcpu->stat.nmi_injections;
if (vcpu->arch.rmode.active) {
vmx->rmode.irq.pending = true;
@@ -2374,6 +2425,7 @@ static void vmx_inject_nmi(struct kvm_vc
 
 static void vmx_update_window_states(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
 
vcpu->arch.nmi_window_open =
@@ -2385,6 +2437,13 @@ static void vmx_update_window_states(str
((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
 !(guest_intr & (GUEST_INTR_STATE_STI |
 GUEST_INTR_STATE_MOV_SS)));
+
+

[PATCH 9/9] kvm: Enable NMI support for userspace irqchip

2008-09-19 Thread Jan Kiszka
Make use of the new KVM_NMI IOCTL to push NMIs into the KVM guest if the
user space APIC emulation or some other source raised them.

In order to use the 'nmi' monitor command which asynchroniously injects
NMIs for the given CPU, a new service called kvm_inject_interrupt is
required. This will invoke cpu_interrupt on the target VCPU, working
around the fact that the QEMU service is not thread-safe.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 libkvm/libkvm.c |   31 +++
 libkvm/libkvm.h |   23 +++
 qemu/monitor.c  |5 -
 qemu/qemu-kvm-x86.c |   26 +++---
 qemu/qemu-kvm.c |   18 +-
 qemu/qemu-kvm.h |2 ++
 6 files changed, 100 insertions(+), 5 deletions(-)

Index: b/libkvm/libkvm.c
===
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -811,6 +811,11 @@ int try_push_interrupts(kvm_context_t kv
return kvm->callbacks->try_push_interrupts(kvm->opaque);
 }
 
+int try_push_nmi(kvm_context_t kvm)
+{
+   return kvm->callbacks->try_push_nmi(kvm->opaque);
+}
+
 void post_kvm_run(kvm_context_t kvm, int vcpu)
 {
kvm->callbacks->post_kvm_run(kvm->opaque, vcpu);
@@ -835,6 +840,17 @@ int kvm_is_ready_for_interrupt_injection
return run->ready_for_interrupt_injection;
 }
 
+int kvm_is_ready_for_nmi_injection(kvm_context_t kvm, int vcpu)
+{
+#ifdef KVM_CAP_NMI
+   struct kvm_run *run = kvm->run[vcpu];
+
+   return run->ready_for_nmi_injection;
+#else
+   return 0;
+#endif
+}
+
 int kvm_run(kvm_context_t kvm, int vcpu)
 {
int r;
@@ -842,6 +858,9 @@ int kvm_run(kvm_context_t kvm, int vcpu)
struct kvm_run *run = kvm->run[vcpu];
 
 again:
+#ifdef KVM_CAP_NMI
+   run->request_nmi_window = try_push_nmi(kvm);
+#endif
 #if !defined(__s390__)
if (!kvm->irqchip_in_kernel)
run->request_interrupt_window = try_push_interrupts(kvm);
@@ -917,6 +936,9 @@ again:
r = handle_halt(kvm, vcpu);
break;
case KVM_EXIT_IRQ_WINDOW_OPEN:
+#ifdef KVM_CAP_NMI
+   case KVM_EXIT_NMI_WINDOW_OPEN:
+#endif
break;
case KVM_EXIT_SHUTDOWN:
r = handle_shutdown(kvm, vcpu);
@@ -1001,6 +1023,15 @@ int kvm_has_sync_mmu(kvm_context_t kvm)
 return r;
 }
 
+int kvm_inject_nmi(kvm_context_t kvm, int vcpu)
+{
+#ifdef KVM_CAP_NMI
+   return ioctl(kvm->vcpu_fd[vcpu], KVM_NMI);
+#else
+   return -ENOSYS;
+#endif
+}
+
 int kvm_init_coalesced_mmio(kvm_context_t kvm)
 {
int r = 0;
Index: b/libkvm/libkvm.h
===
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -66,6 +66,7 @@ struct kvm_callbacks {
 int (*shutdown)(void *opaque, int vcpu);
 int (*io_window)(void *opaque);
 int (*try_push_interrupts)(void *opaque);
+int (*try_push_nmi)(void *opaque);
 void (*post_kvm_run)(void *opaque, int vcpu);
 int (*pre_kvm_run)(void *opaque, int vcpu);
 int (*tpr_access)(void *opaque, int vcpu, uint64_t rip, int is_write);
@@ -216,6 +217,17 @@ uint64_t kvm_get_apic_base(kvm_context_t
 int kvm_is_ready_for_interrupt_injection(kvm_context_t kvm, int vcpu);
 
 /*!
+ * \brief Check if a vcpu is ready for NMI injection
+ *
+ * This checks if vcpu is not already running in NMI context.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param vcpu Which virtual CPU should get dumped
+ * \return boolean indicating NMI injection readiness
+ */
+int kvm_is_ready_for_nmi_injection(kvm_context_t kvm, int vcpu);
+
+/*!
  * \brief Read VCPU registers
  *
  * This gets the GP registers from the VCPU and outputs them
@@ -579,6 +591,17 @@ int kvm_set_lapic(kvm_context_t kvm, int
 
 #endif
 
+/*!
+ * \brief Simulate an NMI
+ *
+ * This allows you to simulate a non-maskable interrupt.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param vcpu Which virtual CPU should get dumped
+ * \return 0 on success
+ */
+int kvm_inject_nmi(kvm_context_t kvm, int vcpu);
+
 #endif
 
 /*!
Index: b/qemu/qemu-kvm-x86.c
===
--- a/qemu/qemu-kvm-x86.c
+++ b/qemu/qemu-kvm-x86.c
@@ -598,7 +598,8 @@ int kvm_arch_halt(void *opaque, int vcpu
 CPUState *env = cpu_single_env;
 
 if (!((env->interrupt_request & CPU_INTERRUPT_HARD) &&
- (env->eflags & IF_MASK))) {
+ (env->eflags & IF_MASK)) &&
+   !(env->interrupt_request & CPU_INTERRUPT_NMI)) {
 env->halted = 1;
env->exception_index = EXCP_HLT;
 }
@@ -627,8 +628,9 @@ void kvm_arch_post_kvm_run(void *opaque,
 
 int kvm_arch_has_work(CPUState *env)
 {
-if ((env->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_EXIT)) &&
-   (env->eflags & IF_MASK))
+if (((env->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_EXIT)) 
&&
+  

[PATCH 6/9] kvm-x86: Support for user space injected NMIs

2008-09-19 Thread Jan Kiszka
Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
injecting NMIs from user space and also extends the statistic report
accordingly.

Based on the original patch by Sheng Yang.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |   46 +++--
 include/asm-x86/kvm_host.h |2 +
 include/linux/kvm.h|   11 --
 3 files changed, 55 insertions(+), 4 deletions(-)

Index: b/arch/x86/kvm/x86.c
===
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -87,6 +87,7 @@ struct kvm_stats_debugfs_item debugfs_en
{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
{ "hypercalls", VCPU_STAT(hypercalls) },
{ "request_irq", VCPU_STAT(request_irq_exits) },
+   { "request_nmi", VCPU_STAT(request_nmi_exits) },
{ "irq_exits", VCPU_STAT(irq_exits) },
{ "host_state_reload", VCPU_STAT(host_state_reload) },
{ "efer_reload", VCPU_STAT(efer_reload) },
@@ -94,6 +95,7 @@ struct kvm_stats_debugfs_item debugfs_en
{ "insn_emulation", VCPU_STAT(insn_emulation) },
{ "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
{ "irq_injections", VCPU_STAT(irq_injections) },
+   { "nmi_injections", VCPU_STAT(nmi_injections) },
{ "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
{ "mmu_pte_write", VM_STAT(mmu_pte_write) },
{ "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
@@ -1549,6 +1551,15 @@ static int kvm_vcpu_ioctl_interrupt(stru
return 0;
 }
 
+static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
+{
+   vcpu_load(vcpu);
+   kvm_inject_nmi(vcpu);
+   vcpu_put(vcpu);
+
+   return 0;
+}
+
 static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu,
   struct kvm_tpr_access_ctl *tac)
 {
@@ -1608,6 +1619,13 @@ long kvm_arch_vcpu_ioctl(struct file *fi
r = 0;
break;
}
+   case KVM_NMI: {
+   r = kvm_vcpu_ioctl_nmi(vcpu);
+   if (r)
+   goto out;
+   r = 0;
+   break;
+   }
case KVM_SET_CPUID: {
struct kvm_cpuid __user *cpuid_arg = argp;
struct kvm_cpuid cpuid;
@@ -3063,18 +3081,37 @@ static int dm_request_for_irq_injection(
(kvm_x86_ops->get_rflags(vcpu) & X86_EFLAGS_IF));
 }
 
+/*
+ * Check if userspace requested a NMI window, and that the NMI window
+ * is open.
+ *
+ * No need to exit to userspace if we already have a NMI queued.
+ */
+static int dm_request_for_nmi_injection(struct kvm_vcpu *vcpu,
+   struct kvm_run *kvm_run)
+{
+   return (!vcpu->arch.nmi_pending &&
+   kvm_run->request_nmi_window &&
+   vcpu->arch.nmi_window_open);
+}
+
 static void post_kvm_run_save(struct kvm_vcpu *vcpu,
  struct kvm_run *kvm_run)
 {
kvm_run->if_flag = (kvm_x86_ops->get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
kvm_run->cr8 = kvm_get_cr8(vcpu);
kvm_run->apic_base = kvm_get_apic_base(vcpu);
-   if (irqchip_in_kernel(vcpu->kvm))
+   if (irqchip_in_kernel(vcpu->kvm)) {
kvm_run->ready_for_interrupt_injection = 1;
-   else
+   kvm_run->ready_for_nmi_injection = 1;
+   } else {
kvm_run->ready_for_interrupt_injection =
(vcpu->arch.interrupt_window_open &&
 vcpu->arch.irq_summary == 0);
+   kvm_run->ready_for_nmi_injection =
+   (vcpu->arch.nmi_window_open &&
+vcpu->arch.nmi_pending == 0);
+   }
 }
 
 static void vapic_enter(struct kvm_vcpu *vcpu)
@@ -3248,6 +3285,11 @@ static int __vcpu_run(struct kvm_vcpu *v
}
 
if (r > 0) {
+   if (dm_request_for_nmi_injection(vcpu, kvm_run)) {
+   r = -EINTR;
+   kvm_run->exit_reason = KVM_EXIT_NMI;
+   ++vcpu->stat.request_nmi_exits;
+   }
if (dm_request_for_irq_injection(vcpu, kvm_run)) {
r = -EINTR;
kvm_run->exit_reason = KVM_EXIT_INTR;
Index: b/include/asm-x86/kvm_host.h
===
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -388,6 +388,7 @@ struct kvm_vcpu_stat {
u32 halt_exits;
u32 halt_wakeup;
u32 request_irq_exits;
+   u32 request_nmi_exits;
u32 irq_exits;
u32 host_state_reload;
u32 efer_reload;
@@ -396,6 +397,7 @@ struct kvm_vcpu_stat {
u32 insn_emulation_fail;
u32 hypercalls;
u32 irq_injections;
+  

[PATCH 1/9] VMX: include all IRQ window exits in statistics

2008-09-19 Thread Jan Kiszka
irq_window_exits only tracks IRQ window exits due to user space
requests, nmi_window_exits include all exits. The latter makes more
sense, so let's adjust irq_window_exits accounting.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2763,6 +2763,7 @@ static int handle_interrupt_window(struc
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
 
KVMTRACE_0D(PEND_INTR, vcpu, handler);
+   ++vcpu->stat.irq_window_exits;
 
/*
 * If the user space waits to inject interrupts, exit as soon as
@@ -2771,7 +2772,6 @@ static int handle_interrupt_window(struc
if (kvm_run->request_interrupt_window &&
!vcpu->arch.irq_summary) {
kvm_run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
-   ++vcpu->stat.irq_window_exits;
return 0;
}
return 1;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


general and x86

2008-09-19 Thread Simone Berretta

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem adding new source files

2008-09-19 Thread Hacking, Stuart
 

> -Original Message-
> From: Uri Lublin [mailto:[EMAIL PROTECTED] 
> Sent: 18 September 2008 17:01
> To: Hacking, Stuart
> Cc: kvm@vger.kernel.org
> Subject: Re: Problem adding new source files
> 
> 
> Try just adding your new .o files (e.g: OBJS+=s1.o s2.o) to 
> /qemu/Makefile.target
> 
> 

Thanks, that seems to have done the trick.  Now all our coding errors
are being revealed! :-)

--Stuart
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmnet.sys BSOD w/ WinXP...

2008-09-19 Thread Daniel J Blueman
Hi Dor,

On Wed, Sep 17, 2008 at 5:04 PM, Dor Laor <[EMAIL PROTECTED]> wrote:
> Daniel J Blueman wrote:
>>
>> When using Windows XP 32 installed with TCP/IP and microsoft client
>> networking, I can reproduce an intermittent BSOD [1] with kvmnet.sys
>> 1.0.0 and 1.2.0, by aborting a large data transfer in an application.
>>
>> Since this reproduces with 1.0.0 kvmnet.sys, it looks unrelated to the
>> locking changes that went into 1.2.0, but something relating to when
>> sockets are closed, flushed or data discarded.
>>
>> Perhaps the offset into the driver at 0xF761A5A9 - 0xF7618000 may tell
>> us what is needed to reproduce and hint at what area the fix is needed
>> in?
>>
>> Many thanks,
>>  Daniel
>>
>> --- [1]
>>
>> DRIVER_IRQL_NOT_LESS_OR_EQUAL
>>
>> *** STOP: 0x00D1 (0x001C,0x0002,0x,0xF761A5A9)
>> ***   kvmnet.sys - Address F761A5A9 base at F7618000, DateStamp 47dd531c
>>
>
> Can you try http://people.qumranet.com/dor/Drivers-0-3107.iso this?

With the updated WinXP 32bit drivers here, I'm finding that the
application is experiences a socket disconnect/loss when upload
starts.

> Also please provide the specific way of producing load.

I'm using a software package from lecroy.com
[http://www.lecroy.com/tm/Library/Software/PSG/petracersummit.asp?menuid=8],
which connects to a device over the network and receives data at
~30Mbits/s with two concurrent streams.

> Along with it, please note kernel version, kvm version, qemu cmd line.

Host is Ubuntu 8.0.4.1 LTS w/ 2.6.24-19-generic kernel, x86-64.

Problem confirmed with KVM 74 "1:74+dfsg-0ubuntu2~ppa3h"; qemu version
"0.9.1-1ubuntu1".

Params are:

kvm -hda winxp-next.qcow2 -m 768 -soundhw '' -parallel none -serial
none -net nic,model=virtio -net user

Let me know if you need any more information.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2119399 ] Can not build KVM with 2.6.26 kernel

2008-09-19 Thread SourceForge.net
Bugs item #2119399, was opened at 2008-09-19 07:09
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2119399&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jiajun Xu (jiajun)
Assigned to: Nobody/Anonymous (nobody)
Summary: Can not build KVM with 2.6.26 kernel

Initial Comment:
Against latest kvm commit, 9644a6d164e3d6d0532ddb064393293134f31ab2. KVM
compile fail with 2.6.26.2 kernel. 

[EMAIL PROTECTED] kernel]# make
rm -f include/asm
ln -sf asm-x86 include/asm
ln -sf asm-x86 include-compat/asm
make -C /lib/modules/2.6.26.2/build M=`pwd` \
LINUXINCLUDE="-I`pwd`/include -Iinclude
-Iarch/x86/include -I`pwd`/include-compat \
-include include/linux/autoconf.h \
-include `pwd`/x86/external-module-compat.h"
make[1]: Entering directory `/usr/src/redhat/BUILD/kernel-2.6.26.2'
  CC [M]  /home/build/gitrepo/test/kvm-userspace/kernel/x86/kvm_main.o
/home/build/gitrepo/test/kvm-userspace/kernel/x86/kvm_main.c: In
function 'gfn_to_pfn':
/home/build/gitrepo/test/kvm-userspace/kernel/x86/kvm_main.c:742: error:
implicit declaration of function 'get_user_pages_fast'
make[3]: ***
[/home/build/gitrepo/test/kvm-userspace/kernel/x86/kvm_main.o] Error 1
make[2]: *** [/home/build/gitrepo/test/kvm-userspace/kernel/x86] Error 2
make[1]: *** [_module_/home/build/gitrepo/test/kvm-userspace/kernel]
Error 2
make[1]: Leaving directory `/usr/src/redhat/BUILD/kernel-2.6.26.2'

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2119399&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Weekly KVM Test report, kernel 8028d1b4 ... userspace 1adc49cc ... -- One new issue

2008-09-19 Thread Xu, Jiajun

Hi All,

This is our Weekly KVM Testing Report against lastest kvm.git
8028d1b4cd2b69663498bfbdaaae8a9451895e80 and kvm-userspace.git
1adc49cc28bd714b84aa0694fbf4f3a2c4104ae5.
There is one new issue found this week.

One New issue:

1. Can not build KVM with 2.6.26 kernel
https://sourceforge.net/tracker/index.php?func=detail&aid=2119399&group_id=180599&atid=893831

Three Old Issues:

1. 32bits Rhel5/FC6 guest may fail to reboot after installation
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1991647&group_id=180599 



2. failure to migrate guests with more than 4GB of RAM
https://sourceforge.net/tracker/index.php?func=detail&aid=1971512&group_id=180599&atid=893831

3. OpenSuse10.2 can not be installed
http://sourceforge.net/tracker/index.php?func=detail&aid=2088475&group_id=180599&atid=893831

4. Fail to save restore and live migration
https://sourceforge.net/tracker/index.php?func=detail&aid=2106661&group_id=180599&atid=893831

Test environment

Platform  A 
Stoakley/Clovertown

CPU 4
Memory size 8G'

Report Summary on IA32-pae
   Summary Test Report of Last Session
=
 Total   PassFailNoResult   Crash
=
control_panel   8   5   3 00
Restart 2   2   0 00
gtest   15  15  0 00
=
control_panel   8   5   3 00
:KVM_256M_guest_PAE_gPAE   1   1   0 00
:KVM_linux_win_PAE_gPAE1   1   0 00
:KVM_two_winxp_PAE_gPAE1   1   0 00
:KVM_four_sguest_PAE_gPA   1   1   0 00
:KVM_1500M_guest_PAE_gPA   1   1   0 00
:KVM_LM_Continuity_PAE_g   1   0   1 00
:KVM_LM_SMP_PAE_gPAE   1   0   1 00
:KVM_SR_Continuity_PAE_g   1   0   1 00
Restart 2   2   0 00
:GuestPAE_PAE_gPAE 1   1   0 00
:BootTo32pae_PAE_gPAE  1   1   0 00
gtest   15  15  0 00
:ltp_nightly_PAE_gPAE  1   1   0 00
:boot_up_acpi_PAE_gPAE 1   1   0 00
:boot_up_acpi_xp_PAE_gPA   1   1   0 00
:boot_up_vista_PAE_gPAE1   1   0 00
:reboot_xp_PAE_gPAE1   1   0 00
:boot_base_kernel_PAE_gP   1   1   0 00
:boot_up_acpi_win2k3_PAE   1   1   0 00
:boot_smp_acpi_win2k3_PA   1   1   0 00
:boot_smp_acpi_win2k_PAE   1   1   0 00
:boot_up_acpi_win2k_PAE_   1   1   0 00
:boot_smp_acpi_xp_PAE_gP   1   1   0 00
:boot_up_noacpi_win2k_PA   1   1   0 00
:boot_smp_vista_PAE_gPAE   1   1   0 00
:bootx_PAE_gPAE1   1   0 00
:kb_nightly_PAE_gPAE   1   1   0 00
=
Total   25  22  3 00

Report Summary on IA32e
 Summary Test Report of Last Session
=
 Total   PassFailNoResult   Crash
=
control_panel   17  10  7 00
Restart 3   3   0 00
gtest   23  22  1 00
=
control_panel   17  10  7 00
:KVM_4G_guest_64_g32e  1   1   0 00
:KVM_four_sguest_64_gPAE   1   1   0 00
:KVM_LM_SMP_64_g32e1   0   1 00
:KVM_linux_win_64_gPAE 1   1   0 00
:KVM_LM_SMP_64_gPAE1   0   1 00
:KVM_SR_Continuity_64_gP   1   0   1 00
:KVM_four_sguest_64_g32e   1   1   0 00
:KVM_four_dguest_64_gPAE   1   1   0 00
:KVM_SR_SMP_64_gPAE1   0   1 00
:KVM_LM_Continuity_64_g3   1   0   1 00
:KVM_1500M_guest_64_gPAE   1   1   0   

Re: [PATCH 0/9] Add support for nested SVM (kernel) v3

2008-09-19 Thread Joerg Roedel
On Wed, Sep 17, 2008 at 03:41:17PM +0200, Alexander Graf wrote:
> To be usable, this patchset requires the two simple changes in the userspace
> part, that I sent to the list with the first version.
> 
> Thanks for reviewing!

Ok, with the patch attached applied on-top of your patches I got a
recent KVM running inside KVM. And it doesn't feel very slow :-)
I will do some benchmarks in the next days to get real numbers. The
patches look good so far.
But I think for now we should disable the feature by default and allow
enabling it from userspace until we are sure we don't introduce any
security hole and don't destroy migration with it. We can add a
-nested-virt parameter to qemu to enable it for the guest then.
Another thing missing is the SVM feature CPUID function. It is used to
find out the number of ASIDs available. But this is a minor issue as
long as we only run KVM inside KVM.

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Add support for nested SVM (kernel) v3

2008-09-19 Thread Joerg Roedel
On Fri, Sep 19, 2008 at 04:36:00PM +0200, Joerg Roedel wrote:
> On Wed, Sep 17, 2008 at 03:41:17PM +0200, Alexander Graf wrote:
> > To be usable, this patchset requires the two simple changes in the userspace
> > part, that I sent to the list with the first version.
> > 
> > Thanks for reviewing!
> 
> Ok, with the patch attached applied on-top of your patches I got a
> recent KVM running inside KVM. And it doesn't feel very slow :-)
> I will do some benchmarks in the next days to get real numbers. The
> patches look good so far.
> But I think for now we should disable the feature by default and allow
> enabling it from userspace until we are sure we don't introduce any
> security hole and don't destroy migration with it. We can add a
> -nested-virt parameter to qemu to enable it for the guest then.
> Another thing missing is the SVM feature CPUID function. It is used to
> find out the number of ASIDs available. But this is a minor issue as
> long as we only run KVM inside KVM.

Oh, forgot the patch. Here is it:


>From 15c4e38288cdaa6d142e94e77025dfd097d63a17 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <[EMAIL PROTECTED]>
Date: Sat, 20 Sep 2008 00:30:25 +0200
Subject: [PATCH] KVM: nested-svm-fix: allow read access to MSR_VM_VR

KVM tries to read the VM_CR MSR to find out if SVM was disabled by
the BIOS. So implement read support for this MSR to make nested
SVM running.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
---
 arch/x86/kvm/svm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 062ded6..7b91c74 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1929,6 +1929,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 *data)
case MSR_VM_HSAVE_PA:
*data = svm->nested_hsave;
break;
+   case MSR_VM_CR:
+   *data = 0;
+   break;
default:
return kvm_get_msr_common(vcpu, ecx, data);
}
-- 
1.5.5.1


-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Add support for nested SVM (kernel) v3

2008-09-19 Thread Joerg Roedel
On Fri, Sep 19, 2008 at 04:36:00PM +0200, Joerg Roedel wrote:
> On Wed, Sep 17, 2008 at 03:41:17PM +0200, Alexander Graf wrote:
> > To be usable, this patchset requires the two simple changes in the userspace
> > part, that I sent to the list with the first version.
> > 
> > Thanks for reviewing!
> 
> Ok, with the patch attached applied on-top of your patches I got a
> recent KVM running inside KVM. And it doesn't feel very slow :-)
> I will do some benchmarks in the next days to get real numbers. The
> patches look good so far.
> But I think for now we should disable the feature by default and allow
> enabling it from userspace until we are sure we don't introduce any
> security hole and don't destroy migration with it. We can add a
> -nested-virt parameter to qemu to enable it for the guest then.
> Another thing missing is the SVM feature CPUID function. It is used to
> find out the number of ASIDs available. But this is a minor issue as
> long as we only run KVM inside KVM.

Ok, further testing showed two issues so far:

- guest timing seems not to work, the printk timing information in ubuntu
  stay at 0.0 all the time
- smp for the second level guest does not work, it crashes also the
  first level guest

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] Add VMRUN handler v3

2008-09-19 Thread Joerg Roedel
On Wed, Sep 17, 2008 at 03:41:24PM +0200, Alexander Graf wrote:
> This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
> in the same context as the normal guest CPU would run.
> So basically it is implemented the same way, a normal CPU would do it.
> 
> We also prepare all intercepts that get OR'ed with the original
> intercepts, as we do not allow a level 2 guest to be intercepted less
> than the first level guest.
> 
> v2 implements the following improvements:
> 
> - fixes the CPL check
> - does not allocate iopm when not used
> - remembers the host's IF in the HIF bit in the hflags
> 
> v3:
> 
> - make use of the new permission checking
> - add support for V_INTR_MASKING_MASK
> 
> Signed-off-by: Alexander Graf <[EMAIL PROTECTED]>
> ---
>  arch/x86/kvm/kvm_svm.h |9 ++
>  arch/x86/kvm/svm.c |  198 
> +++-
>  include/asm-x86/kvm_host.h |2 +
>  3 files changed, 207 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
> index 76ad107..2afe0ce 100644
> --- a/arch/x86/kvm/kvm_svm.h
> +++ b/arch/x86/kvm/kvm_svm.h
> @@ -43,6 +43,15 @@ struct vcpu_svm {
>   u32 *msrpm;
>  
>   u64 nested_hsave;
> + u64 nested_vmcb;
> +
> + /* These are the merged vectors */
> + u32 *nested_msrpm;
> + u32 *nested_iopm;
> +
> + /* gpa pointers to the real vectors */
> + u64 nested_vmcb_msrpm;
> + u64 nested_vmcb_iopm;
>  };
>  
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 0aa22e5..3601e75 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
>  /* Turn on to get debugging output*/
>  /* #define NESTED_DEBUG */
>  
> +/* Not needed until device passthrough */
> +/* #define NESTED_KVM_MERGE_IOPM */
> +
>  #ifdef NESTED_DEBUG
>  #define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
>  #else
> @@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu 
> *vcpu)
>   return container_of(vcpu, struct vcpu_svm, vcpu);
>  }
>  
> +static inline bool is_nested(struct vcpu_svm *svm)
> +{
> + return svm->nested_vmcb;
> +}
> +
>  static unsigned long iopm_base;
>  
>  struct kvm_ldttss_desc {
> @@ -614,6 +622,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>   force_new_asid(&svm->vcpu);
>  
>   svm->nested_hsave = 0;
> + svm->nested_vmcb = 0;
>   svm->vcpu.arch.hflags = HF_GIF_MASK;
>  }
>  
> @@ -639,6 +648,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
> unsigned int id)
>   struct vcpu_svm *svm;
>   struct page *page;
>   struct page *msrpm_pages;
> + struct page *nested_msrpm_pages;
> +#ifdef NESTED_KVM_MERGE_IOPM
> + struct page *nested_iopm_pages;
> +#endif
>   int err;
>  
>   svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
> @@ -661,9 +674,25 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
> unsigned int id)
>   msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
>   if (!msrpm_pages)
>   goto uninit;
> +
> + nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
> + if (!nested_msrpm_pages)
> + goto uninit;
> +
> +#ifdef NESTED_KVM_MERGE_IOPM
> + nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
> + if (!nested_iopm_pages)
> + goto uninit;
> +#endif
> +
>   svm->msrpm = page_address(msrpm_pages);
>   svm_vcpu_init_msrpm(svm->msrpm);
>  
> + svm->nested_msrpm = page_address(nested_msrpm_pages);
> +#ifdef NESTED_KVM_MERGE_IOPM
> + svm->nested_iopm = page_address(nested_iopm_pages);
> +#endif
> +
>   svm->vmcb = page_address(page);
>   clear_page(svm->vmcb);
>   svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
> @@ -693,6 +722,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>  
>   __free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
>   __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
> + __free_pages(virt_to_page(svm->nested_msrpm), MSRPM_ALLOC_ORDER);
> +#ifdef NESTED_KVM_MERGE_IOPM
> + __free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER);
> +#endif
>   kvm_vcpu_uninit(vcpu);
>   kmem_cache_free(kvm_vcpu_cache, svm);
>  }
> @@ -1230,6 +1263,138 @@ static int nested_svm_do(struct vcpu_svm *svm,
>   return retval;
>  }
>  
> +
> +static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
> +   void *arg2, void *opaque)
> +{
> + int i;
> + u32 *nested_msrpm = (u32*)arg1;
> + for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++)
> + svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i];
> + svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm);
> +
> + return 0;
> +}
> +
> +#ifdef NESTED_KVM_MERGE_IOPM
> +static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1,
> +  void *arg2, void *opaque)
> +{
> + int i;
>

[PATCH 3/9] allow intersecting region to be on the boundary.

2008-09-19 Thread Glauber Costa
Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 libkvm/libkvm.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index e768e44..fa65c30 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -130,8 +130,8 @@ int get_intersecting_slot(unsigned long phys_addr)
int i;
 
for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS ; ++i)
-   if (slots[i].len && slots[i].phys_addr < phys_addr &&
-   (slots[i].phys_addr + slots[i].len) > phys_addr)
+   if (slots[i].len && slots[i].phys_addr <= phys_addr &&
+   (slots[i].phys_addr + slots[i].len) >= phys_addr)
return i;
return -1;
 }
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] add debuging facilities to memory registration at libkvm

2008-09-19 Thread Glauber Costa
Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 libkvm/libkvm.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index a5e20bb..222c858 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -52,6 +52,8 @@
 #include "kvm-s390.h"
 #endif
 
+//#define DEBUG_MEMREG
+
 int kvm_abi = EXPECTED_KVM_API_VERSION;
 int kvm_page_size;
 
@@ -458,6 +460,11 @@ int kvm_register_phys_mem(kvm_context_t kvm,
int r;
 
memory.slot = get_free_slot(kvm);
+#ifdef DEBUG_MEMREG
+   fprintf(stderr, "%s, memory: gpa: %llx, size: %llx, uaddr: %llx, slot: 
%x, flags: %lx\n",
+   __func__, memory.guest_phys_addr, memory.memory_size,
+   memory.userspace_addr, memory.slot, memory.flags);
+#endif
r = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &memory);
if (r == -1) {
fprintf(stderr, "create_userspace_phys_mem: %s\n", 
strerror(errno));
@@ -996,6 +1003,9 @@ int kvm_unregister_coalesced_mmio(kvm_context_t kvm, 
uint64_t addr, uint32_t siz
perror("kvm_unregister_coalesced_mmio_zone");
return -errno;
}
+#ifdef DEBUG_MEMREG
+   fprintf(stderr, "Unregistered coalesced mmio region for %llx 
(%lx)\n", addr, size);
+#endif
return 0;
}
 #endif
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] do not use mem_hole anymore.

2008-09-19 Thread Glauber Costa
memory holes are totally evil. Right now they work for some basic tests,
but had never been stressed enough. Using memory holes leaves open questions 
like:

* what happens if a area being registered span two slots?
* what happens if there is already data in the slots?

also, the code behaves badly if the piece to be removed lies in the boundaries 
of the
current slot. Luckily, we don't really need it. Remove it, and make sure we 
never hit it.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 libkvm/libkvm.c |   69 +-
 qemu/qemu-kvm.c |   13 +
 2 files changed, 9 insertions(+), 73 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 63fbcba..e768e44 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -436,74 +436,9 @@ int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long 
phys_start,
return 0;
 }
 
-int kvm_create_mem_hole(kvm_context_t kvm, unsigned long phys_start,
-   unsigned long len)
-{
-   int slot;
-   int r;
-   struct kvm_userspace_memory_region rmslot;
-   struct kvm_userspace_memory_region newslot1;
-   struct kvm_userspace_memory_region newslot2;
-
-   len = (len + PAGE_SIZE - 1) & PAGE_MASK;
-
-   slot = get_intersecting_slot(phys_start);
-   /* no need to create hole, as there is already hole */
-   if (slot == -1)
-   return 0;
-
-   memset(&rmslot, 0, sizeof(struct kvm_userspace_memory_region));
-   memset(&newslot1, 0, sizeof(struct kvm_userspace_memory_region));
-   memset(&newslot2, 0, sizeof(struct kvm_userspace_memory_region));
-
-   rmslot.guest_phys_addr = slots[slot].phys_addr;
-   rmslot.slot = slot;
-
-   newslot1.guest_phys_addr = slots[slot].phys_addr;
-   newslot1.memory_size = phys_start - slots[slot].phys_addr;
-   newslot1.slot = slot;
-   newslot1.userspace_addr = slots[slot].userspace_addr;
-   newslot1.flags = slots[slot].flags;
-
-   newslot2.guest_phys_addr = newslot1.guest_phys_addr +
-  newslot1.memory_size + len;
-   newslot2.memory_size = slots[slot].phys_addr +
-  slots[slot].len - newslot2.guest_phys_addr;
-   newslot2.userspace_addr = newslot1.userspace_addr +
- newslot1.memory_size;
-   newslot2.slot = get_free_slot(kvm);
-   newslot2.flags = newslot1.flags;
-
-   r = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &rmslot);
-   if (r == -1) {
-   fprintf(stderr, "kvm_create_mem_hole: %s\n", strerror(errno));
-   return -1;
-   }
-   free_slot(slot);
-
-   r = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &newslot1);
-   if (r == -1) {
-   fprintf(stderr, "kvm_create_mem_hole: %s\n", strerror(errno));
-   return -1;
-   }
-   register_slot(newslot1.slot, newslot1.guest_phys_addr,
- newslot1.memory_size, newslot1.userspace_addr,
- newslot1.flags);
-
-   r = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &newslot2);
-   if (r == -1) {
-   fprintf(stderr, "kvm_create_mem_hole: %s\n", strerror(errno));
-   return -1;
-   }
-   register_slot(newslot2.slot, newslot2.guest_phys_addr,
- newslot2.memory_size, newslot2.userspace_addr,
- newslot2.flags);
-   return 0;
-}
-
 int kvm_register_phys_mem(kvm_context_t kvm,
-   unsigned long phys_start, void *userspace_addr,
-   unsigned long len, int log)
+ unsigned long phys_start, void *userspace_addr,
+ unsigned long len, int log)
 {
 
struct kvm_userspace_memory_region memory = {
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index 58a6d4a..cff04c5 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -781,12 +781,13 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t 
start_addr,
 r = kvm_is_allocated_mem(kvm_context, start_addr, size);
 if (r)
 return;
-r = kvm_is_intersecting_mem(kvm_context, start_addr);
-if (r)
-kvm_create_mem_hole(kvm_context, start_addr, size);
-r = kvm_register_phys_mem(kvm_context, start_addr,
-  phys_ram_base + phys_offset,
-  size, 0);
+r = kvm_is_intersecting_mem(kvm_context, start_addr);
+if (r) {
+printf("Ignoring intersecting memory %llx (%lx)\n", start_addr, size);
+} else
+r = kvm_register_phys_mem(kvm_context, start_addr,
+  phys_ram_base + phys_offset,
+  size, 0);
 if (r < 0) {
 printf("kvm_cpu_register_physical_memory: failed\n");
 exit(1);
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majord

[PATCH 4/9] substitute is_allocated_mem with more general is_containing_region

2008-09-19 Thread Glauber Costa
is_allocated_mem is a function that checks if every relevant aspect of the 
memory slot
match (start and size). Replace it with a more generic function that checks if 
a memory
region is totally contained into another. The former case is also covered.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 libkvm/libkvm.c |   34 +-
 libkvm/libkvm.h |2 +-
 qemu/qemu-kvm.c |4 ++--
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index fa65c30..a5e20bb 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -136,6 +136,27 @@ int get_intersecting_slot(unsigned long phys_addr)
return -1;
 }
 
+/* Returns -1 if this slot is not totally contained on any other,
+ * and the number of the slot otherwise */
+int get_container_slot(uint64_t phys_addr, unsigned long size)
+{
+   int i;
+
+   for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS ; ++i)
+   if (slots[i].len && slots[i].phys_addr <= phys_addr &&
+   (slots[i].phys_addr + slots[i].len) >= phys_addr + size)
+   return i;
+   return -1;
+}
+
+int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr, 
unsigned long size)
+{
+   int slot = get_container_slot(phys_addr, size);
+   if (slot == -1)
+   return 0;
+   return 1;
+}
+
 /* 
  * dirty pages logging control 
  */
@@ -423,19 +444,6 @@ int kvm_is_intersecting_mem(kvm_context_t kvm, unsigned 
long phys_start)
return get_intersecting_slot(phys_start) != -1;
 }
 
-int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long phys_start,
-unsigned long len)
-{
-   int slot;
-
-   slot = get_slot(phys_start);
-   if (slot == -1)
-   return 0;
-   if (slots[slot].len == len)
-   return 1;
-   return 0;
-}
-
 int kvm_register_phys_mem(kvm_context_t kvm,
  unsigned long phys_start, void *userspace_addr,
  unsigned long len, int log)
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 79dd769..1e89993 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -454,7 +454,7 @@ void *kvm_create_phys_mem(kvm_context_t, unsigned long 
phys_start,
  unsigned long len, int log, int writable);
 void kvm_destroy_phys_mem(kvm_context_t, unsigned long phys_start, 
  unsigned long len);
-int kvm_is_intersecting_mem(kvm_context_t kvm, unsigned long phys_start);
+int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_start, 
unsigned long size);
 int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long phys_start,
 unsigned long len);
 int kvm_create_mem_hole(kvm_context_t kvm, unsigned long phys_start,
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index cff04c5..e0b114a 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -778,10 +778,10 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t 
start_addr,
 int r = 0;
 
 phys_offset &= ~IO_MEM_ROM;
-r = kvm_is_allocated_mem(kvm_context, start_addr, size);
+r = kvm_is_containing_region(kvm_context, start_addr, size);
 if (r)
 return;
-r = kvm_is_intersecting_mem(kvm_context, start_addr);
+r = kvm_is_intersecting_mem(kvm_context, start_addr);
 if (r) {
 printf("Ignoring intersecting memory %llx (%lx)\n", start_addr, size);
 } else
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] register mmio slots

2008-09-19 Thread Glauber Costa
By analysing phys_offset, we know whether a region is an mmio region
or not. If it is, register it as so. We don't reuse the same slot
infrastructure already existant, because there is a relationship between
the slot number for kvm the kernel module, and the index in the slots vector
for libkvm. However, we can do best in the future and use only a single data 
structure
for both.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 libkvm/libkvm.c |   70 +++---
 qemu/qemu-kvm.c |   12 -
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 6ebdc52..dbc1b62 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -64,14 +64,22 @@ struct slot_info {
unsigned flags;
 };
 
+struct mmio_slot_info {
+uint64_t phys_addr;
+unsigned int len;
+};
+
 struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS];
+struct mmio_slot_info mmio_slots[KVM_MAX_NUM_MEM_REGIONS];
 
 void init_slots(void)
 {
int i;
 
-   for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS; ++i)
+   for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS; ++i) {
slots[i].len = 0;
+   mmio_slots[i].len = 0;
+   }
 }
 
 int get_free_slot(kvm_context_t kvm)
@@ -101,6 +109,16 @@ int get_free_slot(kvm_context_t kvm)
return -1;
 }
 
+int get_free_mmio_slot(kvm_context_t kvm)
+{
+
+   unsigned int i;
+   for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS; ++i)
+   if (!mmio_slots[i].len)
+   return i;
+   return -1;
+}
+
 void register_slot(int slot, unsigned long phys_addr, unsigned long len,
   unsigned long userspace_addr, unsigned flags)
 {
@@ -151,14 +169,47 @@ int get_container_slot(uint64_t phys_addr, unsigned long 
size)
return -1;
 }
 
+int get_container_mmio_slot(kvm_context_t kvm, uint64_t phys_addr, unsigned 
long size)
+{
+   int i;
+
+   for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS ; ++i)
+   if (mmio_slots[i].len && mmio_slots[i].phys_addr <= phys_addr &&
+   (mmio_slots[i].phys_addr + mmio_slots[i].len) >= phys_addr 
+ size)
+   return i;
+   return -1;
+}
+
+int kvm_register_mmio_slot(kvm_context_t kvm, uint64_t phys_addr, unsigned int 
size)
+{
+   int slot = get_free_mmio_slot(kvm);
+
+   if (slot == -1)
+   goto out;
+
+#ifdef DEBUG_MEMREG
+   fprintf(stderr, "Registering mmio region %llx (%lx)\n", phys_addr, 
size);
+#endif
+   mmio_slots[slot].phys_addr = phys_addr;
+   mmio_slots[slot].len = size;
+out:
+   return slot;
+}
+
 int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr, 
unsigned long size)
 {
int slot = get_container_slot(phys_addr, size);
-   if (slot == -1)
-   return 0;
-   return 1;
+
+   if (slot != -1)
+   return 1;
+   slot = get_container_mmio_slot(kvm, phys_addr, size);
+   if (slot != -1)
+   return 1;
+
+   return 0;
 }
 
+
 /* 
  * dirty pages logging control 
  */
@@ -528,6 +579,17 @@ void kvm_unregister_memory_area(kvm_context_t kvm, 
uint64_t phys_addr, unsigned
kvm_destroy_phys_mem(kvm, phys_addr, size);
return;
}
+
+   slot = get_container_mmio_slot(kvm, phys_addr, size);
+   if (slot != -1) {
+#ifdef DEBUG_MEMREG
+   fprintf(stderr, "Unregistering mmio region %llx (%lx)\n", 
phys_addr, size);
+#endif
+   kvm_unregister_coalesced_mmio(kvm, phys_addr, size);
+   mmio_slots[slot].len = 0;
+   }
+
+   return;
 }
 
 static int kvm_get_map(kvm_context_t kvm, int ioctl_num, int slot, void *buf)
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index d9fb499..721a9dc 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -788,6 +788,16 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t 
start_addr,
 r = kvm_is_containing_region(kvm_context, start_addr, size);
 if (r)
 return;
+
+if (area_flags >= TLB_MMIO) {
+r = kvm_register_mmio_slot(kvm_context, start_addr, size);
+if (r < 0) {
+printf("No free mmio slots\n");
+exit(1);
+}
+return;
+}
+
 r = kvm_is_intersecting_mem(kvm_context, start_addr);
 if (r) {
 printf("Ignoring intersecting memory %llx (%lx)\n", start_addr, size);
@@ -1032,11 +1042,9 @@ void kvm_mutex_lock(void)
 
 int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr, unsigned int 
size)
 {
-return kvm_register_coalesced_mmio(kvm_context, addr, size);
 }
 
 int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr,
   unsigned int size)
 {
-return kvm_unregister_coalesced_mmio(kvm_context, addr, size);
 }
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.h

[PATCH 6/9] unregister memory area depending on their flags

2008-09-19 Thread Glauber Costa
Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 libkvm/libkvm.c |   14 ++
 libkvm/libkvm.h |3 +++
 qemu/qemu-kvm.c |7 +++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 222c858..6ebdc52 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -516,6 +516,20 @@ void kvm_destroy_phys_mem(kvm_context_t kvm, unsigned long 
phys_start,
free_slot(memory.slot);
 }
 
+void kvm_unregister_memory_area(kvm_context_t kvm, uint64_t phys_addr, 
unsigned long size)
+{
+
+   int slot = get_container_slot(phys_addr, size);
+
+   if (slot != -1) {
+#ifdef DEBUG_MEMREG
+   fprintf(stderr, "Unregistering memory region %llx (%lx)\n", 
phys_addr, size);
+#endif
+   kvm_destroy_phys_mem(kvm, phys_addr, size);
+   return;
+   }
+}
+
 static int kvm_get_map(kvm_context_t kvm, int ioctl_num, int slot, void *buf)
 {
int r;
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 1e89993..fae4e0b 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -454,6 +454,9 @@ void *kvm_create_phys_mem(kvm_context_t, unsigned long 
phys_start,
  unsigned long len, int log, int writable);
 void kvm_destroy_phys_mem(kvm_context_t, unsigned long phys_start, 
  unsigned long len);
+void kvm_unregister_memory_area(kvm_context_t, uint64_t phys_start,
+unsigned long len);
+
 int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_start, 
unsigned long size);
 int kvm_is_allocated_mem(kvm_context_t kvm, unsigned long phys_start,
 unsigned long len);
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index e0b114a..d9fb499 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -776,8 +776,15 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t 
start_addr,
   unsigned long phys_offset)
 {
 int r = 0;
+unsigned long area_flags = phys_offset & ~TARGET_PAGE_MASK;
 
 phys_offset &= ~IO_MEM_ROM;
+
+if (area_flags == IO_MEM_UNASSIGNED) {
+kvm_unregister_memory_area(kvm_context, start_addr, size);
+return;
+}
+
 r = kvm_is_containing_region(kvm_context, start_addr, size);
 if (r)
 return;
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHEY 0/9] Rrrreplace the ol' scurvy memory registration

2008-09-19 Thread Glauber Costa
Yahoy mateys!

I be now presentin'ya the last scurvy version of the ol'memory registration
patches! He pilleage the ol'infrastructure and make me ship more consistent.

All'of the ol'references to kvm_cpu_register_physical_memory() be trow to the
salty sea, to the sharks! I be putin' all those scurvy dogs in 
cpu_register_physical_memory()

Cap'n, these be not much differing from the ol'version, so me say it be 
included if
no mateys say no

Yoo ho!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] Don't separate registrations with IO_MEM_ROM set

2008-09-19 Thread Glauber Costa
Actually, all registrations are the same. If IO_MEM_ROM is set, we only
need to take care of not passing its value as the phys_offset.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 qemu/qemu-kvm.c |   31 +++
 1 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index c522a28..58a6d4a 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -776,26 +776,17 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t 
start_addr,
   unsigned long phys_offset)
 {
 int r = 0;
-if (!(phys_offset & ~TARGET_PAGE_MASK)) {
-r = kvm_is_allocated_mem(kvm_context, start_addr, size);
-if (r)
-return;
-r = kvm_is_intersecting_mem(kvm_context, start_addr);
-if (r)
-kvm_create_mem_hole(kvm_context, start_addr, size);
-r = kvm_register_phys_mem(kvm_context, start_addr,
-phys_ram_base + phys_offset,
-size, 0);
-}
-if (phys_offset & IO_MEM_ROM) {
-phys_offset &= ~IO_MEM_ROM;
-r = kvm_is_intersecting_mem(kvm_context, start_addr);
-if (r)
-kvm_create_mem_hole(kvm_context, start_addr, size);
-r = kvm_register_phys_mem(kvm_context, start_addr,
-phys_ram_base + phys_offset,
-size, 0);
-}
+
+phys_offset &= ~IO_MEM_ROM;
+r = kvm_is_allocated_mem(kvm_context, start_addr, size);
+if (r)
+return;
+r = kvm_is_intersecting_mem(kvm_context, start_addr);
+if (r)
+kvm_create_mem_hole(kvm_context, start_addr, size);
+r = kvm_register_phys_mem(kvm_context, start_addr,
+  phys_ram_base + phys_offset,
+  size, 0);
 if (r < 0) {
 printf("kvm_cpu_register_physical_memory: failed\n");
 exit(1);
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] move kvm memory registration inside qemu's

2008-09-19 Thread Glauber Costa
Remove explicit calls to kvm_cpu_register_physical_memory,
and bundle it together with qemu's memory registration function.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 qemu/exec.c |5 +
 qemu/hw/ipf.c   |8 
 qemu/hw/pc.c|   23 ++-
 qemu/hw/ppc440_bamboo.c |2 --
 4 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/qemu/exec.c b/qemu/exec.c
index bf037f0..b32f2ff 100644
--- a/qemu/exec.c
+++ b/qemu/exec.c
@@ -2203,6 +2203,11 @@ void cpu_register_physical_memory(target_phys_addr_t 
start_addr,
 kqemu_set_phys_mem(start_addr, size, phys_offset);
 }
 #endif
+#ifdef USE_KVM
+if (kvm_enabled())
+kvm_cpu_register_physical_memory(start_addr, size, phys_offset);
+#endif
+
 size = (size + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK;
 end_addr = start_addr + (target_phys_addr_t)size;
 for(addr = start_addr; addr != end_addr; addr += TARGET_PAGE_SIZE) {
diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c
index d70af90..5227385 100644
--- a/qemu/hw/ipf.c
+++ b/qemu/hw/ipf.c
@@ -420,18 +420,14 @@ static void ipf_init1(ram_addr_t ram_size, int 
vga_ram_size,
 if (kvm_enabled()) {
ram_addr = qemu_ram_alloc(0xa);
cpu_register_physical_memory(0, 0xa, ram_addr);
-   kvm_cpu_register_physical_memory(0, 0xa, ram_addr);
 
ram_addr = qemu_ram_alloc(0x2); // Workaround 
0xa-0xc
 
ram_addr = qemu_ram_alloc(0x4);
cpu_register_physical_memory(0xc, 0x4, ram_addr);
-   kvm_cpu_register_physical_memory(0xc, 0x4, ram_addr);
 
ram_addr = qemu_ram_alloc(ram_size - 0x10);
cpu_register_physical_memory(0x10, ram_size - 0x10, 
ram_addr);
-   kvm_cpu_register_physical_memory(0x10, ram_size - 0x10,
-   ram_addr);
} else
 {
 ram_addr = qemu_ram_alloc(ram_size);
@@ -444,9 +440,6 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size,
if (above_4g_mem_size > 0) {
ram_addr = qemu_ram_alloc(above_4g_mem_size);
cpu_register_physical_memory(0x1, above_4g_mem_size, 
ram_addr);
-   if (kvm_enabled())
-   kvm_cpu_register_physical_memory(0x1, above_4g_mem_size,
-   ram_addr);
}
 
/*Load firware to its proper position.*/
@@ -468,7 +461,6 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size,
fw_image_start = fw_start + GFW_SIZE - image_size;
 
 cpu_register_physical_memory(GFW_START, GFW_SIZE, fw_offset);
-kvm_cpu_register_physical_memory(GFW_START,GFW_SIZE, fw_offset);
 memcpy(fw_image_start, image, image_size);
 
free(image);
diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
index 8a50096..bc4585c 100644
--- a/qemu/hw/pc.c
+++ b/qemu/hw/pc.c
@@ -769,9 +769,7 @@ static int load_option_rom(const char *filename, int 
offset, int type)
 cpu_register_physical_memory(0xd + offset,
 size, option_rom_offset | type);
 option_rom_setup_reset(0xd + offset, size);
-if (kvm_enabled())
-   kvm_cpu_register_physical_memory(0xd + offset,
- size, option_rom_offset | type);
+
 return size;
 }
 
@@ -845,16 +843,13 @@ static void pc_init1(ram_addr_t ram_size, int 
vga_ram_size,
 if (kvm_enabled()) {
 ram_addr = qemu_ram_alloc(0xa);
 cpu_register_physical_memory(0, 0xa, ram_addr);
-kvm_cpu_register_physical_memory(0, 0xa, ram_addr);
 
 ram_addr = qemu_ram_alloc(0x10 - 0xa);   // hole
 ram_addr = qemu_ram_alloc(below_4g_mem_size - 0x10);
 cpu_register_physical_memory(0x10,
 below_4g_mem_size - 0x10,
 ram_addr);
-kvm_cpu_register_physical_memory(0x10,
-below_4g_mem_size - 0x10,
- ram_addr);
+
 /* above 4giga memory allocation */
 if (above_4g_mem_size > 0) {
 ram_addr = qemu_ram_alloc(above_4g_mem_size);
@@ -870,9 +865,6 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
 cpu_register_physical_memory(0x1ULL,
  above_4g_mem_size,
  ram_addr);
-kvm_cpu_register_physical_memory(0x1ULL,
-above_4g_mem_size,
- ram_addr);
 }
 } else
 {
@@ -926,9 +918,6 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
 /* setup basic memory access */
 cpu_register_physical_memory(0xc, 0x1,
   

[PATCH 8/9] coalesce mmio regions with an explicit call

2008-09-19 Thread Glauber Costa
Remove explicit calls to mmio coalescing. Rather,
include it in the registration functions.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 qemu/hw/cirrus_vga.c |2 --
 qemu/hw/e1000.c  |   12 
 qemu/hw/pci.c|3 ---
 qemu/hw/vga.c|4 
 qemu/qemu-kvm.c  |   10 +-
 qemu/qemu-kvm.h  |4 
 6 files changed, 1 insertions(+), 34 deletions(-)

diff --git a/qemu/hw/cirrus_vga.c b/qemu/hw/cirrus_vga.c
index 0cf5b24..5919732 100644
--- a/qemu/hw/cirrus_vga.c
+++ b/qemu/hw/cirrus_vga.c
@@ -3291,8 +3291,6 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci)
cirrus_vga_mem_write, s);
 cpu_register_physical_memory(isa_mem_base + 0x000a, 0x2,
  vga_io_memory);
-if (kvm_enabled())
-qemu_kvm_register_coalesced_mmio(isa_mem_base + 0x000a, 0x2);
 
 s->sr[0x06] = 0x0f;
 if (device_id == CIRRUS_ID_CLGD5446) {
diff --git a/qemu/hw/e1000.c b/qemu/hw/e1000.c
index 5ae3960..2d97b34 100644
--- a/qemu/hw/e1000.c
+++ b/qemu/hw/e1000.c
@@ -942,18 +942,6 @@ e1000_mmio_map(PCIDevice *pci_dev, int region_num,
 
 d->mmio_base = addr;
 cpu_register_physical_memory(addr, PNPMMIO_SIZE, d->mmio_index);
-
-if (kvm_enabled()) {
-   int i;
-uint32_t excluded_regs[] = {
-E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
-E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
-};
-qemu_kvm_register_coalesced_mmio(addr, excluded_regs[0]);
-for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
-qemu_kvm_register_coalesced_mmio(addr + excluded_regs[i] + 4,
- excluded_regs[i + 1] - excluded_regs[i] - 4);
-}
 }
 
 static int
diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index 07d37a8..2e4ec92 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -324,9 +324,6 @@ static void pci_update_mappings(PCIDevice *d)
 cpu_register_physical_memory(pci_to_cpu_addr(r->addr),
  r->size,
  IO_MEM_UNASSIGNED);
-if (kvm_enabled())
-qemu_kvm_unregister_coalesced_mmio(r->addr,
- r->size);
 }
 }
 r->addr = new_addr;
diff --git a/qemu/hw/vga.c b/qemu/hw/vga.c
index 3a5dcbc..ba0dec4 100644
--- a/qemu/hw/vga.c
+++ b/qemu/hw/vga.c
@@ -2259,8 +2259,6 @@ void vga_init(VGAState *s)
 vga_io_memory = cpu_register_io_memory(0, vga_mem_read, vga_mem_write, s);
 cpu_register_physical_memory(isa_mem_base + 0x000a, 0x2,
  vga_io_memory);
-if (kvm_enabled())
-qemu_kvm_register_coalesced_mmio(isa_mem_base + 0x000a, 0x2);
 }
 
 /* Memory mapped interface */
@@ -2336,8 +2334,6 @@ static void vga_mm_init(VGAState *s, target_phys_addr_t 
vram_base,
 cpu_register_physical_memory(ctrl_base, 0x10, s_ioport_ctrl);
 s->bank_offset = 0;
 cpu_register_physical_memory(vram_base + 0x000a, 0x2, 
vga_io_memory);
-if (kvm_enabled())
-qemu_kvm_register_coalesced_mmio(vram_base + 0x000a, 0x2);
 }
 
 int isa_vga_init(DisplayState *ds, uint8_t *vga_ram_base,
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index 721a9dc..660e11f 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -795,6 +795,7 @@ void kvm_cpu_register_physical_memory(target_phys_addr_t 
start_addr,
 printf("No free mmio slots\n");
 exit(1);
 }
+kvm_register_coalesced_mmio(kvm_context, start_addr, size);
 return;
 }
 
@@ -1039,12 +1040,3 @@ void kvm_mutex_lock(void)
 pthread_mutex_lock(&qemu_mutex);
 cpu_single_env = NULL;
 }
-
-int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr, unsigned int 
size)
-{
-}
-
-int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr,
-  unsigned int size)
-{
-}
diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h
index 3e40a7d..4308e18 100644
--- a/qemu/qemu-kvm.h
+++ b/qemu/qemu-kvm.h
@@ -75,10 +75,6 @@ int handle_tpr_access(void *opaque, int vcpu,
 void kvm_tpr_vcpu_start(CPUState *env);
 
 int qemu_kvm_get_dirty_pages(unsigned long phys_addr, void *buf);
-int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr,
-unsigned int size);
-int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr,
-  unsigned int size);
 
 void qemu_kvm_system_reset_request(void);
 
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] move kvm memory registration inside qemu's

2008-09-19 Thread Jan Kiszka
Glauber Costa wrote:
> Remove explicit calls to kvm_cpu_register_physical_memory,
> and bundle it together with qemu's memory registration function.

Fine, this also fixes -no-kvm for current userspace git - as I just
noticed. :)

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 5/9] kvm-x86: Enable NMI Watchdog via in-kernel PIT source

2008-09-19 Thread Jan Kiszka
[ Updated version with typo-- and without some spurious apic_debug
  outputs. ]

LINT0 of the LAPIC can be used to route PIT events as NMI watchdog ticks
into the guest. This patch aligns the in-kernel irqchip emulation with
the user space irqchip with already supports this feature. The trick is
to route PIT interrupts to all LAPIC's LVT0 lines.

Rebased and slightly polished patch originally posted by Sheng Yang.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/kvm/i8254.c |   15 +++
 arch/x86/kvm/irq.h   |1 +
 arch/x86/kvm/lapic.c |   34 +-
 3 files changed, 45 insertions(+), 5 deletions(-)

Index: b/arch/x86/kvm/i8254.c
===
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -594,10 +594,25 @@ void kvm_free_pit(struct kvm *kvm)
 
 static void __inject_pit_timer_intr(struct kvm *kvm)
 {
+   struct kvm_vcpu *vcpu;
+   int i;
+
mutex_lock(&kvm->lock);
kvm_set_irq(kvm, 0, 1);
kvm_set_irq(kvm, 0, 0);
mutex_unlock(&kvm->lock);
+
+   /*
+* Provides NMI watchdog support in IOAPIC mode.
+* The route is: PIT -> PIC -> LVT0 in NMI mode,
+* timer IRQs will continue to flow through the IOAPIC.
+*/
+   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+   vcpu = kvm->vcpus[i];
+   if (!vcpu)
+   continue;
+   kvm_apic_local_deliver(vcpu, APIC_LVT0);
+   }
 }
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
Index: b/arch/x86/kvm/irq.h
===
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -93,6 +93,7 @@ void kvm_unregister_irq_ack_notifier(str
 void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
Index: b/arch/x86/kvm/lapic.c
===
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -382,6 +382,14 @@ static int __apic_accept_irq(struct kvm_
}
break;
 
+   case APIC_DM_EXTINT:
+   /*
+* Should only be called by kvm_apic_local_deliver() with LVT0,
+* before NMI watchdog was enabled. Already handled by
+* kvm_apic_accept_pic_intr().
+*/
+   break;
+
default:
printk(KERN_ERR "TODO: unsupported delivery mode %x\n",
   delivery_mode);
@@ -745,10 +753,13 @@ static void apic_mmio_write(struct kvm_i
apic_set_reg(apic, APIC_ICR2, val & 0xff00);
break;
 
+   case APIC_LVT0:
+   if (val == APIC_DM_NMI)
+   apic_debug("Receive NMI setting on APIC_LVT0 "
+   "for cpu %d\n", apic->vcpu->vcpu_id);
case APIC_LVTT:
case APIC_LVTTHMR:
case APIC_LVTPC:
-   case APIC_LVT0:
case APIC_LVT1:
case APIC_LVTERR:
/* TODO: Check vector */
@@ -965,12 +976,25 @@ int apic_has_pending_timer(struct kvm_vc
return 0;
 }
 
-static int __inject_apic_timer_irq(struct kvm_lapic *apic)
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type)
 {
-   int vector;
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   int vector, mode, trig_mode;
+   u32 reg;
 
-   vector = apic_lvt_vector(apic, APIC_LVTT);
-   return __apic_accept_irq(apic, APIC_DM_FIXED, vector, 1, 0);
+   if (apic && apic_enabled(apic)) {
+   reg = apic_get_reg(apic, lvt_type);
+   vector = reg & APIC_VECTOR_MASK;
+   mode = reg & APIC_MODE_MASK;
+   trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
+   return __apic_accept_irq(apic, mode, vector, 1, trig_mode);
+   }
+   return 0;
+}
+
+static inline int __inject_apic_timer_irq(struct kvm_lapic *apic)
+{
+   return kvm_apic_local_deliver(apic->vcpu, APIC_LVTT);
 }
 
 static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Event channels in KVM?

2008-09-19 Thread Matt Anger
Does KVM have any interface similar to event-channels like Xen does?
Basically a way to send notifications between the host and guest.

Thanks,
-Matt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Avoiding I/O bottlenecks between VM's

2008-09-19 Thread Alberto Treviño
I am using KVM 69 and kernel 2.6.25.17 to host several VM's in a large 
server.  So far, everything has been great.  Except I'm adding a Windows 
Server VM that will run a SQL Server database.  A few times I've noticed 
that I/O becomes a bottleneck for the VM and Windows VM freezes for a few 
seconds.  Oh well, no biggie.  Except, every so often, these I/O bottlenecks 
start to affect other VM's and they freeze as well for a few seconds.  I 
don't really care of one VM does so much I/O that it freezes itself 
temporarily.  I just don't want I/O bottlenecks on one VM to affect other 
VM's.

My questions are:

1. Is this a problem anyone else has experienced and has it been fixed in a 
later KVM release?

2. I'm using the CFQ scheduler.  Would the deadline scheduler do a better 
job?

3. Any other suggestions to improve this problem?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Event channels in KVM?

2008-09-19 Thread Anthony Liguori

Matt Anger wrote:

Does KVM have any interface similar to event-channels like Xen does?
Basically a way to send notifications between the host and guest.
  


virtio is the abstraction we use.

But virtio is based on the standard hardware interfaces of the PC--PIO, 
MMIO, and interrupts.


Regards,

Anthony Liguori


Thanks,
-Matt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Avoiding I/O bottlenecks between VM's

2008-09-19 Thread Marcelo Tosatti
Hi Alberto,

On Fri, Sep 19, 2008 at 11:26:09AM -0600, Alberto Treviño wrote:
> I am using KVM 69 and kernel 2.6.25.17 to host several VM's in a large 
> server.  So far, everything has been great.  Except I'm adding a Windows 
> Server VM that will run a SQL Server database.  A few times I've noticed 
> that I/O becomes a bottleneck for the VM and Windows VM freezes for a few 
> seconds.  Oh well, no biggie.  Except, every so often, these I/O bottlenecks 
> start to affect other VM's and they freeze as well for a few seconds.  I 
> don't really care of one VM does so much I/O that it freezes itself 
> temporarily.  I just don't want I/O bottlenecks on one VM to affect other 
> VM's.
> 
> My questions are:
> 
> 1. Is this a problem anyone else has experienced and has it been fixed in a 
> later KVM release?
> 
> 2. I'm using the CFQ scheduler.  Would the deadline scheduler do a better 
> job?
> 
> 3. Any other suggestions to improve this problem?

Are you using filesystem backed storage for the guest images or direct
block device storage? I assume there's heavy write activity on the
guests when these hangs happen?

ext3 with ordered data mode has latency issues on fsync.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Avoiding I/O bottlenecks between VM's

2008-09-19 Thread Alberto Treviño
On Friday 19 September 2008 12:41:46 pm you wrote:
> Are you using filesystem backed storage for the guest images or direct
> block device storage? I assume there's heavy write activity on the
> guests when these hangs happen?

Yes, they happen when one VM is doing heavy writes.  I'm actually using a 
whole stack of things:

OCFS2 on DRBD (Primary-Primary) on LVM Volume (continuous) on LUKS-encrypted 
partition.  Fun debugging that, heh?

In trying to figure out the problem, I tried to reconfigure DRBD to use 
Protocol B instead of C.  However, it failed to make the switch and both 
nodes disconnected so now I have a split-brain.  In try to fix the split 
brain I'm taking down on one node all the VM's one by one, copying the VM 
drives from one node to the next, and starting up on the other node (old-
fashioned migration).  Yes, I'm having *lots* of fun!  Perfect way to end 
the week!

So, any ideas on how to solve the bottleneck?  Isn't the CFQ scheduler 
supposed to grant every processes the same amount of I/O?  Is there a way to 
change something in proc to avoid this situation?

-- 
Alberto Treviño
BYU Testing Center
Brigham Young University

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Event channels in KVM?

2008-09-19 Thread Javier Guerra
On Fri, Sep 19, 2008 at 1:31 PM, Anthony Liguori <[EMAIL PROTECTED]> wrote:
> Matt Anger wrote:
>>
>> Does KVM have any interface similar to event-channels like Xen does?
>> Basically a way to send notifications between the host and guest.
>>
>
> virtio is the abstraction we use.
>
> But virtio is based on the standard hardware interfaces of the PC--PIO,
> MMIO, and interrupts.

this is rather low-level, it would be nice to have a multiplatform
interface to this abstraction.

just for kicks, i've found and printed Rusty's paper about it. hope
it's current :-)

-- 
Javier
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Avoiding I/O bottlenecks between VM's

2008-09-19 Thread Javier Guerra
On Fri, Sep 19, 2008 at 1:53 PM, Alberto Treviño <[EMAIL PROTECTED]> wrote:
> On Friday 19 September 2008 12:41:46 pm you wrote:
>> Are you using filesystem backed storage for the guest images or direct
>> block device storage? I assume there's heavy write activity on the
>> guests when these hangs happen?
>
> Yes, they happen when one VM is doing heavy writes.  I'm actually using a
> whole stack of things:
>
> OCFS2 on DRBD (Primary-Primary) on LVM Volume (continuous) on LUKS-encrypted
> partition.  Fun debugging that, heh?

a not-so-wild guess might be the inter-node locking needed by any
cluster FS.  you'd do much better using just CLVM or EVMS-Ha

if it's a single box, it would be interesting to compare with ext3

> So, any ideas on how to solve the bottleneck?  Isn't the CFQ scheduler
> supposed to grant every processes the same amount of I/O?  Is there a way to
> change something in proc to avoid this situation?

i don't think CFQ can do much to alleviate the heavy lock-dependency
of a cluster FS

-- 
Javier
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: Event channels in KVM?

2008-09-19 Thread Anthony Liguori

Javier Guerra wrote:

On Fri, Sep 19, 2008 at 1:31 PM, Anthony Liguori <[EMAIL PROTECTED]> wrote:
  

Matt Anger wrote:


Does KVM have any interface similar to event-channels like Xen does?
Basically a way to send notifications between the host and guest.

  

virtio is the abstraction we use.

But virtio is based on the standard hardware interfaces of the PC--PIO,
MMIO, and interrupts.



this is rather low-level, it would be nice to have a multiplatform
interface to this abstraction.
  


That's exactly the purpose of virtio.  virtio is a high-level, cross 
platform interface.  It's been tested on x86, PPC, s390, and I believe 
ia64.  It also works in lguest.


It happens to use PIO, MMIO, and interrupts on x86 under KVM but other 
virtio implementations exist for other platforms.



just for kicks, i've found and printed Rusty's paper about it. hope
it's current :-)
  


The other good thing to look at is the lguest documentation.  You can 
skip to just the virtio bits if you're so inclined.  It's really quite 
thoroughly documented.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Avoiding I/O bottlenecks between VM's

2008-09-19 Thread Marcelo Tosatti
On Fri, Sep 19, 2008 at 02:14:32PM -0500, Javier Guerra wrote:
> On Fri, Sep 19, 2008 at 1:53 PM, Alberto Treviño <[EMAIL PROTECTED]> wrote:
> > On Friday 19 September 2008 12:41:46 pm you wrote:
> >> Are you using filesystem backed storage for the guest images or direct
> >> block device storage? I assume there's heavy write activity on the
> >> guests when these hangs happen?
> >
> > Yes, they happen when one VM is doing heavy writes.  I'm actually using a
> > whole stack of things:
> >
> > OCFS2 on DRBD (Primary-Primary) on LVM Volume (continuous) on LUKS-encrypted
> > partition.  Fun debugging that, heh?

Heh. Lots of variables there.

> a not-so-wild guess might be the inter-node locking needed by any
> cluster FS.  you'd do much better using just CLVM or EVMS-Ha
> 
> if it's a single box, it would be interesting to compare with ext3
> 
> > So, any ideas on how to solve the bottleneck?  Isn't the CFQ scheduler
> > supposed to grant every processes the same amount of I/O?  

Yes, but if the filesystem on top is at fault, the IO scheduler can't
help (this is the case with ext3 ordered mode and fsync latency, which
could last for hundreds of seconds last time I checked).

> > Is there a way to
> > change something in proc to avoid this situation?
> 
> i don't think CFQ can do much to alleviate the heavy lock-dependency
> of a cluster FS

Perhaps isolate the problem by having the guest images directly on
partitions first (or ext3 with writeback data mode).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for September 19 (kvm + intel_iommu)

2008-09-19 Thread Randy Dunlap
On Fri, 19 Sep 2008 14:38:20 +1000 Stephen Rothwell wrote:


kvm calls intel_iommu_found(), which won't exist if CONFIG_DMAR=n:

arch/x86/kvm/built-in.o: In function `kvm_dev_ioctl_check_extension':
(.text+0x5588): undefined reference to `intel_iommu_found'
make[1]: *** [.tmp_vmlinux1] Error 1


---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


First performance numbers

2008-09-19 Thread Joerg Roedel
Ok, here are some performance numbers for nested svm. I ran kernbench -M
on a virtual machine with 4G RAM and 1 VCPU (since nesting SMP guests
do currently not work). I measured simple virtualization with a shadow
paging guest on bare metal and within a nested guest (same guest image)
on a nested paging enabled first level guest.

 | Shadow Guest (100%) | Nested Guest (X)  | X
-+-+---+
Elapsed Time | 553.244 (1.21208)   | 1185.95 (20.0365) | 214.363%
User Time| 407.728 (0.987279)  | 520.434 (8.55643) | 127.642% 
System Time  | 144.828 (0.480645)  | 664.528 (11.6648) | 458.839%
Percent CPU  | 99 (0)  | 99 (0)| 100.000%
Context Switches | 98265.2 (183.001)   | 220015 (3302.74)  | 223.899%
Sleeps   | 49397.8 (31.0274)   | 49460.2 (364.84)  | 100.126%

So we have an overall slowdown in the first nesting level of more than
50%. Mostly because we spend so much time in the system level. Seems
there is some work to do for performance improvements :-)

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


windows xp not able to install without no-kvm

2008-09-19 Thread Ludmilla Becherer
Hi there, following situation:

suse linux 11.0x86-64
kernel 2.6.25.16-0.1
suse-paket:   kvm 75-6.1
kvm-kmp-default 75_2.6.25.16_0.1-6.1
AMD Phenom(TM) 9750 Quad-Core Processor

guest install:
windows xp home sp1
and
windows xp professional sp3
(32 bit)
(both same problem)

commandline (for installing into a new qcow2.img):
qemu-kvm -hda /archiv/winxphome-qcow2.img -cdrom /archiv/winxphome_sp1.iso
without special arguments

xp will format and first time reboot. showing up shortly the logo,
and crash  to a blue screen with message:

"...there is a problem . tecnical information:
stop: 0x008E (0xC005, 0x80511403, 0xF7A37630, 0x)
 
only the -no-kvm option will solve the problem.
then installation is fine.

other options -no-apci , no-kvm-irqchip -no-kvm-pit
will not help

after installing with the -no-kvm option and restarting the image,
situation is the same:
windows xp will always crash after showing up the windows logo,
the xp-image will only work with the no-kvm-option (slowly, nut usable)

if the image was started on time without the -no-kvm option
the image-xp will never start again, also not with the -no-kvm-option

could anybody tell me if i make a mistake???
i could not find anywhere in the internet any help.
or is there a bug which shows only on my system?
i played several days without a result.


regards

simoN

Mail: [EMAIL PROTECTED]



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VMX: Host NMI triggering on NMI vmexit

2008-09-19 Thread Avi Kivity

Jan Kiszka wrote:

Sheng,

out of curiosity: vmx_vcpu_run invokes 'int $2' to trigger a host NMI if
the VM exited due to an external NMI event. According to Intel specs I
have, software-triggered NMIs do not block hardware NMIs. So are we
facing the risk to receive another NMI while running the first handler?
Or will the VM be left with the hardware blocking logic armed? Or does
Linux not care about NMI handler re-entrance?
  


All good questions.  Usually this doesn't happen since NMI sources are
far apart (oprofile, watchdog).

Maybe the answer is to generate the local nmi via an IPI-to-self command
to the local apic.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Don't separate registrations with IO_MEM_ROM set

2008-09-19 Thread Avi Kivity

Glauber Costa wrote:


I'm currently not aware of a practical use case where this bites, but if
the guest maps some memory from A to B, it may expect to find the
content of A under B as well. That is not the case so far as B remains B
from KVM's POV. At the same time, all QEMU memory access functions see B
as A (that caused trouble for debugging and memory sniffing monitor
services).


It looks like KVM aliasing support, that (up to now), seemed completely 
orthogonal.
I'm looking at ways to integrate aliasing now, so if you can provide me with 
some use
cases of what you described above, (that seem to have happened in your 
debugging patches),
it would surely help.

  


Aliasing/remapping can now be implemented using memory slots.  Simply 
map the same hva to different gpas.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add compat wrapper for get_user_pages_fast

2008-09-19 Thread Avi Kivity

Jan Kiszka wrote:

Not sure if this is correct, but here is at least a compile fix.

  


I think it is.


Note that the original mmem_map locking scope was partly far broader on
older kernels than with Marcelo's patch and this fix now. Could anyone
comment on the correctness?
  


Since get_user_pages_fast() falls back to your code sequence, this is at 
least as safe as mainline.  So I applied it, thanks.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM build fix for 2.6.27-rc6

2008-09-19 Thread Avi Kivity
Linus, a recent merge has broken the build for KVM on ia64.  Please pull from

  git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git kvm-updates/2.6.27

to fix it.

Shortlog/diffstat/diff below.

Jes Sorensen (1):
  KVM: ia64: 'struct fdesc' build fix

 arch/ia64/kvm/kvm-ia64.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 7a37d06..cd0d1a7 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "misc.h"
 #include "vti.h"
@@ -61,12 +62,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
-
-struct fdesc{
-unsigned long ip;
-unsigned long gp;
-};
-
 static void kvm_flush_icache(unsigned long start, unsigned long len)
 {
int l;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM build fix for 2.6.27-rc6

2008-09-19 Thread Avi Kivity
Linus, a recent merge has broken the build for KVM on ia64.  Please pull from

  git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git kvm-updates/2.6.27

to fix it.

Shortlog/diffstat/diff below.

Jes Sorensen (1):
  KVM: ia64: 'struct fdesc' build fix

 arch/ia64/kvm/kvm-ia64.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 7a37d06..cd0d1a7 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "misc.h"
 #include "vti.h"
@@ -61,12 +62,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
-
-struct fdesc{
-unsigned long ip;
-unsigned long gp;
-};
-
 static void kvm_flush_icache(unsigned long start, unsigned long len)
 {
int l;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] move MAX_CPUS to cpu.h

2008-09-19 Thread Avi Kivity

Jes Sorensen wrote:

Hi,

I noticed that qemu-kvm.c hardcodes the array of struct vcpu_info
to 256, instead of using the MAX_CPUS #define. This patch corrects
this by moving the definition of MAX_CPUS to cpu.h from vl.c and
then fixes qemu-kvm.c


This should be send to qemu-devel (without the kvm change), and after 
the next merge we can fix qemu-kvm.c.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 02/10] KVM: MMU: move local TLB flush to mmu_set_spte

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:

Since the sync page path can collapse flushes.

Also only flush if the spte was writable before.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

@@ -1241,9 +1239,12 @@ static void mmu_set_spte(struct kvm_vcpu
}
}
if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
- dirty, largepage, gfn, pfn, speculative))
+ dirty, largepage, gfn, pfn, speculative)) {
if (write_fault)
*ptwrite = 1;
+   if (was_writeble)
+   kvm_x86_ops->tlb_flush(vcpu);
+   }
 
  


I think we had cases where the spte.pfn contents changed, for example 
when a large page was replaced by a normal page, and also:


   } else if (pfn != spte_to_pfn(*shadow_pte)) {


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 03/10] KVM: MMU: do not write-protect large mappings

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:

There is not much point in write protecting large mappings. This
can only happen when a page is shadowed during the window between
is_largepage_backed and mmu_lock acquision. Zap the entry instead, so
the next pagefault will find a shadowed page via is_largepage_backed and
fallback to 4k translations.

Simplifies out of sync shadow.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1180,11 +1180,16 @@ static int set_spte(struct kvm_vcpu *vcp
|| (write_fault && !is_write_protection(vcpu) && !user_fault)) {
struct kvm_mmu_page *shadow;
 
+		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {

+   ret = 1;
+   spte = shadow_trap_nonpresent_pte;
+   goto set_pte;
+   }
+
spte |= PT_WRITABLE_MASK;
 
 		shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);

-   if (shadow ||
-  (largepage && has_wrprotected_page(vcpu->kvm, gfn))) {
+   if (shadow) {
pgprintk("%s: found shadow page for %lx, marking ro\n",
 __func__, gfn);
ret = 1;
@@ -1197,6 +1202,7 @@ static int set_spte(struct kvm_vcpu *vcp
if (pte_access & ACC_WRITE_MASK)
mark_page_dirty(vcpu->kvm, gfn);
 
+set_pte:

set_shadow_pte(shadow_pte, spte);
return ret;
 }
  


Don't we need to drop a reference to the page?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/10] KVM: MMU: mode specific sync_page

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:

Examine guest pagetable and bring the shadow back in sync. Caller is responsible
for local TLB flush before re-entering guest mode.

  


Neat!  We had a gpte snapshot, and I forgot all about it.


+   for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
+   if (is_shadow_present_pte(sp->spt[i])) {
  


if (!is_..())
  continue;

to reduce indentation.


+   pte_gpa += (i+offset) * sizeof(pt_element_t);
+
+   if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte,
+ sizeof(pt_element_t)))
+   return -EINVAL;
  


I guess we want a kvm_map_guest_page_atomic() to speed this up.  Can be 
done later as an optimization, of course.



+
+   if (gpte_to_gfn(gpte) != gfn || !(gpte & 
PT_ACCESSED_MASK)) {
+   rmap_remove(vcpu->kvm, &sp->spt[i]);
+   if (is_present_pte(gpte))
+   sp->spt[i] = shadow_trap_nonpresent_pte;
+   else
+   sp->spt[i] = 
shadow_notrap_nonpresent_pte;
  


set_shadow_pte()


+   continue;
+   }
+
+   if (!is_present_pte(gpte)) {
+   rmap_remove(vcpu->kvm, &sp->spt[i]);
+   sp->spt[i] = shadow_notrap_nonpresent_pte;
+   continue;
+   }
  


Merge with previous block?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 06/10] KVM: x86: trap invlpg

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:
 
+static int FNAME(shadow_invlpg_entry)(struct kvm_shadow_walk *_sw,

+ struct kvm_vcpu *vcpu, u64 addr,
+ u64 *sptep, int level)
+{
+
+   if (level == PT_PAGE_TABLE_LEVEL) {
+   if (is_shadow_present_pte(*sptep))
+   rmap_remove(vcpu->kvm, sptep);
+   set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
  


Need to flush the real tlb as well.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] KVM: MMU: mmu_parent_walk

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:

Introduce a function to walk all parents of a given page, invoking a handler.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>


Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -147,6 +147,8 @@ struct kvm_shadow_walk {
 u64 addr, u64 *spte, int level);
 };
 
+typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);

+
 static struct kmem_cache *pte_chain_cache;
 static struct kmem_cache *rmap_desc_cache;
 static struct kmem_cache *mmu_page_header_cache;
@@ -862,6 +864,65 @@ static void mmu_page_remove_parent_pte(s
BUG();
 }
 
+struct mmu_parent_walk {

+   struct hlist_node *node;
+   int i;
+};
+
+static struct kvm_mmu_page *mmu_parent_next(struct kvm_mmu_page *sp,
+   struct mmu_parent_walk *walk)
+{
+   struct kvm_pte_chain *pte_chain;
+   struct hlist_head *h;
+
+   if (!walk->node) {
+   if (!sp || !sp->parent_pte)
+   return NULL;
+   if (!sp->multimapped)
+   return page_header(__pa(sp->parent_pte));
+   h = &sp->parent_ptes;
+   walk->node = h->first;
+   walk->i = 0;
+   }
+
+   while (walk->node) {
+   pte_chain = hlist_entry(walk->node, struct kvm_pte_chain, link);
+   while (walk->i < NR_PTE_CHAIN_ENTRIES) {
+   int i = walk->i++;
+   if (!pte_chain->parent_ptes[i])
+   break;
+   return page_header(__pa(pte_chain->parent_ptes[i]));
+   }
+   walk->node = walk->node->next;
+   walk->i = 0;
+   }
+
+   return NULL;
+}
+
+static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+   mmu_parent_walk_fn fn)
+{
+   int level, start_level;
+   struct mmu_parent_walk walk[PT64_ROOT_LEVEL];
+
+   memset(&walk, 0, sizeof(walk));
+   level = start_level = sp->role.level;
+
+   do {
+   sp = mmu_parent_next(sp, &walk[level-1]);
+   if (sp) {
+   if (sp->role.level > start_level)
+   fn(vcpu, sp);
+   if (level != sp->role.level)
+   ++level;
+   WARN_ON (level > PT64_ROOT_LEVEL);
+   continue;
+   }
+   --level;
+   } while (level > start_level-1);
+}
+


Could be much simplified with recursion, no?  As the depth is limited to 
4, there's no stack overflow problem.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 09/10] KVM: MMU: out of sync shadow core v2

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:

 static struct kmem_cache *rmap_desc_cache;
@@ -942,6 +943,39 @@ static void nonpaging_invlpg(struct kvm_
 {
 }
 
+static int mmu_unsync_walk(struct kvm_mmu_page *parent, mmu_unsync_fn fn,

+  void *priv)
  


Instead of private, have an object contain both callback and private 
data, and use container_of().  Reduces the chance of type errors.



+{
+   int i, ret;
+   struct kvm_mmu_page *sp = parent;
+
+   while (parent->unsync_children) {
+   for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+   u64 ent = sp->spt[i];
+
+   if (is_shadow_present_pte(ent)) {
+   struct kvm_mmu_page *child;
+   child = page_header(ent & PT64_BASE_ADDR_MASK);
+
+   if (child->unsync_children) {
+   sp = child;
+   break;
+   }
+   if (child->unsync) {
+   ret = fn(child, priv);
+   if (ret)
+   return ret;
+   }
+   }
+   }
+   if (i == PT64_ENT_PER_PAGE) {
+   sp->unsync_children = 0;
+   sp = parent;
+   }
+   }
+   return 0;
+}
  


What does this do?


+static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+   if (sp->role.glevels != vcpu->arch.mmu.root_level) {
+   kvm_mmu_zap_page(vcpu->kvm, sp);
+   return 1;
+   }
  


Suppose we switch to real mode, touch a pte, switch back.  Is this handled?


@@ -991,8 +1066,18 @@ static struct kvm_mmu_page *kvm_mmu_get_
 gfn, role.word);
index = kvm_page_table_hashfn(gfn);
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
-   hlist_for_each_entry(sp, node, bucket, hash_link)
-   if (sp->gfn == gfn && sp->role.word == role.word) {
+   hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+   if (sp->gfn == gfn) {
+   if (sp->unsync)
+   if (kvm_sync_page(vcpu, sp))
+   continue;
+
+   if (sp->role.word != role.word)
+   continue;
+
+   if (sp->unsync_children)
+   vcpu->arch.mmu.need_root_sync = 1;
  


mmu_reload() maybe?


 static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
+   int ret;
++kvm->stat.mmu_shadow_zapped;
+   ret = mmu_zap_unsync_children(kvm, sp);
kvm_mmu_page_unlink_children(kvm, sp);
kvm_mmu_unlink_parents(kvm, sp);
kvm_flush_remote_tlbs(kvm);
if (!sp->role.invalid && !sp->role.metaphysical)
unaccount_shadowed(kvm, sp->gfn);
+   if (sp->unsync)
+   kvm_unlink_unsync_page(kvm, sp);
if (!sp->root_count) {
hlist_del(&sp->hash_link);
kvm_mmu_free_page(kvm, sp);
@@ -1129,7 +1245,7 @@ static int kvm_mmu_zap_page(struct kvm *
kvm_reload_remote_mmus(kvm);
}
kvm_mmu_reset_last_pte_updated(kvm);
-   return 0;
+   return ret;
 }
  


Why does the caller care if zap also zapped some other random pages?  To 
restart walking the list?


 
+

+static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+   unsigned index;
+   struct hlist_head *bucket;
+   struct kvm_mmu_page *s;
+   struct hlist_node *node, *n;
+
+   index = kvm_page_table_hashfn(sp->gfn);
+   bucket = &vcpu->kvm->arch.mmu_page_hash[index];
+   /* don't unsync if pagetable is shadowed with multiple roles */
+   hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
+   if (s->gfn != sp->gfn || s->role.metaphysical)
+   continue;
+   if (s->role.word != sp->role.word)
+   return 1;
+   }
  


This will happen for nonpae paging.  But why not allow it?  Zap all 
unsynced pages on mode switch.


Oh, if a page is both a page directory and page table, yes.  So to allow 
nonpae oos, check the level instead.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 10/10] KVM: MMU: speed up mmu_unsync_walk

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:

Cache the unsynced children information in a per-page bitmap.

 static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp)
 {
@@ -946,33 +978,57 @@ static void nonpaging_invlpg(struct kvm_
 static int mmu_unsync_walk(struct kvm_mmu_page *parent, mmu_unsync_fn fn,
   void *priv)
 {
-   int i, ret;
-   struct kvm_mmu_page *sp = parent;
+   int ret, level, i;
+   u64 ent;
+   struct kvm_mmu_page *sp, *child;
+   struct walk {
+   struct kvm_mmu_page *sp;
+   int pos;
+   } walk[PT64_ROOT_LEVEL];
 
-	while (parent->unsync_children) {

-   for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-   u64 ent = sp->spt[i];
+   WARN_ON(parent->role.level == PT_PAGE_TABLE_LEVEL);
+
+   if (!parent->unsync_children)
+   return 0;
+
+   memset(&walk, 0, sizeof(walk));
+   level = parent->role.level;
+   walk[level-1].sp = parent;
+
+   do {
+   sp = walk[level-1].sp;
+   i = find_next_bit(sp->unsync_child_bitmap, 512, 
walk[level-1].pos);
+   if (i < 512) {
+   walk[level-1].pos = i+1;
+   ent = sp->spt[i];
 
 			if (is_shadow_present_pte(ent)) {

-   struct kvm_mmu_page *child;
child = page_header(ent & PT64_BASE_ADDR_MASK);
 
 if (child->unsync_children) {

-   sp = child;
-   break;
+   --level;
+   walk[level-1].sp = child;
+   walk[level-1].pos = 0;
+   continue;
}
if (child->unsync) {
ret = fn(child, priv);
+   __clear_bit(i, sp->unsync_child_bitmap);
if (ret)
return ret;
}
}
+   __clear_bit(i, sp->unsync_child_bitmap);
+   } else {
+   ++level;
+   if (find_first_bit(sp->unsync_child_bitmap, 512) == 
512) {
+   sp->unsync_children = 0;
+   if (level-1 < PT64_ROOT_LEVEL)
+   walk[level-1].pos = 0;
+   }
}
-   if (i == PT64_ENT_PER_PAGE) {
-   sp->unsync_children = 0;
-   sp = parent;
-   }
-   }
+   } while (level <= parent->role.level);
+
return 0;
 }
  




 
--- kvm.orig/include/asm-x86/kvm_host.h

+++ kvm/include/asm-x86/kvm_host.h
@@ -201,6 +201,7 @@ struct kvm_mmu_page {
u64 *parent_pte;   /* !multimapped */
struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
};
+   DECLARE_BITMAP(unsync_child_bitmap, 512);
 };
 


Later, we can throw this bitmap out to a separate object.  Also, it may 
make sense to replace it with an array of u16s.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 00/10] out of sync shadow v2

2008-09-19 Thread Avi Kivity

Marcelo Tosatti wrote:

On Thu, Sep 18, 2008 at 06:27:49PM -0300, Marcelo Tosatti wrote:
  

Addressing earlier comments.



Ugh, forgot to convert shadow_notrap -> shadow_trap on unsync, 
so bypass_guest_pf=1 is still broken.



  


Also, forgot the nice benchmark results.

I really like this (at least the parts I understand, which I believe are 
most of the patchset).  This looks much closer to merging.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: First performance numbers

2008-09-19 Thread Avi Kivity

Joerg Roedel wrote:

Ok, here are some performance numbers for nested svm. I ran kernbench -M
on a virtual machine with 4G RAM and 1 VCPU (since nesting SMP guests
do currently not work). I measured simple virtualization with a shadow
paging guest on bare metal and within a nested guest (same guest image)
on a nested paging enabled first level guest.

 | Shadow Guest (100%) | Nested Guest (X)  | X
-+-+---+
Elapsed Time | 553.244 (1.21208)   | 1185.95 (20.0365) | 214.363%
User Time| 407.728 (0.987279)  | 520.434 (8.55643) | 127.642% 
System Time  | 144.828 (0.480645)  | 664.528 (11.6648) | 458.839%

Percent CPU  | 99 (0)  | 99 (0)| 100.000%
Context Switches | 98265.2 (183.001)   | 220015 (3302.74)  | 223.899%
Sleeps   | 49397.8 (31.0274)   | 49460.2 (364.84)  | 100.126%

So we have an overall slowdown in the first nesting level of more than
50%. Mostly because we spend so much time in the system level. Seems
there is some work to do for performance improvements :-)

  


Do you have kvm_stat output for the two cases?  Also interesting to run 
kvm_stat on both guest and host.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Graceful shutdown for OpenBSD

2008-09-19 Thread [EMAIL PROTECTED]

##Sent this first with wrong date (14th Sep), apologies.

Hello,

I want to be able to shut down all virtual machines gracefully by
running "virsh shutdown VM".

For Linux (tested with Debian Lenny) this works.

But OpenBSD does not work.

I read somewhere that kvm/qemu sends an "acpi shutdown signal" to the
guest OS when running the virsh shutdown command. Is this correct?

I am having problems enabling acpi on OpenBSD (its not enabled by
default) and I want to be sure that everything on the side of the guest
is working, so I need to know what exactly is this signal?

Maybe "power button pressed" or something?

Thank you.
Benjamin Reiter

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: First performance numbers

2008-09-19 Thread Joerg Roedel
On Fri, Sep 19, 2008 at 06:30:50PM -0700, Avi Kivity wrote:
> Joerg Roedel wrote:
> >Ok, here are some performance numbers for nested svm. I ran kernbench -M
> >on a virtual machine with 4G RAM and 1 VCPU (since nesting SMP guests
> >do currently not work). I measured simple virtualization with a shadow
> >paging guest on bare metal and within a nested guest (same guest image)
> >on a nested paging enabled first level guest.
> >
> > | Shadow Guest (100%) | Nested Guest (X)  | X
> >-+-+---+
> >Elapsed Time | 553.244 (1.21208)   | 1185.95 (20.0365) | 214.363%
> >User Time| 407.728 (0.987279)  | 520.434 (8.55643) | 127.642% 
> >System Time  | 144.828 (0.480645)  | 664.528 (11.6648) | 458.839%
> >Percent CPU  | 99 (0)  | 99 (0)| 100.000%
> >Context Switches | 98265.2 (183.001)   | 220015 (3302.74)  | 223.899%
> >Sleeps   | 49397.8 (31.0274)   | 49460.2 (364.84)  | 100.126%
> >
> >So we have an overall slowdown in the first nesting level of more than
> >50%. Mostly because we spend so much time in the system level. Seems
> >there is some work to do for performance improvements :-)
> >
> >  
> 
> Do you have kvm_stat output for the two cases?  Also interesting to run 
> kvm_stat on both guest and host.

Sorry, no. But I can repeat the measurements and gather these numbers.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VMX: Host NMI triggering on NMI vmexit

2008-09-19 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
>> Sheng,
>>
>> out of curiosity: vmx_vcpu_run invokes 'int $2' to trigger a host NMI if
>> the VM exited due to an external NMI event. According to Intel specs I
>> have, software-triggered NMIs do not block hardware NMIs. So are we
>> facing the risk to receive another NMI while running the first handler?
>> Or will the VM be left with the hardware blocking logic armed? Or does
>> Linux not care about NMI handler re-entrance?
>>   
> 
> All good questions.  Usually this doesn't happen since NMI sources are
> far apart (oprofile, watchdog).

Only true until you have multiple unsynchronized NMI sources, e.g.
inter-CPU NMIs of kgdb + a watchdog. I just stumbled over several bugs
in kvm's and my own NMI code that were triggered by such a scenario
(sigh...).

> 
> Maybe the answer is to generate the local nmi via an IPI-to-self command
> to the local apic.
> 

That sounds like a good idea, will look into this right after fixing the
other NMI issues.

Jan



signature.asc
Description: OpenPGP digital signature