date:20091026

From: Eduardo Habkost ehabk...@redhat.com

This should have no effect, it is just to make the code clearer.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 364263a..1773017 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2538,7 +2538,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
if (vmx-vpid != 0)
vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx-vpid);
 
-   vmx-vcpu.arch.cr0 = 0x6010;
+   vmx-vcpu.arch.cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
vmx_set_cr0(vmx-vcpu, vmx-vcpu.arch.cr0); /* enter rmode */
vmx_set_cr4(vmx-vcpu, 0);
vmx_set_efer(vmx-vcpu, 0);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Merge commit 'tip/x86/entry'

From: Avi Kivity a...@redhat.com

Merge the user-return-notifier infrastructure.

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: SVM: init_vmcb(): remove redundant save-cr0 initialization

From: Eduardo Habkost ehabk...@redhat.com

The svm_set_cr0() call will initialize save-cr0 properly even when npt is
enabled, clearing the NW and CD bits as expected, so we don't need to
initialize it manually for npt_enabled anymore.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c9ef6c0..34b700f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -648,8 +648,6 @@ static void init_vmcb(struct vcpu_svm *svm)
control-intercept_cr_write = ~(INTERCEPT_CR0_MASK|
 INTERCEPT_CR3_MASK);
save-g_pat = 0x0007040600070406ULL;
-   /* enable caching because the QEMU Bios doesn't enable it */
-   save-cr0 = X86_CR0_ET;
save-cr3 = 0;
save-cr4 = 0;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Fix user return notifier build

From: Avi Kivity a...@redhat.com

When CONFIG_USER_RETURN_NOTIFIER is set, we need to link
kernel/user-return-notifier.o.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kernel/Makefile b/kernel/Makefile
index b8d4cd8..0ae57a8 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -95,6 +95,7 @@ obj-$(CONFIG_RING_BUFFER) += trace/
 obj-$(CONFIG_SMP) += sched_cpupri.o
 obj-$(CONFIG_SLOW_WORK) += slow-work.o
 obj-$(CONFIG_PERF_EVENTS) += perf_event.o
+obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
 
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra a...@linuxcare.com.au, the -fno-omit-frame-pointer 
is
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: SVM: Reset cr0 properly on vcpu reset

From: Eduardo Habkost ehabk...@redhat.com

svm_vcpu_reset() was not properly resetting the contents of the guest-visible
cr0 register, causing the following issue:
https://bugzilla.redhat.com/show_bug.cgi?id=525699

Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine
with paging enabled, making the vcpu get a pagefault exception while trying to
run it.

Instead of setting vmcb-save.cr0 directly, the new code just resets
kvm-arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
vmcb-save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().

kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ffa6ad2..c9ef6c0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -628,11 +628,12 @@ static void init_vmcb(struct vcpu_svm *svm)
save-rip = 0xfff0;
svm-vcpu.arch.regs[VCPU_REGS_RIP] = save-rip;
 
-   /*
-* cr0 val on cpu init should be 0x6010, we enable cpu
-* cache by default. the orderly way is to enable cache in bios.
+   /* This is the guest-visible cr0 value.
+* svm_set_cr0() sets PG and WP and clears NW and CD on save-cr0.
 */
-   save-cr0 = 0x0010 | X86_CR0_PG | X86_CR0_WP;
+   svm-vcpu.arch.cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
+   kvm_set_cr0(svm-vcpu, svm-vcpu.arch.cr0);
+
save-cr4 = X86_CR4_PAE;
/* rdx = ?? */
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area

From: Avi Kivity a...@redhat.com

Currently MSR_KERNEL_GS_BASE is saved and restored as part of the
guest/host msr reloading.  Since we wish to lazy-restore all the other
msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using
the common code.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1773017..d1f40cc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -99,7 +99,8 @@ struct vcpu_vmx {
int   save_nmsrs;
int   msr_offset_efer;
 #ifdef CONFIG_X86_64
-   int   msr_offset_kernel_gs_base;
+   u64   msr_host_kernel_gs_base;
+   u64   msr_guest_kernel_gs_base;
 #endif
struct vmcs  *vmcs;
struct {
@@ -202,7 +203,7 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
  */
 static const u32 vmx_msr_index[] = {
 #ifdef CONFIG_X86_64
-   MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR, MSR_KERNEL_GS_BASE,
+   MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
 #endif
MSR_EFER, MSR_K6_STAR,
 };
@@ -674,10 +675,10 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
 #endif
 
 #ifdef CONFIG_X86_64
-   if (is_long_mode(vmx-vcpu))
-   save_msrs(vmx-host_msrs +
- vmx-msr_offset_kernel_gs_base, 1);
-
+   if (is_long_mode(vmx-vcpu)) {
+   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
+   wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
+   }
 #endif
load_msrs(vmx-guest_msrs, vmx-save_nmsrs);
load_transition_efer(vmx);
@@ -711,6 +712,12 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
save_msrs(vmx-guest_msrs, vmx-save_nmsrs);
load_msrs(vmx-host_msrs, vmx-save_nmsrs);
reload_host_efer(vmx);
+#ifdef CONFIG_X86_64
+   if (is_long_mode(vmx-vcpu)) {
+   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
+   wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
+   }
+#endif
 }
 
 static void vmx_load_host_state(struct vcpu_vmx *vmx)
@@ -940,9 +947,6 @@ static void setup_msrs(struct vcpu_vmx *vmx)
index = __find_msr_index(vmx, MSR_CSTAR);
if (index = 0)
move_msr_up(vmx, index, save_nmsrs++);
-   index = __find_msr_index(vmx, MSR_KERNEL_GS_BASE);
-   if (index = 0)
-   move_msr_up(vmx, index, save_nmsrs++);
/*
 * MSR_K6_STAR is only needed on long mode guests, and only
 * if efer.sce is enabled.
@@ -954,10 +958,6 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 #endif
vmx-save_nmsrs = save_nmsrs;
 
-#ifdef CONFIG_X86_64
-   vmx-msr_offset_kernel_gs_base =
-   __find_msr_index(vmx, MSR_KERNEL_GS_BASE);
-#endif
vmx-msr_offset_efer = __find_msr_index(vmx, MSR_EFER);
 
if (cpu_has_vmx_msr_bitmap()) {
@@ -1015,6 +1015,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
case MSR_GS_BASE:
data = vmcs_readl(GUEST_GS_BASE);
break;
+   case MSR_KERNEL_GS_BASE:
+   vmx_load_host_state(to_vmx(vcpu));
+   data = to_vmx(vcpu)-msr_guest_kernel_gs_base;
+   break;
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_index, pdata);
 #endif
@@ -1068,6 +1072,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 data)
case MSR_GS_BASE:
vmcs_writel(GUEST_GS_BASE, data);
break;
+   case MSR_KERNEL_GS_BASE:
+   vmx_load_host_state(vmx);
+   vmx-msr_guest_kernel_gs_base = data;
+   break;
 #endif
case MSR_IA32_SYSENTER_CS:
vmcs_write32(GUEST_SYSENTER_CS, data);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Fix Xen hvm msr ioctl by adding a flags field

From: Avi Kivity a...@redhat.com

So we can extend it later.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index f504e0b..36594ba 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -608,8 +608,8 @@ page of a blob (32- or 64-bit, depending on the vcpu mode) 
to guest
 memory.
 
 struct kvm_xen_hvm_config {
+   __u32 flags;
__u32 msr;
-   __u32 pad1;
__u64 blob_addr_32;
__u64 blob_addr_64;
__u8 blob_size_32;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7203bca..93ed656 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2542,6 +2542,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (copy_from_user(kvm-arch.xen_hvm_config, argp,
   sizeof(struct kvm_xen_hvm_config)))
goto out;
+   r = -EINVAL;
+   if (kvm-arch.xen_hvm_config.flags)
+   goto out;
r = 0;
break;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index cf2b011..6ed1a12 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -494,8 +494,8 @@ struct kvm_x86_mce {
 
 #ifdef KVM_CAP_XEN_HVM
 struct kvm_xen_hvm_config {
+   __u32 flags;
__u32 msr;
-   __u32 pad1;
__u64 blob_addr_32;
__u64 blob_addr_64;
__u8 blob_size_32;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: powerpc: Fix BUILD_BUG_ON condition

From: Hollis Blanchard holl...@us.ibm.com

The old BUILD_BUG_ON implementation didn't work with __builtin_constant_p().
Fixing that revealed this test had been inverted for a long time without
anybody noticing...

Signed-off-by: Hollis Blanchard holl...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/kvm/timing.h b/arch/powerpc/kvm/timing.h
index bb13b1f..a550f0f 100644
--- a/arch/powerpc/kvm/timing.h
+++ b/arch/powerpc/kvm/timing.h
@@ -48,7 +48,7 @@ static inline void kvmppc_set_exit_type(struct kvm_vcpu 
*vcpu, int type) {}
 static inline void kvmppc_account_exit_stat(struct kvm_vcpu *vcpu, int type)
 {
/* type has to be known at build time for optimization */
-   BUILD_BUG_ON(__builtin_constant_p(type));
+   BUILD_BUG_ON(!__builtin_constant_p(type));
switch (type) {
case EXT_INTR_EXITS:
vcpu-stat.ext_intr_exits++;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 shared msr infrastructure

From: Avi Kivity a...@redhat.com

The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0558ff8..26a74b7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -809,4 +809,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 
+void kvm_define_shared_msr(unsigned index, u32 msr);
+void kvm_set_shared_msr(unsigned index, u64 val);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b84e571..4cd4983 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
+   select USER_RETURN_NOTIFIER
---help---
  Support hosting fully virtualized guest machines using hardware
  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b7f9bfe..7203bca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -37,6 +37,7 @@
 #include linux/iommu.h
 #include linux/intel-iommu.h
 #include linux/cpufreq.h
+#include linux/user-return-notifier.h
 #include trace/events/kvm.h
 #undef TRACE_INCLUDE_FILE
 #define CREATE_TRACE_POINTS
@@ -87,6 +88,25 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops);
 int ignore_msrs = 0;
 module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR);
 
+#define KVM_NR_SHARED_MSRS 16
+
+struct kvm_shared_msrs_global {
+   int nr;
+   struct kvm_shared_msr {
+   u32 msr;
+   u64 value;
+   } msrs[KVM_NR_SHARED_MSRS];
+};
+
+struct kvm_shared_msrs {
+   struct user_return_notifier urn;
+   bool registered;
+   u64 current_value[KVM_NR_SHARED_MSRS];
+};
+
+static struct kvm_shared_msrs_global __read_mostly shared_msrs_global;
+static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs);
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
{ pf_fixed, VCPU_STAT(pf_fixed) },
{ pf_guest, VCPU_STAT(pf_guest) },
@@ -123,6 +143,64 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+static void kvm_on_user_return(struct user_return_notifier *urn)
+{
+   unsigned slot;
+   struct kvm_shared_msr *global;
+   struct kvm_shared_msrs *locals
+   = container_of(urn, struct kvm_shared_msrs, urn);
+
+   for (slot = 0; slot  shared_msrs_global.nr; ++slot) {
+   global = shared_msrs_global.msrs[slot];
+   if (global-value != locals-current_value[slot]) {
+   wrmsrl(global-msr, global-value);
+   locals-current_value[slot] = global-value;
+   }
+   }
+   locals-registered = false;
+   user_return_notifier_unregister(urn);
+}
+
+void kvm_define_shared_msr(unsigned slot, u32 msr)
+{
+   int cpu;
+   u64 value;
+
+   if (slot = shared_msrs_global.nr)
+   shared_msrs_global.nr = slot + 1;
+   shared_msrs_global.msrs[slot].msr = msr;
+   rdmsrl_safe(msr, value);
+   shared_msrs_global.msrs[slot].value = value;
+   for_each_online_cpu(cpu)
+   per_cpu(shared_msrs, cpu).current_value[slot] = value;
+}
+EXPORT_SYMBOL_GPL(kvm_define_shared_msr);
+
+static void kvm_shared_msr_cpu_online(void)
+{
+   unsigned i;
+   struct kvm_shared_msrs *locals = __get_cpu_var(shared_msrs);
+
+   for (i = 0; i  shared_msrs_global.nr; ++i)
+   locals-current_value[i] = shared_msrs_global.msrs[i].value;
+}
+
+void kvm_set_shared_msr(unsigned slot, u64 value)
+{
+   struct kvm_shared_msrs *smsr = __get_cpu_var(shared_msrs);
+
+   if (value == smsr-current_value[slot])
+   return;
+   smsr-current_value[slot] = value;
+   wrmsrl(shared_msrs_global.msrs[slot].msr, value);
+   if (!smsr-registered) {
+   smsr-urn.on_user_return = kvm_on_user_return;
+   user_return_notifier_register(smsr-urn);
+   smsr-registered = true;
+   }
+}

[COMMIT master] KVM: remove duplicated task_switch check

From: Gleb Natapov g...@redhat.com

Probably introduced by a bad merge.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be968f1..2ef3906 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4532,11 +4532,6 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason)
if (reason != TASK_SWITCH_CALL  reason != TASK_SWITCH_GATE)
old_tss_sel = 0x;
 
-   /* set back link to prev task only if NT bit is set in eflags
-  note that old_tss_sel is not used afetr this point */
-   if (reason != TASK_SWITCH_CALL  reason != TASK_SWITCH_GATE)
-   old_tss_sel = 0x;
-
if (nseg_desc.type  8)
ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_sel,
 old_tss_base, nseg_desc);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Merge commit 'tip/x86/entry'

From: Avi Kivity a...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: get_tss_base_addr() should return a gpa_t

From: Gleb Natapov g...@redhat.com

If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G

Cc: sta...@kernel.org
Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 93ed656..be968f1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4214,7 +4214,7 @@ static int save_guest_segment_descriptor(struct kvm_vcpu 
*vcpu, u16 selector,
return kvm_write_guest_virt(dtable.base + index*8, seg_desc, 
sizeof(*seg_desc), vcpu);
 }
 
-static u32 get_tss_base_addr(struct kvm_vcpu *vcpu,
+static gpa_t get_tss_base_addr(struct kvm_vcpu *vcpu,
 struct desc_struct *seg_desc)
 {
u32 base_addr = get_desc_base(seg_desc);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM

2009-10-26 Thread MORITA Kazutaka


On 2009/10/25 17:51, Dietmar Maurer wrote:

Do you support multiple guests accessing the same image?

A VM image can be attached to any VMs but one VM at a time; multiple
running VMs cannot access to the same VM image.


I guess this is a problem when you want to do live migrations?


Yes, because Sheepdog locks a VM image when it is opened.
To avoid this problem, locking must be delayed until migration has done.
This is also a TODO item.

--
MORITA Kazutaka



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: x86: Add VCPU substate for NMI states

Avi Kivity wrote:
 On 10/15/2009 01:27 PM, Jan Kiszka wrote:
 Perhaps it makes sense to query about individual states, including
 existing ones?  That will allow us to deprecate and then phase out
 broken states.  It's probably not worth it.
  
 You may do this already with the given design: Set up a VCPU, then issue
 KVM_GET_VCPU_STATE on the substate in question. You will either get an
 error code or 0 if the substate is supported. At least no additional
 kernel code required.

 
 No, if some code requires a feature, we don't want to set up a guest and 
 a vcpu and issue dummy commands in order to find out if we can actually 
 run that code.
 
 Feature discovery needs to be a 'system ioctl' in the words of 
 Documentation/kvm/api.txt.
 

OK, added some system IOCTL 'KVM_GET_VCPU_STATE_LIST' to my to-do list.

Jan



signature.asc
Description: OpenPGP digital signature

Re: I/O performance of VirtIO

Avi Kivity wrote:
 On 10/23/2009 12:06 AM, Alexander Graf wrote:

 Am 22.10.2009 um 18:29 schrieb Avi Kivity a...@redhat.com:

 On 10/13/2009 08:35 AM, Jan Kiszka wrote:
 It can be particularly slow if you use in-kernel irqchips and the
 default NIC emulation (up to 10 times slower), some effect I always
 wanted to understand on a rainy day. So, when you actually want -net
 user, try -no-kvm-irqchip.


 This might be due to a missing SIGIO or SIGALRM; -no-kvm-irqchip
 generates a lot of extra signals and thus polling opportunities.

 Isn't that what dedicated io threads are supposed to solve?

 
 No.  Dedicated I/O threads provide parallelism.  All latency needs is to
 have SIGIO sent on all file descriptors (or rather, in qemu-kvm with
 irqchip, to have all file descriptors in the poll() call).
 
 Jan, does slirp add new connections to the select set?
 

It should do so in slirp_select_fill (it iterates over all TCPUDP
sockets of all instances). I think without doing this, slirp wouldn't
receive a single bit at all (no activity without FD_ISSET).

Jan



signature.asc
Description: OpenPGP digital signature

Re: List of unaccessible x86 states


On 10/25/2009 06:45 PM, Alexander Graf wrote:
It's not. We can't use the guest memory for hsave because then the 
guest could break the l1 state, so a malicious hypervisor could 
break us.


Guest hsave should be used for storing guest state when switching 
into the nested guest, not host state.  Host state is not part of the 
save/restore state in any case.



No it's not.

When going in an l2 guest, we need to save the l1 state in the hsave. 
Now if we'd use the l1 given hsave, the l2 guest could modify the hsave.


That means the l2 guest could rewrite the intercept bitmap to 0 and 
compromize the host.


L1 hsave stores the architected state saved by vmrun, e.g. cs.sel, 
next_rip, cr0, cr3, etc.  The host intercept bitmap is not state since 
it is calculated from the L1 intercept bitmap and host code.  Indeed it 
can be different from host to host even with the same guest state.



That's why we're storing the hsave data in a host allocated page.

Of course, we could save the whole hsave are off to the host on 
migeation...


Sorry, -ENOPARSE.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: I/O performance of VirtIO


On 10/26/2009 10:12 AM, Jan Kiszka wrote:

No.  Dedicated I/O threads provide parallelism.  All latency needs is to
have SIGIO sent on all file descriptors (or rather, in qemu-kvm with
irqchip, to have all file descriptors in the poll() call).

Jan, does slirp add new connections to the select set?

 

It should do so in slirp_select_fill (it iterates over all TCPUDP
sockets of all instances). I think without doing this, slirp wouldn't
receive a single bit at all (no activity without FD_ISSET).
   


Yes, so it seems from the code.  But something is missing if you get 
better performance with -no-kvm-irqchip.  Perhaps timers are off.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-net: fix data corruption with OOM

On Mon, Oct 26, 2009 at 12:11:51PM +1030, Rusty Russell wrote:
 On Mon, 26 Oct 2009 03:33:40 am Michael S. Tsirkin wrote:
  virtio net used to unlink skbs from send queues on error,
  but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
  we do not do this. This causes guest data corruption and crashes
  with vhost since net core can requeue the skb or free it without
  it being taken off the list.
  
  This patch fixes this by queueing the skb after successfull
  transmit.
 
 I originally thought that this was racy: as soon as we do add_buf, we need to
 make sure we're ready for the callback (for virtio_pci, it's -kick, but we
 shouldn't rely on that).
 
 So a comment would be nice.  How's this?

Acked-by: Michael S. Tsirkin m...@redhat.com

 Subject: virtio-net: fix data corruption with OOM
 Date: Sun, 25 Oct 2009 19:03:40 +0200
 From: Michael S. Tsirkin m...@redhat.com
 
 virtio net used to unlink skbs from send queues on error,
 but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
 we do not do this. This causes guest data corruption and crashes
 with vhost since net core can requeue the skb or free it without
 it being taken off the list.
 
 This patch fixes this by queueing the skb after successful
 transmit.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au (+ comment)
 ---
 
 Rusty, here's a fix for another data corrupter I saw.
 This fixes a regression from 2.6.31, so definitely
 2.6.32 I think. Comments?
 
  drivers/net/virtio_net.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -516,8 +516,7 @@ again:
   /* Free up any pending old buffers before queueing new ones. */
   free_old_xmit_skbs(vi);
  
 - /* Put new one in send queue and do transmit */
 - __skb_queue_head(vi-send, skb);
 + /* Try to transmit */
   capacity = xmit_skb(vi, skb);
  
   /* This can happen with OOM and indirect buffers. */
 @@ -531,8 +530,17 @@ again:
   }
   return NETDEV_TX_BUSY;
   }
 + vi-svq-vq_ops-kick(vi-svq);
  
 - vi-svq-vq_ops-kick(vi-svq);
 + /*
 +  * Put new one in send queue.  You'd expect we'd need this before
 +  * xmit_skb calls add_buf(), since the callback can be triggered
 +  * immediately after that.  But since the callback just triggers
 +  * another call back here, normal network xmit locking prevents the
 +  * race.
 +  */
 + __skb_queue_head(vi-send, skb);
 +
   /* Don't wait up for transmitted skbs to be freed. */
   skb_orphan(skb);
   nf_reset(skb);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 64 bit guest much faster ?

2009-10-26 Thread Gerd Hoffmann


On 10/23/09 17:54, Stefan wrote:


Hello,

I have a simple question (sorry I'm a kvm beginner):
Is it right that a 64bit guest (8 CPUs, 16GB) is
much faster than a 32bit guest (8 CPUs, 16GB PAE).

  
Yes.  With *that* much memory the 32bit guest struggles with address 
space limitations (32bit - 4G), whereas the 64bit guest doesn't.


With up to 1G you shouldn't see a noticable difference.  But the more 
highmem the 32bit guest uses the higher is the penalty.  Especially 
without ept/npt as every kmap() of a high page is a roundtrip to the 
hypervisor then.


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka to maintain kvm-kmod

I am pleased to announce that Jan Kiszka has agreed to maintain 
kvm-kmod.git, the backporting kit that allows running modern kvm code on 
older kernels.  Jan will release kvm-kmod-2.6.x.y packages and 
kvm-kmod-2.6.x-rcy packages, while Marcelo and I will (with Jan's help) 
release kvm-kmod-devel-xx.  Many thanks to Jan for taking on this task.


As there are now many different sources of kvm kernel modules to choose 
from, I wrote up a page that describes the various releases and what 
they are suited for.  This can be found in 
http://www.linux-kvm.org/page/Getting_the_kvm_kernel_modules.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-net: fix data corruption with OOM

On Mon, Oct 26, 2009 at 12:11:51PM +1030, Rusty Russell wrote:
 On Mon, 26 Oct 2009 03:33:40 am Michael S. Tsirkin wrote:
  virtio net used to unlink skbs from send queues on error,
  but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
  we do not do this. This causes guest data corruption and crashes
  with vhost since net core can requeue the skb or free it without
  it being taken off the list.
  
  This patch fixes this by queueing the skb after successfull
  transmit.
 
 I originally thought that this was racy: as soon as we do add_buf, we need to
 make sure we're ready for the callback (for virtio_pci, it's -kick, but we
 shouldn't rely on that).

BTW, wanted to note that unlink on error would *also* be racy if we did any
processing in the callback.

 So a comment would be nice.  How's this?
 
 Subject: virtio-net: fix data corruption with OOM
 Date: Sun, 25 Oct 2009 19:03:40 +0200
 From: Michael S. Tsirkin m...@redhat.com
 
 virtio net used to unlink skbs from send queues on error,
 but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
 we do not do this. This causes guest data corruption and crashes
 with vhost since net core can requeue the skb or free it without
 it being taken off the list.
 
 This patch fixes this by queueing the skb after successful
 transmit.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au (+ comment)
 ---
 
 Rusty, here's a fix for another data corrupter I saw.
 This fixes a regression from 2.6.31, so definitely
 2.6.32 I think. Comments?
 
  drivers/net/virtio_net.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -516,8 +516,7 @@ again:
   /* Free up any pending old buffers before queueing new ones. */
   free_old_xmit_skbs(vi);
  
 - /* Put new one in send queue and do transmit */
 - __skb_queue_head(vi-send, skb);
 + /* Try to transmit */
   capacity = xmit_skb(vi, skb);
  
   /* This can happen with OOM and indirect buffers. */
 @@ -531,8 +530,17 @@ again:
   }
   return NETDEV_TX_BUSY;
   }
 + vi-svq-vq_ops-kick(vi-svq);
  
 - vi-svq-vq_ops-kick(vi-svq);
 + /*
 +  * Put new one in send queue.  You'd expect we'd need this before
 +  * xmit_skb calls add_buf(), since the callback can be triggered
 +  * immediately after that.  But since the callback just triggers
 +  * another call back here, normal network xmit locking prevents the
 +  * race.
 +  */
 + __skb_queue_head(vi-send, skb);
 +
   /* Don't wait up for transmitted skbs to be freed. */
   skb_orphan(skb);
   nf_reset(skb);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states

2009-10-26 Thread Alexander Graf



Am 26.10.2009 um 09:33 schrieb Avi Kivity a...@redhat.com:


On 10/25/2009 06:45 PM, Alexander Graf wrote:
It's not. We can't use the guest memory for hsave because then  
the guest could break the l1 state, so a malicious hypervisor  
could break us.


Guest hsave should be used for storing guest state when switching  
into the nested guest, not host state.  Host state is not part of  
the save/restore state in any case.



No it's not.

When going in an l2 guest, we need to save the l1 state in the  
hsave. Now if we'd use the l1 given hsave, the l2 guest could  
modify the hsave.


That means the l2 guest could rewrite the intercept bitmap to 0 and  
compromize the host.


L1 hsave stores the architected state saved by vmrun, e.g. cs.sel,  
next_rip, cr0, cr3, etc.  The host intercept bitmap is not state  
since it is calculated from the L1 intercept bitmap and host code.   
Indeed it can be different from host to host even with the same  
guest state.


Ah, so you'd only save off the cpu state parts of the vmcb.

Currently we save off control parts too, so we can easily swap them in  
on #vmexit.


So if we'd migrate off when inside the nested guest, we'd have to save  
off the resume control state, OR them again with the guest vmcb  
control states and be inside the nested guest.


Wouldn't it be much easier to not migrate / save state when inside a  
nested guest? I'm afraid the code will become overly complex if we do  
allow migration while in a nested context.


Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states

On Sun, Oct 25, 2009 at 11:49:35AM +0200, Avi Kivity wrote:
 On 10/24/2009 12:35 PM, Alexander Graf wrote:
 
 Hm, thinking about this again, it might be useful to have an
 currently in nested VM flag here. That way userspace can decide
 if it needs to get out of the nested state (for migration) or if
 it just doesn't care.
 
 Getting out of nested state involves modifying state (both memory
 and registers).  Nor can we in the general case force it.  The guest
 can set up a situation where it is impossible to #vmexit.

There is actually more than that. If the guest runs in guest mode itself
we also need to report the host state to be able to do an #vmexit after
migration.
In nested SVM the host state is not saved in the guest memory to prevent
the guest from modifying it and break out of its virtualization jail.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states


On 10/26/2009 11:11 AM, Alexander Graf wrote:
L1 hsave stores the architected state saved by vmrun, e.g. cs.sel, 
next_rip, cr0, cr3, etc.  The host intercept bitmap is not state 
since it is calculated from the L1 intercept bitmap and host code.  
Indeed it can be different from host to host even with the same guest 
state.



Ah, so you'd only save off the cpu state parts of the vmcb.

Currently we save off control parts too, so we can easily swap them in 
on #vmexit.


These can still be saved in a host memory area as an optimization, and 
regenerated if needed.


So if we'd migrate off when inside the nested guest, we'd have to save 
off the resume control state, OR them again with the guest vmcb 
control states and be inside the nested guest.


Right, if the new state bit (guest mode) is set, we look at the control 
bits and OR them into the vmcb.  That part can be reused with the VMRUN 
code.




Wouldn't it be much easier to not migrate / save state when inside a 
nested guest? I'm afraid the code will become overly complex if we do 
allow migration while in a nested context.


I can't really see why but then I don't know the code as well as you 
do.  The current code won't work for guests which don't intercept 
external interrupts (probably only malware).  For nested vmx it may be 
necessary since vmx has a mode where interrupts are acknowledged during 
#VMEXIT and the interrupt vector is saved into a register; you can't 
fake an interrupt #VMEXIT since you can't fake the vector.  Xen is one 
guest which uses this mode.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states


On 10/26/2009 11:17 AM, Joerg Roedel wrote:

On Sun, Oct 25, 2009 at 11:49:35AM +0200, Avi Kivity wrote:
   

On 10/24/2009 12:35 PM, Alexander Graf wrote:
 

Hm, thinking about this again, it might be useful to have an
currently in nested VM flag here. That way userspace can decide
if it needs to get out of the nested state (for migration) or if
it just doesn't care.
   

Getting out of nested state involves modifying state (both memory
and registers).  Nor can we in the general case force it.  The guest
can set up a situation where it is impossible to #vmexit.
 

There is actually more than that. If the guest runs in guest mode itself
we also need to report the host state to be able to do an #vmexit after
migration.
In nested SVM the host state is not saved in the guest memory to prevent
the guest from modifying it and break out of its virtualization jail.
   


Which host state?  As far as I can tell, it can all be regenerated.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

On Fri, Oct 23, 2009 at 09:23:40AM -0700, Shirley Ma wrote:
 I also hit guest skb_xmit panic.

If these are the same panics I have seen myself,
they are probably fixed with recent virtio patches
I sent to Rusty. I put them on my vhost.git tree to make
it easier for you to test.
If you see any more crashes, please holler, preferably
with a backtrace.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states

On Mon, Oct 26, 2009 at 11:21:12AM +0200, Avi Kivity wrote:
 On 10/26/2009 11:17 AM, Joerg Roedel wrote:
 On Sun, Oct 25, 2009 at 11:49:35AM +0200, Avi Kivity wrote:
 On 10/24/2009 12:35 PM, Alexander Graf wrote:
 Hm, thinking about this again, it might be useful to have an
 currently in nested VM flag here. That way userspace can decide
 if it needs to get out of the nested state (for migration) or if
 it just doesn't care.
 Getting out of nested state involves modifying state (both memory
 and registers).  Nor can we in the general case force it.  The guest
 can set up a situation where it is impossible to #vmexit.
 There is actually more than that. If the guest runs in guest mode itself
 we also need to report the host state to be able to do an #vmexit after
 migration.
 In nested SVM the host state is not saved in the guest memory to prevent
 the guest from modifying it and break out of its virtualization jail.
 
 Which host state?  As far as I can tell, it can all be regenerated.

The state which is loaded into the vcpu when a #vmexit is emulated. This
includes segments, control registers and the host rip for example.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states


On 10/26/2009 11:30 AM, Joerg Roedel wrote:



Which host state?  As far as I can tell, it can all be regenerated.
 

The state which is loaded into the vcpu when a #vmexit is emulated. This
includes segments, control registers and the host rip for example.
   


All of this state does not change between nested guest and normal guest 
mode.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states

On Mon, Oct 26, 2009 at 11:39:46AM +0200, Avi Kivity wrote:
 On 10/26/2009 11:30 AM, Joerg Roedel wrote:
 
 Which host state?  As far as I can tell, it can all be regenerated.
 The state which is loaded into the vcpu when a #vmexit is emulated. This
 includes segments, control registers and the host rip for example.
 
 All of this state does not change between nested guest and normal
 guest mode.

I am talking about all the state that is saved in svm-nested.hsave.
When we migrate a guest vcpu while it is running in guest mode itself
(without forcing a nested #vmexit) this state is required when a #vmexit
needs to be emulated on this vcpu after migration.
Same is true for the nested intercept conditions.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states


On 10/26/2009 11:56 AM, Joerg Roedel wrote:

On Mon, Oct 26, 2009 at 11:39:46AM +0200, Avi Kivity wrote:
   

On 10/26/2009 11:30 AM, Joerg Roedel wrote:
 
   

Which host state?  As far as I can tell, it can all be regenerated.
 

The state which is loaded into the vcpu when a #vmexit is emulated. This
includes segments, control registers and the host rip for example.
   

All of this state does not change between nested guest and normal
guest mode.
 

I am talking about all the state that is saved in svm-nested.hsave.
When we migrate a guest vcpu while it is running in guest mode itself
(without forcing a nested #vmexit) this state is required when a #vmexit
needs to be emulated on this vcpu after migration.
Same is true for the nested intercept conditions.
   


The state that is saved by VMRUN can be saved to guest memory and 
migrated.  Extra state (like the intercepts for the previous mode) must 
be saved to host memory and not migrated; host intercepts can be 
regenerated.


Concretely:


hsave-save.es = vmcb-save.es;
hsave-save.cs = vmcb-save.cs;
hsave-save.ss = vmcb-save.ss;
hsave-save.ds = vmcb-save.ds;
hsave-save.gdtr   = vmcb-save.gdtr;
hsave-save.idtr   = vmcb-save.idtr;
hsave-save.efer   = svm-vcpu.arch.shadow_efer;
hsave-save.cr0= svm-vcpu.arch.cr0;
hsave-save.cr4= svm-vcpu.arch.cr4;
hsave-save.rflags = vmcb-save.rflags;
hsave-save.rip= svm-next_rip;
hsave-save.rsp= vmcb-save.rsp;
hsave-save.rax= vmcb-save.rax;
if (npt_enabled)
hsave-save.cr3= vmcb-save.cr3;
else
hsave-save.cr3= svm-vcpu.arch.cr3;


Can all be saved to guest memory.

copy_vmcb_control_area(hsave, vmcb);

Must not be saved into guest memory.  On the other hand, it is not 
needed for migration.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 64 bit guest much faster ?


On 10/26/2009 10:58 AM, Gerd Hoffmann wrote:

On 10/23/09 17:54, Stefan wrote:


Hello,

I have a simple question (sorry I'm a kvm beginner):
Is it right that a 64bit guest (8 CPUs, 16GB) is
much faster than a 32bit guest (8 CPUs, 16GB PAE).

  
Yes.  With *that* much memory the 32bit guest struggles with address 
space limitations (32bit - 4G), whereas the 64bit guest doesn't.


With up to 1G you shouldn't see a noticable difference.  But the more 
highmem the 32bit guest uses the higher is the penalty.  Especially 
without ept/npt as every kmap() of a high page is a roundtrip to the 
hypervisor then.




Oh yes, without ept/npt the slowdown should indeed be significant with 
this much memory.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm problems on new hardware

2009-10-26 Thread Danny ter Haar


Hello,
I have a KVM virtualization problem.
I've put together new hardware (supermicro) server with 2 E5530 cpu's
and memory  disk to start experimenting with virtualization.

I intend to use the www.proxmox.com system/setup.
I installed proxmox and started stress testing the hardware: 
parallel kernel compiles in a loop (concurrency_level=32) 
memtest86+ during the night etc.
The hardware/os performs rocksolid when i stress test it, but the moment
i start a virtual guest (eg debian netinstall) i get the first screen of the
installation procedure in a vnc screen. I choose either normal install or 
expert install , the guest screen goes blank with only a cursor and the 
kvm process prints an error on the console and starts to eat cpu cycles. 
So the host OS is not barfing, only the kvm process is giving problems and the 
guest is frozen.

To see if it was/is related to the older proxmox kernel setup i installed
ubuntu karmic with libvirt on another harddrive for comparison: 
Same error happens when i start a guest.

Oct 23 09:34:14 ubuntu kernel: [  416.226550] device vnet0 left promiscuous mode
Oct 23 09:34:14 ubuntu kernel: [  416.226554] br0: port 2(vnet0) entering 
disabled state
Oct 23 09:34:57 ubuntu kernel: [  459.544150] type=1505 
audit(1256290497.414:17): operation=profile_load pid=1676 
name=libvirt-2ae923e6-f06d-9f0d-d072-c2067b7cbee4
Oct 23 09:34:57 ubuntu kernel: [  459.550725] device vnet0 entered promiscuous 
mode
Oct 23 09:34:57 ubuntu kernel: [  459.551888] br0: port 2(vnet0) entering 
learning state
Oct 23 09:34:57 ubuntu kernel: [  459.557989] type=1503 
audit(1256290497.429:18): operation=open pid=1679 parent=1 
profile=libvirt-2ae923e6-f06d-9f0d-d072-c2067b7cbee4 requested_mask=rw:: 
denied_mask=w:: fsuid=0 ouid=0 name=/var/tmp/debian-503-amd64-netinst.iso
Oct 23 09:35:05 ubuntu kernel: [  468.066681] handle_exception: unexpected, 
vectoring info 0x8010 intr info 0x8b0d
Oct 23 09:35:05 ubuntu kernel: [  468.066760] handle_exception: unexpected, 
vectoring info 0x800d intr info 0x8b0d
Oct 23 09:35:05 ubuntu kernel: [  468.066836] handle_exception: unexpected, 
vectoring info 0x800d intr info 0x8b0d


In the bios there are settings for vt-d etc, i tried (imho) all combinations
but i am not able to start a guest. I spoke to supermicro support and even 
got a more recent (yet unpublished) bios. All without success.

No i am up to the point where i dont know if i have a hardware or a software
problem. I installed the intel-microcode package to see if that maybe fixed 
something: It didn't


References:
chassis: 
http://supermicro.com/products/system/2U/6026/SYS-6026TT-BIBQRF.cfm?INF=
motherboard:
http://supermicro.com/products/motherboard/QPI/5500/X8DTT-F.cfm
7 hourmemtest:
http://dth.net/supermicro/memtest86_completed_7hr.jpg

More output (dmesg and other stuff) in the same dir:
http://dth.net/supermicro/

Am i trying to run this on hardware that is to recent and not yet tested ?

I hope somebody has some ideas/hints about this.


Danny
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 64 bit guest much faster ?

2009-10-26 Thread Michael Tokarev


Avi Kivity wrote:

On 10/26/2009 10:58 AM, Gerd Hoffmann wrote:

On 10/23/09 17:54, Stefan wrote:


Hello,

I have a simple question (sorry I'm a kvm beginner):
Is it right that a 64bit guest (8 CPUs, 16GB) is
much faster than a 32bit guest (8 CPUs, 16GB PAE).

  
Yes.  With *that* much memory the 32bit guest struggles with address 
space limitations (32bit - 4G), whereas the 64bit guest doesn't.


With up to 1G you shouldn't see a noticable difference.  But the more 
highmem the 32bit guest uses the higher is the penalty.  Especially 
without ept/npt as every kmap() of a high page is a roundtrip to the 
hypervisor then.


Oh yes, without ept/npt the slowdown should indeed be significant with 
this much memory.


How it is with 4Gb guest/mem without PAE (I mean, with CONFIG_HIGHMEM_4G=y)?
Or even 2Gb?  In case of npt or without.

Can we construct a sort of a table of expected slowdowns (not in numbers
but just in terms significant, minor etc) of running 4Gb or 4Gb
(and 1Gb and 1Gb if that makes significant diffencece) 32bit guests
with and without npt and 64bit guests, please?  I guess it's quite
interesting to many users.

From the above it looks like it's better to run 64bit kernel in the 32bit
guest in these situations too.

I haven't measured it, just because it never occured to me that there
MAY be any difference.  But I've only non-npt hardware here at the
moment.

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 64 bit guest much faster ?


On 10/26/2009 12:42 PM, Michael Tokarev wrote:
Oh yes, without ept/npt the slowdown should indeed be significant 
with this much memory.



How it is with 4Gb guest/mem without PAE (I mean, with 
CONFIG_HIGHMEM_4G=y)?

Or even 2Gb?  In case of npt or without.



It'll be slow.  Just use x86_64 with  1GB.


Can we construct a sort of a table of expected slowdowns (not in numbers
but just in terms significant, minor etc) of running 4Gb or 4Gb
(and 1Gb and 1Gb if that makes significant diffencece) 32bit guests
with and without npt and 64bit guests, please?  I guess it's quite
interesting to many users.


These tables will be useless, it greatly depends on workload.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states

On Mon, Oct 26, 2009 at 12:09:25PM +0200, Avi Kivity wrote:
 On 10/26/2009 11:56 AM, Joerg Roedel wrote:
 On Mon, Oct 26, 2009 at 11:39:46AM +0200, Avi Kivity wrote:
 On 10/26/2009 11:30 AM, Joerg Roedel wrote:
 Which host state?  As far as I can tell, it can all be regenerated.
 The state which is loaded into the vcpu when a #vmexit is emulated. This
 includes segments, control registers and the host rip for example.
 All of this state does not change between nested guest and normal
 guest mode.
 I am talking about all the state that is saved in svm-nested.hsave.
 When we migrate a guest vcpu while it is running in guest mode itself
 (without forcing a nested #vmexit) this state is required when a #vmexit
 needs to be emulated on this vcpu after migration.
 Same is true for the nested intercept conditions.
 
 The state that is saved by VMRUN can be saved to guest memory and
 migrated.  Extra state (like the intercepts for the previous mode)
 must be saved to host memory and not migrated; host intercepts can
 be regenerated.

Ok, parts of the state can be saved in guest memory. But thats
currently not done. This will need some care to not introduce a security
hole. But it shouldn't be too difficult.
The state thats not reproducible in an sane way is the intercept bitmap
for the l2 guest.
From the nested state what needs to be exposed to userspace for
migration is:

* guest mode flag (as returned by is_nested)
* nested vmcb address
* nested hsave msr
* nested intercepts
* for nested nested paging: guest nested cr3 value

Another state which needs exposure is the last branch record related
state.

Off-topic question: Will the new migration protocol include some kind
   handshake to find out if migration is possible at all?

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states


On 10/26/2009 12:45 PM, Joerg Roedel wrote:


Ok, parts of the state can be saved in guest memory. But thats
currently not done. This will need some care to not introduce a security
hole. But it shouldn't be too difficult.
The state thats not reproducible in an sane way is the intercept bitmap
for the l2 guest.
 From the nested state what needs to be exposed to userspace for
migration is:

* guest mode flag (as returned by is_nested)
* nested vmcb address
   


Yes, forgot that.  We can store it in the hsave area (note the hsave 
area format becomes an ABI).



* nested hsave msr
   


That's already saved.


* nested intercepts
   


These are part of the guest vmcb.  The host nested intercepts can be 
recalculated, no?



* for nested nested paging: guest nested cr3 value
   


Part of the guest vmcb.


Another state which needs exposure is the last branch record related
state.
   


Aren't those just more MSRs?


Off-topic question: Will the new migration protocol include some kind
handshake to find out if migration is possible at all?

   


It's assumed that migration always works for a newer qemu version, and 
that the management tools don't attempt backward migration.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List of unaccessible x86 states

On Mon, Oct 26, 2009 at 12:56:31PM +0200, Avi Kivity wrote:
 On 10/26/2009 12:45 PM, Joerg Roedel wrote:


 * nested intercepts
 
 These are part of the guest vmcb.  The host nested intercepts can be
 recalculated, no?
 
 * for nested nested paging: guest nested cr3 value
 
 Part of the guest vmcb.

This will work is most cases. But its not architecturally sane because
real hardware caches this information in the cpu. So software is free to
modify the vmcb without impacting the in-cpu state until the next
#vmexit. I don't know any software which relies on that so it may be not
an issue.
 
 Off-topic question: Will the new migration protocol include some kind
 handshake to find out if migration is possible at all?
 
 
 It's assumed that migration always works for a newer qemu version,
 and that the management tools don't attempt backward migration.

I think such a handshake would make sense to just prevent that a nested
svm hypervisor is migrated to an intel machine or vice versa (just an
example, there are more like sse*, nested nested paging, ...).

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Make vapic.S into optional rom


Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/Makefile b/Makefile
index ea568f5..acd9108 100644
--- a/Makefile
+++ b/Makefile
@@ -259,6 +259,7 @@ pxe-ne2k_pci.bin pxe-rtl8139.bin pxe-pcnet.bin 
pxe-e1000.bin \
 bamboo.dtb petalogix-s3adsp1800.dtb \
 multiboot.bin
 BLOBS += extboot.bin
+BLOBS += vapic.bin
 else
 BLOBS=
 endif
diff --git a/hw/pc.c b/hw/pc.c
index 83012a9..819b78a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -53,6 +53,7 @@
 #define VGABIOS_FILENAME vgabios.bin
 #define VGABIOS_CIRRUS_FILENAME vgabios-cirrus.bin
 #define EXTBOOT_FILENAME extboot.bin
+#define VAPIC_FILENAME vapic.bin
 
 #define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
 
@@ -1149,6 +1150,7 @@ static void pc_init1(ram_addr_t ram_size,
 if (extboot_drive) {
 option_rom[nb_option_roms++] = qemu_strdup(EXTBOOT_FILENAME);
 }
+option_rom[nb_option_roms++] = qemu_strdup(VAPIC_FILENAME);
 
 option_rom_offset = qemu_ram_alloc(PC_ROM_SIZE);
 cpu_register_physical_memory(PC_ROM_MIN_VGA, PC_ROM_SIZE, 
option_rom_offset);
diff --git a/kvm-tpr-opt.c b/kvm-tpr-opt.c
index 932b49b..2565d79 100644
--- a/kvm-tpr-opt.c
+++ b/kvm-tpr-opt.c
@@ -114,6 +114,7 @@ static uint32_t bios_addr;
 static uint32_t vapic_phys;
 static uint32_t bios_enabled;
 static uint32_t vbios_desc_phys;
+static uint32_t vapic_bios_addr;
 
 static void update_vbios_real_tpr(void)
 {
@@ -187,16 +188,16 @@ static int bios_is_mapped(CPUState *env, uint64_t rip)
 struct kvm_sregs sregs;
 unsigned perms;
 uint32_t i;
-uint32_t offset, fixup;
+uint32_t offset, fixup, start = vapic_bios_addr ? : 0xe;
 
 if (bios_enabled)
return 1;
 
 kvm_get_sregs(env, sregs);
 
-probe = (rip  0xf000) + 0xe;
+probe = (rip  0xf000) + start;
 phys = map_addr(sregs, probe, perms);
-if (phys != 0xe)
+if (phys != start)
return 0;
 bios_addr = probe;
 for (i = 0; i  64; ++i) {
@@ -356,6 +357,17 @@ static int tpr_load(QEMUFile *f, void *s, int version_id)
 return 0;
 }
 
+static void vtpr_ioport_write16(void *opaque, uint32_t addr, uint32_t val)
+{
+struct kvm_regs regs;
+CPUState *env = cpu_single_env;
+struct kvm_sregs sregs;
+kvm_get_regs(env, regs);
+kvm_get_sregs(env, sregs);
+vapic_bios_addr = ((sregs.cs.base + regs.rip)  ~(512 - 1)) + val;
+bios_enabled = 0;
+}
+
 static void vtpr_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 CPUState *env = cpu_single_env;
@@ -386,5 +398,6 @@ void kvm_tpr_opt_setup(void)
 {
 register_savevm(kvm-tpr-opt, 0, 1, tpr_save, tpr_load, NULL);
 register_ioport_write(0x7e, 1, 1, vtpr_ioport_write, NULL);
+register_ioport_write(0x7e, 2, 2, vtpr_ioport_write16, NULL);
 }
 
diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index 73e74d8..67ecc63 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -13,7 +13,7 @@ CFLAGS += -I$(SRC_PATH)
 CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
 QEMU_CFLAGS = $(CFLAGS)
 
-build-all: multiboot.bin extboot.bin
+build-all: multiboot.bin extboot.bin vapic.bin
 
 %.img: %.o
$(call quiet-command,$(LD) -Ttext 0 -e _start -s -o $@ $,  Building 
$(TARGET_DIR)$@)
diff --git a/pc-bios/optionrom/vapic.S b/pc-bios/optionrom/vapic.S
new file mode 100644
index 000..1924eeb
--- /dev/null
+++ b/pc-bios/optionrom/vapic.S
@@ -0,0 +1,311 @@
+   .text 0
+   .code16
+.global _start
+_start:
+   .short 0xaa55
+   .byte (_end - _start) / 512
+   mov $vapic_base, %ax
+   out %ax, $0x7e
+   lret
+
+   .code32
+vapic_size = 2*4096
+
+.macro fixup delta=-4
+777:
+   .text 1
+   .long 777b + \delta  - vapic_base
+   .text 0
+.endm
+
+.macro reenable_vtpr
+   out %al, $0x7e
+.endm
+
+.text 1
+   fixup_start = .
+.text 0
+
+vapic_base:
+   .ascii kvm aPiC
+
+   /* relocation data */
+   .long vapic_base; fixup
+   .long fixup_start   ; fixup
+   .long fixup_end ; fixup
+
+   .long vapic ; fixup
+   .long vapic_size
+vcpu_shift:
+   .long 0
+real_tpr:
+   .long 0
+   .long up_set_tpr; fixup
+   .long up_set_tpr_eax; fixup
+   .long up_get_tpr_eax; fixup
+   .long up_get_tpr_ecx; fixup
+   .long up_get_tpr_edx; fixup
+   .long up_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long up_get_tpr_ebp; fixup
+   .long up_get_tpr_esi; fixup
+   .long up_get_tpr_edi; fixup
+   .long up_get_tpr_stack  ; fixup
+   .long mp_set_tpr; fixup
+   .long mp_set_tpr_eax; fixup
+   .long mp_get_tpr_eax; fixup
+   .long mp_get_tpr_ecx; fixup
+   .long mp_get_tpr_edx; fixup
+   .long mp_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long mp_get_tpr_ebp; fixup
+   .long mp_get_tpr_esi; fixup
+   .long mp_get_tpr_edi; fixup
+   .long

Re: [PATCH] Make vapic.S into optional rom


On 10/26/2009 01:42 PM, Gleb Natapov wrote:
   


Need to remove the original implementation.

What was this tested on?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make vapic.S into optional rom

On Mon, Oct 26, 2009 at 02:31:21PM +0200, Avi Kivity wrote:
 On 10/26/2009 01:42 PM, Gleb Natapov wrote:
 
 Need to remove the original implementation.
 
That's in submodule now. Different repository. May it's worth to leave it
in for a while?

 What was this tested on?
 
WindowsXP 32 bit boot/reboot.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make vapic.S into optional rom


On 10/26/2009 02:33 PM, Gleb Natapov wrote:

On Mon, Oct 26, 2009 at 02:31:21PM +0200, Avi Kivity wrote:
   

On 10/26/2009 01:42 PM, Gleb Natapov wrote:

Need to remove the original implementation.

 

That's in submodule now. Different repository. May it's worth to leave it
in for a while?
   


Then we won't know which version is used.  Please send an additional patch.


What was this tested on?

 

WindowsXP 32 bit boot/reboot.
   


smp?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make vapic.S into optional rom

On Mon, Oct 26, 2009 at 02:36:47PM +0200, Avi Kivity wrote:
 On 10/26/2009 02:33 PM, Gleb Natapov wrote:
 On Mon, Oct 26, 2009 at 02:31:21PM +0200, Avi Kivity wrote:
 On 10/26/2009 01:42 PM, Gleb Natapov wrote:
 
 Need to remove the original implementation.
 
 That's in submodule now. Different repository. May it's worth to leave it
 in for a while?
 
 Then we won't know which version is used.  Please send an additional patch.
 
 What was this tested on?
 
 WindowsXP 32 bit boot/reboot.
 
 smp?
 
Yes -smp 2

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] remove vapic.S from pcbios

Compiled as option rom now.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/Makefile b/Makefile
index 434d64e..bcd3ee2 100644
--- a/Makefile
+++ b/Makefile
@@ -105,8 +105,8 @@ rombios32.bin: rombios32.out rombios.h
objcopy -O binary $ $@
./biossums -pad $@
 
-rombios32.out: rombios32start.o rombios32.o vapic.o rombios32.ld
-   ld -o $@ -T rombios32.ld rombios32start.o vapic.o rombios32.o
+rombios32.out: rombios32start.o rombios32.o rombios32.ld
+   ld -o $@ -T rombios32.ld rombios32start.o rombios32.o
 
 rombios32.o: rombios32.c acpi-dsdt.hex acpi-ssdt.hex
$(GCC) -m32 -O2 -Wall -c -o $@ $
@@ -126,9 +126,6 @@ acpi-ssdt.hex: acpi-ssdt.dsl
 rombios32start.o: rombios32start.S
$(GCC) -m32 -c -o $@ $
 
-vapic.o: vapic.S
-   $(GCC) -m32 -c -o $@ $
-
 BIOS-bochs-latest: rombios16.bin rombios32.bin
cat rombios32.bin rombios16.bin  $@
 
diff --git a/rombios32.ld b/rombios32.ld
index 1fc99c3..ca31f54 100644
--- a/rombios32.ld
+++ b/rombios32.ld
@@ -6,10 +6,6 @@ SECTIONS
 . = 0x000e;
 .text : { *(.text)}
 .rodata: { *(.rodata*) }
-. = ALIGN(64);
-fixup_start = .;
-.fixup: { *(.fixup) }
-fixup_end = .;
 . = ALIGN(4096);
 _end = . ;
 .data 0x700 : AT (_end) { __data_start = .; *(.data); __data_end = .;}
diff --git a/vapic.S b/vapic.S
deleted file mode 100644
index cf2a474..000
--- a/vapic.S
+++ /dev/null
@@ -1,294 +0,0 @@
-   .text
-   .code32
-   .align 4096
-
-vapic_size = 2*4096
-
-.macro fixup delta=-4
-777:
-   .pushsection .fixup, a
-   .long 777b + \delta  - vapic_base
-   .popsection
-.endm
-
-.macro reenable_vtpr
-   out %al, $0x7e
-.endm
-
-vapic_base:
-   .ascii kvm aPiC
-
-   /* relocation data */
-   .long vapic_base; fixup
-   .long fixup_start   ; fixup
-   .long fixup_end ; fixup
-
-   .long vapic ; fixup
-   .long vapic_size
-vcpu_shift:
-   .long 0
-real_tpr:
-   .long 0
-   .long up_set_tpr; fixup
-   .long up_set_tpr_eax; fixup
-   .long up_get_tpr_eax; fixup
-   .long up_get_tpr_ecx; fixup
-   .long up_get_tpr_edx; fixup
-   .long up_get_tpr_ebx; fixup
-   .long 0 /* esp. won't work. */
-   .long up_get_tpr_ebp; fixup
-   .long up_get_tpr_esi; fixup
-   .long up_get_tpr_edi; fixup
-   .long up_get_tpr_stack  ; fixup
-   .long mp_set_tpr; fixup
-   .long mp_set_tpr_eax; fixup
-   .long mp_get_tpr_eax; fixup
-   .long mp_get_tpr_ecx; fixup
-   .long mp_get_tpr_edx; fixup
-   .long mp_get_tpr_ebx; fixup
-   .long 0 /* esp. won't work. */
-   .long mp_get_tpr_ebp; fixup
-   .long mp_get_tpr_esi; fixup
-   .long mp_get_tpr_edi; fixup
-   .long mp_get_tpr_stack  ; fixup
-
-.macro kvm_hypercall
-   .byte 0x0f, 0x01, 0xc1
-.endm
-
-kvm_hypercall_vapic_poll_irq = 1
-
-pcr_cpu = 0x51
-
-.align 64
-
-mp_get_tpr_eax:
-   pushf
-   cli
-   reenable_vtpr
-   push %ecx
-
-   fs/movzbl pcr_cpu, %eax
-
-   mov vcpu_shift, %ecx; fixup
-   shl %cl, %eax
-   testb $1, vapic+4(%eax) ; fixup delta=-5
-   jz mp_get_tpr_bad
-   movzbl vapic(%eax), %eax ; fixup
-
-mp_get_tpr_out:
-   pop %ecx
-   popf
-   ret
-
-mp_get_tpr_bad:
-   mov real_tpr, %eax  ; fixup
-   mov (%eax), %eax
-   jmp mp_get_tpr_out
-
-mp_get_tpr_ebx:
-   mov %eax, %ebx
-   call mp_get_tpr_eax
-   xchg %eax, %ebx
-   ret
-
-mp_get_tpr_ecx:
-   mov %eax, %ecx
-   call mp_get_tpr_eax
-   xchg %eax, %ecx
-   ret
-
-mp_get_tpr_edx:
-   mov %eax, %edx
-   call mp_get_tpr_eax
-   xchg %eax, %edx
-   ret
-
-mp_get_tpr_esi:
-   mov %eax, %esi
-   call mp_get_tpr_eax
-   xchg %eax, %esi
-   ret
-
-mp_get_tpr_edi:
-   mov %eax, %edi
-   call mp_get_tpr_edi
-   xchg %eax, %edi
-   ret
-
-mp_get_tpr_ebp:
-   mov %eax, %ebp
-   call mp_get_tpr_eax
-   xchg %eax, %ebp
-   ret
-
-mp_get_tpr_stack:
-   call mp_get_tpr_eax
-   xchg %eax, 4(%esp)
-   ret
-
-mp_set_tpr_eax:
-   push %eax
-   call mp_set_tpr
-   ret
-
-mp_set_tpr:
-   pushf
-   push %eax
-   push %ecx
-   push %edx
-   push %ebx
-   cli
-   reenable_vtpr
-
-mp_set_tpr_failed:
-   fs/movzbl pcr_cpu, %edx
-
-   mov vcpu_shift, %ecx; fixup
-   shl %cl, %edx
-
-   testb $1, vapic+4(%edx) ; fixup delta=-5
-   jz mp_set_tpr_bad
-
-   mov vapic(%edx), %eax   ; fixup
-
-   mov %eax, %ebx
-   mov 24(%esp), %bl
-
-   /* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
-
-   lock cmpxchg %ebx, vapic(%edx) ; fixup
-   jnz mp_set_tpr_failed
-
-   /* compute ppr */
-   cmp %bh, %bl
-   jae mp_tpr_is_bigger

Re: [PATCH] Make vapic.S into optional rom


On 10/26/2009 01:42 PM, Gleb Natapov wrote:

Signed-off-by: Gleb Natapovg...@redhat.com
   



Applied this and the pcbios patch as well.  Thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on default_x86_64_out_of_tree

The Buildbot has detected a new failure of default_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/67

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com,Gleb Natapov g...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on default_x86_64_debian_5_0

The Buildbot has detected a new failure of default_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/126

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com,Gleb Natapov g...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on default_i386_out_of_tree

The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/65

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com,Gleb Natapov g...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on default_i386_debian_5_0

The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/128

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com,Gleb Natapov g...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Avi Kivity wrote:
 On 10/23/2009 04:38 AM, Gregory Haskins wrote:
 Certain GSI's support lockless injecton, but we have no way to detect
 which ones at the GSI level.  Knowledge of this attribute will be
 useful later in the series so that we can optimize irqfd injection
 paths for cases where we know the code will not sleep.  Therefore,
 we provide an API to query a specific GSI.


 
 Instead of a lockless attribute, how about a -set_atomic() method.  For 
 msi this can be the same as -set(), for non-msi it can be a function 
 that schedules the work (which will eventually call -set()).
 
 The benefit is that we make a decision only once, when preparing the 
 routing entry, and install that decision in the routing entry instead of 
 making it again and again later.

Yeah, I like this idea.  I think we can also get rid of the custom
workqueue if we do this as well, TBD.

 
 +int kvm_irq_check_lockless(struct kvm *kvm, u32 irq)

 
 bool kvm_irq_check_lockless(...)

We lose the ability to detect failure (such as ENOENT) if we do this,
but its moot if we move to the -set_atomic() model, since this
attribute is no longer necessary and this patch can be dropped.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

RE: [Qemu-devel] net packet storms with multiple NICs

2009-10-26 Thread Krumme, Chris

 -Original Message-
 From: 
 qemu-devel-bounces+chris.krumme=windriver@nongnu.org 
 [mailto:qemu-devel-bounces+chris.krumme=windriver@nongnu.o
rg] On Behalf Of Avi Kivity
 Sent: Sunday, October 25, 2009 9:23 AM
 To: Mark McLoughlin
 Cc: Michael Tokarev; qemu-de...@nongnu.org; KVM list
 Subject: Re: [Qemu-devel] net packet storms with multiple NICs

 On 10/23/2009 06:43 PM, Mark McLoughlin wrote:
  On Fri, 2009-10-23 at 20:25 +0400, Michael Tokarev wrote:

  I've two questions:

  o what's the intended usage of all-vlan-equal case, when 
 kvm (or qemu)
  reflects packets from one interface to another?  It's 
 what bridge
  in linux is for, I think.

  I don't think it's necessarily an intended use-case for the 
 vlan feature

 Well, it is.  vlan=x really means the ethernet segment named x.  If 
 you connect all your guest nics to one vlan, you are 
 connecting them all 
 to one ethernet segment, so any packet transmitted on one will be 
 reflected on others.

 Whether this is a useful feature is another matter, but the code is 
 functioning as expected.

Hello,

We had one environment where the NIC understood by u-boot and the NIC
understood by the kernel where different.  We just attached both to the
same VLAN.  During u-boot one was used for downloading the kernel, then
once the kernel booted the other was used.  Not ideal, and maybe not
important enough to keep the feature around, but it does get used now
and again.

Thanks

Chris 

 -- 
 error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] net packet storms with multiple NICs


On 10/26/2009 03:40 PM, Krumme, Chris wrote:



Well, it is.  vlan=x really means the ethernet segment named x.  If
you connect all your guest nics to one vlan, you are
connecting them all
to one ethernet segment, so any packet transmitted on one will be
reflected on others.

Whether this is a useful feature is another matter, but the code is
functioning as expected.
 

Hello,

We had one environment where the NIC understood by u-boot and the NIC
understood by the kernel where different.  We just attached both to the
same VLAN.  During u-boot one was used for downloading the kernel, then
once the kernel booted the other was used.  Not ideal, and maybe not
important enough to keep the feature around, but it does get used now
and again.
   


You could get the same behaviour by using two different vlans connected 
to the same bridge.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: Unattended install: Mount isos as read only

2009-10-26 Thread Lucas Meneghel Rodrigues

Sometimes CD images can be located on read only NFS shares,
so allways pass the ro option to the CD mount command
on the unattended.py setup script.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/scripts/unattended.py |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/scripts/unattended.py 
b/client/tests/kvm/scripts/unattended.py
index febea6e..2667649 100755
--- a/client/tests/kvm/scripts/unattended.py
+++ b/client/tests/kvm/scripts/unattended.py
@@ -136,8 +136,8 @@ class UnattendedInstall(object):
 pxe_dest = os.path.join(self.tftp_root, 'pxelinux.0')
 shutil.copyfile(pxe_file, pxe_dest)
 
-m_cmd = 'mount -t iso9660 -v -o loop %s %s' % (self.cdrom_iso,
-   self.cdrom_mount)
+m_cmd = 'mount -t iso9660 -v -o loop,ro %s %s' % (self.cdrom_iso,
+  self.cdrom_mount)
 if os.system(m_cmd):
 raise SetupError('Could not mount CD image %s.' % self.cdrom_iso)
 
-- 
1.6.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Gregory Haskins wrote:
 Avi Kivity wrote:
 On 10/23/2009 04:38 AM, Gregory Haskins wrote:
 Certain GSI's support lockless injecton, but we have no way to detect
 which ones at the GSI level.  Knowledge of this attribute will be
 useful later in the series so that we can optimize irqfd injection
 paths for cases where we know the code will not sleep.  Therefore,
 we provide an API to query a specific GSI.


 Instead of a lockless attribute, how about a -set_atomic() method.  For 
 msi this can be the same as -set(), for non-msi it can be a function 
 that schedules the work (which will eventually call -set()).

 The benefit is that we make a decision only once, when preparing the 
 routing entry, and install that decision in the routing entry instead of 
 making it again and again later.
 
 Yeah, I like this idea.  I think we can also get rid of the custom
 workqueue if we do this as well, TBD.

So I looked into this.  It isn't straight forward because you need to
retain some kind of state across the deferment on a per-request basis
(not per-GSI).  Today, this state is neatly tracked into the irqfd
object itself (e.g. it knows to toggle the GSI).

So while generalizing this perhaps makes sense at some point, especially
if irqfd-like interfaces get added, it probably doesn't make a ton of
sense to expend energy on it ATM.  It is basically a generalization of
the irqfd deferrment code.  Lets just wait until we have a user beyond
irqfd for now.  Sound acceptable?

In the meantime, I found a bug in the irq_routing code, so I will submit
a v3 with this fix, as well as a few other things I improved in the v2
series.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

[KVM PATCH v3 0/3] irqfd enhancements, and irq_routing fixes

(Applies to kvm.git/master:11b06403)

The following patches are cleanups/enhancements for IRQFD now that
we have lockless interrupt injection.  For more details, please see
the patch headers.

These patches pass checkpatch, and are fully tested.  Please consider
for merging.  Patch 1/3 is a fix for an issue that may exist upstream
and should be considered for a more timely push upstream.  Patches 2/3
- 3/3 are an enhancement only, so there is no urgency to push to
mainline until a suitable merge window presents itself.

Kind Regards,
-Greg

[ Change log:

  v3:
 *) Added patch 1/3 as a fix for a race condition
 *) Minor cleanup to 2/3 to ensure that all shared vectors conform
to a unified locking model.

  v2:
 *) dropped original cleanup which relied on the user registering
MSI based GSIs or we may crash at runtime.  Instead, we now
check at registration whether the GSI supports lockless
operation and dynamically adapt to either the original
deferred path for lock-based injections, or direct for lockless.

  v1:
 *) original release
]

---

Gregory Haskins (3):
  KVM: Directly inject interrupts if they support lockless operation
  KVM: export lockless GSI attribute
  KVM: fix race in irq_routing logic


 include/linux/kvm_host.h |8 
 virt/kvm/eventfd.c   |   31 +++--
 virt/kvm/irq_comm.c  |   85 ++
 virt/kvm/kvm_main.c  |1 +
 4 files changed, 98 insertions(+), 27 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation

IRQFD currently uses a deferred workqueue item to execute the injection
operation.  It was originally designed this way because kvm_set_irq()
required the caller to hold the irq_lock mutex, and the eventfd callback
is invoked from within a non-preemptible critical section.

With the advent of lockless injection support for certain GSIs, the
deferment mechanism is no longer technically needed in all cases.
Since context switching to the workqueue is a source of interrupt
latency, lets switch to a direct method whenever possible.  Fortunately
for us, the most common use of irqfd (MSI-based GSIs) readily support
lockless injection.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 virt/kvm/eventfd.c |   31 +++
 1 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 30f70fd..e6cc958 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -51,20 +51,34 @@ struct _irqfd {
wait_queue_t  wait;
struct work_structinject;
struct work_structshutdown;
+   void (*execute)(struct _irqfd *);
 };
 
 static struct workqueue_struct *irqfd_cleanup_wq;
 
 static void
-irqfd_inject(struct work_struct *work)
+irqfd_inject(struct _irqfd *irqfd)
 {
-   struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
struct kvm *kvm = irqfd-kvm;
 
kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
 }
 
+static void
+irqfd_deferred_inject(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+
+   irqfd_inject(irqfd);
+}
+
+static void
+irqfd_schedule(struct _irqfd *irqfd)
+{
+   schedule_work(irqfd-inject);
+}
+
 /*
  * Race-free decouple logic (ordering is critical)
  */
@@ -126,7 +140,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
 
if (flags  POLLIN)
/* An event has been signaled, inject an interrupt */
-   schedule_work(irqfd-inject);
+   irqfd-execute(irqfd);
 
if (flags  POLLHUP) {
/* The eventfd is closing, detach from KVM */
@@ -179,7 +193,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
irqfd-kvm = kvm;
irqfd-gsi = gsi;
INIT_LIST_HEAD(irqfd-list);
-   INIT_WORK(irqfd-inject, irqfd_inject);
+   INIT_WORK(irqfd-inject, irqfd_deferred_inject);
INIT_WORK(irqfd-shutdown, irqfd_shutdown);
 
file = eventfd_fget(fd);
@@ -209,6 +223,15 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
list_add_tail(irqfd-list, kvm-irqfds.items);
spin_unlock_irq(kvm-irqfds.lock);
 
+   ret = kvm_irq_check_lockless(kvm, gsi);
+   if (ret  0)
+   goto fail;
+
+   if (ret)
+   irqfd-execute = irqfd_inject;
+   else
+   irqfd-execute = irqfd_schedule;
+
/*
 * Check if there was an event already pending on the eventfd
 * before we registered, and trigger it as if we didn't miss it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

The current code suffers from the following race condition:

thread-1thread-2
---

kvm_set_irq() {
   rcu_read_lock()
   irq_rt = rcu_dereference(table);
   rcu_read_unlock();

   kvm_set_irq_routing() {
  mutex_lock();
  irq_rt = table;
  rcu_assign_pointer();
  mutex_unlock();
  synchronize_rcu();

  kfree(irq_rt);

   irq_rt-entry-set(); /* bad */

-

Because the pointer is accessed outside of the read-side critical
section.  There are two basic patterns we can use to fix this bug:

1) Switch to sleeping-rcu and encompass the -set() access within the
   read-side critical section,

   OR

2) Add reference counting to the irq_rt structure, and simply acquire
   the reference from within the RSCS.

This patch implements solution (1).

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 include/linux/kvm_host.h |6 +-
 virt/kvm/irq_comm.c  |   50 +++---
 virt/kvm/kvm_main.c  |1 +
 3 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd5a616..1fe135d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -185,7 +185,10 @@ struct kvm {
 
struct mutex irq_lock;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
-   struct kvm_irq_routing_table *irq_routing;
+   struct {
+   struct srcu_structsrcu;
+   struct kvm_irq_routing_table *table;
+   } irq_routing;
struct hlist_head mask_notifier_list;
struct hlist_head irq_ack_notifier_list;
 #endif
@@ -541,6 +544,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
const struct kvm_irq_routing_entry *entries,
unsigned nr,
unsigned flags);
+void kvm_init_irq_routing(struct kvm *kvm);
 void kvm_free_irq_routing(struct kvm *kvm);
 
 #else
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 00c68d2..db2553f 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -144,10 +144,11 @@ static int kvm_set_msi(struct 
kvm_kernel_irq_routing_entry *e,
  */
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 {
-   struct kvm_kernel_irq_routing_entry *e, irq_set[KVM_NR_IRQCHIPS];
-   int ret = -1, i = 0;
+   struct kvm_kernel_irq_routing_entry *e;
+   int ret = -1;
struct kvm_irq_routing_table *irq_rt;
struct hlist_node *n;
+   int idx;
 
trace_kvm_set_irq(irq, level, irq_source_id);
 
@@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
 * IOAPIC.  So set the bit in both. The guest will ignore
 * writes to the unused one.
 */
-   rcu_read_lock();
-   irq_rt = rcu_dereference(kvm-irq_routing);
+   idx = srcu_read_lock(kvm-irq_routing.srcu);
+   irq_rt = rcu_dereference(kvm-irq_routing.table);
if (irq  irq_rt-nr_rt_entries)
-   hlist_for_each_entry(e, n, irq_rt-map[irq], link)
-   irq_set[i++] = *e;
-   rcu_read_unlock();
+   hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
+   int r;
 
-   while(i--) {
-   int r;
-   r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level);
-   if (r  0)
-   continue;
+   r = e-set(e, kvm, irq_source_id, level);
+   if (r  0)
+   continue;
 
-   ret = r + ((ret  0) ? 0 : ret);
-   }
+   ret = r + ((ret  0) ? 0 : ret);
+   }
+   srcu_read_unlock(kvm-irq_routing.srcu, idx);
 
return ret;
 }
@@ -179,17 +178,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned 
irqchip, unsigned pin)
struct kvm_irq_ack_notifier *kian;
struct hlist_node *n;
int gsi;
+   int idx;
 
trace_kvm_ack_irq(irqchip, pin);
 
-   rcu_read_lock();
-   gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin];
+   idx = srcu_read_lock(kvm-irq_routing.srcu);
+   gsi = rcu_dereference(kvm-irq_routing.table)-chip[irqchip][pin];
if (gsi != -1)
hlist_for_each_entry_rcu(kian, n, kvm-irq_ack_notifier_list,
 link)
if (kian-gsi == gsi)
kian-irq_acked(kian);
-   rcu_read_unlock();
+   srcu_read_unlock(kvm-irq_routing.srcu, idx);
 }
 
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
@@ -287,11

[KVM PATCH v3 2/3] KVM: export lockless GSI attribute

Certain GSI's support lockless injecton, but we have no way to detect
which ones at the GSI level.  Knowledge of this attribute will be
useful later in the series so that we can optimize irqfd injection
paths for cases where we know the code will not sleep.  Therefore,
we provide an API to query a specific GSI.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 include/linux/kvm_host.h |2 ++
 virt/kvm/irq_comm.c  |   35 ++-
 2 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1fe135d..01151a6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -119,6 +119,7 @@ struct kvm_memory_slot {
 struct kvm_kernel_irq_routing_entry {
u32 gsi;
u32 type;
+   bool lockless;
int (*set)(struct kvm_kernel_irq_routing_entry *e,
   struct kvm *kvm, int irq_source_id, int level);
union {
@@ -420,6 +421,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_irq_check_lockless(struct kvm *kvm, u32 irq);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index db2553f..a7fd487 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -173,6 +173,35 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
return ret;
 }
 
+int kvm_irq_check_lockless(struct kvm *kvm, u32 irq)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   struct kvm_irq_routing_table *irq_rt;
+   struct hlist_node *n;
+   int ret = -ENOENT;
+   int idx;
+
+   idx = srcu_read_lock(kvm-irq_routing.srcu);
+   irq_rt = rcu_dereference(kvm-irq_routing.table);
+   if (irq  irq_rt-nr_rt_entries)
+   hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
+   if (!e-lockless) {
+   /*
+* all destinations need to be lockless to
+* declare that the GSI as a whole is also
+* lockless
+*/
+   ret = 0;
+   break;
+   }
+
+   ret = 1;
+   }
+   srcu_read_unlock(kvm-irq_routing.srcu, idx);
+
+   return ret;
+}
+
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
struct kvm_irq_ack_notifier *kian;
@@ -310,18 +339,22 @@ static int setup_routing_entry(struct 
kvm_irq_routing_table *rt,
int delta;
struct kvm_kernel_irq_routing_entry *ei;
struct hlist_node *n;
+   bool lockless = ue-type == KVM_IRQ_ROUTING_MSI;
 
/*
 * Do not allow GSI to be mapped to the same irqchip more than once.
 * Allow only one to one mapping between GSI and MSI.
+* Do not allow mixed lockless vs locked variants to coexist.
 */
hlist_for_each_entry(ei, n, rt-map[ue-gsi], link)
if (ei-type == KVM_IRQ_ROUTING_MSI ||
-   ue-u.irqchip.irqchip == ei-irqchip.irqchip)
+   ue-u.irqchip.irqchip == ei-irqchip.irqchip ||
+   ei-lockless != lockless)
return r;
 
e-gsi = ue-gsi;
e-type = ue-type;
+   e-lockless = lockless;
switch (ue-type) {
case KVM_IRQ_ROUTING_IRQCHIP:
delta = 0;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] kvm-kmod-2.6.31.5

This package contains the kvm external modules, using the sources from
latest stable Linux release 2.6.31.5. It can be used to update the
kernel-side support of kvm without upgrading the host kernel.

This release has been tested on x86 down to host kernel 2.6.27 and
builds down to 2.6.24. Building against older kernels is expected to be
broken, but if anyone provides patches to fix it, I'm open to merge them.

Enjoy,
Jan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Jan Kiszka to maintain kvm-kmod

Avi Kivity wrote:
I am pleased to announce that Jan Kiszka has agreed to maintain
kvm-kmod.git, the backporting kit that allows running modern kvm code on
older kernels. Jan will release kvm-kmod-2.6.x.y packages and
kvm-kmod-2.6.x-rcy packages, while Marcelo and I will (with Jan's help)
release kvm-kmod-devel-xx. Many thanks to Jan for taking on this task.

Thanks for giving me the chance to screw even more things up. :)

Thanks also go to Siemens Corporate Technology and Siemens Enterprise
Communications for sponsoring my work on kvm-kmod.

As there are now many different sources of kvm kernel modules to choose
from, I wrote up a page that describes the various releases and what
they are suited for. This can be found in
http://www.linux-kvm.org/page/Getting_the_kvm_kernel_modules.

And besides those releases, I will try to keep the kvm-kmod.git in sync
with latest kvm.git so that developers can test most bleeding-edge kvm
on not that much bleeding host kernels (I'm one of those).

At this chance I would like to underline that the quality of kvm-kmod
support of course continues to depend on patch contributions. So if you
are posting a new kvm feature that may require compat wrapping or you
discover some breakage, please consider posting a corresponding update
of kvm-kmod as well. TiA!

To help detecting breakages, I've set up a builtbot [1] that checks
kvm-mod against its officially supported kvm version as well as the next
branch in kvm.git (the former on commits, the latter on a nightly
basis). That forecast already promises the next rain [2] - time to go
home...

Jan

[1]http://buildbot.kiszka.org/kvm-kmod/
[2]http://buildbot.kiszka.org/kvm-kmod/builders/latest-kvm/builds/11/steps/compile/logs/stdio

--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: sigsegv at exit

 On Thu, Oct 22, 2009 at 06:57:27PM -0200, Marcelo Tosatti wrote:
  On Thu, Oct 22, 2009 at 02:00:15PM +0200, Michael S. Tsirkin wrote:
   Hi!
   I'm sometimes getting segfaults when I kill qemu.
   This time I caught it when qemu was under gdb:
   
   
   Program received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0x411d0940 (LWP 14446)]
   0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0, 
   expire_time=62275467335)
   at /home/mst/scm/qemu-kvm/vl.c:1009
   1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0) {
   (gdb) l
   1004ts-next = *pt;
   1005*pt = ts;
   1006
   1007/* Rearm if necessary  */
   1008if (pt == active_timers[ts-clock-type]) {
   1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0) {
   1010qemu_rearm_alarm_timer(alarm_timer);
   1011}
   1012/* Interrupt execution to force deadline recalculation.  
   */
   1013if (use_icount)
   (gdb) p alarm_timer
   $1 = (struct qemu_alarm_timer *) 0x0
   (gdb) where
   #0  0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0, 
   expire_time=62275467335)
   at /home/mst/scm/qemu-kvm/vl.c:1009
   #1  0x0041aadf in virtio_net_handle_tx (vdev=value optimized 
   out, vq=0x19f5af0)
   at /home/mst/scm/qemu-kvm/hw/virtio-net.c:696
   #2  0x00421669 in kvm_run (vcpu=0x19d46a0, env=0x19c2250) at 
   /home/mst/scm/qemu-kvm/qemu-kvm.c:797
   #3  0x004216d6 in kvm_cpu_exec (env=0x83d0f8) at 
   /home/mst/scm/qemu-kvm/qemu-kvm.c:1714
   #4  0x00422981 in ap_main_loop (_env=value optimized out) at 
   /home/mst/scm/qemu-kvm/qemu-kvm.c:1969
   #5  0x00377dc06367 in start_thread () from /lib64/libpthread.so.0
   #6  0x00377d0d30ad in clone () from /lib64/libc.so.6
   (gdb)
   
   So this probably means that we have already run quit_timers:
   
   static void quit_timers(void)
   {
   alarm_timer-stop(alarm_timer);
   alarm_timer = NULL;
   }
   
   but kvm vcpu thread is still running.
   
   
   Not sure what the right fix is here: should we stop
   kvm after main loop has exited?
  
  kvm_main_loop_wait(env, 0) can process the stop request (signalling
  iothread that vcpu is stopped, so its OK to exit) and continue to
  kvm_cpu_exec.
  
  Can you please try this:
 
 I applied this, and have not yet see any segfaults at exit.
 Not sure whether this is means anything as the crash is not
 100% reproducable. Push it out to Anthony and we'll see, long term?
 Based on the knowledge of how to fix this,
 how would you go about reproducing it?

Add code to trigger the race manually, but i'm pretty sure thats it.

Thanks for testing.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-net: fix data corruption with OOM

On Mon, Oct 26, 2009 at 12:11:51PM +1030, Rusty Russell wrote:
 On Mon, 26 Oct 2009 03:33:40 am Michael S. Tsirkin wrote:
  virtio net used to unlink skbs from send queues on error,
  but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
  we do not do this. This causes guest data corruption and crashes
  with vhost since net core can requeue the skb or free it without
  it being taken off the list.
  
  This patch fixes this by queueing the skb after successfull
  transmit.
 
 I originally thought that this was racy: as soon as we do add_buf, we need to
 make sure we're ready for the callback (for virtio_pci, it's -kick, but we
 shouldn't rely on that).

Modified the guest slightly, and I am getting crashes again.
I didn't have time to debug this, but based on previous experience,
I reverted 48925e372f04f5e35fec6269127c62b2c71ab794,
and the crash went away.
Rusty, what do you say we just revert 48925e372f04f5e35fec6269127c62b2c71ab794
for now?

How to reproduce: I used my vhost trees, and modified drivers/vhost/vhost.c :
-   vhost_workqueue = create_workqueue(vhost);
+   vhost_workqueue = create_singlethread_workqueue(vhost);

My guess is this modifies timing and uncovers more races,
but of course there is a possibility that the bug is in vhost.
Still, the fact that 2.6.31 and 48925e372f04f5e35fec6269127c62b2c71ab794
as a guest are both fine, this is a strong hint that
48925e372f04f5e35fec6269127c62b2c71ab794 is to blame.

[   24.555691] BUG: unable to handle kernel NULL pointer dereference at 
0008  
[   24.556658] IP: [a003f1b1] free_old_xmit_skbs+0x66/0xcd 
[virtio_net] 
[   24.556658] PGD 3e9ee067 PUD 3f38d067 PMD 0  
  
[   24.556658] Thread overran stack, or stack corrupted 
  
[   24.556658] Oops: 0002 [#1] SMP  
  
[   24.556658] last sysfs file: 
/sys/devices/virtual/input/input1/capabilities/sw 
[   24.556658] CPU 0
  
[   24.556658] Modules linked in: virtio_net virtio_blk virtio_pci virtio_ring 
virtio af_packet aacraid [last unloaded: scsi_wait_scan]
 
[   24.556658] Pid: 0, comm: swapper Tainted: GW  2.6.32-rc4-net #6 
  
[   24.556658] RIP: 0010:[a003f1b1]  [a003f1b1] 
free_old_xmit_skbs+0x66/0xcd [virtio_net] 
[   24.556658] RSP: 0018:880001c03d70  EFLAGS: 00010202 
  
[   24.556658] RAX: 88003e951418 RBX: 88003e953398 RCX: 
  
[   24.556658] RDX:  RSI: 880001c03d84 RDI: 
88003e953398  
[   24.556658] RBP: 880001c03db0 R08: 88003e2c949c R09: 
  
[   24.556658] R10: 880001c03f78 R11: fffbcc57 R12: 
88003e65cdc0  
[   24.556658] R13:  R14: 2000 R15: 
880001c03d84  
[   24.556658] FS:  () GS:880001c0() 
knlGS:   
[   24.556658] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
  
[   24.556658] CR2: 0008 CR3: 3eee4000 CR4: 
06b0  
[   24.556658] DR0:  DR1:  DR2: 
  
[   24.556658] DR3:  DR6: 0ff0 DR7: 
0400  
[   24.556658] Process swapper (pid: 0, threadinfo 8174e000, task 
817c09f0)   
[   24.556658] Stack:   
  
[   24.556658]  0002   
88003e953398   
[   24.556658] 0 88003e953398 88003e65cdc0 88003e65c800 
88003e65ce70
[   24.556658] 0 880001c03df0 a003fb35 88003e65cc28 
88003e953398
[   24.556658] Call Trace:  
  
[   24.556658]  IRQ   
  
[   24.556658]  [a003fb35] start_xmit+0x38/0x15f [virtio_net] 
  
[   24.556658]  [813ff768] dev_hard_start_xmit+0x26c/0x2d3
  
[   24.556658]  [81412016] sch_direct_xmit+0x5a/0x157

KVM: VMX: move CR3/PDPTR update to vmx_set_cr3


GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.

Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1748,6 +1748,7 @@ static void vmx_set_cr3(struct kvm_vcpu 
vmcs_write64(EPT_POINTER, eptp);
guest_cr3 = is_paging(vcpu) ? vcpu-arch.cr3 :
vcpu-kvm-arch.ept_identity_map_addr;
+   ept_load_pdptrs(vcpu);
}
 
vmx_flush_tlb(vcpu);
@@ -3638,10 +3639,6 @@ static void vmx_vcpu_run(struct kvm_vcpu
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-   if (enable_ept  is_paging(vcpu)) {
-   vmcs_writel(GUEST_CR3, vcpu-arch.cr3);
-   ept_load_pdptrs(vcpu);
-   }
/* Record the guest's net vcpu time for enforced NMI injections. */
if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
vmx-entry_time = ktime_get();
Index: b/arch/x86/kvm/x86.c
===
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4517,8 +4517,10 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct
 
mmu_reset_needed |= vcpu-arch.cr4 != sregs-cr4;
kvm_x86_ops-set_cr4(vcpu, sregs-cr4);
-   if (!is_long_mode(vcpu)  is_pae(vcpu))
+   if (!is_long_mode(vcpu)  is_pae(vcpu)) {
load_pdptrs(vcpu, vcpu-arch.cr3);
+   mmu_reset_needed = 1;
+   }
 
if (mmu_reset_needed)
kvm_mmu_reset_context(vcpu);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

performance regression in virtio-net in 2.6.32-rc4

Hi!
I noticed a performance regression in virtio net: going from
2.6.31 to 2.6.32-rc4 I see this, for guest to host communication:

[...@tuck ~]$ ssh robin sh streamtest1
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.3
(11.0.0.3) port 0 AF_INET : demo
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638410.207806.48


[...@tuck ~]$ ssh robin sh streamtest1
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.3
(11.0.0.3) port 0 AF_INET : demo
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638410.006814.60


Note: I had to revert 48925e372f04f5e35fec6269127c62b2c71ab794,
and I applied a patch
virtio-pci: fix per-vq MSI-X request logic
which fixes a bug introduced by f68d24082e22ccee3077d11aeb6dc5354f0ca7f1.

Any tips on debugging this?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: sigsegv at exit

On Mon, Oct 26, 2009 at 04:43:11PM -0200, Marcelo Tosatti wrote:
  On Thu, Oct 22, 2009 at 06:57:27PM -0200, Marcelo Tosatti wrote:
   On Thu, Oct 22, 2009 at 02:00:15PM +0200, Michael S. Tsirkin wrote:
Hi!
I'm sometimes getting segfaults when I kill qemu.
This time I caught it when qemu was under gdb:


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x411d0940 (LWP 14446)]
0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0, 
expire_time=62275467335)
at /home/mst/scm/qemu-kvm/vl.c:1009
1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0) {
(gdb) l
1004ts-next = *pt;
1005*pt = ts;
1006
1007/* Rearm if necessary  */
1008if (pt == active_timers[ts-clock-type]) {
1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0) {
1010qemu_rearm_alarm_timer(alarm_timer);
1011}
1012/* Interrupt execution to force deadline recalculation. 
 */
1013if (use_icount)
(gdb) p alarm_timer
$1 = (struct qemu_alarm_timer *) 0x0
(gdb) where
#0  0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0, 
expire_time=62275467335)
at /home/mst/scm/qemu-kvm/vl.c:1009
#1  0x0041aadf in virtio_net_handle_tx (vdev=value optimized 
out, vq=0x19f5af0)
at /home/mst/scm/qemu-kvm/hw/virtio-net.c:696
#2  0x00421669 in kvm_run (vcpu=0x19d46a0, env=0x19c2250) at 
/home/mst/scm/qemu-kvm/qemu-kvm.c:797
#3  0x004216d6 in kvm_cpu_exec (env=0x83d0f8) at 
/home/mst/scm/qemu-kvm/qemu-kvm.c:1714
#4  0x00422981 in ap_main_loop (_env=value optimized out) at 
/home/mst/scm/qemu-kvm/qemu-kvm.c:1969
#5  0x00377dc06367 in start_thread () from /lib64/libpthread.so.0
#6  0x00377d0d30ad in clone () from /lib64/libc.so.6
(gdb)

So this probably means that we have already run quit_timers:

static void quit_timers(void)
{
alarm_timer-stop(alarm_timer);
alarm_timer = NULL;
}

but kvm vcpu thread is still running.


Not sure what the right fix is here: should we stop
kvm after main loop has exited?
   
   kvm_main_loop_wait(env, 0) can process the stop request (signalling
   iothread that vcpu is stopped, so its OK to exit) and continue to
   kvm_cpu_exec.
   
   Can you please try this:
  
  I applied this, and have not yet see any segfaults at exit.
  Not sure whether this is means anything as the crash is not
  100% reproducable. Push it out to Anthony and we'll see, long term?
  Based on the knowledge of how to fix this,
  how would you go about reproducing it?
 
 Add code to trigger the race manually,

If you like, send a patch adding such code, I will test.

 but i'm pretty sure thats it.
 
 Thanks for testing.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: MMU: update invlpg handler comment


Large page translations are always synchronized (either in level 3 
or level 2), so its not necessary to properly deal with them
in the invlpg handler.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: b/arch/x86/kvm/paging_tmpl.h
===
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -467,7 +467,6 @@ static void FNAME(invlpg)(struct kvm_vcp
level = iterator.level;
sptep = iterator.sptep;
 
-   /* FIXME: properly handle invlpg on large guest pages */
if (level == PT_PAGE_TABLE_LEVEL  ||
((level == PT_DIRECTORY_LEVEL  is_large_pte(*sptep))) ||
((level == PT_PDPE_LEVEL  is_large_pte(*sptep {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: fix qemu-kvm sigsegv at exit

On Mon, Oct 26, 2009 at 04:46:02PM -0200, Marcelo Tosatti wrote:
 
 Michael reported a qemu-kvm SIGSEGV at shutdown:
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x411d0940 (LWP 14446)]
 0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0,
 expire_time=62275467335)
 at /home/mst/scm/qemu-kvm/vl.c:1009
 1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0)
 {
 (gdb) l
 1004ts-next = *pt;
 1005*pt = ts;
 1006
 1007/* Rearm if necessary  */
 1008if (pt == active_timers[ts-clock-type]) {
 1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0)
 {
 1010  qemu_rearm_alarm_timer(alarm_timer);
 1011}
 1012/* Interrupt execution to force deadline
 recalculation.  */
 1013if (use_icount)
 (gdb) p alarm_timer
 $1 = (struct qemu_alarm_timer *) 0x0
 
 Problem is kvm_main_loop_wait(env, 0) can process the stop request
 (signalling iothread that vcpu is stopped, so its OK to exit) and
 continue to kvm_cpu_exec.
 
 Make sure cpu is not stopped before proceeding to kvm_cpu_exec.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 Reported-by: Michael S. Tsirkin m...@redhat.com
 
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 4c13628..ab8f0e4 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -1868,7 +1868,8 @@ static int kvm_main_loop_cpu(CPUState *env)
  }
  if (run_cpu) {
  kvm_main_loop_wait(env, 0);
 -kvm_cpu_exec(env);
 +if (!is_cpu_stopped(env))
 +kvm_cpu_exec(env);
I wonder if calling kvm_cpu_exec() after kvm_main_loop_wait() will fix
the problem?

  } else {
  kvm_main_loop_wait(env, 1000);
  }
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: fix qemu-kvm sigsegv at exit

On Mon, Oct 26, 2009 at 08:58:49PM +0200, Gleb Natapov wrote:
 On Mon, Oct 26, 2009 at 04:46:02PM -0200, Marcelo Tosatti wrote:
  
  Michael reported a qemu-kvm SIGSEGV at shutdown:
  
  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x411d0940 (LWP 14446)]
  0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0,
  expire_time=62275467335)
  at /home/mst/scm/qemu-kvm/vl.c:1009
  1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0)
  {
  (gdb) l
  1004ts-next = *pt;
  1005*pt = ts;
  1006
  1007/* Rearm if necessary  */
  1008if (pt == active_timers[ts-clock-type]) {
  1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0)
  {
  1010qemu_rearm_alarm_timer(alarm_timer);
  1011}
  1012/* Interrupt execution to force deadline
  recalculation.  */
  1013if (use_icount)
  (gdb) p alarm_timer
  $1 = (struct qemu_alarm_timer *) 0x0
  
  Problem is kvm_main_loop_wait(env, 0) can process the stop request
  (signalling iothread that vcpu is stopped, so its OK to exit) and
  continue to kvm_cpu_exec.
  
  Make sure cpu is not stopped before proceeding to kvm_cpu_exec.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  Reported-by: Michael S. Tsirkin m...@redhat.com
  
  diff --git a/qemu-kvm.c b/qemu-kvm.c
  index 4c13628..ab8f0e4 100644
  --- a/qemu-kvm.c
  +++ b/qemu-kvm.c
  @@ -1868,7 +1868,8 @@ static int kvm_main_loop_cpu(CPUState *env)
   }
   if (run_cpu) {
   kvm_main_loop_wait(env, 0);
  -kvm_cpu_exec(env);
  +if (!is_cpu_stopped(env))
  +kvm_cpu_exec(env);
 I wonder if calling kvm_cpu_exec() after kvm_main_loop_wait() will fix
 the problem?

Yeah, that would also do it.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-net: fix data corruption with OOM

On Mon, Oct 26, 2009 at 08:42:43PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 26, 2009 at 12:11:51PM +1030, Rusty Russell wrote:
  On Mon, 26 Oct 2009 03:33:40 am Michael S. Tsirkin wrote:
   virtio net used to unlink skbs from send queues on error,
   but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
   we do not do this. This causes guest data corruption and crashes
   with vhost since net core can requeue the skb or free it without
   it being taken off the list.
   
   This patch fixes this by queueing the skb after successfull
   transmit.
  
  I originally thought that this was racy: as soon as we do add_buf, we need 
  to
  make sure we're ready for the callback (for virtio_pci, it's -kick, but we
  shouldn't rely on that).
 
 Modified the guest slightly, and I am getting crashes again.
 I didn't have time to debug this, but based on previous experience,
 I reverted 48925e372f04f5e35fec6269127c62b2c71ab794,
 and the crash went away.
 Rusty, what do you say we just revert 48925e372f04f5e35fec6269127c62b2c71ab794
 for now?

Hmm. Can't reproduce the crash anymore.
There is a small chance that the problem was my error,
so I guess I should try to reproduce and debug this,
after all.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-cpu host AMD Host

2009-10-26 Thread Martin Gallant

Is cpu host supported on AMD hosts?

Whenever I try to use this option on a Windows Vista/7 client, I get blue
screen.
Removing the option, the client works fine.

Host kernel 2.6.31.4.  Userspace is qemu-kvm-0.11.0.  (Previous versions
fail too)

/proc/cpuinfo snippet:
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+

Thanks,

--
Marty


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

On Fri, Oct 23, 2009 at 09:23:40AM -0700, Shirley Ma wrote:
 Hello Michael,
 
 Some initial vhost test netperf results on my T61 laptop from the
 working tap device are here, latency has been significant decreased, but
 throughput from guest to host has huge regression. I also hit guest
 skb_xmit panic.
 
 netperf TCP_STREAM, default setup, 60 secs run
 
 guest-host drops from 3XXXMb/s to 1XXXMb/s (regression)
 host-guest increases from 3XXXMb/s to 4Mb/s 
 
 TCP_RR, 60 secs run (very impressive)
 
 guest-host trans/s increases from 2XXX/s to 13XXX/s
 host-guest trans/s increases from 2XXX/s to 13XXX/s
 
 Thanks
 Shirley

Shirley, could you please test the following patch?
It is surprising to me that it should improve
performance, but seems to do this in my setup.
Please comment.


diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 30708c6..67bfc08 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -775,7 +775,7 @@ void vhost_no_notify(struct vhost_virtqueue *vq)
 
 int vhost_init(void)
 {
-   vhost_workqueue = create_workqueue(vhost);
+   vhost_workqueue = create_singlethread_workqueue(vhost);
if (!vhost_workqueue)
return -ENOMEM;
return 0;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index a140dad..49026bb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -106,10 +106,14 @@ static void handle_tx(struct vhost_net *net)
.msg_flags = MSG_DONTWAIT,
};
size_t len, total_len = 0;
-   int err;
+   int err, wmem;
size_t hdr_size;
struct socket *sock = rcu_dereference(vq-private_data);
-   if (!sock || !sock_writeable(sock-sk))
+   if (!sock)
+   return;
+
+   wmem = atomic_read(sock-sk-sk_wmem_alloc);
+   if (wmem = sock-sk-sk_sndbuf)
return;
 
use_mm(net-dev.mm);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

On Sun, 2009-10-25 at 15:01 +0200, Avi Kivity wrote:
 On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
  On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
 
  KVM for PowerPC only supports embedded cores at the moment.
 
  While it makes sense to virtualize on small machines, it's even more fun
  to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
 
  This patchset implements KVM support for Book3s_64 hosts and guest support
  for Book3s_64 and G3/G4.
   
  Acked-by: Hollis Blanchardholl...@us.ibm.com
 
  Avi, please apply these patches
 
 
 I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
 are, from the powerpc maintainers.

OK, BenH says they're on his todo list.

In the meantime, please apply patch #2, because it fixes the broken qemu
build.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

xp guest, blue screen c0000221 on boot

2009-10-26 Thread Andrew Olney


Hangs on boot, xp guest:

STOP: c221 Unknown Hard Error
 \SystemRoot\System32\ntdll.dll

Will boot into safe mode, but _not_ into safe mode with networking.

Boots into non-MS VMs fine.


*  what cpu model (examples: Intel Core Duo, Intel Core 2 Duo, AMD 
Opteron 2210). See /proc/cpuinfo if you're not sure.


processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 14
model name  : Genuine Intel(R) CPU   L2400  @ 1.66GHz
stepping: 8
cpu MHz : 1000.000
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc 
arch_perfmon bts pni monitor vmx est tm2 xtpr pdcm

bogomips: 3324.92
clflush size: 64
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 14
model name  : Genuine Intel(R) CPU   L2400  @ 1.66GHz
stepping: 8
cpu MHz : 1000.000
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc 
arch_perfmon bts pni monitor vmx est tm2 xtpr pdcm

bogomips: 3324.97
clflush size: 64
power management:


* what kvm version you are using. If you're using git directly, 
provide the output of 'git describe'.


Same behavior with ubuntu package 0.11.0-0ubuntu6 (karmic)
and source qemu-kvm-0.11.0


* the host kernel version

Linux monkamu 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 
2009 i686 GNU/Linux



* what host kernel arch you are using (i386 or x86_64)

i386


* what guest you are using, including OS type (Linux, Windows, 
Solaris, etc.), bitness (32 or 64), kernel version


XP Pro 32 SP 3


* the qemu command line you are using to start the guest

kvm -cpu coreduo,-nx -hda /z/xp.img -boot c -usb -usbdevice tablet -m 512

* whether the problem goes away if using the -no-kvm-irqchip or 
-no-kvm-pit switch.


No

* whether the problem also appears with the -no-kvm switch.

No
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] kvm-kmod-2.6.31.5

2009-10-26 Thread Alexander Graf



On 26.10.2009, at 18:26, Jan Kiszka wrote:


This package contains the kvm external modules, using the sources from
latest stable Linux release 2.6.31.5. It can be used to update the
kernel-side support of kvm without upgrading the host kernel.

This release has been tested on x86 down to host kernel 2.6.27 and
builds down to 2.6.24. Building against older kernels is expected to  
be
broken, but if anyone provides patches to fix it, I'm open to merge  
them.


Aww - I'm missing the awesome changelogs :-).


Great to see you take this up Jan!

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

2009-10-26 Thread Shirley Ma

Hello Miachel,

On Mon, 2009-10-26 at 22:05 +0200, Michael S. Tsirkin wrote:
 Shirley, could you please test the following patch?

With this patch, the performance has gained from 1xxx to 2xxx Mb/s,
still has some performance gap compared to without vhost. It was
3xxxMb/s before from guest to host on my set up.

Looks like your git tree virtio_net has fixed the skb_xmit panic I have
seen before as well, good news.

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

2009-10-26 Thread Shirley Ma

Pulled your git tree, didn't see the panic.

Thanks
Shirley


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

2009-10-26 Thread Shirley Ma

On Sun, 2009-10-25 at 11:11 +0200, Michael S. Tsirkin wrote:
 What is vnet0?

That's a tap interface. I am binding raw socket to a tap interface and
it doesn't work. Does it support?

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2886754 ] Extreme slow down using -cpu host

2009-10-26 Thread SourceForge.net

Bugs item #2886754, was opened at 2009-10-26 23:41
Message generated for change (Tracker Item Submitted) made by nwxi
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886754group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nwxi (nwxi)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extreme slow down using -cpu host

Initial Comment:
Hi!

I've currently experienced a massive slowdown when using flag -cpu host on an 
AMD Phenom 905e with qemu-kvm 0.11.0 and kernel 2.6.31.3 x86_64. This affects 
network (tested e1000 and virtio) and i/o performance (virtio). Further there 
are dozens of log messages from kvm module complaining about:

 cpu0/1 unhandled rdmsr: 0xc0010055

Both, the slowdown and the rdmsr-message do not happen when using qemu64 as 
cpu, but also occur when using -cpu phenom. Thank you for your comments!

regards, Michael

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886754group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-26 Thread Olof Johansson

Not sure which patch in the series this is needed for since I applied
them all, but I got:

  CC  arch/powerpc/kvm/timing.o
arch/powerpc/kvm/timing.c:205: error: 'THIS_MODULE' undeclared here (not in a 
function)


Signed-off-by: Olof Johansson o...@lixom.net


diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
index 2aa371e..7037855 100644
--- a/arch/powerpc/kvm/timing.c
+++ b/arch/powerpc/kvm/timing.c
@@ -23,6 +23,7 @@
 #include linux/seq_file.h
 #include linux/debugfs.h
 #include linux/uaccess.h
+#include linux/module.h
 
 #include asm/time.h
 #include asm-generic/div64.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

On Mon, 2009-10-26 at 18:06 -0500, Olof Johansson wrote:
 Not sure which patch in the series this is needed for since I applied
 them all, but I got:
 
   CC  arch/powerpc/kvm/timing.o
 arch/powerpc/kvm/timing.c:205: error: 'THIS_MODULE' undeclared here (not in a 
 function)
 
 
 Signed-off-by: Olof Johansson o...@lixom.net
 
 
 diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
 index 2aa371e..7037855 100644
 --- a/arch/powerpc/kvm/timing.c
 +++ b/arch/powerpc/kvm/timing.c
 @@ -23,6 +23,7 @@
  #include linux/seq_file.h
  #include linux/debugfs.h
  #include linux/uaccess.h
 +#include linux/module.h
 
  #include asm/time.h
  #include asm-generic/div64.h

For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard holl...@us.ibm.com

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: Add new program cd_hash.py

2009-10-26 Thread Lucas Meneghel Rodrigues

A new program that evaluates hash strings, intended
to help kvm autotest administrators was added, cd_hash.

Usage: cd_hash.py [options]

Options:
  -h, --helpshow this help message and exit
  -i FILENAME, --iso=FILENAME
path to a ISO file whose hash string will be
evaluated.

This script will calculate:

 * MD5SUM for the 1st MB of the file
 * SHA1SUM for the 1st MB of the file
 * MD5SUM for the whole file
 * SHA1SUM for the whole file

The hashes for the 1st MB are calculated first in the case the
user only wants them. This program replaces calc_md5sum_1m.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/calc_md5sum_1m.py |   21 --
 client/tests/kvm/cd_hash.py|   54 
 2 files changed, 54 insertions(+), 21 deletions(-)
 delete mode 100755 client/tests/kvm/calc_md5sum_1m.py
 create mode 100755 client/tests/kvm/cd_hash.py

diff --git a/client/tests/kvm/calc_md5sum_1m.py 
b/client/tests/kvm/calc_md5sum_1m.py
deleted file mode 100755
index 153a1e0..000
--- a/client/tests/kvm/calc_md5sum_1m.py
+++ /dev/null
@@ -1,21 +0,0 @@
-#!/usr/bin/python
-
-Program that calculates the md5sum for the first megabyte of a file.
-It's faster than calculating the md5sum for the whole ISO image.
-
-...@copyright: Red Hat 2008-2009
-...@author: Uri Lublin (u...@redhat.com)
-
-
-import os, sys
-import kvm_utils
-
-
-if len(sys.argv)  2:
-print 'usage: %s iso-filename' % sys.argv[0]
-else:
-fname = sys.argv[1]
-if not os.access(fname, os.F_OK) or not os.access(fname, os.R_OK):
-print 'bad file name or permissions'
-else:
-print kvm_utils.hash_file(fname, 1024*1024, method=md5)
diff --git a/client/tests/kvm/cd_hash.py b/client/tests/kvm/cd_hash.py
new file mode 100755
index 000..483d71c
--- /dev/null
+++ b/client/tests/kvm/cd_hash.py
@@ -0,0 +1,54 @@
+#!/usr/bin/python
+
+Program that calculates several hashes for a given CD image.
+
+...@copyright: Red Hat 2008-2009
+
+
+import os, sys, optparse, logging
+import common, kvm_utils
+from autotest_lib.client.common_lib import logging_config, logging_manager
+
+
+class KvmLoggingConfig(logging_config.LoggingConfig):
+def configure_logging(self, results_dir=None, verbose=False):
+super(KvmLoggingConfig, self).configure_logging(use_console=True,
+verbose=verbose)
+
+if __name__ == __main__:
+parser = optparse.OptionParser()
+parser.add_option('-i', '--iso', type=string, dest=filename,
+  action='store',
+  help='path to a ISO file whose hash string will be '
+   'evaluated.')
+
+options, args = parser.parse_args()
+filename = options.filename
+
+logging_manager.configure_logging(KvmLoggingConfig())
+
+if not filename:
+parser.print_help()
+sys.exit(1)
+
+filename = os.path.abspath(filename)
+
+file_exists = os.path.isfile(filename)
+can_read_file = os.access(filename, os.R_OK)
+if not file_exists:
+logging.critical(File %s does not exist. Aborting..., filename)
+sys.exit(1)
+if not can_read_file:
+logging.critical(File %s does not have read permissions. 
+ Aborting..., filename)
+sys.exit(1)
+
+logging.info(Hash values for file %s, os.path.basename(filename))
+logging.info(md5(1m): %s, kvm_utils.hash_file(filename, 1024*1024,
+method=md5))
+logging.info(sha1   (1m): %s, kvm_utils.hash_file(filename, 1024*1024,
+method=sha1))
+logging.info(md5  (full): %s, kvm_utils.hash_file(filename,
+method=md5))
+logging.info(sha1 (full): %s, kvm_utils.hash_file(filename,
+method=sha1))
-- 
1.6.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-26 Thread Olof Johansson


On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:



For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard holl...@us.ibm.com


I saw it when building with pasemi_defconfig + manually enabled KVM  
options (all available).



-Olof
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Virtio block module slower than IDE

2009-10-26 Thread Floris Bos


Hi,



I am running Proxmox 1.4 (which uses the 2.6.30.1 kvm modules) and am

experiencing performance problems with Linux guests using the virtio_blk

module.

Especially with random IO it is a lot slower than IDE.





Ubuntu 9.10 VM on LVM storage with VirtIO:



===

bonnie++ -s 16384



Version 1.03c   --Sequential Output-- --Sequential Input-

--Random-

-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--

--Seeks--

MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec

%CP

ubuntu910   16G 39209  96 45383   3 29984   6 33996  73 90472   8 636.5

  1

--Sequential Create-- Random

Create

-Create-- --Read--- -Delete-- -Create-- --Read---

-Delete--

  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec

%CP

 16 + +++ + +++ + +++ 23837  56 + +++ +

+++

ubuntu910,16G,39209,96,45383,3,29984,6,33996,73,90472,8,636.5,1,16,+,+++,+,+++,+,+++,23837,56,+,+++,+,+++



postmark

set size 1 1000

set number 300

set transactions 2500

run



PostMark v1.51 : 8/14/01

Creating files...Done

Performing transactions..Done

Deleting files...Done

Time:

141 seconds total

122 seconds of transactions (20 per second)



Files:

1540 created (10 per second)

Creation alone: 300 files (17 per second)

Mixed with transactions: 1240 files (10 per second)

1242 read (10 per second)

1258 appended (10 per second)

1540 deleted (10 per second)

Deletion alone: 280 files (140 per second)

Mixed with transactions: 1260 files (10 per second)



Data:

7653.28 megabytes read (54.28 megabytes per second)

9534.76 megabytes written (67.62 megabytes per second)

===





Ubuntu 9.10 VM on LVM storage with IDE:



===

Version 1.03c   --Sequential Output-- --Sequential Input-

--Random-

-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--

--Seeks--

MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec

%CP

ubuntu910   16G 38796  97 63574   5 31138   7 34604  74 92490   8  2803

  7

--Sequential Create-- Random

Create

-Create-- --Read--- -Delete-- -Create-- --Read---

-Delete--

  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec

%CP

 16 + +++ + +++ + +++ 23745  56 + +++ +

+++

ubuntu910,16G,38796,97,63574,5,31138,7,34604,74,92490,8,2803.0,7,16,+,+++,+,+++,+,+++,23745,56,+,+++,+,+++



PostMark v1.51 : 8/14/01

Creating files...Done

Performing transactions..Done

Deleting files...Done

Time:

126 seconds total

111 seconds of transactions (22 per second)



Files:

1540 created (12 per second)

Creation alone: 300 files (20 per second)

Mixed with transactions: 1240 files (11 per second)

1242 read (11 per second)

1258 appended (11 per second)

1540 deleted (12 per second)

Deletion alone: 280 files (280 per second)

Mixed with transactions: 1260 files (11 per second)



Data:

7653.28 megabytes read (60.74 megabytes per second)

9534.76 megabytes written (75.67 megabytes per second)

===



Configuration: dual quadcore Opteron 2350, Mtron 7000 solid state drive, 8

gb ram, 6 gb assigned to vm, swap disabled on both host and vm. 

KVM command line used by Proxmox for VirtIO: /usr/bin/kvm -monitor

unix:/var/run/qemu-server/102.mon,server,nowait -vnc

unix:/var/run/qemu-server/102.vnc,password -pidfile

/var/run/qemu-server/102.pid -daemonize -usbdevice tablet -name ubuntu910

-smp sockets=1,cores=1 -boot cad -vga cirrus -tdf-drive

file=/dev/vmstorage/vm-102-disk-1,if=virtio,index=0,boot=on -m 6000 -net

user,vlan=1000,hostname=ubuntu910 -net

nic,vlan=1000,model=rtl8139,macaddr=CE:14:D4:DC:2B:94



Also tried with Ubuntu 9.04 instead of 9.10, but the results are similar.





Any idea what might be the problem?





Yours sincerely,



Floris Bos
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-net: fix data corruption with OOM

2009-10-26 Thread David Miller

From: Michael S. Tsirkin m...@redhat.com
Date: Mon, 26 Oct 2009 11:07:13 +0200

 Another, and hopefully the last, note, is that
 git-am can only handle Subject/From lines
 at the beginning of the message.
 So git style of the mail would be
 ...
 I think it's weird. We could invent some kind of separator
 that would make git-am accept Subject/From/Date lines in
 the middle of the message, so that discussion can come before
 the description. Worth it?

There is no need for this.  patchwork handles this situation perfectly
and this is what I use to apply all networking patches.

Anything in a reply to a patch that looks like a signoff or ACK,
patchwork adds to the commit message in the mbox blob it spits out for
me.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM: VMX: move CR3/PDPTR update to vmx_set_cr3

2009-10-26 Thread Yang, Sheng

On Tuesday 27 October 2009 02:48:33 Marcelo Tosatti wrote:
 GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
 outside guest context. Similarly pdptrs are updated via load_pdptrs.

 Let kvm_set_cr3 perform the update, removing it from the vcpu_run
 fast path.

Looks fine to me.

Acked-by: Sheng Yang sh...@linux.intel.com

-- 
regards
Yang, Sheng

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

 Index: b/arch/x86/kvm/vmx.c
 ===
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -1748,6 +1748,7 @@ static void vmx_set_cr3(struct kvm_vcpu
   vmcs_write64(EPT_POINTER, eptp);
   guest_cr3 = is_paging(vcpu) ? vcpu-arch.cr3 :
   vcpu-kvm-arch.ept_identity_map_addr;
 + ept_load_pdptrs(vcpu);
   }

   vmx_flush_tlb(vcpu);
 @@ -3638,10 +3639,6 @@ static void vmx_vcpu_run(struct kvm_vcpu
  {
   struct vcpu_vmx *vmx = to_vmx(vcpu);

 - if (enable_ept  is_paging(vcpu)) {
 - vmcs_writel(GUEST_CR3, vcpu-arch.cr3);
 - ept_load_pdptrs(vcpu);
 - }
   /* Record the guest's net vcpu time for enforced NMI injections. */
   if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
   vmx-entry_time = ktime_get();
 Index: b/arch/x86/kvm/x86.c
 ===
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4517,8 +4517,10 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct

   mmu_reset_needed |= vcpu-arch.cr4 != sregs-cr4;
   kvm_x86_ops-set_cr4(vcpu, sregs-cr4);
 - if (!is_long_mode(vcpu)  is_pae(vcpu))
 + if (!is_long_mode(vcpu)  is_pae(vcpu)) {
   load_pdptrs(vcpu, vcpu-arch.cr3);
 + mmu_reset_needed = 1;
 + }

   if (mmu_reset_needed)
   kvm_mmu_reset_context(vcpu);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-26 Thread Paul E. McKenney

On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:
 
 thread-1thread-2
 ---
 
 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();
 
kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();
 
   kfree(irq_rt);
 
irq_rt-entry-set(); /* bad */
 
 -
 
 Because the pointer is accessed outside of the read-side critical
 section.  There are two basic patterns we can use to fix this bug:
 
 1) Switch to sleeping-rcu and encompass the -set() access within the
read-side critical section,
 
OR
 
 2) Add reference counting to the irq_rt structure, and simply acquire
the reference from within the RSCS.
 
 This patch implements solution (1).

Looks like a good transformation!  A few questions interspersed below.

 Signed-off-by: Gregory Haskins ghask...@novell.com
 ---
 
  include/linux/kvm_host.h |6 +-
  virt/kvm/irq_comm.c  |   50 
 +++---
  virt/kvm/kvm_main.c  |1 +
  3 files changed, 35 insertions(+), 22 deletions(-)
 
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index bd5a616..1fe135d 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -185,7 +185,10 @@ struct kvm {
 
   struct mutex irq_lock;
  #ifdef CONFIG_HAVE_KVM_IRQCHIP
 - struct kvm_irq_routing_table *irq_routing;
 + struct {
 + struct srcu_structsrcu;

Each structure has its own SRCU domain.  This is OK, but just asking
if that is the intent.  It does look like the SRCU primitives are
passed a pointer to the correct structure, and that the return value
from srcu_read_lock() gets passed into the matching srcu_read_unlock()
like it needs to be, so that is good.

 + struct kvm_irq_routing_table *table;
 + } irq_routing;
   struct hlist_head mask_notifier_list;
   struct hlist_head irq_ack_notifier_list;
  #endif

[ . . . ]

 @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
 irq, int level)
* IOAPIC.  So set the bit in both. The guest will ignore
* writes to the unused one.
*/
 - rcu_read_lock();
 - irq_rt = rcu_dereference(kvm-irq_routing);
 + idx = srcu_read_lock(kvm-irq_routing.srcu);
 + irq_rt = rcu_dereference(kvm-irq_routing.table);
   if (irq  irq_rt-nr_rt_entries)
 - hlist_for_each_entry(e, n, irq_rt-map[irq], link)
 - irq_set[i++] = *e;
 - rcu_read_unlock();
 + hlist_for_each_entry(e, n, irq_rt-map[irq], link) {

What prevents the above list from changing while we are traversing it?
(Yes, presumably whatever was preventing it from changing before this
patch, but what?)

Mostly kvm-lock is held, but not always.  And if kvm-lock were held
all the time, there would be no point in using SRCU.  ;-)

 + int r;
 
 - while(i--) {
 - int r;
 - r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level);
 - if (r  0)
 - continue;
 + r = e-set(e, kvm, irq_source_id, level);
 + if (r  0)
 + continue;
 
 - ret = r + ((ret  0) ? 0 : ret);
 - }
 + ret = r + ((ret  0) ? 0 : ret);
 + }
 + srcu_read_unlock(kvm-irq_routing.srcu, idx);
 
   return ret;
  }
 @@ -179,17 +178,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned 
 irqchip, unsigned pin)
   struct kvm_irq_ack_notifier *kian;
   struct hlist_node *n;
   int gsi;
 + int idx;
 
   trace_kvm_ack_irq(irqchip, pin);
 
 - rcu_read_lock();
 - gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin];
 + idx = srcu_read_lock(kvm-irq_routing.srcu);
 + gsi = rcu_dereference(kvm-irq_routing.table)-chip[irqchip][pin];
   if (gsi != -1)
   hlist_for_each_entry_rcu(kian, n, kvm-irq_ack_notifier_list,
link)

And same question here -- what keeps the above list from changing while
we are traversing it?

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM Test: Add re.IGNORECASE to re.compile to verify_ip_address_ in kvm_utils.py

2009-10-26 Thread Cao, Chen

Since the mac address is (changed to) lowercase and the output of
'arping' is in uppercase, we need re.IGNORECASE in the re.compile.

(the re.IGNORECASE in the re.search function takes no effect on the
compiled regex.)

Signed-off-by: Cao, Chen k...@redhat.com
---
 client/tests/kvm/kvm_utils.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index f72984a..934f223 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -190,7 +190,7 @@ def verify_ip_address_ownership(ip, macs, timeout=10.0):
 # Compile a regex that matches the given IP address and any of the given
 # MAC addresses
 mac_regex = |.join((%s) % mac for mac in macs)
-regex = re.compile(r\b%s\b.*\b(%s)\b % (ip, mac_regex))
+regex = re.compile(r\b%s\b.*\b(%s)\b % (ip, mac_regex), re.IGNORECASE)
 
 # Check the ARP cache
 o = commands.getoutput(/sbin/arp -n)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM-test: Add execute permission to qemu-ifup script

2009-10-26 Thread Amos Kong


qemu-ifup is a script for setting network bridge.
If no execute permission, always face this problem:

autotest/client/tests/kvm/scripts/qemu-ifup: could not launch network script
Could not initialize device 'tap

Signed-off-by: Amos Kong ak...@redhat.com
---
 0 files changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 = 100755 client/tests/kvm/scripts/qemu-ifup

diff --git a/client/tests/kvm/scripts/qemu-ifup 
b/client/tests/kvm/scripts/qemu-ifup
old mode 100644
new mode 100755
-- 
1.5.5.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KSM-test: Test 802.1Q vlan of nic

2009-10-26 Thread Amos Kong


Test 802.1Q vlan of nic, config it by vconfig command.
1) Create two VMs
2) Setup guests in different vlan by vconfig and test communication by ping
   using hard-coded ip address
3) Setup guests in same vlan and test communication by ping
4) Recover the vlan config

The subnet of vlan can be setup in configure file.

Signed-off-by: Amos Kong ak...@redhat.com
---
 client/tests/kvm/kvm_tests.cfg.sample |   12 ++
 client/tests/kvm/tests/vlan_tag.py|   68 +
 2 files changed, 80 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/vlan_tag.py

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 573206c..7f9512a 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -157,6 +157,18 @@ variants:
 used_cpus = 5
 used_mem = 2560
 
+- vlan_tag:  install setup
+type = vlan_tag
+# subnet2 should not be used by host
+subnet2 = 192.168.123
+vlans = 10 20
+nic_mode = tap
+vms +=  vm2
+extra_params_vm1 +=  -snapshot
+extra_params_vm2 +=  -snapshot
+kill_vm_gracefully_vm2 = no
+address_index_vm2 = 1
+
 - autoit:   install setup
 type = autoit
 autoit_binary = D:\AutoIt3.exe
diff --git a/client/tests/kvm/tests/vlan_tag.py 
b/client/tests/kvm/tests/vlan_tag.py
new file mode 100644
index 000..ada919f
--- /dev/null
+++ b/client/tests/kvm/tests/vlan_tag.py
@@ -0,0 +1,68 @@
+import logging, time
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+def run_vlan_tag(test, params, env):
+
+Test 802.1Q vlan of nic, config it by vconfig command.
+
+1) Create two VMs
+2) Setup guests in different vlan by vconfig and test communication by ping
+   using hard-coded ip address
+3) Setup guests in same vlan and test communication by ping
+4) Recover the vlan config
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+
+
+vm = []
+session = []
+subnet2 = params.get(subnet2)
+vlans = params.get(vlans).split()
+
+vm.append(kvm_test_utils.get_living_vm(env, params.get(main_vm)))
+vm.append(kvm_test_utils.get_living_vm(env, vm2))
+
+if not vm[1].create():
+raise error.TestError(VM 1 create faild)
+
+for i in range(2):
+session.append(kvm_test_utils.wait_for_login(vm[i]))
+
+try:
+vconfig_cmd = vconfig add eth0 %s;ifconfig eth0.%s %s.%s
+# Attempt to configure IPs for the VMs and record the results in
+# boolean variables
+# Make vm1 and vm2 in the different vlan
+
+ip_config_vm1_ok = (session[0].get_command_status(vconfig_cmd
+   % (vlans[0], vlans[0], subnet2, 11)) == 0)
+ip_config_vm2_ok = (session[1].get_command_status(vconfig_cmd
+   % (vlans[1], vlans[1], subnet2, 12)) == 0)
+if not ip_config_vm1_ok or not ip_config_vm2_ok:
+raise error.TestError, Fail to config VMs ip address
+ping_diff_vlan_ok = (session[0].get_command_status(
+  ping -c 2 -I eth0.%s %s.12 % (vlans[0], subnet2)) == 0)
+
+if ping_diff_vlan_ok:
+raise error.TestFail(VM 2 is unexpectedly pingable in different 
+ vlan)
+# Make vm2 in the same vlan with vm1
+vlan_config_vm2_ok = (session[1].get_command_status(
+  vconfig rem eth0.%s;vconfig add eth0 %s;
+  ifconfig eth0.%s %s.12 %
+  (vlans[1], vlans[0], vlans[0], subnet2)) == 0)
+if not vlan_config_vm2_ok:
+raise error.TestError, Fail to config ip address of VM 2
+
+ping_same_vlan_ok = (session[0].get_command_status(
+  ping -c 2 -I eth0.%s %s.12 % (vlans[0], subnet2)) == 0)
+if not ping_same_vlan_ok:
+raise error.TestFail(Fail to ping the guest in same vlan)
+finally:
+# Clean the vlan config
+for i in range(2):
+session[i].get_command_status(vconfig rem eth0.%s % vlans[0])
+session[i].close()
-- 
1.5.5.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [PATCH] Test 802.1Q vlan of nic

2009-10-26 Thread Amos Kong

On Wed, Oct 21, 2009 at 06:37:56PM +0800, Amos Kong wrote:
 On Tue, Oct 20, 2009 at 09:19:50AM -0400, Michael Goldish wrote:
  See comments below.
 
 Hi all,
 Thanks for your reply.
  
.
 
 Agree with you.
 When I test this case, the original get_command_status() always cause special 
 read problem, so I use sendline().
 
 I'll replace sendline() with get_command_status() later.
  
  Other than these minor issues the test looks good.
 
 I'll re-send another patch later. Thanks again!


Hello all,


Execute on VM1 ping -c 2 -I eth0.10 IP_Address_eth0.10_VM2

We can use -I option to assign the interface of ping, then no need to make 
eth0.10 and eth0 in different subnet. But eth0 and eth0.10 have the same mac 
address, so eth0.10 could not get address by DHCP. If we assign it in the code, 
it's maybe repeat with others. The method is not better than assigning subnet2 
in configure file.

So I'll send another new version first.

Welcome any suggestion :)


Best Regards,
Amos

-- 
Amos Kong
Quality Engineer
Raycom Office(Beijing), Red Hat Inc.
Phone: +86-10-62608183
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

On Sun, 2009-10-25 at 15:01 +0200, Avi Kivity wrote:
 On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
  On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
 
  KVM for PowerPC only supports embedded cores at the moment.
 
  While it makes sense to virtualize on small machines, it's even more fun
  to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
 
  This patchset implements KVM support for Book3s_64 hosts and guest support
  for Book3s_64 and G3/G4.
   
  Acked-by: Hollis Blanchardholl...@us.ibm.com
 
  Avi, please apply these patches
 
 
 I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
 are, from the powerpc maintainers.

OK, BenH says they're on his todo list.

In the meantime, please apply patch #2, because it fixes the broken qemu
build.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5