date:20100516

From: Alex Williamson alex.william...@redhat.com

We keep a qlist of assigned devices for irq updates, but we forgot to
remove entries from it if they're hot unplugged.  This makes
assigned_dev_update_irqs() a timebomb that goes off when the guest is
rebooted.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 1f13a6d..b9cc06f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1390,6 +1390,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
+QLIST_REMOVE(dev, next);
 deassign_device(dev);
 free_assigned_device(dev);
 return 0;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] turn off kvmclock when resetting cpu

From: Glauber Costa glom...@redhat.com

Currently, in the linux kernel, we reset kvmclock if we are rebooting
into a crash kernel through kexec. The rationale, is that a new kernel
won't follow the same memory addresses, and the memory where kvmclock is
located in the first kernel, will be something else in the second one.

We don't do it in normal reboots, because the second kernel ends up
registering kvmclock again, which has the effect of turning off the
first instance.

This is, however, totally wrong. This assumes we're booting into
a kernel that also has kvmclock enabled. If by some reason we reboot
into something that doesn't do kvmclock including but not limited to:
 * rebooting into an older kernel without kvmclock support,
 * rebooting with no-kvmclock,
 * rebootint into another O.S,

we'll simply have the hypervisor writting into a random memory position
into the guest. Neat, uh?

Moreover, I believe the fix belongs in qemu, since it is the entity
more prepared to detect all kinds of reboots (by means of a cpu_reset),
not to mention the presence of misbehaving guests, that can forget
to turn kvmclock off.

It is also necessary to reset other msrs, so this patch resets
everything that kvm exports through its MSR list.

This patch fixes the issue for me.

Signed-off-by: Glauber Costa glom...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 5bd752d..73b4af7 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1333,8 +1333,31 @@ void kvm_arch_push_nmi(void *opaque)
 }
 #endif /* KVM_CAP_USER_NMI */
 
+static int kvm_reset_msrs(CPUState *env)
+{
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[100];
+} msr_data;
+int n;
+struct kvm_msr_entry *msrs = msr_data.entries;
+
+if (!kvm_msr_list)
+return -1;
+
+for (n = 0; n  kvm_msr_list-nmsrs; n++) {
+kvm_msr_entry_set(msrs[n], kvm_msr_list-indices[n], 0);
+}
+
+msr_data.info.nmsrs = n;
+
+return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data);
+}
+
+
 void kvm_arch_cpu_reset(CPUState *env)
 {
+kvm_reset_msrs(env);
 kvm_arch_reset_vcpu(env);
 kvm_reset_mpstate(env);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] test: emulator: lmsw may not clear cr0.pe

From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/emulator.c b/kvm/user/test/x86/emulator.c
index 5406062..e677e3a 100644
--- a/kvm/user/test/x86/emulator.c
+++ b/kvm/user/test/x86/emulator.c
@@ -267,6 +267,15 @@ void test_lmsw(void)
asm(lmsw %0 : : m(*pmsw));
printf(before %lx after %lx\n, cr0, read_cr0());
report(lmsw (2), cr0 == read_cr0());
+
+   /* lmsw can't clear cr0.pe */
+   msw = (cr0  ~1ul) ^ 4;  /* change EM to force trap */
+   asm(lmsw %0 : : r(msw));
+   report(lmsw (3), (cr0 ^ read_cr0()) == 4  (cr0  1));
+
+   /* back to normal */
+   msw = cr0;
+   asm(lmsw %0 : : r(msw));
 }
 
 int main()
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] test: Add test for xor acc, imm

From: Mohammed Gamal m.gamal...@gmail.com

Adds test for xor acc, imm

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c
index 41e2aea..70a1e05 100644
--- a/kvm/user/test/x86/realmode.c
+++ b/kvm/user/test/x86/realmode.c
@@ -350,6 +350,50 @@ void test_sub_imm(void)
print_serial(sub test 4: PASS\n);
 }
 
+
+void test_xor_imm(void)
+{
+   struct regs inregs = { 0 }, outregs;
+   MK_INSN(xor_r32_imm_1, mov $1234567890, %eax\n\t xor $1234567890, 
%eax\n\t);
+   MK_INSN(xor_r16_imm_1, mov $1234, %ax\n\t xor $1234, %ax\n\t);
+   MK_INSN(xor_r8_imm_1, mov $0x12, %ah\n\t xor $0x12, %ah\n\t);
+   MK_INSN(xor_r8_imm_2, mov $0x34, %al\n\t xor $0x34, %al\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_xor_r16_imm_1,
+ insn_xor_r16_imm_1_end - insn_xor_r16_imm_1);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0)
+   print_serial(xor test 1: FAIL\n);
+   else
+   print_serial(xor test 1: PASS\n);
+
+   /* test mov $imm, %eax */
+   exec_in_big_real_mode(inregs, outregs,
+ insn_xor_r32_imm_1,
+ insn_xor_r32_imm_1_end - insn_xor_r32_imm_1);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0)
+   print_serial(xor test 2: FAIL\n);
+   else
+   print_serial(xor test 2: PASS\n);
+
+   /* test mov $imm, %al/%ah */
+   exec_in_big_real_mode(inregs, outregs,
+ insn_xor_r8_imm_1,
+ insn_xor_r8_imm_1_end - insn_xor_r8_imm_1);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0)
+   print_serial(xor test 3: FAIL\n);
+   else
+   print_serial(xor test 3: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_xor_r8_imm_2,
+ insn_xor_r8_imm_2_end - insn_xor_r8_imm_2);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0)
+   print_serial(xor test 4: FAIL\n);
+   else
+   print_serial(xor test 4: PASS\n);
+}
+
 void test_cmp_imm(void)
 {
struct regs inregs = { 0 }, outregs;
@@ -786,6 +830,7 @@ void realmode_start(void)
test_cmp_imm();
test_add_imm();
test_sub_imm();
+   test_xor_imm();
test_io();
test_eflags_insn();
test_jcc_short();
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] device-assignment: fix failure to exit on shared IRQ

From: Alex Williamson alex.william...@redhat.com

Since c1699988, piix config space isn't programmed until the first
system reset.  This means that when we call assign_irq() from
assigned_initfn(), we're going to get back an irq of 0x0, which
unfortunately matches our initialization value, so we don't bother
to call kvm_assign_irq().  Switch to a -1 initializer so we can
test whether kvm_assign_irq() is going to succeed and allow the
process to exit if it doesn't.  The guest irq will get reset to a
more appropriate value on system reset anyway.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Chris Wright chr...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index b9cc06f..eb31c78 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1346,7 +1346,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 e_intx = dev-dev.config[0x3d] - 1;
 dev-intpin = e_intx;
 dev-run = 0;
-dev-girq = 0;
+dev-girq = -1;
 dev-h_segnr = dev-host.seg;
 dev-h_busnr = dev-host.bus;
 dev-h_devfn = PCI_DEVFN(dev-host.dev, dev-host.func);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] test: Add test for sub acc,imm

From: Mohammed Gamal m.gamal...@gmail.com

Adds tests fot sub acc, imm

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c
index bc4ed97..41e2aea 100644
--- a/kvm/user/test/x86/realmode.c
+++ b/kvm/user/test/x86/realmode.c
@@ -307,6 +307,49 @@ void test_mov_imm(void)
print_serial(mov test 5: PASS\n);
 }
 
+void test_sub_imm(void)
+{
+   struct regs inregs = { 0 }, outregs;
+   MK_INSN(sub_r32_imm_1, mov $1234567890, %eax\n\t sub $10, %eax\n\t);
+   MK_INSN(sub_r16_imm_1, mov $1234, %ax\n\t sub $10, %ax\n\t);
+   MK_INSN(sub_r8_imm_1, mov $0x12, %ah\n\t sub $0x10, %ah\n\t);
+   MK_INSN(sub_r8_imm_2, mov $0x34, %al\n\t sub $0x10, %al\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_sub_r16_imm_1,
+ insn_sub_r16_imm_1_end - insn_sub_r16_imm_1);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1224)
+   print_serial(sub test 1: FAIL\n);
+   else
+   print_serial(sub test 1: PASS\n);
+
+   /* test mov $imm, %eax */
+   exec_in_big_real_mode(inregs, outregs,
+ insn_sub_r32_imm_1,
+ insn_sub_r32_imm_1_end - insn_sub_r32_imm_1);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1234567880)
+   print_serial(sub test 2: FAIL\n);
+   else
+   print_serial(sub test 2: PASS\n);
+
+   /* test mov $imm, %al/%ah */
+   exec_in_big_real_mode(inregs, outregs,
+ insn_sub_r8_imm_1,
+ insn_sub_r8_imm_1_end - insn_sub_r8_imm_1);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x0200)
+   print_serial(sub test 3: FAIL\n);
+   else
+   print_serial(sub test 3: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_sub_r8_imm_2,
+ insn_sub_r8_imm_2_end - insn_sub_r8_imm_2);
+   if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x24)
+   print_serial(sub test 4: FAIL\n);
+   else
+   print_serial(sub test 4: PASS\n);
+}
+
 void test_cmp_imm(void)
 {
struct regs inregs = { 0 }, outregs;
@@ -742,6 +785,7 @@ void realmode_start(void)
test_mov_imm();
test_cmp_imm();
test_add_imm();
+   test_sub_imm();
test_io();
test_eflags_insn();
test_jcc_short();
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: update mmu documetation for role.nxe

From: Gui Jianfeng guijianf...@cn.fujitsu.com

There's no member cr4_nxe in struct kvm_mmu_page_role, it names nxe now.
Update mmu document.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index 0cc28fb..fde7989 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -161,7 +161,7 @@ Shadow pages contain the following information:
   role.cr4_pae:
 Contains the value of cr4.pae for which the page is valid (e.g. whether
 32-bit or 64-bit gptes are in use).
-  role.cr4_nxe:
+  role.nxe:
 Contains the value of efer.nxe for which the page is valid.
   gfn:
 Either the guest page table containing the translations shadowed by this
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: Call vcpu_load and vcpu_put in cpuid_update

From: Dongxiao Xu dongxiao...@intel.com

cpuid_update may operate VMCS, so vcpu_load() and vcpu_put()
should be called to ensure correctness.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b5eaed4..74d609e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1781,6 +1781,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
if (copy_from_user(cpuid_entries, entries,
   cpuid-nent * sizeof(struct kvm_cpuid_entry)))
goto out_free;
+   vcpu_load(vcpu);
for (i = 0; i  cpuid-nent; i++) {
vcpu-arch.cpuid_entries[i].function = 
cpuid_entries[i].function;
vcpu-arch.cpuid_entries[i].eax = cpuid_entries[i].eax;
@@ -1798,6 +1799,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
r = 0;
kvm_apic_set_version(vcpu);
kvm_x86_ops-cpuid_update(vcpu);
+   vcpu_put(vcpu);
 
 out_free:
vfree(cpuid_entries);
@@ -1818,9 +1820,11 @@ static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu 
*vcpu,
if (copy_from_user(vcpu-arch.cpuid_entries, entries,
   cpuid-nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
+   vcpu_load(vcpu);
vcpu-arch.cpuid_nent = cpuid-nent;
kvm_apic_set_version(vcpu);
kvm_x86_ops-cpuid_update(vcpu);
+   vcpu_put(vcpu);
return 0;
 
 out:
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Merge remote branch 'tip/x86/fpu'

From: Marcelo Tosatti mtosa...@redhat.com

Conflicts:
arch/x86/kernel/process.c

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: VMX: blocked-by-sti must not defer NMI injections

From: Jan Kiszka jan.kis...@siemens.com

As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.

Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
KVM-Stable-Tag
Acked-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f20ff50..99ae513 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2850,8 +2850,7 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
return 0;
 
return  !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) 
-   (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS |
-   GUEST_INTR_STATE_NMI));
+   (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_NMI));
 }
 
 static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: VMX: Define new functions to wrapper direct call of asm code

From: Dongxiao Xu dongxiao...@intel.com

Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e35c479..1959814 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -453,6 +453,19 @@ static void vmcs_clear(struct vmcs *vmcs)
   vmcs, phys_addr);
 }
 
+static void vmcs_load(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(vmcs);
+   u8 error;
+
+   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0
+   : =g(error) : a(phys_addr), m(phys_addr)
+   : cc, memory);
+   if (error)
+   printk(KERN_ERR kvm: vmptrld %p/%llx fail\n,
+  vmcs, phys_addr);
+}
+
 static void __vcpu_clear(void *arg)
 {
struct vcpu_vmx *vmx = arg;
@@ -830,7 +843,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   u64 phys_addr = __pa(vmx-vmcs);
u64 tsc_this, delta, new_offset;
 
if (vcpu-cpu != cpu) {
@@ -844,15 +856,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
}
 
if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
-   u8 error;
-
per_cpu(current_vmcs, cpu) = vmx-vmcs;
-   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0
- : =g(error) : a(phys_addr), m(phys_addr)
- : cc);
-   if (error)
-   printk(KERN_ERR kvm: vmptrld %p/%llx fail\n,
-  vmx-vmcs, phys_addr);
+   vmcs_load(vmx-vmcs);
}
 
if (vcpu-cpu != cpu) {
@@ -1288,6 +1293,13 @@ static __init int vmx_disabled_by_bios(void)
/* locked but not enabled */
 }
 
+static void kvm_cpu_vmxon(u64 addr)
+{
+   asm volatile (ASM_VMX_VMXON_RAX
+   : : a(addr), m(addr)
+   : memory, cc);
+}
+
 static int hardware_enable(void *garbage)
 {
int cpu = raw_smp_processor_id();
@@ -1310,9 +1322,7 @@ static int hardware_enable(void *garbage)
wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits);
}
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   asm volatile (ASM_VMX_VMXON_RAX
- : : a(phys_addr), m(phys_addr)
- : memory, cc);
+   kvm_cpu_vmxon(phys_addr);
 
ept_sync_global();
 
@@ -1336,13 +1346,13 @@ static void vmclear_local_vcpus(void)
 static void kvm_cpu_vmxoff(void)
 {
asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc);
-   write_cr4(read_cr4()  ~X86_CR4_VMXE);
 }
 
 static void hardware_disable(void *garbage)
 {
vmclear_local_vcpus();
kvm_cpu_vmxoff();
+   write_cr4(read_cr4()  ~X86_CR4_VMXE);
 }
 
 static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: add new KVMCLOCK cpuid feature

From: Glauber Costa glom...@redhat.com

This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest
that kvmclock is available through a new set of MSRs. The old ones
are deprecated.

Signed-off-by: Glauber Costa glom...@redhat.com
Acked-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9734808..f019f8c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
 #define KVM_FEATURE_CLOCKSOURCE0
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
+/* This indicates that the new set of kvmclock msrs
+ * are available. The use of 0x11 and 0x12 is deprecated
+ */
+#define KVM_FEATURE_CLOCKSOURCE23
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: VMX: VMCLEAR/VMPTRLD usage changes

From: Dongxiao Xu dongxiao...@intel.com

Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 454d46a..e262e66 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -63,6 +63,9 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+static int __read_mostly vmm_exclusive = 1;
+module_param(vmm_exclusive, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  \
(X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
 #define KVM_GUEST_CR0_MASK \
@@ -845,7 +848,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
 
-   if (vcpu-cpu != cpu)
+   if (vmm_exclusive  vcpu-cpu != cpu)
vcpu_clear(vmx);
 
if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
@@ -891,6 +894,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
__vmx_load_host_state(to_vmx(vcpu));
+   if (!vmm_exclusive)
+   __vcpu_clear(to_vmx(vcpu));
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Enable pvclock flags in vcpu_time_info structure

From: Glauber Costa glom...@redhat.com

This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Jeremy Fitzhardinge jer...@goop.org
Acked-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 6d93508..ec5c41a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
u64   system_time;
u32   tsc_to_system_mul;
s8tsc_shift;
-   u8pad[3];
+   u8flags;
+   u8pad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
 struct pvclock_wall_clock {
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 53235fd..cd02f32 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@
 
 /* some helper functions for xen and kvm pv clock sources */
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+void pvclock_set_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..f7fdd56 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -31,8 +31,16 @@ struct pvclock_shadow_time {
u32 tsc_to_nsec_mul;
int tsc_shift;
u32 version;
+   u8  flags;
 };
 
+static u8 valid_flags __read_mostly = 0;
+
+void pvclock_set_flags(u8 flags)
+{
+   valid_flags = flags;
+}
+
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
  * yielding a 64-bit result.
@@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct 
pvclock_shadow_time *dst,
dst-system_timestamp  = src-system_time;
dst-tsc_to_nsec_mul   = src-tsc_to_system_mul;
dst-tsc_shift = src-tsc_shift;
+   dst-flags = src-flags;
rmb();  /* test version after fetching data */
} while ((src-version  1) || (dst-version != src-version));
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] don't compute pvclock adjustments if we trust the tsc

From: Glauber Costa glom...@redhat.com

If the HV told us we can fully trust the TSC, skip any
correction

Signed-off-by: Glauber Costa glom...@redhat.com
Acked-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f019f8c..05eba5e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,11 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 
+/* The last 8 bits are used to indicate how to interpret the flags field
+ * in pvclock structure. If no bits are set, all flags are ignored.
+ */
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24
+
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index ec5c41a..35f2d19 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -39,5 +39,6 @@ struct pvclock_wall_clock {
u32   nsec;
 } __attribute__((__packed__));
 
+#define PVCLOCK_TSC_STABLE_BIT (1  0)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 59c740f..eb9b76c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -215,4 +215,7 @@ void __init kvmclock_init(void)
clocksource_register(kvm_clock);
pv_info.paravirt_enabled = 1;
pv_info.name = KVM;
+
+   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
+   pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 }
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index f5bc40e..239427c 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
barrier();
} while (version != src-version);
 
+   if ((valid_flags  PVCLOCK_TSC_STABLE_BIT) 
+   (shadow.flags  PVCLOCK_TSC_STABLE_BIT))
+   return ret;
+
/*
 * Assumption here is that last_value, a global accumulator, always goes
 * forward. If we are less than that, we should not be much smaller.
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: VMX: VMXON/VMXOFF usage changes

From: Dongxiao Xu dongxiao...@intel.com

SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.

Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e262e66..f20ff50 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -176,6 +176,8 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 
 static int init_rmode(struct kvm *kvm);
 static u64 construct_eptp(unsigned long root_hpa);
+static void kvm_cpu_vmxon(u64 addr);
+static void kvm_cpu_vmxoff(void);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -847,8 +849,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
+   u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
-   if (vmm_exclusive  vcpu-cpu != cpu)
+   if (!vmm_exclusive)
+   kvm_cpu_vmxon(phys_addr);
+   else if (vcpu-cpu != cpu)
vcpu_clear(vmx);
 
if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
@@ -894,8 +899,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
__vmx_load_host_state(to_vmx(vcpu));
-   if (!vmm_exclusive)
+   if (!vmm_exclusive) {
__vcpu_clear(to_vmx(vcpu));
+   kvm_cpu_vmxoff();
+   }
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
@@ -1327,9 +1334,11 @@ static int hardware_enable(void *garbage)
wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits);
}
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   kvm_cpu_vmxon(phys_addr);
 
-   ept_sync_global();
+   if (vmm_exclusive) {
+   kvm_cpu_vmxon(phys_addr);
+   ept_sync_global();
+   }
 
return 0;
 }
@@ -1355,8 +1364,10 @@ static void kvm_cpu_vmxoff(void)
 
 static void hardware_disable(void *garbage)
 {
-   vmclear_local_vcpus();
-   kvm_cpu_vmxoff();
+   if (vmm_exclusive) {
+   vmclear_local_vcpus();
+   kvm_cpu_vmxoff();
+   }
write_cr4(read_cr4()  ~X86_CR4_VMXE);
 }
 
@@ -3995,6 +4006,19 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
kmem_cache_free(kvm_vcpu_cache, vmx);
 }
 
+static inline void vmcs_init(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(per_cpu(vmxarea, raw_smp_processor_id()));
+
+   if (!vmm_exclusive)
+   kvm_cpu_vmxon(phys_addr);
+
+   vmcs_clear(vmcs);
+
+   if (!vmm_exclusive)
+   kvm_cpu_vmxoff();
+}
+
 static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 {
int err;
@@ -4020,7 +4044,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!vmx-vmcs)
goto free_msrs;
 
-   vmcs_clear(vmx-vmcs);
+   vmcs_init(vmx-vmcs);
 
cpu = get_cpu();
vmx_vcpu_load(vmx-vcpu, cpu);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Add a global synchronization point for pvclock

From: Glauber Costa glom...@redhat.com

In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa glom...@redhat.com
Acked-by: Zachary Amsden zams...@redhat.com
CC: Jeremy Fitzhardinge jer...@goop.org
CC: Avi Kivity a...@redhat.com
CC: Marcelo Tosatti mtosa...@redhat.com
CC: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index f7fdd56..f5bc40e 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
return pv_tsc_khz;
 }
 
+static atomic64_t last_value = ATOMIC64_INIT(0);
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
+   u64 last;
 
do {
version = pvclock_get_time_values(shadow, src);
@@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
barrier();
} while (version != src-version);
 
+   /*
+* Assumption here is that last_value, a global accumulator, always goes
+* forward. If we are less than that, we should not be much smaller.
+* We assume there is an error marging we're inside, and then the 
correction
+* does not sacrifice accuracy.
+*
+* For reads: global may have changed between test and return,
+* but this means someone else updated poked the clock at a later time.
+* We just need to make sure we are not seeing a backwards event.
+*
+* For updates: last_value = ret is not enough, since two vcpus could be
+* updating at the same time, and one of them could be slightly behind,
+* making the assumption that last_value always go forward fail to hold.
+*/
+   last = atomic64_read(last_value);
+   do {
+   if (ret  last)
+   return last;
+   last = atomic64_cmpxchg(last_value, last, ret);
+   } while (unlikely(last != ret));
+
return ret;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID

From: Glauber Costa glom...@redhat.com

Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Signed-off-by: Glauber Costa glom...@redhat.com
Acked-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 38363f8..9d6e7bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1983,6 +1983,23 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
}
break;
}
+   case KVM_CPUID_SIGNATURE: {
+   char signature[12] = KVMKVMKVM\0\0;
+   u32 *sigptr = (u32 *)signature;
+   entry-eax = 0;
+   entry-ebx = sigptr[0];
+   entry-ecx = sigptr[1];
+   entry-edx = sigptr[2];
+   break;
+   }
+   case KVM_CPUID_FEATURES:
+   entry-eax = (1  KVM_FEATURE_CLOCKSOURCE) |
+(1  KVM_FEATURE_NOP_IO_DELAY) |
+(1  KVM_FEATURE_CLOCKSOURCE2);
+   entry-ebx = 0;
+   entry-ecx = 0;
+   entry-edx = 0;
+   break;
case 0x8000:
entry-eax = min(entry-eax, 0x801a);
break;
@@ -2029,6 +2046,23 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct 
kvm_cpuid2 *cpuid,
for (func = 0x8001; func = limit  nent  cpuid-nent; ++func)
do_cpuid_ent(cpuid_entries[nent], func, 0,
 nent, cpuid-nent);
+
+   
+
+   r = -E2BIG;
+   if (nent = cpuid-nent)
+   goto out_free;
+
+   do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, nent,
+cpuid-nent);
+
+   r = -E2BIG;
+   if (nent = cpuid-nent)
+   goto out_free;
+
+   do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_FEATURES, 0, nent,
+cpuid-nent);
+
r = -E2BIG;
if (nent = cpuid-nent)
goto out_free;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: Add missing decoder flags for sub instruction

From: Mohammed Gamal m.gamal...@gmail.com

This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d)

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 35dd57c..1b974f8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -126,7 +126,7 @@ static u32 opcode_table[256] = {
/* 0x28 - 0x2F */
ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
-   0, 0, 0, 0,
+   ByteOp | DstAcc | SrcImmByte, DstAcc | SrcImm, 0, 0,
/* 0x30 - 0x37 */
ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Fix tip/x86/fpu merge

From: Marcelo Tosatti mtosa...@redhat.com

Code removed upstream sneaked in through the merge.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 83ccfdf..8bcc21f 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -47,7 +47,6 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src)
 void free_thread_xstate(struct task_struct *tsk)
 {
fpu_free(tsk-thread.fpu);
-   WARN(tsk-thread.ds_ctx, leaking DS context\n);
 }
 
 void free_thread_info(struct thread_info *ti)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Add cpuid.txt file

From: Glauber Costa glom...@redhat.com

This file documents cpuid bits used by KVM.

Signed-off-by: Glauber Costa glom...@redhat.com
Acked-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/cpuid.txt b/Documentation/kvm/cpuid.txt
new file mode 100644
index 000..8167ef1
--- /dev/null
+++ b/Documentation/kvm/cpuid.txt
@@ -0,0 +1,44 @@
+KVM CPUID bits
+Glauber Costa glom...@redhat.com, Red Hat Inc, 2010
+=
+
+A guest running on a kvm host, can check some of its features using
+cpuid. This is not always guaranteed to work, since userspace can
+mask-out some, or even all KVM-related cpuid features before launching
+a guest.
+
+KVM cpuid functions are:
+
+function: KVM_CPUID_SIGNATURE (0x4000)
+returns : eax = 0,
+  ebx = 0x4b4d564b,
+  ecx = 0x564b4d56,
+  edx = 0x4d.
+Note that this value in ebx, ecx and edx corresponds to the string KVMKVMKVM.
+This function queries the presence of KVM cpuid leafs.
+
+
+function: define KVM_CPUID_FEATURES (0x4001)
+returns : ebx, ecx, edx = 0
+  eax = and OR'ed group of (1  flag), where each flags is:
+
+
+flag   || value || meaning
+=
+KVM_FEATURE_CLOCKSOURCE|| 0 || kvmclock available at msrs
+   ||   || 0x11 and 0x12. 
+--
+KVM_FEATURE_NOP_IO_DELAY   || 1 || not necessary to perform delays
+   ||   || on PIO operations.
+--
+KVM_FEATURE_MMU_OP || 2 || deprecated.
+--
+KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock available at msrs
+   ||   || 0x4b564d00 and 0x4b564d01
+--
+KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
+   ||   || per-cpu warps are expected in
+   ||   || kvmclock.
+--
+
+
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: VMX: Only reset MMU when necessary

From: Sheng Yang sh...@linux.intel.com

Only modifying some bits of CR0/CR4 needs paging mode switch.

Modify EFER.NXE bit would result in reserved bit updates.

Signed-off-by: Sheng Yang sh...@linux.intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b8e707f..48b0866 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -416,6 +416,10 @@ out:
 
 static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
+   unsigned long old_cr0 = kvm_read_cr0(vcpu);
+   unsigned long update_bits = X86_CR0_PG | X86_CR0_WP |
+   X86_CR0_CD | X86_CR0_NW;
+
cr0 |= X86_CR0_ET;
 
 #ifdef CONFIG_X86_64
@@ -449,7 +453,8 @@ static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
 
kvm_x86_ops-set_cr0(vcpu, cr0);
 
-   kvm_mmu_reset_context(vcpu);
+   if ((cr0 ^ old_cr0)  update_bits)
+   kvm_mmu_reset_context(vcpu);
return 0;
 }
 
@@ -487,7 +492,8 @@ int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 
kvm_x86_ops-set_cr4(vcpu, cr4);
 
-   kvm_mmu_reset_context(vcpu);
+   if ((cr4 ^ old_cr4)  pdptr_bits)
+   kvm_mmu_reset_context(vcpu);
 
return 0;
 }
@@ -693,6 +699,8 @@ static u32 emulated_msrs[] = {
 
 static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
+   u64 old_efer = vcpu-arch.efer;
+
if (efer  efer_reserved_bits)
return 1;
 
@@ -724,6 +732,10 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
vcpu-arch.mmu.base_role.nxe = (efer  EFER_NX)  !tdp_enabled;
kvm_mmu_reset_context(vcpu);
 
+   /* Update reserved bits */
+   if ((efer ^ old_efer)  EFER_NX)
+   kvm_mmu_reset_context(vcpu);
+
return 0;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)

From: Mohammed Gamal m.gamal...@gmail.com

This adds test acc, imm instruction to the x86 emulator

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b43ac98..35dd57c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -181,7 +181,7 @@ static u32 opcode_table[256] = {
ByteOp | SrcSI | DstDI | Mov | String, SrcSI | DstDI | Mov | String,
ByteOp | SrcSI | DstDI | String, SrcSI | DstDI | String,
/* 0xA8 - 0xAF */
-   0, 0, ByteOp | DstDI | Mov | String, DstDI | Mov | String,
+   DstAcc | SrcImmByte | ByteOp, DstAcc | SrcImm, ByteOp | DstDI | Mov | 
String, DstDI | Mov | String,
ByteOp | SrcSI | DstAcc | Mov | String, SrcSI | DstAcc | Mov | String,
ByteOp | DstDI | String, DstDI | String,
/* 0xB0 - 0xB7 */
@@ -2754,6 +2754,7 @@ special_insn:
}
break;
case 0x84 ... 0x85:
+   test:
emulate_2op_SrcV(test, c-src, c-dst, ctxt-eflags);
break;
case 0x86 ... 0x87: /* xchg */
@@ -2852,6 +2853,8 @@ special_insn:
c-dst.type = OP_NONE; /* Disable writeback. */
DPRINTF(cmps: mem1=0x%p mem2=0x%p\n, c-src.ptr, c-dst.ptr);
goto cmp;
+   case 0xa8 ... 0xa9: /* test ax, imm */
+   goto test;
case 0xaa ... 0xab: /* stos */
c-dst.val = c-regs[VCPU_REGS_RAX];
break;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Don't allow lmsw to clear cr0.pe

From: Avi Kivity a...@redhat.com

The current lmsw implementation allows the guest to clear cr0.pe, contrary
to the manual, which breaks EMM386.EXE.

Fix by ORing the old cr0.pe with lmsw's operand.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4b1433f..37164e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -462,7 +462,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0);
 
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 {
-   kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0ful) | (msw  0x0f));
+   kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw  0x0f));
 }
 EXPORT_SYMBOL_GPL(kvm_lmsw);
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: MMU: use proper cache object freeing function

From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

Use kmem_cache_free to free objects allocated by kmem_cache_alloc.

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9db33f1..70566d2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -304,10 +304,11 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *cache,
return 0;
 }
 
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
+ struct kmem_cache *cache)
 {
while (mc-nobjs)
-   kfree(mc-objects[--mc-nobjs]);
+   kmem_cache_free(cache, mc-objects[--mc-nobjs]);
 }
 
 static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
@@ -355,10 +356,11 @@ out:
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(vcpu-arch.mmu_pte_chain_cache);
-   mmu_free_memory_cache(vcpu-arch.mmu_rmap_desc_cache);
+   mmu_free_memory_cache(vcpu-arch.mmu_pte_chain_cache, pte_chain_cache);
+   mmu_free_memory_cache(vcpu-arch.mmu_rmap_desc_cache, rmap_desc_cache);
mmu_free_memory_cache_page(vcpu-arch.mmu_page_cache);
-   mmu_free_memory_cache(vcpu-arch.mmu_page_header_cache);
+   mmu_free_memory_cache(vcpu-arch.mmu_page_header_cache,
+   mmu_page_header_cache);
 }
 
 static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
@@ -379,7 +381,7 @@ static struct kvm_pte_chain *mmu_alloc_pte_chain(struct 
kvm_vcpu *vcpu)
 
 static void mmu_free_pte_chain(struct kvm_pte_chain *pc)
 {
-   kfree(pc);
+   kmem_cache_free(pte_chain_cache, pc);
 }
 
 static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu)
@@ -390,7 +392,7 @@ static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct 
kvm_vcpu *vcpu)
 
 static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
 {
-   kfree(rd);
+   kmem_cache_free(rmap_desc_cache, rd);
 }
 
 /*
@@ -897,7 +899,7 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
list_del(sp-link);
__free_page(virt_to_page(sp-spt));
__free_page(virt_to_page(sp-gfns));
-   kfree(sp);
+   kmem_cache_free(mmu_page_header_cache, sp);
++kvm-arch.n_free_mmu_pages;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: MMU: Segregate shadow pages with different cr0.wp

From: Avi Kivity a...@redhat.com

When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte
having u/s=0 and r/w=1.  This allows excessive access if the guest sets
cr0.wp=1 and accesses through this spte.

Fix by making cr0.wp part of the base role; we'll have different sptes for
the two cases and the problem disappears.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index fde7989..0e872ae 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -163,6 +163,8 @@ Shadow pages contain the following information:
 32-bit or 64-bit gptes are in use).
   role.nxe:
 Contains the value of efer.nxe for which the page is valid.
+  role.cr0_wp:
+Contains the value of cr0.wp for which the page is valid.
   gfn:
 Either the guest page table containing the translations shadowed by this
 page, or the base page frame for linear translations.  See role.direct.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5aa0944..0c06148 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -179,6 +179,7 @@ union kvm_mmu_page_role {
unsigned access:3;
unsigned invalid:1;
unsigned nxe:1;
+   unsigned cr0_wp:1;
};
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f0ab86a..9db33f1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -217,7 +217,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
-static int is_write_protection(struct kvm_vcpu *vcpu)
+static bool is_write_protection(struct kvm_vcpu *vcpu)
 {
return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
 }
@@ -2435,6 +2435,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
r = paging32_init_context(vcpu);
 
vcpu-arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
+   vcpu-arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
 
return r;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: MMU: unalias gfn before sp-gfns[] comparison in sync_page

From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

sp-gfns[] contain unaliased gfns, but gpte might contain pointer
to aliased region.

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 11d8a16..71c73fe 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -588,7 +588,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
unsigned pte_access;
pt_element_t gpte;
gpa_t pte_gpa;
-   gfn_t gfn = sp-gfns[i];
+   gfn_t gfn;
 
if (!is_shadow_present_pte(sp-spt[i]))
continue;
@@ -599,8 +599,9 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
  sizeof(pt_element_t)))
return -EINVAL;
 
-   if (gpte_to_gfn(gpte) != gfn || !is_present_gpte(gpte) ||
-   !(gpte  PT_ACCESSED_MASK)) {
+   gfn = gpte_to_gfn(gpte);
+   if (unalias_gfn(vcpu-kvm, gfn) != sp-gfns[i] ||
+ !is_present_gpte(gpte) || !(gpte  PT_ACCESSED_MASK)) {
u64 nonpresent;
 
rmap_remove(vcpu-kvm, sp-spt[i]);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: Clean up duplicate assignment

From: Sheng Yang sh...@linux.intel.com

mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the
destory_kvm_mmu().

kvm_x86_ops-set_cr4() and set_efer() already assign cr4/efer to
vcpu-arch.cr4/efer, no need to do it again later.

Signed-off-by: Sheng Yang sh...@linux.intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fd2c8f4..f0ab86a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2452,10 +2452,9 @@ static int init_kvm_mmu(struct kvm_vcpu *vcpu)
 static void destroy_kvm_mmu(struct kvm_vcpu *vcpu)
 {
ASSERT(vcpu);
-   if (VALID_PAGE(vcpu-arch.mmu.root_hpa)) {
+   if (VALID_PAGE(vcpu-arch.mmu.root_hpa))
+   /* mmu.free() should set root_hpa = INVALID_PAGE */
vcpu-arch.mmu.free(vcpu);
-   vcpu-arch.mmu.root_hpa = INVALID_PAGE;
-   }
 }
 
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 95dc5f3..b8e707f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -486,7 +486,7 @@ int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
return 1;
 
kvm_x86_ops-set_cr4(vcpu, cr4);
-   vcpu-arch.cr4 = cr4;
+
kvm_mmu_reset_context(vcpu);
 
return 0;
@@ -721,8 +721,6 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
kvm_x86_ops-set_efer(vcpu, efer);
 
-   vcpu-arch.efer = efer;
-
vcpu-arch.mmu.base_role.nxe = (efer  EFER_NX)  !tdp_enabled;
kvm_mmu_reset_context(vcpu);
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls

From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 11f226f..b998abf 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1110,6 +1110,8 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
int i;
 
+   vcpu_load(vcpu);
+
sregs-pvr = vcpu-arch.pvr;
 
sregs-u.s.sdr1 = to_book3s(vcpu)-sdr1;
@@ -1128,6 +1130,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
sregs-u.s.ppc32.dbat[i] = vcpu3s-dbat[i].raw;
}
}
+
+   vcpu_put(vcpu);
+
return 0;
 }
 
@@ -1137,6 +1142,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
int i;
 
+   vcpu_load(vcpu);
+
kvmppc_set_pvr(vcpu, sregs-pvr);
 
vcpu3s-sdr1 = sregs-u.s.sdr1;
@@ -1163,6 +1170,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
/* Flush the MMU after messing with the segments */
kvmppc_mmu_pte_flush(vcpu, 0, 0);
+
+   vcpu_put(vcpu);
+
return 0;
 }
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c922240..a33ab8c 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -485,6 +485,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 {
int i;
 
+   vcpu_load(vcpu);
+
regs-pc = vcpu-arch.pc;
regs-cr = kvmppc_get_cr(vcpu);
regs-ctr = vcpu-arch.ctr;
@@ -505,6 +507,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
for (i = 0; i  ARRAY_SIZE(regs-gpr); i++)
regs-gpr[i] = kvmppc_get_gpr(vcpu, i);
 
+   vcpu_put(vcpu);
+
return 0;
 }
 
@@ -512,6 +516,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 {
int i;
 
+   vcpu_load(vcpu);
+
vcpu-arch.pc = regs-pc;
kvmppc_set_cr(vcpu, regs-cr);
vcpu-arch.ctr = regs-ctr;
@@ -531,6 +537,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
for (i = 0; i  ARRAY_SIZE(regs-gpr); i++)
kvmppc_set_gpr(vcpu, i, regs-gpr[i]);
 
+   vcpu_put(vcpu);
+
return 0;
 }
 
@@ -559,7 +567,12 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, 
struct kvm_fpu *fpu)
 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
   struct kvm_translation *tr)
 {
-   return kvmppc_core_vcpu_translate(vcpu, tr);
+   int r;
+
+   vcpu_load(vcpu);
+   r = kvmppc_core_vcpu_translate(vcpu, tr);
+   vcpu_put(vcpu);
+   return r;
 }
 
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: s390: Centrally lock arch specific vcpu ioctls

From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e80f55e..28cd8fd 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -363,9 +363,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 
 static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
 {
-   vcpu_load(vcpu);
kvm_s390_vcpu_initial_reset(vcpu);
-   vcpu_put(vcpu);
return 0;
 }
 
@@ -415,14 +413,12 @@ static int kvm_arch_vcpu_ioctl_set_initial_psw(struct 
kvm_vcpu *vcpu, psw_t psw)
 {
int rc = 0;
 
-   vcpu_load(vcpu);
if (atomic_read(vcpu-arch.sie_block-cpuflags)  CPUSTAT_RUNNING)
rc = -EBUSY;
else {
vcpu-run-psw_mask = psw.mask;
vcpu-run-psw_addr = psw.addr;
}
-   vcpu_put(vcpu);
return rc;
 }
 
@@ -573,7 +569,7 @@ static int __guestcopy(struct kvm_vcpu *vcpu, u64 
guestdest, const void *from,
  * KVM_S390_STORE_STATUS_NOADDR: - 0x1200 on 64 bit
  * KVM_S390_STORE_STATUS_PREFIXED: - prefix
  */
-int __kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr)
+static int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long 
addr)
 {
const unsigned char archmode = 1;
int prefix;
@@ -635,45 +631,43 @@ int __kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, 
unsigned long addr)
return 0;
 }
 
-static int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long 
addr)
-{
-   int rc;
-
-   vcpu_load(vcpu);
-   rc = __kvm_s390_vcpu_store_status(vcpu, addr);
-   vcpu_put(vcpu);
-   return rc;
-}
-
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
struct kvm_vcpu *vcpu = filp-private_data;
void __user *argp = (void __user *)arg;
+   long r;
 
-   switch (ioctl) {
-   case KVM_S390_INTERRUPT: {
+   if (ioctl == KVM_S390_INTERRUPT) {
struct kvm_s390_interrupt s390int;
 
if (copy_from_user(s390int, argp, sizeof(s390int)))
return -EFAULT;
return kvm_s390_inject_vcpu(vcpu, s390int);
}
+
+   vcpu_load(vcpu);
+   switch (ioctl) {
case KVM_S390_STORE_STATUS:
-   return kvm_s390_vcpu_store_status(vcpu, arg);
+   r = kvm_s390_vcpu_store_status(vcpu, arg);
+   break;
case KVM_S390_SET_INITIAL_PSW: {
psw_t psw;
 
+   r = -EFAULT;
if (copy_from_user(psw, argp, sizeof(psw)))
-   return -EFAULT;
-   return kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw);
+   break;
+   r = kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw);
+   break;
}
case KVM_S390_INITIAL_RESET:
-   return kvm_arch_vcpu_ioctl_initial_reset(vcpu);
+   r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
+   break;
default:
-   ;
+   r = -EINVAL;
}
-   return -EINVAL;
+   vcpu_put(vcpu);
+   return r;
 }
 
 /* Section: memory related */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: Add missing locking to arch specific vcpu ioctls

From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bfe0730..7167109 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1854,6 +1854,7 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu 
*vcpu,
 {
int r;
 
+   vcpu_load(vcpu);
r = -E2BIG;
if (cpuid-nent  vcpu-arch.cpuid_nent)
goto out;
@@ -1865,6 +1866,7 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu 
*vcpu,
 
 out:
cpuid-nent = vcpu-arch.cpuid_nent;
+   vcpu_put(vcpu);
return r;
 }
 
@@ -2155,6 +2157,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu 
*vcpu,
int r;
unsigned bank_num = mcg_cap  0xff, bank;
 
+   vcpu_load(vcpu);
r = -EINVAL;
if (!bank_num || bank_num = KVM_MAX_MCE_BANKS)
goto out;
@@ -2169,6 +2172,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu 
*vcpu,
for (bank = 0; bank  bank_num; bank++)
vcpu-arch.mce_banks[bank*4] = ~(u64)0;
 out:
+   vcpu_put(vcpu);
return r;
 }
 
@@ -2477,7 +2481,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(mce, argp, sizeof mce))
goto out;
+   vcpu_load(vcpu);
r = kvm_vcpu_ioctl_x86_set_mce(vcpu, mce);
+   vcpu_put(vcpu);
break;
}
case KVM_GET_VCPU_EVENTS: {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: pass correct parameter to kvm_mmu_free_some_pages

From: Marcelo Tosatti mtosa...@redhat.com

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 604eb3f..fd2c8f4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2067,7 +2067,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
root_gfn = 0;
}
spin_lock(vcpu-kvm-mmu_lock);
-   kvm_mmu_free_some_pages(vcpu-kvm);
+   kvm_mmu_free_some_pages(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
  PT64_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
@@ -2098,7 +2098,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
root_gfn = i  30;
}
spin_lock(vcpu-kvm-mmu_lock);
-   kvm_mmu_free_some_pages(vcpu-kvm);
+   kvm_mmu_free_some_pages(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
  PT32_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: Check LMA bit before set_efer

From: Sheng Yang sh...@linux.intel.com

kvm_x86_ops-set_efer() would execute vcpu-arch.efer = efer, so the
checking of LMA bit didn't work.

Signed-off-by: Sheng Yang sh...@linux.intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 37164e7..95dc5f3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -716,11 +716,11 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
return 1;
}
 
-   kvm_x86_ops-set_efer(vcpu, efer);
-
efer = ~EFER_LMA;
efer |= vcpu-arch.efer  EFER_LMA;
 
+   kvm_x86_ops-set_efer(vcpu, efer);
+
vcpu-arch.efer = efer;
 
vcpu-arch.mmu.base_role.nxe = (efer  EFER_NX)  !tdp_enabled;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: Lock arch specific vcpu ioctls centrally

From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 75a6e8a..ce4e943 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1541,16 +1541,12 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct 
kvm_msrs *msrs,
 {
int i, idx;
 
-   vcpu_load(vcpu);
-
idx = srcu_read_lock(vcpu-kvm-srcu);
for (i = 0; i  msrs-nmsrs; ++i)
if (do_msr(vcpu, entries[i].index, entries[i].data))
break;
srcu_read_unlock(vcpu-kvm-srcu, idx);
 
-   vcpu_put(vcpu);
-
return i;
 }
 
@@ -1798,7 +1794,6 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
if (copy_from_user(cpuid_entries, entries,
   cpuid-nent * sizeof(struct kvm_cpuid_entry)))
goto out_free;
-   vcpu_load(vcpu);
for (i = 0; i  cpuid-nent; i++) {
vcpu-arch.cpuid_entries[i].function = 
cpuid_entries[i].function;
vcpu-arch.cpuid_entries[i].eax = cpuid_entries[i].eax;
@@ -1816,7 +1811,6 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
r = 0;
kvm_apic_set_version(vcpu);
kvm_x86_ops-cpuid_update(vcpu);
-   vcpu_put(vcpu);
 
 out_free:
vfree(cpuid_entries);
@@ -1837,11 +1831,9 @@ static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu 
*vcpu,
if (copy_from_user(vcpu-arch.cpuid_entries, entries,
   cpuid-nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
-   vcpu_load(vcpu);
vcpu-arch.cpuid_nent = cpuid-nent;
kvm_apic_set_version(vcpu);
kvm_x86_ops-cpuid_update(vcpu);
-   vcpu_put(vcpu);
return 0;
 
 out:
@@ -1854,7 +1846,6 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu 
*vcpu,
 {
int r;
 
-   vcpu_load(vcpu);
r = -E2BIG;
if (cpuid-nent  vcpu-arch.cpuid_nent)
goto out;
@@ -1866,7 +1857,6 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu 
*vcpu,
 
 out:
cpuid-nent = vcpu-arch.cpuid_nent;
-   vcpu_put(vcpu);
return r;
 }
 
@@ -2098,9 +2088,7 @@ out:
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
 {
-   vcpu_load(vcpu);
memcpy(s-regs, vcpu-arch.apic-regs, sizeof *s);
-   vcpu_put(vcpu);
 
return 0;
 }
@@ -2108,11 +2096,9 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu 
*vcpu,
 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
 {
-   vcpu_load(vcpu);
memcpy(vcpu-arch.apic-regs, s-regs, sizeof *s);
kvm_apic_post_state_restore(vcpu);
update_cr8_intercept(vcpu);
-   vcpu_put(vcpu);
 
return 0;
 }
@@ -2124,20 +2110,15 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu 
*vcpu,
return -EINVAL;
if (irqchip_in_kernel(vcpu-kvm))
return -ENXIO;
-   vcpu_load(vcpu);
 
kvm_queue_interrupt(vcpu, irq-irq, false);
 
-   vcpu_put(vcpu);
-
return 0;
 }
 
 static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
 {
-   vcpu_load(vcpu);
kvm_inject_nmi(vcpu);
-   vcpu_put(vcpu);
 
return 0;
 }
@@ -2157,7 +2138,6 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu 
*vcpu,
int r;
unsigned bank_num = mcg_cap  0xff, bank;
 
-   vcpu_load(vcpu);
r = -EINVAL;
if (!bank_num || bank_num = KVM_MAX_MCE_BANKS)
goto out;
@@ -2172,7 +2152,6 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu 
*vcpu,
for (bank = 0; bank  bank_num; bank++)
vcpu-arch.mce_banks[bank*4] = ~(u64)0;
 out:
-   vcpu_put(vcpu);
return r;
 }
 
@@ -2230,8 +2209,6 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu 
*vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
   struct kvm_vcpu_events *events)
 {
-   vcpu_load(vcpu);
-
events-exception.injected =
vcpu-arch.exception.pending 
!kvm_exception_is_soft(vcpu-arch.exception.nr);
@@ -2256,8 +2233,6 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
kvm_vcpu *vcpu,
events-flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
 | KVM_VCPUEVENT_VALID_SIPI_VECTOR
 | KVM_VCPUEVENT_VALID_SHADOW);
-
-   vcpu_put(vcpu);
 }
 
 static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
@@ -2268,8 +2243,6 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
kvm_vcpu *vcpu,
  | KVM_VCPUEVENT_VALID_SHADOW))
return -EINVAL;
 
-   vcpu_load(vcpu);
-
vcpu-arch.exception.pending = events-exception.injected;
vcpu-arch.exception.nr = events-exception.nr;

[COMMIT master] KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq

From: Alex Williamson alex.william...@redhat.com

Remove this check in an effort to allow kvm guests to run without
root privileges.  This capability check doesn't seem to add any
security since the device needs to have already been added via the
assign device ioctl and the io actually occurs through the pci
sysfs interface.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 4d10b1e..64672e2 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -448,9 +448,6 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
struct kvm_assigned_dev_kernel *match;
unsigned long host_irq_type, guest_irq_type;
 
-   if (!capable(CAP_SYS_RAWIO))
-   return -EPERM;
-
if (!irqchip_in_kernel(kvm))
return r;
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: x86: cleanup unused local variable

From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

fix:
 arch/x86/kvm/x86.c: In function ‘handle_emulation_failure’:
 arch/x86/kvm/x86.c:3844: warning: unused variable ‘ctxt’

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48b0866..bfe0730 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3851,8 +3851,6 @@ static void inject_emulated_exception(struct kvm_vcpu 
*vcpu)
 
 static int handle_emulation_failure(struct kvm_vcpu *vcpu)
 {
-   struct x86_emulate_ctxt *ctxt = vcpu-arch.emulate_ctxt;
-
++vcpu-stat.insn_emulation_fail;
trace_kvm_emulate_insn_failed(vcpu);
vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: Consolidate arch specific vcpu ioctl locking

From: Avi Kivity a...@redhat.com

Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index caeed7b..a1d8750 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -512,17 +512,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
void __user *argp = (void __user *)arg;
long r;
 
-   if (ioctl == KVM_INTERRUPT) {
+   switch (ioctl) {
+   case KVM_INTERRUPT: {
struct kvm_interrupt irq;
r = -EFAULT;
if (copy_from_user(irq, argp, sizeof(irq)))
-   goto out_nolock;
+   goto out;
r = kvm_vcpu_ioctl_interrupt(vcpu, irq);
-   goto out_nolock;
+   goto out;
}
 
-   vcpu_load(vcpu);
-   switch (ioctl) {
case KVM_ENABLE_CAP:
{
struct kvm_enable_cap cap;
@@ -537,8 +536,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
}
 
 out:
-   vcpu_put(vcpu);
-out_nolock:
return r;
 }
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 28cd8fd..fad1024 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -638,16 +638,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
void __user *argp = (void __user *)arg;
long r;
 
-   if (ioctl == KVM_S390_INTERRUPT) {
+   switch (ioctl) {
+   case KVM_S390_INTERRUPT: {
struct kvm_s390_interrupt s390int;
 
+   r = -EFAULT;
if (copy_from_user(s390int, argp, sizeof(s390int)))
-   return -EFAULT;
-   return kvm_s390_inject_vcpu(vcpu, s390int);
+   break;
+   r = kvm_s390_inject_vcpu(vcpu, s390int);
+   break;
}
-
-   vcpu_load(vcpu);
-   switch (ioctl) {
case KVM_S390_STORE_STATUS:
r = kvm_s390_vcpu_store_status(vcpu, arg);
break;
@@ -666,7 +666,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
default:
r = -EINVAL;
}
-   vcpu_put(vcpu);
return r;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ce4e943..7500cba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2298,7 +2298,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
int r;
struct kvm_lapic_state *lapic = NULL;
 
-   vcpu_load(vcpu);
switch (ioctl) {
case KVM_GET_LAPIC: {
r = -EINVAL;
@@ -2496,7 +2495,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EINVAL;
}
 out:
-   vcpu_put(vcpu);
kfree(lapic);
return r;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 08b2ccd..5ee558c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1564,9 +1564,7 @@ out_free2:
break;
}
default:
-   vcpu_put(vcpu);
r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
-   vcpu_load(vcpu);
}
 out:
vcpu_put(vcpu);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking


On 05/16/2010 04:00 AM, Alexander Graf wrote:

On 15.05.2010, at 19:30, Avi Kivity wrote:

   

On 05/15/2010 11:26 AM, Alexander Graf wrote:
 
   

That means you never inject an interrupt from the iothread (or from a different 
vcpu thread)?

If that's the case we might make it part of the API and require the ioctl to be 
issued from the vcpu thread.  We'd still be left with the s390 exception.

 

Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the user 
base is small enough to ignore them.

Either way, I'm actually rather unhappy with the way interrupts work right now. 
We're only injecting interrupts when in the main loop, which is rare if we did 
our homework right. So what I'd really like to see is that the MPIC on ppc 
directly invokes KVM_INTERRUPT to pull (and losen) the interrupt line. That way 
we can't just accidently miss interrupts.

   

on x86 we signal the vcpu thread to pull it out of the main loop and poll the 
apic.
 

Hrm, makes sense. Though it's additional overhead of a task switch. Why take 
the burden if you don't have to?
   


That's what the world looked like in 2006.

We could change it, but there's not much point, since having the local 
apic in the kernel is pretty much a requirement for reasonable performance.



That's exactly why x86 has run-request_interrupt_window, 
run-ready_for_interrupt_injection, and run-if_flag.
 

So how does x86 userspace get notified when it has an interrupt pending but 
couldn't inject it? Without a notification, we delay interrupts by quite some 
time.
   


run-ready_for_interrupt_injection and run-request_irq_window.


-   we need to exit to userspace to realize that the interrupt is still active

This is fundamentally broken. What I'd like to see is:

* device in userspace wants to trigger an interrupt
* mpic pulls external interrupt line, automatically issues ioctl(KVM_INTERRUPT)
* we enter the guest
* we inject the external interrupt
* guest gets the interrupt, sees it's lazily disabled, sets msr.ee=0
* guest moves on, sets msr.ee=1 again later
* we inject the external interrupt again, since it's still active
* guest acknowledges the interrupt with the mpic, issues ioctl(KVM_INTERRUPT, 
disable)
-   all is great

   

This is similar to KVM_IRQ_LINE.
 

Well, KVM_IRQ_LINE would be the in-kernel pic, no? That'd mean I'd have to 
reimplement the mpic in kernel space - which might eventually be a good idea 
when going for SMP, but I'd first like to see if I can keep the current 
interrupt injection path efficient.
   


Sure.


For that to work we need to enable triggering KVM_INTERRUPT from a non-vcpu 
thread.

   

KVM_IRQ_LINE is a vm ioctl instead of a vcpu ioctl.

The motivation for the strict 'issue vcpu ioctls from vcpu thread' rule is to 
prepare the way for a syscall (instead of ioctl) API.  Then we lost the fd 
argument, and choosing the vcpu is done by associating it with the current task.  
That allows us to get rid of vcpu-mutex and fget_light() (as well as the ioctl 
dispatch).
 

If we define the API to only work on the current vcpu with a few excetions 
where we're safe (KVM_INTERRUPT), that'd get rid of the mutex too, no?
   


Yes.  Need to document it though.


What does fget_light do?
   


Make sure that the vcpu fd doesn't disappear.


And does the ioctl dispatch cost that much? I like the flexibility of it to be 
honest.
   


Not much, which is why there's no movement in that direction.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [SeaBIOS] [PATCHv2] Support for booting from virtio disks

2010-05-16 Thread Gleb Natapov

On Thu, May 13, 2010 at 07:49:40PM +0300, Avi Kivity wrote:
 On 05/10/2010 06:58 PM, Anthony Liguori wrote:
 Isn't this problem unrelated to this patch?  I mean if I start qemu with
 two ide devices can I specify from qemu command line which one I want to
 boot from?
 
 That's sort of what I'm asking.  If you compare this approach to
 extboot, extboot provided a capability to select a disk.  I think
 it can be argued though that this isn't a necessary feature to
 carry over and I'm looking for additional opinions on that.
 
 I'd say it's a necessary feature, but not one to carry over from the
 extboot implementation.  We have the seabios boot menu (how to reach
 it?), we need to store the nvram persistently,  and we need to
 extend the selection menu to qemu, but that's unrelated to this
 patch.
 
To reach seabios boot menu run qemu with -boot menu=on option and press
f12 when prompted.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking


On 16.05.2010, at 10:23, Avi Kivity wrote:

 On 05/16/2010 04:00 AM, Alexander Graf wrote:
 On 15.05.2010, at 19:30, Avi Kivity wrote:
 
   
 On 05/15/2010 11:26 AM, Alexander Graf wrote:
 
   
 That means you never inject an interrupt from the iothread (or from a 
 different vcpu thread)?
 
 If that's the case we might make it part of the API and require the ioctl 
 to be issued from the vcpu thread.  We'd still be left with the s390 
 exception.
 
 
 Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the 
 user base is small enough to ignore them.
 
 Either way, I'm actually rather unhappy with the way interrupts work right 
 now. We're only injecting interrupts when in the main loop, which is rare 
 if we did our homework right. So what I'd really like to see is that the 
 MPIC on ppc directly invokes KVM_INTERRUPT to pull (and losen) the 
 interrupt line. That way we can't just accidently miss interrupts.
 
   
 on x86 we signal the vcpu thread to pull it out of the main loop and poll 
 the apic.
 
 Hrm, makes sense. Though it's additional overhead of a task switch. Why take 
 the burden if you don't have to?
   
 
 That's what the world looked like in 2006.
 
 We could change it, but there's not much point, since having the local apic 
 in the kernel is pretty much a requirement for reasonable performance.

Well, I'm not convinced yet that's the case for PPC as well. The timer is 
in-cpu anyways and I don't see why IPIs should be slow with a userspace pic - 
if we keep the overhead low.

So let me think this through. With remote interrupt injection we have.

* thread 1 does vcpu_run
* thread 2 triggers KVM_INTERRUPT on fd
* thread 2 signals thread 1 so we're sure the interrupt gets injected
* thread 1 exits into qemu
* thread 1 goes back into the vcpu, triggering an interrupt

Without we have:

* thread 1 does vcpu_run
* thread 2 wants to trigger an an interrupt, sets the qemu internal bit
* thread 2 signals thread 1 so we're sure the interrupt gets processed
* thread 1 exits into qemu
* thread 1 triggers KVM_INTERRUPT on fd
* thread 1 goes into the vcpu

So we don't really buy anything from doing the remote injection. Hrm.

What's somewhat striking me here though is - why do we need KVM_INTERRUPT when 
there's all those kvm_run fields? Can't we just do interrupt injection by 
setting run-trigger_interrupt? There's only a single interrupt line on the 
CPU anyways. That way we'd save the ioctl and get rid of the locking problem 
altogether.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking

On 05/16/2010 12:01 PM, Alexander Graf wrote:

That's what the world looked like in 2006.

We could change it, but there's not much point, since having the local apic in
the kernel is pretty much a requirement for reasonable performance.

Well, I'm not convinced yet that's the case for PPC as well. The timer is
in-cpu anyways and I don't see why IPIs should be slow with a userspace pic -
if we keep the overhead low.

If it's at all possible keep the mpic out. I am _not_ advocating
pushing ppc's mpic into the kernel.

So let me think this through. With remote interrupt injection we have.

* thread 1 does vcpu_run
* thread 2 triggers KVM_INTERRUPT on fd
* thread 2 signals thread 1 so we're sure the interrupt gets injected
* thread 1 exits into qemu

This doesn't seem necessary. The kernel can own the interrupt line, so
it remembers it from the last KVM_INTERRUPT.

* thread 1 goes back into the vcpu, triggering an interrupt

Without we have:

* thread 1 does vcpu_run
* thread 2 wants to trigger an an interrupt, sets the qemu internal bit
* thread 2 signals thread 1 so we're sure the interrupt gets processed
* thread 1 exits into qemu
* thread 1 triggers KVM_INTERRUPT on fd
* thread 1 goes into the vcpu

So we don't really buy anything from doing the remote injection. Hrm.

Not if you make interrupt injection a lightweight exit.

What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all
those kvm_run fields? Can't we just do interrupt injection by setting
run-trigger_interrupt? There's only a single interrupt line on the CPU
anyways. That way we'd save the ioctl and get rid of the locking problem altogether.

That's what x86 does. However, it's synchronous.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking


On 16.05.2010, at 11:09, Avi Kivity wrote:

 On 05/16/2010 12:01 PM, Alexander Graf wrote:
 
 That's what the world looked like in 2006.
 
 We could change it, but there's not much point, since having the local apic 
 in the kernel is pretty much a requirement for reasonable performance.
 
 Well, I'm not convinced yet that's the case for PPC as well. The timer is 
 in-cpu anyways and I don't see why IPIs should be slow with a userspace pic 
 - if we keep the overhead low.
   
 
 If it's at all possible keep the mpic out.  I am _not_ advocating pushing 
 ppc's mpic into the kernel.
 
 So let me think this through. With remote interrupt injection we have.
 
 * thread 1 does vcpu_run
 * thread 2 triggers KVM_INTERRUPT on fd
 * thread 2 signals thread 1 so we're sure the interrupt gets injected
 * thread 1 exits into qemu
   
 
 This doesn't seem necessary.  The kernel can own the interrupt line, so it 
 remembers it from the last KVM_INTERRUPT.

It's not? On signals we always exit to userspace, no?

 
 * thread 1 goes back into the vcpu, triggering an interrupt
 
 Without we have:
 
 * thread 1 does vcpu_run
 * thread 2 wants to trigger an an interrupt, sets the qemu internal bit
 * thread 2 signals thread 1 so we're sure the interrupt gets processed
 * thread 1 exits into qemu
 * thread 1 triggers KVM_INTERRUPT on fd
 * thread 1 goes into the vcpu
 
 So we don't really buy anything from doing the remote injection. Hrm.
   
 
 Not if you make interrupt injection a lightweight exit.

Please elaborate.

 
 What's somewhat striking me here though is - why do we need KVM_INTERRUPT 
 when there's all those kvm_run fields? Can't we just do interrupt injection 
 by setting run-trigger_interrupt? There's only a single interrupt line on 
 the CPU anyways. That way we'd save the ioctl and get rid of the locking 
 problem altogether.
   
 
 That's what x86 does.  However, it's synchronous.

For everyone except for the vcpu thread executing the interrupt, it's 
asynchronous, right? The same applies to an in-kernel pic.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking

On 05/16/2010 12:35 PM, Alexander Graf wrote:

So let me think this through. With remote interrupt injection we have.

* thread 1 does vcpu_run
* thread 2 triggers KVM_INTERRUPT on fd
* thread 2 signals thread 1 so we're sure the interrupt gets injected
* thread 1 exits into qemu

This doesn't seem necessary. The kernel can own the interrupt line, so it
remembers it from the last KVM_INTERRUPT.

It's not?

With s/signals/IPIs/.

On signals we always exit to userspace, no?

Yes (if the signal isn't blocked).

* thread 1 goes back into the vcpu, triggering an interrupt

Without we have:

So we don't really buy anything from doing the remote injection. Hrm.

Not if you make interrupt injection a lightweight exit.

Please elaborate.

1: vcpu_run
2: KVM_INTERRUPT
2k: sets flag, if msr.ee IPIs 1 or wakes up 1 if halted
1k: notices flag, if msr.ee injects interrupt
...
1g: acks
1k: forwards ack to userspace
1: completes interrupt

That's what x86 does. However, it's synchronous.

For everyone except for the vcpu thread executing the interrupt, it's
asynchronous, right?

For everyone other than the vcpu thread, it's off limits. kvm_run is
only read on KVM_RUN entries and written on KVM_RUN exits.

The same applies to an in-kernel pic.

The in-kernel pic doesn't use kvm_run.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking


On 16.05.2010, at 11:47, Avi Kivity wrote:

 1: vcpu_run
 2: KVM_INTERRUPT
 2k: sets flag, if msr.ee IPIs 1 or wakes up 1 if halted

Doesn't that break when we have a while(1) loop in the guest with msr.ee=0 
while no timer is scheduled on the host? But then again with msr.ee=0 we don't 
get interrupts in the guest and to set msr.ee=1 we trap. Yeah, that would work.

 1k: notices flag, if msr.ee injects interrupt
 ...
 1g: acks

The ack is done in userspace by the mpic, so we can just complete the interrupt 
there.

 1k: forwards ack to userspace
 1: completes interrupt


So if I just have a field kvm_run-external_active I could set that to =1 on 
KVM_INTERRUPT including the above logic. To acknowledge it userspace would then 
do something like this in kvm_arch_pre_run:

if (kvm_run-external_active 
!((env-interrupt_request  CPU_INTERRUPT_HARD) 
  (env-irq_input_state  (1PPC_INPUT_INT
{
kvm_run-external_active = 0;
}

The big question is how to make such a change backwards compatible. But I guess 
I could just reuse the feature enabling framework. Well, sounds like we're 
getting closer.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/7] KVM: move vcpu locking to dispatcher for generic vcpu ioctls


On 05/15/2010 03:03 AM, Marcelo Tosatti wrote:

On Thu, May 13, 2010 at 02:17:35PM +0300, Avi Kivity wrote:
   

All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.

This patch only updates generic ioctls and leaves arch specific ioctls alone.
 

Forgot mp_state get/set.

   


Right, fixed and applied.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: system_powerdown not working for qemu-kvm 0.12.4?


On 05/15/2010 04:19 AM, Teck Choon Giam wrote:

Hi,

Anyone encountered the same issue as me about system_powerdown no
longer working since upgraded to qemu-kvm 0.12.4?
   


Compared with what version?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?


On 05/12/2010 02:11 AM, Juan Quintela wrote:

Peter Lievenp...@dlh.net  wrote:
   

Hi Qemu/KVM Devel Team,

Live Migration from a 0.12.2 qemu-kvm to a 0.12.3 (and 0.12.4)
does not work: load of migration failed

Is there any way to find out, why exactly it fails? I have
a lot of VMs running on 0.12.2 and would like to migrate
them to 0.12.4

cmdline:
-net tap,vlan=6,script=no,downscript=no,ifname=tap7 -net
nic,vlan=6,model=e1000,macaddr=52:54:00:fe:00:88   -net
tap,vlan=651,script=no,downscript=no,ifname=tap8 -net
nic,vlan=651,model=e1000,macaddr=52:54:00:ff:00:69   -drive
file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-b6eca7604-6280020bc4b4bde8-quagga,if=ide,boot=on,cache=none,aio=native
-m 256 -monitor tcp:0:4090,server,nowait -vnc :90 -name 'Quagga' -boot
order=dc,menu=on  -k de  -incoming tcp:172.21.59.132:5090 -pidfile
/var/run/qemu/vm-148.pid  -rtc base=utc,clock=vm  -usb -usbdevice
tablet  -no-kvm-irqchip  -vga cirrus

Any hints would be appreciated!
 

Can you try reverting this patch?

commit 3fa017e24b0a0f0e68619a689b9b02fe486dae9e
Author: Marcelo Tosattimtosa...@redhat.com
Date:   Thu Feb 11 18:19:44 2010 -0200

 ide save/restore pio/atapi cmd transfer fields and io buffer

and told me if it works?

   


Any idea why it fails?  And how to fix it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm-intel does not load when Intel TXT is enabled


On 05/14/2010 09:24 PM, Andrej Podzimek wrote:

Hello,

when I enable Intel TXT (Trusted Execution Technology) in the BIOS 
settings of my Lenovo W510, the kvm-intel module does not load and 
says 'kvm: disabled by bios', despite the fact that both VT-x and VT-d 
are enabled. Disabling TXT enables hardware virtualization and 
kvm-intel works as expected.


Are VT-[xd] and TXT supposed to be mutually exclusive? Or could it be 
a BIOS bug?




Running kvm under txt needs special support in the kernel; it will be 
available in 2.6.35.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] [block]: Fix scsi-generic breakage in find_image_format()

2010-05-16 Thread Christoph Hellwig

On Sat, May 15, 2010 at 06:30:52AM -0700, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 This patch adds a special BlockDriverState-sg check in 
 block.c:find_image_format()
 after bdrv_file_open() - block/raw-posix.c:hdev_open() has been called to 
 determine
 if we are dealing with a Linux host scsi-generic device or not.
 
 The patch then returns the BlockDriver * from find_protocol(), skipping the 
 subsequent
 bdrv_read() and rest of find_image_format().

That's not quite correct as we don't want to expose formats directly.
Returning

bdrv_find_format(raw);

should fix it for now, although I really hate having these special
cases in block.c.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] [block]: Skip refresh_total_sectors() for scsi-generic devices

2010-05-16 Thread Christoph Hellwig

On Sat, May 15, 2010 at 06:30:59AM -0700, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 This patch adds a BlockDriverState-sg check in block.c:bdrv_common_open()
 to skip the new refresh_total_sectors() call once we know we are working with
 a scsi-generic device.
 
 We go ahead and skip this call for scsi-generic devices because
 block/raw-posix.c:raw_getlength() - lseek() will return -ESPIPE.

How about moving that check into refresh_total_sectors?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?

2010-05-16 Thread Juan Quintela

Avi Kivity a...@redhat.com wrote:
 On 05/12/2010 02:11 AM, Juan Quintela wrote:
 Peter Lievenp...@dlh.net  wrote:

 Hi Qemu/KVM Devel Team,

 Live Migration from a 0.12.2 qemu-kvm to a 0.12.3 (and 0.12.4)
 does not work: load of migration failed

 Is there any way to find out, why exactly it fails? I have
 a lot of VMs running on 0.12.2 and would like to migrate
 them to 0.12.4

 cmdline:
 -net tap,vlan=6,script=no,downscript=no,ifname=tap7 -net
 nic,vlan=6,model=e1000,macaddr=52:54:00:fe:00:88   -net
 tap,vlan=651,script=no,downscript=no,ifname=tap8 -net
 nic,vlan=651,model=e1000,macaddr=52:54:00:ff:00:69   -drive
 file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-b6eca7604-6280020bc4b4bde8-quagga,if=ide,boot=on,cache=none,aio=native
 -m 256 -monitor tcp:0:4090,server,nowait -vnc :90 -name 'Quagga' -boot
 order=dc,menu=on  -k de  -incoming tcp:172.21.59.132:5090 -pidfile
 /var/run/qemu/vm-148.pid  -rtc base=utc,clock=vm  -usb -usbdevice
 tablet  -no-kvm-irqchip  -vga cirrus

 Any hints would be appreciated!
  
 Can you try reverting this patch?

 commit 3fa017e24b0a0f0e68619a689b9b02fe486dae9e
 Author: Marcelo Tosattimtosa...@redhat.com
 Date:   Thu Feb 11 18:19:44 2010 -0200

  ide save/restore pio/atapi cmd transfer fields and io buffer

 and told me if it works?



 Any idea why it fails?  And how to fix it?

Lack of proper subsections.  IDE is something like:

const VMStateDescription vmstate_ide_drive = {
.version_id = 4,

}

static const VMStateDescription vmstate_bmdma = {
.name = ide bmdma,
.version_id = 4,
...
}

const VMStateDescription vmstate_ide_pci = {
.name = ide,
.version_id = 4,

VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0,
 vmstate_bmdma, BMDMAState),
VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState),
VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState),

}


Notice that everything is at version 4.  It used to be everything at
version 3.  Now the problem is that when migrating from v3 - v4.  We
put in one place v3, But we only have a version number at the toplevel,
rest of subsections don't sent a version number.  There is no way to
fix it in the general case.  We can hack something around for ide, but
that will just be a hack, or we can backport marcelo change and port it
as a proper subsection (that is my plan).  I expect to have time at the
end of next time to work on this.

So, to make the story short: I know what is happening, and I know how to
fix it, just that fix is not trivial.  I just need time.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?


On 05/16/2010 05:42 PM, Juan Quintela wrote:



Any idea why it fails?  And how to fix it?
 

Lack of proper subsections.  IDE is something like:

const VMStateDescription vmstate_ide_drive = {
 .version_id = 4,

}

static const VMStateDescription vmstate_bmdma = {
 .name = ide bmdma,
 .version_id = 4,
...
}

const VMStateDescription vmstate_ide_pci = {
 .name = ide,
 .version_id = 4,

 VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0,
  vmstate_bmdma, BMDMAState),
 VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState),
 VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState),

}


Notice that everything is at version 4.  It used to be everything at
version 3.  Now the problem is that when migrating from v3 -  v4.  We
put in one place v3, But we only have a version number at the toplevel,
rest of subsections don't sent a version number.  There is no way to
fix it in the general case.  We can hack something around for ide, but
that will just be a hack, or we can backport marcelo change and port it
as a proper subsection (that is my plan).  I expect to have time at the
end of next time to work on this.
   


end of next week?


So, to make the story short: I know what is happening, and I know how to
fix it, just that fix is not trivial.  I just need time.
   


Meanwhile, we have a broken 0.12.4.  Is there a quick'n'dirty workaround 
that will be forward compatible with the real fix that we can push out?


We've regressed from failing some migrations to failing all migrations.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?

2010-05-16 Thread Jan Kiszka

Juan Quintela wrote:
 Lack of proper subsections.  IDE is something like:
 
 const VMStateDescription vmstate_ide_drive = {
 .version_id = 4,
 
 }
 
 static const VMStateDescription vmstate_bmdma = {
 .name = ide bmdma,
 .version_id = 4,
 ...
 }
 
 const VMStateDescription vmstate_ide_pci = {
 .name = ide,
 .version_id = 4,
 
 VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0,
  vmstate_bmdma, BMDMAState),
 VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState),
 VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState),
 
 }
 
 
 Notice that everything is at version 4.  It used to be everything at
 version 3.  Now the problem is that when migrating from v3 - v4.  We
 put in one place v3, But we only have a version number at the toplevel,
 rest of subsections don't sent a version number.  There is no way to
 fix it in the general case.  We can hack something around for ide, but
 that will just be a hack, or we can backport marcelo change and port it
 as a proper subsection (that is my plan).  I expect to have time at the
 end of next time to work on this.

BTW, the IDE subsystem is yet lacking a proper vmstate section split-up
along qdev boundaries (ie. vmstate_ide_pci should not contain drive
structures). Do you plan to address this as well?

Jan



signature.asc
Description: OpenPGP digital signature

Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?

2010-05-16 Thread Juan Quintela

Avi Kivity a...@redhat.com wrote:
 On 05/16/2010 05:42 PM, Juan Quintela wrote:

 Any idea why it fails?  And how to fix it?
  
 Lack of proper subsections.  IDE is something like:

 const VMStateDescription vmstate_ide_drive = {
  .version_id = 4,
 
 }

 static const VMStateDescription vmstate_bmdma = {
  .name = ide bmdma,
  .version_id = 4,
 ...
 }

 const VMStateDescription vmstate_ide_pci = {
  .name = ide,
  .version_id = 4,
 
  VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0,
   vmstate_bmdma, BMDMAState),
  VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState),
  VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState),
 
 }


 Notice that everything is at version 4.  It used to be everything at
 version 3.  Now the problem is that when migrating from v3 -  v4.  We
 put in one place v3, But we only have a version number at the toplevel,
 rest of subsections don't sent a version number.  There is no way to
 fix it in the general case.  We can hack something around for ide, but
 that will just be a hack, or we can backport marcelo change and port it
 as a proper subsection (that is my plan).  I expect to have time at the
 end of next time to work on this.


 end of next week?

Humm, for Spaniards weeks start on Monday :)  (Monday is holiday here).

I mean here Friday 21.

 So, to make the story short: I know what is happening, and I know how to
 fix it, just that fix is not trivial.  I just need time.


 Meanwhile, we have a broken 0.12.4.  Is there a quick'n'dirty
 workaround that will be forward compatible with the real fix that we
 can push out?

revert the patch.  It almost never happen (being in the middle of one
IO) while migrating.

 We've regressed from failing some migrations to failing all migrations.

Humm, 0.12.4 - 0.12.4 should work.  My advise is just revert the patch
and live with it for another week, what do you think?

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?

2010-05-16 Thread Juan Quintela

Jan Kiszka jan.kis...@web.de wrote:
 Juan Quintela wrote:
 Lack of proper subsections.  IDE is something like:
 
 const VMStateDescription vmstate_ide_drive = {
 .version_id = 4,
 
 }
 
 static const VMStateDescription vmstate_bmdma = {
 .name = ide bmdma,
 .version_id = 4,
 ...
 }
 
 const VMStateDescription vmstate_ide_pci = {
 .name = ide,
 .version_id = 4,
 
 VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0,
  vmstate_bmdma, BMDMAState),
 VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState),
 VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState),
 
 }
 
 
 Notice that everything is at version 4.  It used to be everything at
 version 3.  Now the problem is that when migrating from v3 - v4.  We
 put in one place v3, But we only have a version number at the toplevel,
 rest of subsections don't sent a version number.  There is no way to
 fix it in the general case.  We can hack something around for ide, but
 that will just be a hack, or we can backport marcelo change and port it
 as a proper subsection (that is my plan).  I expect to have time at the
 end of next time to work on this.

 BTW, the IDE subsystem is yet lacking a proper vmstate section split-up
 along qdev boundaries (ie. vmstate_ide_pci should not contain drive
 structures). Do you plan to address this as well?

Not for Friday, and not for 0.12.

That is 0.13 material, and have to get one agreement on how to go.
We can go for:
- good structure
- backward compatibility

I can't see any good way to get both at this stage :(  But I am open to
sugestions.

Later, Juan.

PD. BTW, very good work with printing the vmstate, that was one of the goals
when we added it, that was the next step after porting everything to
vmstate :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] [block]: Fix scsi-generic breakage in find_image_format()

On Sun, 2010-05-16 at 15:29 +0200, Christoph Hellwig wrote:
 On Sat, May 15, 2010 at 06:30:52AM -0700, Nicholas A. Bellinger wrote:
  From: Nicholas Bellinger n...@linux-iscsi.org
  
  This patch adds a special BlockDriverState-sg check in 
  block.c:find_image_format()
  after bdrv_file_open() - block/raw-posix.c:hdev_open() has been called to 
  determine
  if we are dealing with a Linux host scsi-generic device or not.
  
  The patch then returns the BlockDriver * from find_protocol(), skipping the 
  subsequent
  bdrv_read() and rest of find_image_format().
 
 That's not quite correct as we don't want to expose formats directly.
 Returning
 
   bdrv_find_format(raw);
 
 should fix it for now,

Ahh, that makes more sense..  Attached is updated patch #1:

[PATCH 1/2] [block]: Make find_image_format() return 'raw' BlockDriver for 
SG_IO devices

  although I really hate having these special
 cases in block.c.

Indeed having BlockDriverState-sg kinda feels kinda like a hack to
begin with, but it appears this is currently QEMU's way of telling block
code to disable internal SCSI CDB emulation for devices.

Best,

--nab




0001--block-Make-find_image_format-return-raw-Block.patch
Description: application/mbox

Re: [PATCH 2/2] [block]: Skip refresh_total_sectors() for scsi-generic devices

On Sun, 2010-05-16 at 15:30 +0200, Christoph Hellwig wrote:
 On Sat, May 15, 2010 at 06:30:59AM -0700, Nicholas A. Bellinger wrote:
  From: Nicholas Bellinger n...@linux-iscsi.org
  
  This patch adds a BlockDriverState-sg check in block.c:bdrv_common_open()
  to skip the new refresh_total_sectors() call once we know we are working 
  with
  a scsi-generic device.
  
  We go ahead and skip this call for scsi-generic devices because
  block/raw-posix.c:raw_getlength() - lseek() will return -ESPIPE.
 
 How about moving that check into refresh_total_sectors?

Sounds good, attached is updated patch #2:

[PATCH 2/2] [block]: Add SG_IO device check in refresh_total_sectors()

Also, I am using the options in .vimrc to follow QEMU's indent style:

set smartindent
set tabstop=4
set shiftwidth=4
set expandtab

I assume this is what should be used when sending QEMU patches, yes..?

Best,

--nab


0002--block-Add-SG_IO-device-check-in-refresh_total_sec.patch
Description: application/mbox

[PATCH 0/2] Fix scsi-generic breakage in upstream qemu-kvm.git

From: Nicholas Bellinger n...@linux-iscsi.org

Greetings,

Attached are the updated patches following hch's comments to fix scsi-generic
device breakage with find_image_format() and refresh_total_sectors().

These are being resent as the last attachments where in MBOX format from 
git-format-patch.

Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org

Nicholas Bellinger (2):
  [block]: Make find_image_format() return 'raw' BlockDriver for SG_IO
devices
  [block]: Add SG_IO device check in refresh_total_sectors()

 block.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] [block]: Add SG_IO device check in refresh_total_sectors()

From: Nicholas Bellinger n...@linux-iscsi.org

This patch adds a special case check for scsi-generic devices in 
refresh_total_sectors()
to skip the subsequent BlockDriver-bdrv_getlength() that will be returning
-ESPIPE from block/raw-posic.c:raw_getlength() for BlockDriverState-sg=1 
devices.

Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org
---
 block.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index f419ee6..7917712 100644
--- a/block.c
+++ b/block.c
@@ -364,6 +364,10 @@ static int refresh_total_sectors(BlockDriverState *bs, 
int64_t hint)
 {
 BlockDriver *drv = bs-drv;
 
+/* Do not attempt drv-bdrv_getlength() on scsi-generic devices */
+if (bs-sg)
+return 0;
+
 /* query actual device if possible, otherwise just trust the hint */
 if (drv-bdrv_getlength) {
 int64_t length = drv-bdrv_getlength(bs);
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] [block]: Make find_image_format() return 'raw' BlockDriver for SG_IO devices

From: Nicholas Bellinger n...@linux-iscsi.org

This patch adds a special BlockDriverState-sg check in 
block.c:find_image_format()
after bdrv_file_open() - block/raw-posix.c:hdev_open() has been called to 
determine
if we are dealing with a Linux host scsi-generic device.

The patch then returns the BlockDriver * from bdrv_find_format(raw), skipping 
the
subsequent bdrv_read() and rest of find_image_format().

Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org
---
 block.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 48305b7..f419ee6 100644
--- a/block.c
+++ b/block.c
@@ -332,6 +332,11 @@ static BlockDriver *find_image_format(const char *filename)
 ret = bdrv_file_open(bs, filename, 0);
 if (ret  0)
 return NULL;
+
+/* Return the raw BlockDriver * to scsi-generic devices */
+if (bs-sg)
+return bdrv_find_format(raw);
+
 ret = bdrv_pread(bs, 0, buf, sizeof(buf));
 bdrv_delete(bs);
 if (ret  0) {
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking


On 05/16/2010 04:00 AM, Alexander Graf wrote:

On 15.05.2010, at 19:30, Avi Kivity wrote:

   

On 05/15/2010 11:26 AM, Alexander Graf wrote:
 
   

That means you never inject an interrupt from the iothread (or from a different 
vcpu thread)?

If that's the case we might make it part of the API and require the ioctl to be 
issued from the vcpu thread.  We'd still be left with the s390 exception.

 

Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the user 
base is small enough to ignore them.

Either way, I'm actually rather unhappy with the way interrupts work right now. 
We're only injecting interrupts when in the main loop, which is rare if we did 
our homework right. So what I'd really like to see is that the MPIC on ppc 
directly invokes KVM_INTERRUPT to pull (and losen) the interrupt line. That way 
we can't just accidently miss interrupts.

   

on x86 we signal the vcpu thread to pull it out of the main loop and poll the 
apic.
 

Hrm, makes sense. Though it's additional overhead of a task switch. Why take 
the burden if you don't have to?
   


That's what the world looked like in 2006.

We could change it, but there's not much point, since having the local 
apic in the kernel is pretty much a requirement for reasonable performance.



That's exactly why x86 has run-request_interrupt_window, 
run-ready_for_interrupt_injection, and run-if_flag.
 

So how does x86 userspace get notified when it has an interrupt pending but 
couldn't inject it? Without a notification, we delay interrupts by quite some 
time.
   


run-ready_for_interrupt_injection and run-request_irq_window.


-   we need to exit to userspace to realize that the interrupt is still active

This is fundamentally broken. What I'd like to see is:

* device in userspace wants to trigger an interrupt
* mpic pulls external interrupt line, automatically issues ioctl(KVM_INTERRUPT)
* we enter the guest
* we inject the external interrupt
* guest gets the interrupt, sees it's lazily disabled, sets msr.ee=0
* guest moves on, sets msr.ee=1 again later
* we inject the external interrupt again, since it's still active
* guest acknowledges the interrupt with the mpic, issues ioctl(KVM_INTERRUPT, 
disable)
-   all is great

   

This is similar to KVM_IRQ_LINE.
 

Well, KVM_IRQ_LINE would be the in-kernel pic, no? That'd mean I'd have to 
reimplement the mpic in kernel space - which might eventually be a good idea 
when going for SMP, but I'd first like to see if I can keep the current 
interrupt injection path efficient.
   


Sure.


For that to work we need to enable triggering KVM_INTERRUPT from a non-vcpu 
thread.

   

KVM_IRQ_LINE is a vm ioctl instead of a vcpu ioctl.

The motivation for the strict 'issue vcpu ioctls from vcpu thread' rule is to 
prepare the way for a syscall (instead of ioctl) API.  Then we lost the fd 
argument, and choosing the vcpu is done by associating it with the current task.  
That allows us to get rid of vcpu-mutex and fget_light() (as well as the ioctl 
dispatch).
 

If we define the API to only work on the current vcpu with a few excetions 
where we're safe (KVM_INTERRUPT), that'd get rid of the mutex too, no?
   


Yes.  Need to document it though.


What does fget_light do?
   


Make sure that the vcpu fd doesn't disappear.


And does the ioctl dispatch cost that much? I like the flexibility of it to be 
honest.
   


Not much, which is why there's no movement in that direction.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking


On 16.05.2010, at 10:23, Avi Kivity wrote:

 On 05/16/2010 04:00 AM, Alexander Graf wrote:
 On 15.05.2010, at 19:30, Avi Kivity wrote:
 
   
 On 05/15/2010 11:26 AM, Alexander Graf wrote:
 
   
 That means you never inject an interrupt from the iothread (or from a 
 different vcpu thread)?
 
 If that's the case we might make it part of the API and require the ioctl 
 to be issued from the vcpu thread.  We'd still be left with the s390 
 exception.
 
 
 Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the 
 user base is small enough to ignore them.
 
 Either way, I'm actually rather unhappy with the way interrupts work right 
 now. We're only injecting interrupts when in the main loop, which is rare 
 if we did our homework right. So what I'd really like to see is that the 
 MPIC on ppc directly invokes KVM_INTERRUPT to pull (and losen) the 
 interrupt line. That way we can't just accidently miss interrupts.
 
   
 on x86 we signal the vcpu thread to pull it out of the main loop and poll 
 the apic.
 
 Hrm, makes sense. Though it's additional overhead of a task switch. Why take 
 the burden if you don't have to?
   
 
 That's what the world looked like in 2006.
 
 We could change it, but there's not much point, since having the local apic 
 in the kernel is pretty much a requirement for reasonable performance.

Well, I'm not convinced yet that's the case for PPC as well. The timer is 
in-cpu anyways and I don't see why IPIs should be slow with a userspace pic - 
if we keep the overhead low.

So let me think this through. With remote interrupt injection we have.

* thread 1 does vcpu_run
* thread 2 triggers KVM_INTERRUPT on fd
* thread 2 signals thread 1 so we're sure the interrupt gets injected
* thread 1 exits into qemu
* thread 1 goes back into the vcpu, triggering an interrupt

Without we have:

* thread 1 does vcpu_run
* thread 2 wants to trigger an an interrupt, sets the qemu internal bit
* thread 2 signals thread 1 so we're sure the interrupt gets processed
* thread 1 exits into qemu
* thread 1 triggers KVM_INTERRUPT on fd
* thread 1 goes into the vcpu

So we don't really buy anything from doing the remote injection. Hrm.

What's somewhat striking me here though is - why do we need KVM_INTERRUPT when 
there's all those kvm_run fields? Can't we just do interrupt injection by 
setting run-trigger_interrupt? There's only a single interrupt line on the 
CPU anyways. That way we'd save the ioctl and get rid of the locking problem 
altogether.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking


On 16.05.2010, at 11:09, Avi Kivity wrote:

 On 05/16/2010 12:01 PM, Alexander Graf wrote:
 
 That's what the world looked like in 2006.
 
 We could change it, but there's not much point, since having the local apic 
 in the kernel is pretty much a requirement for reasonable performance.
 
 Well, I'm not convinced yet that's the case for PPC as well. The timer is 
 in-cpu anyways and I don't see why IPIs should be slow with a userspace pic 
 - if we keep the overhead low.
   
 
 If it's at all possible keep the mpic out.  I am _not_ advocating pushing 
 ppc's mpic into the kernel.
 
 So let me think this through. With remote interrupt injection we have.
 
 * thread 1 does vcpu_run
 * thread 2 triggers KVM_INTERRUPT on fd
 * thread 2 signals thread 1 so we're sure the interrupt gets injected
 * thread 1 exits into qemu
   
 
 This doesn't seem necessary.  The kernel can own the interrupt line, so it 
 remembers it from the last KVM_INTERRUPT.

It's not? On signals we always exit to userspace, no?

 
 * thread 1 goes back into the vcpu, triggering an interrupt
 
 Without we have:
 
 * thread 1 does vcpu_run
 * thread 2 wants to trigger an an interrupt, sets the qemu internal bit
 * thread 2 signals thread 1 so we're sure the interrupt gets processed
 * thread 1 exits into qemu
 * thread 1 triggers KVM_INTERRUPT on fd
 * thread 1 goes into the vcpu
 
 So we don't really buy anything from doing the remote injection. Hrm.
   
 
 Not if you make interrupt injection a lightweight exit.

Please elaborate.

 
 What's somewhat striking me here though is - why do we need KVM_INTERRUPT 
 when there's all those kvm_run fields? Can't we just do interrupt injection 
 by setting run-trigger_interrupt? There's only a single interrupt line on 
 the CPU anyways. That way we'd save the ioctl and get rid of the locking 
 problem altogether.
   
 
 That's what x86 does.  However, it's synchronous.

For everyone except for the vcpu thread executing the interrupt, it's 
asynchronous, right? The same applies to an in-kernel pic.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking

On 05/16/2010 12:35 PM, Alexander Graf wrote:

So let me think this through. With remote interrupt injection we have.

* thread 1 does vcpu_run
* thread 2 triggers KVM_INTERRUPT on fd
* thread 2 signals thread 1 so we're sure the interrupt gets injected
* thread 1 exits into qemu

This doesn't seem necessary. The kernel can own the interrupt line, so it
remembers it from the last KVM_INTERRUPT.

It's not?

With s/signals/IPIs/.

On signals we always exit to userspace, no?

Yes (if the signal isn't blocked).

* thread 1 goes back into the vcpu, triggering an interrupt

Without we have:

So we don't really buy anything from doing the remote injection. Hrm.

Not if you make interrupt injection a lightweight exit.

Please elaborate.

1: vcpu_run
2: KVM_INTERRUPT
2k: sets flag, if msr.ee IPIs 1 or wakes up 1 if halted
1k: notices flag, if msr.ee injects interrupt
...
1g: acks
1k: forwards ack to userspace
1: completes interrupt

That's what x86 does. However, it's synchronous.

For everyone except for the vcpu thread executing the interrupt, it's
asynchronous, right?

For everyone other than the vcpu thread, it's off limits. kvm_run is
only read on KVM_RUN entries and written on KVM_RUN exits.

The same applies to an in-kernel pic.

The in-kernel pic doesn't use kvm_run.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Consolidate vcpu ioctl locking