[COMMIT master] Fix segfault after device assignment hot remove
From: Alex Williamson alex.william...@redhat.com We keep a qlist of assigned devices for irq updates, but we forgot to remove entries from it if they're hot unplugged. This makes assigned_dev_update_irqs() a timebomb that goes off when the guest is rebooted. Signed-off-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1f13a6d..b9cc06f 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1390,6 +1390,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev) { AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev); +QLIST_REMOVE(dev, next); deassign_device(dev); free_assigned_device(dev); return 0; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] turn off kvmclock when resetting cpu
From: Glauber Costa glom...@redhat.com Currently, in the linux kernel, we reset kvmclock if we are rebooting into a crash kernel through kexec. The rationale, is that a new kernel won't follow the same memory addresses, and the memory where kvmclock is located in the first kernel, will be something else in the second one. We don't do it in normal reboots, because the second kernel ends up registering kvmclock again, which has the effect of turning off the first instance. This is, however, totally wrong. This assumes we're booting into a kernel that also has kvmclock enabled. If by some reason we reboot into something that doesn't do kvmclock including but not limited to: * rebooting into an older kernel without kvmclock support, * rebooting with no-kvmclock, * rebootint into another O.S, we'll simply have the hypervisor writting into a random memory position into the guest. Neat, uh? Moreover, I believe the fix belongs in qemu, since it is the entity more prepared to detect all kinds of reboots (by means of a cpu_reset), not to mention the presence of misbehaving guests, that can forget to turn kvmclock off. It is also necessary to reset other msrs, so this patch resets everything that kvm exports through its MSR list. This patch fixes the issue for me. Signed-off-by: Glauber Costa glom...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 5bd752d..73b4af7 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -1333,8 +1333,31 @@ void kvm_arch_push_nmi(void *opaque) } #endif /* KVM_CAP_USER_NMI */ +static int kvm_reset_msrs(CPUState *env) +{ +struct { +struct kvm_msrs info; +struct kvm_msr_entry entries[100]; +} msr_data; +int n; +struct kvm_msr_entry *msrs = msr_data.entries; + +if (!kvm_msr_list) +return -1; + +for (n = 0; n kvm_msr_list-nmsrs; n++) { +kvm_msr_entry_set(msrs[n], kvm_msr_list-indices[n], 0); +} + +msr_data.info.nmsrs = n; + +return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data); +} + + void kvm_arch_cpu_reset(CPUState *env) { +kvm_reset_msrs(env); kvm_arch_reset_vcpu(env); kvm_reset_mpstate(env); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] test: emulator: lmsw may not clear cr0.pe
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/emulator.c b/kvm/user/test/x86/emulator.c index 5406062..e677e3a 100644 --- a/kvm/user/test/x86/emulator.c +++ b/kvm/user/test/x86/emulator.c @@ -267,6 +267,15 @@ void test_lmsw(void) asm(lmsw %0 : : m(*pmsw)); printf(before %lx after %lx\n, cr0, read_cr0()); report(lmsw (2), cr0 == read_cr0()); + + /* lmsw can't clear cr0.pe */ + msw = (cr0 ~1ul) ^ 4; /* change EM to force trap */ + asm(lmsw %0 : : r(msw)); + report(lmsw (3), (cr0 ^ read_cr0()) == 4 (cr0 1)); + + /* back to normal */ + msw = cr0; + asm(lmsw %0 : : r(msw)); } int main() -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] test: Add test for xor acc, imm
From: Mohammed Gamal m.gamal...@gmail.com Adds test for xor acc, imm Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c index 41e2aea..70a1e05 100644 --- a/kvm/user/test/x86/realmode.c +++ b/kvm/user/test/x86/realmode.c @@ -350,6 +350,50 @@ void test_sub_imm(void) print_serial(sub test 4: PASS\n); } + +void test_xor_imm(void) +{ + struct regs inregs = { 0 }, outregs; + MK_INSN(xor_r32_imm_1, mov $1234567890, %eax\n\t xor $1234567890, %eax\n\t); + MK_INSN(xor_r16_imm_1, mov $1234, %ax\n\t xor $1234, %ax\n\t); + MK_INSN(xor_r8_imm_1, mov $0x12, %ah\n\t xor $0x12, %ah\n\t); + MK_INSN(xor_r8_imm_2, mov $0x34, %al\n\t xor $0x34, %al\n\t); + + exec_in_big_real_mode(inregs, outregs, + insn_xor_r16_imm_1, + insn_xor_r16_imm_1_end - insn_xor_r16_imm_1); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0) + print_serial(xor test 1: FAIL\n); + else + print_serial(xor test 1: PASS\n); + + /* test mov $imm, %eax */ + exec_in_big_real_mode(inregs, outregs, + insn_xor_r32_imm_1, + insn_xor_r32_imm_1_end - insn_xor_r32_imm_1); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0) + print_serial(xor test 2: FAIL\n); + else + print_serial(xor test 2: PASS\n); + + /* test mov $imm, %al/%ah */ + exec_in_big_real_mode(inregs, outregs, + insn_xor_r8_imm_1, + insn_xor_r8_imm_1_end - insn_xor_r8_imm_1); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0) + print_serial(xor test 3: FAIL\n); + else + print_serial(xor test 3: PASS\n); + + exec_in_big_real_mode(inregs, outregs, + insn_xor_r8_imm_2, + insn_xor_r8_imm_2_end - insn_xor_r8_imm_2); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0) + print_serial(xor test 4: FAIL\n); + else + print_serial(xor test 4: PASS\n); +} + void test_cmp_imm(void) { struct regs inregs = { 0 }, outregs; @@ -786,6 +830,7 @@ void realmode_start(void) test_cmp_imm(); test_add_imm(); test_sub_imm(); + test_xor_imm(); test_io(); test_eflags_insn(); test_jcc_short(); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] device-assignment: fix failure to exit on shared IRQ
From: Alex Williamson alex.william...@redhat.com Since c1699988, piix config space isn't programmed until the first system reset. This means that when we call assign_irq() from assigned_initfn(), we're going to get back an irq of 0x0, which unfortunately matches our initialization value, so we don't bother to call kvm_assign_irq(). Switch to a -1 initializer so we can test whether kvm_assign_irq() is going to succeed and allow the process to exit if it doesn't. The guest irq will get reset to a more appropriate value on system reset anyway. Signed-off-by: Alex Williamson alex.william...@redhat.com Acked-by: Chris Wright chr...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/hw/device-assignment.c b/hw/device-assignment.c index b9cc06f..eb31c78 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1346,7 +1346,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev) e_intx = dev-dev.config[0x3d] - 1; dev-intpin = e_intx; dev-run = 0; -dev-girq = 0; +dev-girq = -1; dev-h_segnr = dev-host.seg; dev-h_busnr = dev-host.bus; dev-h_devfn = PCI_DEVFN(dev-host.dev, dev-host.func); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] test: Add test for sub acc,imm
From: Mohammed Gamal m.gamal...@gmail.com Adds tests fot sub acc, imm Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c index bc4ed97..41e2aea 100644 --- a/kvm/user/test/x86/realmode.c +++ b/kvm/user/test/x86/realmode.c @@ -307,6 +307,49 @@ void test_mov_imm(void) print_serial(mov test 5: PASS\n); } +void test_sub_imm(void) +{ + struct regs inregs = { 0 }, outregs; + MK_INSN(sub_r32_imm_1, mov $1234567890, %eax\n\t sub $10, %eax\n\t); + MK_INSN(sub_r16_imm_1, mov $1234, %ax\n\t sub $10, %ax\n\t); + MK_INSN(sub_r8_imm_1, mov $0x12, %ah\n\t sub $0x10, %ah\n\t); + MK_INSN(sub_r8_imm_2, mov $0x34, %al\n\t sub $0x10, %al\n\t); + + exec_in_big_real_mode(inregs, outregs, + insn_sub_r16_imm_1, + insn_sub_r16_imm_1_end - insn_sub_r16_imm_1); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1224) + print_serial(sub test 1: FAIL\n); + else + print_serial(sub test 1: PASS\n); + + /* test mov $imm, %eax */ + exec_in_big_real_mode(inregs, outregs, + insn_sub_r32_imm_1, + insn_sub_r32_imm_1_end - insn_sub_r32_imm_1); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1234567880) + print_serial(sub test 2: FAIL\n); + else + print_serial(sub test 2: PASS\n); + + /* test mov $imm, %al/%ah */ + exec_in_big_real_mode(inregs, outregs, + insn_sub_r8_imm_1, + insn_sub_r8_imm_1_end - insn_sub_r8_imm_1); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x0200) + print_serial(sub test 3: FAIL\n); + else + print_serial(sub test 3: PASS\n); + + exec_in_big_real_mode(inregs, outregs, + insn_sub_r8_imm_2, + insn_sub_r8_imm_2_end - insn_sub_r8_imm_2); + if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x24) + print_serial(sub test 4: FAIL\n); + else + print_serial(sub test 4: PASS\n); +} + void test_cmp_imm(void) { struct regs inregs = { 0 }, outregs; @@ -742,6 +785,7 @@ void realmode_start(void) test_mov_imm(); test_cmp_imm(); test_add_imm(); + test_sub_imm(); test_io(); test_eflags_insn(); test_jcc_short(); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: update mmu documetation for role.nxe
From: Gui Jianfeng guijianf...@cn.fujitsu.com There's no member cr4_nxe in struct kvm_mmu_page_role, it names nxe now. Update mmu document. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt index 0cc28fb..fde7989 100644 --- a/Documentation/kvm/mmu.txt +++ b/Documentation/kvm/mmu.txt @@ -161,7 +161,7 @@ Shadow pages contain the following information: role.cr4_pae: Contains the value of cr4.pae for which the page is valid (e.g. whether 32-bit or 64-bit gptes are in use). - role.cr4_nxe: + role.nxe: Contains the value of efer.nxe for which the page is valid. gfn: Either the guest page table containing the translations shadowed by this -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: Call vcpu_load and vcpu_put in cpuid_update
From: Dongxiao Xu dongxiao...@intel.com cpuid_update may operate VMCS, so vcpu_load() and vcpu_put() should be called to ensure correctness. Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b5eaed4..74d609e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1781,6 +1781,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, if (copy_from_user(cpuid_entries, entries, cpuid-nent * sizeof(struct kvm_cpuid_entry))) goto out_free; + vcpu_load(vcpu); for (i = 0; i cpuid-nent; i++) { vcpu-arch.cpuid_entries[i].function = cpuid_entries[i].function; vcpu-arch.cpuid_entries[i].eax = cpuid_entries[i].eax; @@ -1798,6 +1799,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, r = 0; kvm_apic_set_version(vcpu); kvm_x86_ops-cpuid_update(vcpu); + vcpu_put(vcpu); out_free: vfree(cpuid_entries); @@ -1818,9 +1820,11 @@ static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, if (copy_from_user(vcpu-arch.cpuid_entries, entries, cpuid-nent * sizeof(struct kvm_cpuid_entry2))) goto out; + vcpu_load(vcpu); vcpu-arch.cpuid_nent = cpuid-nent; kvm_apic_set_version(vcpu); kvm_x86_ops-cpuid_update(vcpu); + vcpu_put(vcpu); return 0; out: -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Merge remote branch 'tip/x86/fpu'
From: Marcelo Tosatti mtosa...@redhat.com Conflicts: arch/x86/kernel/process.c Signed-off-by: Marcelo Tosatti mtosa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: blocked-by-sti must not defer NMI injections
From: Jan Kiszka jan.kis...@siemens.com As the processor may not consider GUEST_INTR_STATE_STI as a reason for blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW when we asked for it. But as we consider this state as NMI-blocking, we can run into an endless loop. Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is active (originally suggested by Gleb). Intel confirmed that this is safe, the processor will never complain about NMI injection in this state. Signed-off-by: Jan Kiszka jan.kis...@siemens.com KVM-Stable-Tag Acked-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f20ff50..99ae513 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2850,8 +2850,7 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) return 0; return !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) - (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS | - GUEST_INTR_STATE_NMI)); + (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_NMI)); } static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: Define new functions to wrapper direct call of asm code
From: Dongxiao Xu dongxiao...@intel.com Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm code. Also move VMXE bit operation out of kvm_cpu_vmxoff(). Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e35c479..1959814 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -453,6 +453,19 @@ static void vmcs_clear(struct vmcs *vmcs) vmcs, phys_addr); } +static void vmcs_load(struct vmcs *vmcs) +{ + u64 phys_addr = __pa(vmcs); + u8 error; + + asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 + : =g(error) : a(phys_addr), m(phys_addr) + : cc, memory); + if (error) + printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, + vmcs, phys_addr); +} + static void __vcpu_clear(void *arg) { struct vcpu_vmx *vmx = arg; @@ -830,7 +843,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - u64 phys_addr = __pa(vmx-vmcs); u64 tsc_this, delta, new_offset; if (vcpu-cpu != cpu) { @@ -844,15 +856,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) } if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { - u8 error; - per_cpu(current_vmcs, cpu) = vmx-vmcs; - asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 - : =g(error) : a(phys_addr), m(phys_addr) - : cc); - if (error) - printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, - vmx-vmcs, phys_addr); + vmcs_load(vmx-vmcs); } if (vcpu-cpu != cpu) { @@ -1288,6 +1293,13 @@ static __init int vmx_disabled_by_bios(void) /* locked but not enabled */ } +static void kvm_cpu_vmxon(u64 addr) +{ + asm volatile (ASM_VMX_VMXON_RAX + : : a(addr), m(addr) + : memory, cc); +} + static int hardware_enable(void *garbage) { int cpu = raw_smp_processor_id(); @@ -1310,9 +1322,7 @@ static int hardware_enable(void *garbage) wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits); } write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ - asm volatile (ASM_VMX_VMXON_RAX - : : a(phys_addr), m(phys_addr) - : memory, cc); + kvm_cpu_vmxon(phys_addr); ept_sync_global(); @@ -1336,13 +1346,13 @@ static void vmclear_local_vcpus(void) static void kvm_cpu_vmxoff(void) { asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc); - write_cr4(read_cr4() ~X86_CR4_VMXE); } static void hardware_disable(void *garbage) { vmclear_local_vcpus(); kvm_cpu_vmxoff(); + write_cr4(read_cr4() ~X86_CR4_VMXE); } static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: add new KVMCLOCK cpuid feature
From: Glauber Costa glom...@redhat.com This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest that kvmclock is available through a new set of MSRs. The old ones are deprecated. Signed-off-by: Glauber Costa glom...@redhat.com Acked-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 9734808..f019f8c 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,6 +16,10 @@ #define KVM_FEATURE_CLOCKSOURCE0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 +/* This indicates that the new set of kvmclock msrs + * are available. The use of 0x11 and 0x12 is deprecated + */ +#define KVM_FEATURE_CLOCKSOURCE23 #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: VMCLEAR/VMPTRLD usage changes
From: Dongxiao Xu dongxiao...@intel.com Originally VMCLEAR/VMPTRLD is called on vcpu migration. To support hosted VMM coexistance, VMCLEAR is executed on vcpu schedule out, and VMPTRLD is executed on vcpu schedule in. This could also eliminate the IPI when doing VMCLEAR. Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 454d46a..e262e66 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -63,6 +63,9 @@ module_param_named(unrestricted_guest, static int __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); +static int __read_mostly vmm_exclusive = 1; +module_param(vmm_exclusive, bool, S_IRUGO); + #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \ (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD) #define KVM_GUEST_CR0_MASK \ @@ -845,7 +848,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; - if (vcpu-cpu != cpu) + if (vmm_exclusive vcpu-cpu != cpu) vcpu_clear(vmx); if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { @@ -891,6 +894,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { __vmx_load_host_state(to_vmx(vcpu)); + if (!vmm_exclusive) + __vcpu_clear(to_vmx(vcpu)); } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Enable pvclock flags in vcpu_time_info structure
From: Glauber Costa glom...@redhat.com This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa glom...@redhat.com CC: Jeremy Fitzhardinge jer...@goop.org Acked-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 6d93508..ec5c41a 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info { u64 system_time; u32 tsc_to_system_mul; s8tsc_shift; - u8pad[3]; + u8flags; + u8pad[2]; } __attribute__((__packed__)); /* 32 bytes */ struct pvclock_wall_clock { diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index 53235fd..cd02f32 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -6,6 +6,7 @@ /* some helper functions for xen and kvm pv clock sources */ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src); +void pvclock_set_flags(u8 flags); unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 03801f2..f7fdd56 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -31,8 +31,16 @@ struct pvclock_shadow_time { u32 tsc_to_nsec_mul; int tsc_shift; u32 version; + u8 flags; }; +static u8 valid_flags __read_mostly = 0; + +void pvclock_set_flags(u8 flags) +{ + valid_flags = flags; +} + /* * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, * yielding a 64-bit result. @@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst, dst-system_timestamp = src-system_time; dst-tsc_to_nsec_mul = src-tsc_to_system_mul; dst-tsc_shift = src-tsc_shift; + dst-flags = src-flags; rmb(); /* test version after fetching data */ } while ((src-version 1) || (dst-version != src-version)); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] don't compute pvclock adjustments if we trust the tsc
From: Glauber Costa glom...@redhat.com If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costa glom...@redhat.com Acked-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index f019f8c..05eba5e 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -21,6 +21,11 @@ */ #define KVM_FEATURE_CLOCKSOURCE23 +/* The last 8 bits are used to indicate how to interpret the flags field + * in pvclock structure. If no bits are set, all flags are ignored. + */ +#define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 + #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index ec5c41a..35f2d19 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -39,5 +39,6 @@ struct pvclock_wall_clock { u32 nsec; } __attribute__((__packed__)); +#define PVCLOCK_TSC_STABLE_BIT (1 0) #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PVCLOCK_ABI_H */ diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 59c740f..eb9b76c 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -215,4 +215,7 @@ void __init kvmclock_init(void) clocksource_register(kvm_clock); pv_info.paravirt_enabled = 1; pv_info.name = KVM; + + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); } diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index f5bc40e..239427c 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) barrier(); } while (version != src-version); + if ((valid_flags PVCLOCK_TSC_STABLE_BIT) + (shadow.flags PVCLOCK_TSC_STABLE_BIT)) + return ret; + /* * Assumption here is that last_value, a global accumulator, always goes * forward. If we are less than that, we should not be much smaller. -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: VMXON/VMXOFF usage changes
From: Dongxiao Xu dongxiao...@intel.com SDM suggests VMXON should be called before VMPTRLD, and VMXOFF should be called after doing VMCLEAR. Therefore in vmm coexistence case, we should firstly call VMXON before any VMCS operation, and then call VMXOFF after the operation is done. Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e262e66..f20ff50 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -176,6 +176,8 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) static int init_rmode(struct kvm *kvm); static u64 construct_eptp(unsigned long root_hpa); +static void kvm_cpu_vmxon(u64 addr); +static void kvm_cpu_vmxoff(void); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -847,8 +849,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; + u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); - if (vmm_exclusive vcpu-cpu != cpu) + if (!vmm_exclusive) + kvm_cpu_vmxon(phys_addr); + else if (vcpu-cpu != cpu) vcpu_clear(vmx); if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { @@ -894,8 +899,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { __vmx_load_host_state(to_vmx(vcpu)); - if (!vmm_exclusive) + if (!vmm_exclusive) { __vcpu_clear(to_vmx(vcpu)); + kvm_cpu_vmxoff(); + } } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) @@ -1327,9 +1334,11 @@ static int hardware_enable(void *garbage) wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits); } write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ - kvm_cpu_vmxon(phys_addr); - ept_sync_global(); + if (vmm_exclusive) { + kvm_cpu_vmxon(phys_addr); + ept_sync_global(); + } return 0; } @@ -1355,8 +1364,10 @@ static void kvm_cpu_vmxoff(void) static void hardware_disable(void *garbage) { - vmclear_local_vcpus(); - kvm_cpu_vmxoff(); + if (vmm_exclusive) { + vmclear_local_vcpus(); + kvm_cpu_vmxoff(); + } write_cr4(read_cr4() ~X86_CR4_VMXE); } @@ -3995,6 +4006,19 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu) kmem_cache_free(kvm_vcpu_cache, vmx); } +static inline void vmcs_init(struct vmcs *vmcs) +{ + u64 phys_addr = __pa(per_cpu(vmxarea, raw_smp_processor_id())); + + if (!vmm_exclusive) + kvm_cpu_vmxon(phys_addr); + + vmcs_clear(vmcs); + + if (!vmm_exclusive) + kvm_cpu_vmxoff(); +} + static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; @@ -4020,7 +4044,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx-vmcs) goto free_msrs; - vmcs_clear(vmx-vmcs); + vmcs_init(vmx-vmcs); cpu = get_cpu(); vmx_vcpu_load(vmx-vcpu, cpu); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Add a global synchronization point for pvclock
From: Glauber Costa glom...@redhat.com In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means *BIG*), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa glom...@redhat.com Acked-by: Zachary Amsden zams...@redhat.com CC: Jeremy Fitzhardinge jer...@goop.org CC: Avi Kivity a...@redhat.com CC: Marcelo Tosatti mtosa...@redhat.com CC: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index f7fdd56..f5bc40e 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) return pv_tsc_khz; } +static atomic64_t last_value = ATOMIC64_INIT(0); + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; unsigned version; cycle_t ret, offset; + u64 last; do { version = pvclock_get_time_values(shadow, src); @@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) barrier(); } while (version != src-version); + /* +* Assumption here is that last_value, a global accumulator, always goes +* forward. If we are less than that, we should not be much smaller. +* We assume there is an error marging we're inside, and then the correction +* does not sacrifice accuracy. +* +* For reads: global may have changed between test and return, +* but this means someone else updated poked the clock at a later time. +* We just need to make sure we are not seeing a backwards event. +* +* For updates: last_value = ret is not enough, since two vcpus could be +* updating at the same time, and one of them could be slightly behind, +* making the assumption that last_value always go forward fail to hold. +*/ + last = atomic64_read(last_value); + do { + if (ret last) + return last; + last = atomic64_cmpxchg(last_value, last, ret); + } while (unlikely(last != ret)); + return ret; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
From: Glauber Costa glom...@redhat.com Right now, we were using individual KVM_CAP entities to communicate userspace about which cpuids we support. This is suboptimal, since it generates a delay between the feature arriving in the host, and being available at the guest. A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID. This makes userspace automatically aware of what we provide. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Signed-off-by: Glauber Costa glom...@redhat.com Acked-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 38363f8..9d6e7bf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1983,6 +1983,23 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, } break; } + case KVM_CPUID_SIGNATURE: { + char signature[12] = KVMKVMKVM\0\0; + u32 *sigptr = (u32 *)signature; + entry-eax = 0; + entry-ebx = sigptr[0]; + entry-ecx = sigptr[1]; + entry-edx = sigptr[2]; + break; + } + case KVM_CPUID_FEATURES: + entry-eax = (1 KVM_FEATURE_CLOCKSOURCE) | +(1 KVM_FEATURE_NOP_IO_DELAY) | +(1 KVM_FEATURE_CLOCKSOURCE2); + entry-ebx = 0; + entry-ecx = 0; + entry-edx = 0; + break; case 0x8000: entry-eax = min(entry-eax, 0x801a); break; @@ -2029,6 +2046,23 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, for (func = 0x8001; func = limit nent cpuid-nent; ++func) do_cpuid_ent(cpuid_entries[nent], func, 0, nent, cpuid-nent); + + + + r = -E2BIG; + if (nent = cpuid-nent) + goto out_free; + + do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, nent, +cpuid-nent); + + r = -E2BIG; + if (nent = cpuid-nent) + goto out_free; + + do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_FEATURES, 0, nent, +cpuid-nent); + r = -E2BIG; if (nent = cpuid-nent) goto out_free; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: Add missing decoder flags for sub instruction
From: Mohammed Gamal m.gamal...@gmail.com This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d) Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 35dd57c..1b974f8 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -126,7 +126,7 @@ static u32 opcode_table[256] = { /* 0x28 - 0x2F */ ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, - 0, 0, 0, 0, + ByteOp | DstAcc | SrcImmByte, DstAcc | SrcImm, 0, 0, /* 0x30 - 0x37 */ ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Fix tip/x86/fpu merge
From: Marcelo Tosatti mtosa...@redhat.com Code removed upstream sneaked in through the merge. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 83ccfdf..8bcc21f 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -47,7 +47,6 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) void free_thread_xstate(struct task_struct *tsk) { fpu_free(tsk-thread.fpu); - WARN(tsk-thread.ds_ctx, leaking DS context\n); } void free_thread_info(struct thread_info *ti) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Add cpuid.txt file
From: Glauber Costa glom...@redhat.com This file documents cpuid bits used by KVM. Signed-off-by: Glauber Costa glom...@redhat.com Acked-by: Zachary Amsden zams...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/Documentation/kvm/cpuid.txt b/Documentation/kvm/cpuid.txt new file mode 100644 index 000..8167ef1 --- /dev/null +++ b/Documentation/kvm/cpuid.txt @@ -0,0 +1,44 @@ +KVM CPUID bits +Glauber Costa glom...@redhat.com, Red Hat Inc, 2010 += + +A guest running on a kvm host, can check some of its features using +cpuid. This is not always guaranteed to work, since userspace can +mask-out some, or even all KVM-related cpuid features before launching +a guest. + +KVM cpuid functions are: + +function: KVM_CPUID_SIGNATURE (0x4000) +returns : eax = 0, + ebx = 0x4b4d564b, + ecx = 0x564b4d56, + edx = 0x4d. +Note that this value in ebx, ecx and edx corresponds to the string KVMKVMKVM. +This function queries the presence of KVM cpuid leafs. + + +function: define KVM_CPUID_FEATURES (0x4001) +returns : ebx, ecx, edx = 0 + eax = and OR'ed group of (1 flag), where each flags is: + + +flag || value || meaning += +KVM_FEATURE_CLOCKSOURCE|| 0 || kvmclock available at msrs + || || 0x11 and 0x12. +-- +KVM_FEATURE_NOP_IO_DELAY || 1 || not necessary to perform delays + || || on PIO operations. +-- +KVM_FEATURE_MMU_OP || 2 || deprecated. +-- +KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs + || || 0x4b564d00 and 0x4b564d01 +-- +KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side + || || per-cpu warps are expected in + || || kvmclock. +-- + + -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: Only reset MMU when necessary
From: Sheng Yang sh...@linux.intel.com Only modifying some bits of CR0/CR4 needs paging mode switch. Modify EFER.NXE bit would result in reserved bit updates. Signed-off-by: Sheng Yang sh...@linux.intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b8e707f..48b0866 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -416,6 +416,10 @@ out: static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { + unsigned long old_cr0 = kvm_read_cr0(vcpu); + unsigned long update_bits = X86_CR0_PG | X86_CR0_WP | + X86_CR0_CD | X86_CR0_NW; + cr0 |= X86_CR0_ET; #ifdef CONFIG_X86_64 @@ -449,7 +453,8 @@ static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) kvm_x86_ops-set_cr0(vcpu, cr0); - kvm_mmu_reset_context(vcpu); + if ((cr0 ^ old_cr0) update_bits) + kvm_mmu_reset_context(vcpu); return 0; } @@ -487,7 +492,8 @@ int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) kvm_x86_ops-set_cr4(vcpu, cr4); - kvm_mmu_reset_context(vcpu); + if ((cr4 ^ old_cr4) pdptr_bits) + kvm_mmu_reset_context(vcpu); return 0; } @@ -693,6 +699,8 @@ static u32 emulated_msrs[] = { static int set_efer(struct kvm_vcpu *vcpu, u64 efer) { + u64 old_efer = vcpu-arch.efer; + if (efer efer_reserved_bits) return 1; @@ -724,6 +732,10 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer) vcpu-arch.mmu.base_role.nxe = (efer EFER_NX) !tdp_enabled; kvm_mmu_reset_context(vcpu); + /* Update reserved bits */ + if ((efer ^ old_efer) EFER_NX) + kvm_mmu_reset_context(vcpu); + return 0; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)
From: Mohammed Gamal m.gamal...@gmail.com This adds test acc, imm instruction to the x86 emulator Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index b43ac98..35dd57c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -181,7 +181,7 @@ static u32 opcode_table[256] = { ByteOp | SrcSI | DstDI | Mov | String, SrcSI | DstDI | Mov | String, ByteOp | SrcSI | DstDI | String, SrcSI | DstDI | String, /* 0xA8 - 0xAF */ - 0, 0, ByteOp | DstDI | Mov | String, DstDI | Mov | String, + DstAcc | SrcImmByte | ByteOp, DstAcc | SrcImm, ByteOp | DstDI | Mov | String, DstDI | Mov | String, ByteOp | SrcSI | DstAcc | Mov | String, SrcSI | DstAcc | Mov | String, ByteOp | DstDI | String, DstDI | String, /* 0xB0 - 0xB7 */ @@ -2754,6 +2754,7 @@ special_insn: } break; case 0x84 ... 0x85: + test: emulate_2op_SrcV(test, c-src, c-dst, ctxt-eflags); break; case 0x86 ... 0x87: /* xchg */ @@ -2852,6 +2853,8 @@ special_insn: c-dst.type = OP_NONE; /* Disable writeback. */ DPRINTF(cmps: mem1=0x%p mem2=0x%p\n, c-src.ptr, c-dst.ptr); goto cmp; + case 0xa8 ... 0xa9: /* test ax, imm */ + goto test; case 0xaa ... 0xab: /* stos */ c-dst.val = c-regs[VCPU_REGS_RAX]; break; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Don't allow lmsw to clear cr0.pe
From: Avi Kivity a...@redhat.com The current lmsw implementation allows the guest to clear cr0.pe, contrary to the manual, which breaks EMM386.EXE. Fix by ORing the old cr0.pe with lmsw's operand. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4b1433f..37164e7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -462,7 +462,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0); void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) { - kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0ful) | (msw 0x0f)); + kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw 0x0f)); } EXPORT_SYMBOL_GPL(kvm_lmsw); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: use proper cache object freeing function
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Use kmem_cache_free to free objects allocated by kmem_cache_alloc. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9db33f1..70566d2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -304,10 +304,11 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, return 0; } -static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc, + struct kmem_cache *cache) { while (mc-nobjs) - kfree(mc-objects[--mc-nobjs]); + kmem_cache_free(cache, mc-objects[--mc-nobjs]); } static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, @@ -355,10 +356,11 @@ out: static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) { - mmu_free_memory_cache(vcpu-arch.mmu_pte_chain_cache); - mmu_free_memory_cache(vcpu-arch.mmu_rmap_desc_cache); + mmu_free_memory_cache(vcpu-arch.mmu_pte_chain_cache, pte_chain_cache); + mmu_free_memory_cache(vcpu-arch.mmu_rmap_desc_cache, rmap_desc_cache); mmu_free_memory_cache_page(vcpu-arch.mmu_page_cache); - mmu_free_memory_cache(vcpu-arch.mmu_page_header_cache); + mmu_free_memory_cache(vcpu-arch.mmu_page_header_cache, + mmu_page_header_cache); } static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc, @@ -379,7 +381,7 @@ static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu) static void mmu_free_pte_chain(struct kvm_pte_chain *pc) { - kfree(pc); + kmem_cache_free(pte_chain_cache, pc); } static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu) @@ -390,7 +392,7 @@ static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu) static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) { - kfree(rd); + kmem_cache_free(rmap_desc_cache, rd); } /* @@ -897,7 +899,7 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) list_del(sp-link); __free_page(virt_to_page(sp-spt)); __free_page(virt_to_page(sp-gfns)); - kfree(sp); + kmem_cache_free(mmu_page_header_cache, sp); ++kvm-arch.n_free_mmu_pages; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Segregate shadow pages with different cr0.wp
From: Avi Kivity a...@redhat.com When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte having u/s=0 and r/w=1. This allows excessive access if the guest sets cr0.wp=1 and accesses through this spte. Fix by making cr0.wp part of the base role; we'll have different sptes for the two cases and the problem disappears. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt index fde7989..0e872ae 100644 --- a/Documentation/kvm/mmu.txt +++ b/Documentation/kvm/mmu.txt @@ -163,6 +163,8 @@ Shadow pages contain the following information: 32-bit or 64-bit gptes are in use). role.nxe: Contains the value of efer.nxe for which the page is valid. + role.cr0_wp: +Contains the value of cr0.wp for which the page is valid. gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5aa0944..0c06148 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -179,6 +179,7 @@ union kvm_mmu_page_role { unsigned access:3; unsigned invalid:1; unsigned nxe:1; + unsigned cr0_wp:1; }; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f0ab86a..9db33f1 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -217,7 +217,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, } EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); -static int is_write_protection(struct kvm_vcpu *vcpu) +static bool is_write_protection(struct kvm_vcpu *vcpu) { return kvm_read_cr0_bits(vcpu, X86_CR0_WP); } @@ -2435,6 +2435,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu) r = paging32_init_context(vcpu); vcpu-arch.mmu.base_role.cr4_pae = !!is_pae(vcpu); + vcpu-arch.mmu.base_role.cr0_wp = is_write_protection(vcpu); return r; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: unalias gfn before sp-gfns[] comparison in sync_page
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com sp-gfns[] contain unaliased gfns, but gpte might contain pointer to aliased region. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 11d8a16..71c73fe 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -588,7 +588,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) unsigned pte_access; pt_element_t gpte; gpa_t pte_gpa; - gfn_t gfn = sp-gfns[i]; + gfn_t gfn; if (!is_shadow_present_pte(sp-spt[i])) continue; @@ -599,8 +599,9 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) sizeof(pt_element_t))) return -EINVAL; - if (gpte_to_gfn(gpte) != gfn || !is_present_gpte(gpte) || - !(gpte PT_ACCESSED_MASK)) { + gfn = gpte_to_gfn(gpte); + if (unalias_gfn(vcpu-kvm, gfn) != sp-gfns[i] || + !is_present_gpte(gpte) || !(gpte PT_ACCESSED_MASK)) { u64 nonpresent; rmap_remove(vcpu-kvm, sp-spt[i]); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: Clean up duplicate assignment
From: Sheng Yang sh...@linux.intel.com mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the destory_kvm_mmu(). kvm_x86_ops-set_cr4() and set_efer() already assign cr4/efer to vcpu-arch.cr4/efer, no need to do it again later. Signed-off-by: Sheng Yang sh...@linux.intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index fd2c8f4..f0ab86a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2452,10 +2452,9 @@ static int init_kvm_mmu(struct kvm_vcpu *vcpu) static void destroy_kvm_mmu(struct kvm_vcpu *vcpu) { ASSERT(vcpu); - if (VALID_PAGE(vcpu-arch.mmu.root_hpa)) { + if (VALID_PAGE(vcpu-arch.mmu.root_hpa)) + /* mmu.free() should set root_hpa = INVALID_PAGE */ vcpu-arch.mmu.free(vcpu); - vcpu-arch.mmu.root_hpa = INVALID_PAGE; - } } int kvm_mmu_reset_context(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 95dc5f3..b8e707f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -486,7 +486,7 @@ int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) return 1; kvm_x86_ops-set_cr4(vcpu, cr4); - vcpu-arch.cr4 = cr4; + kvm_mmu_reset_context(vcpu); return 0; @@ -721,8 +721,6 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer) kvm_x86_ops-set_efer(vcpu, efer); - vcpu-arch.efer = efer; - vcpu-arch.mmu.base_role.nxe = (efer EFER_NX) !tdp_enabled; kvm_mmu_reset_context(vcpu); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 11f226f..b998abf 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1110,6 +1110,8 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu); int i; + vcpu_load(vcpu); + sregs-pvr = vcpu-arch.pvr; sregs-u.s.sdr1 = to_book3s(vcpu)-sdr1; @@ -1128,6 +1130,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, sregs-u.s.ppc32.dbat[i] = vcpu3s-dbat[i].raw; } } + + vcpu_put(vcpu); + return 0; } @@ -1137,6 +1142,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu); int i; + vcpu_load(vcpu); + kvmppc_set_pvr(vcpu, sregs-pvr); vcpu3s-sdr1 = sregs-u.s.sdr1; @@ -1163,6 +1170,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, /* Flush the MMU after messing with the segments */ kvmppc_mmu_pte_flush(vcpu, 0, 0); + + vcpu_put(vcpu); + return 0; } diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index c922240..a33ab8c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -485,6 +485,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { int i; + vcpu_load(vcpu); + regs-pc = vcpu-arch.pc; regs-cr = kvmppc_get_cr(vcpu); regs-ctr = vcpu-arch.ctr; @@ -505,6 +507,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) for (i = 0; i ARRAY_SIZE(regs-gpr); i++) regs-gpr[i] = kvmppc_get_gpr(vcpu, i); + vcpu_put(vcpu); + return 0; } @@ -512,6 +516,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { int i; + vcpu_load(vcpu); + vcpu-arch.pc = regs-pc; kvmppc_set_cr(vcpu, regs-cr); vcpu-arch.ctr = regs-ctr; @@ -531,6 +537,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) for (i = 0; i ARRAY_SIZE(regs-gpr); i++) kvmppc_set_gpr(vcpu, i, regs-gpr[i]); + vcpu_put(vcpu); + return 0; } @@ -559,7 +567,12 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, struct kvm_translation *tr) { - return kvmppc_core_vcpu_translate(vcpu, tr); + int r; + + vcpu_load(vcpu); + r = kvmppc_core_vcpu_translate(vcpu, tr); + vcpu_put(vcpu); + return r; } int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: s390: Centrally lock arch specific vcpu ioctls
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index e80f55e..28cd8fd 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -363,9 +363,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu) { - vcpu_load(vcpu); kvm_s390_vcpu_initial_reset(vcpu); - vcpu_put(vcpu); return 0; } @@ -415,14 +413,12 @@ static int kvm_arch_vcpu_ioctl_set_initial_psw(struct kvm_vcpu *vcpu, psw_t psw) { int rc = 0; - vcpu_load(vcpu); if (atomic_read(vcpu-arch.sie_block-cpuflags) CPUSTAT_RUNNING) rc = -EBUSY; else { vcpu-run-psw_mask = psw.mask; vcpu-run-psw_addr = psw.addr; } - vcpu_put(vcpu); return rc; } @@ -573,7 +569,7 @@ static int __guestcopy(struct kvm_vcpu *vcpu, u64 guestdest, const void *from, * KVM_S390_STORE_STATUS_NOADDR: - 0x1200 on 64 bit * KVM_S390_STORE_STATUS_PREFIXED: - prefix */ -int __kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) +static int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) { const unsigned char archmode = 1; int prefix; @@ -635,45 +631,43 @@ int __kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) return 0; } -static int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) -{ - int rc; - - vcpu_load(vcpu); - rc = __kvm_s390_vcpu_store_status(vcpu, addr); - vcpu_put(vcpu); - return rc; -} - long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { struct kvm_vcpu *vcpu = filp-private_data; void __user *argp = (void __user *)arg; + long r; - switch (ioctl) { - case KVM_S390_INTERRUPT: { + if (ioctl == KVM_S390_INTERRUPT) { struct kvm_s390_interrupt s390int; if (copy_from_user(s390int, argp, sizeof(s390int))) return -EFAULT; return kvm_s390_inject_vcpu(vcpu, s390int); } + + vcpu_load(vcpu); + switch (ioctl) { case KVM_S390_STORE_STATUS: - return kvm_s390_vcpu_store_status(vcpu, arg); + r = kvm_s390_vcpu_store_status(vcpu, arg); + break; case KVM_S390_SET_INITIAL_PSW: { psw_t psw; + r = -EFAULT; if (copy_from_user(psw, argp, sizeof(psw))) - return -EFAULT; - return kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw); + break; + r = kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw); + break; } case KVM_S390_INITIAL_RESET: - return kvm_arch_vcpu_ioctl_initial_reset(vcpu); + r = kvm_arch_vcpu_ioctl_initial_reset(vcpu); + break; default: - ; + r = -EINVAL; } - return -EINVAL; + vcpu_put(vcpu); + return r; } /* Section: memory related */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: Add missing locking to arch specific vcpu ioctls
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bfe0730..7167109 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1854,6 +1854,7 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, { int r; + vcpu_load(vcpu); r = -E2BIG; if (cpuid-nent vcpu-arch.cpuid_nent) goto out; @@ -1865,6 +1866,7 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, out: cpuid-nent = vcpu-arch.cpuid_nent; + vcpu_put(vcpu); return r; } @@ -2155,6 +2157,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, int r; unsigned bank_num = mcg_cap 0xff, bank; + vcpu_load(vcpu); r = -EINVAL; if (!bank_num || bank_num = KVM_MAX_MCE_BANKS) goto out; @@ -2169,6 +2172,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, for (bank = 0; bank bank_num; bank++) vcpu-arch.mce_banks[bank*4] = ~(u64)0; out: + vcpu_put(vcpu); return r; } @@ -2477,7 +2481,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(mce, argp, sizeof mce)) goto out; + vcpu_load(vcpu); r = kvm_vcpu_ioctl_x86_set_mce(vcpu, mce); + vcpu_put(vcpu); break; } case KVM_GET_VCPU_EVENTS: { -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: pass correct parameter to kvm_mmu_free_some_pages
From: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 604eb3f..fd2c8f4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2067,7 +2067,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = 0; } spin_lock(vcpu-kvm-mmu_lock); - kvm_mmu_free_some_pages(vcpu-kvm); + kvm_mmu_free_some_pages(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, direct, ACC_ALL, NULL); @@ -2098,7 +2098,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = i 30; } spin_lock(vcpu-kvm-mmu_lock); - kvm_mmu_free_some_pages(vcpu-kvm); + kvm_mmu_free_some_pages(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: Check LMA bit before set_efer
From: Sheng Yang sh...@linux.intel.com kvm_x86_ops-set_efer() would execute vcpu-arch.efer = efer, so the checking of LMA bit didn't work. Signed-off-by: Sheng Yang sh...@linux.intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 37164e7..95dc5f3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -716,11 +716,11 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer) return 1; } - kvm_x86_ops-set_efer(vcpu, efer); - efer = ~EFER_LMA; efer |= vcpu-arch.efer EFER_LMA; + kvm_x86_ops-set_efer(vcpu, efer); + vcpu-arch.efer = efer; vcpu-arch.mmu.base_role.nxe = (efer EFER_NX) !tdp_enabled; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: Lock arch specific vcpu ioctls centrally
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 75a6e8a..ce4e943 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1541,16 +1541,12 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, { int i, idx; - vcpu_load(vcpu); - idx = srcu_read_lock(vcpu-kvm-srcu); for (i = 0; i msrs-nmsrs; ++i) if (do_msr(vcpu, entries[i].index, entries[i].data)) break; srcu_read_unlock(vcpu-kvm-srcu, idx); - vcpu_put(vcpu); - return i; } @@ -1798,7 +1794,6 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, if (copy_from_user(cpuid_entries, entries, cpuid-nent * sizeof(struct kvm_cpuid_entry))) goto out_free; - vcpu_load(vcpu); for (i = 0; i cpuid-nent; i++) { vcpu-arch.cpuid_entries[i].function = cpuid_entries[i].function; vcpu-arch.cpuid_entries[i].eax = cpuid_entries[i].eax; @@ -1816,7 +1811,6 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, r = 0; kvm_apic_set_version(vcpu); kvm_x86_ops-cpuid_update(vcpu); - vcpu_put(vcpu); out_free: vfree(cpuid_entries); @@ -1837,11 +1831,9 @@ static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, if (copy_from_user(vcpu-arch.cpuid_entries, entries, cpuid-nent * sizeof(struct kvm_cpuid_entry2))) goto out; - vcpu_load(vcpu); vcpu-arch.cpuid_nent = cpuid-nent; kvm_apic_set_version(vcpu); kvm_x86_ops-cpuid_update(vcpu); - vcpu_put(vcpu); return 0; out: @@ -1854,7 +1846,6 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, { int r; - vcpu_load(vcpu); r = -E2BIG; if (cpuid-nent vcpu-arch.cpuid_nent) goto out; @@ -1866,7 +1857,6 @@ static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, out: cpuid-nent = vcpu-arch.cpuid_nent; - vcpu_put(vcpu); return r; } @@ -2098,9 +2088,7 @@ out: static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) { - vcpu_load(vcpu); memcpy(s-regs, vcpu-arch.apic-regs, sizeof *s); - vcpu_put(vcpu); return 0; } @@ -2108,11 +2096,9 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) { - vcpu_load(vcpu); memcpy(vcpu-arch.apic-regs, s-regs, sizeof *s); kvm_apic_post_state_restore(vcpu); update_cr8_intercept(vcpu); - vcpu_put(vcpu); return 0; } @@ -2124,20 +2110,15 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, return -EINVAL; if (irqchip_in_kernel(vcpu-kvm)) return -ENXIO; - vcpu_load(vcpu); kvm_queue_interrupt(vcpu, irq-irq, false); - vcpu_put(vcpu); - return 0; } static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) { - vcpu_load(vcpu); kvm_inject_nmi(vcpu); - vcpu_put(vcpu); return 0; } @@ -2157,7 +2138,6 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, int r; unsigned bank_num = mcg_cap 0xff, bank; - vcpu_load(vcpu); r = -EINVAL; if (!bank_num || bank_num = KVM_MAX_MCE_BANKS) goto out; @@ -2172,7 +2152,6 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, for (bank = 0; bank bank_num; bank++) vcpu-arch.mce_banks[bank*4] = ~(u64)0; out: - vcpu_put(vcpu); return r; } @@ -2230,8 +2209,6 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu, static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, struct kvm_vcpu_events *events) { - vcpu_load(vcpu); - events-exception.injected = vcpu-arch.exception.pending !kvm_exception_is_soft(vcpu-arch.exception.nr); @@ -2256,8 +2233,6 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, events-flags = (KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR | KVM_VCPUEVENT_VALID_SHADOW); - - vcpu_put(vcpu); } static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, @@ -2268,8 +2243,6 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, | KVM_VCPUEVENT_VALID_SHADOW)) return -EINVAL; - vcpu_load(vcpu); - vcpu-arch.exception.pending = events-exception.injected; vcpu-arch.exception.nr = events-exception.nr;
[COMMIT master] KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq
From: Alex Williamson alex.william...@redhat.com Remove this check in an effort to allow kvm guests to run without root privileges. This capability check doesn't seem to add any security since the device needs to have already been added via the assign device ioctl and the io actually occurs through the pci sysfs interface. Signed-off-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 4d10b1e..64672e2 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -448,9 +448,6 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm, struct kvm_assigned_dev_kernel *match; unsigned long host_irq_type, guest_irq_type; - if (!capable(CAP_SYS_RAWIO)) - return -EPERM; - if (!irqchip_in_kernel(kvm)) return r; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86: cleanup unused local variable
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com fix: arch/x86/kvm/x86.c: In function ‘handle_emulation_failure’: arch/x86/kvm/x86.c:3844: warning: unused variable ‘ctxt’ Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 48b0866..bfe0730 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3851,8 +3851,6 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu) static int handle_emulation_failure(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt = vcpu-arch.emulate_ctxt; - ++vcpu-stat.insn_emulation_fail; trace_kvm_emulate_insn_failed(vcpu); vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Consolidate arch specific vcpu ioctl locking
From: Avi Kivity a...@redhat.com Now that all arch specific ioctls have centralized locking, it is easy to move it to the central dispatcher. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index caeed7b..a1d8750 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -512,17 +512,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, void __user *argp = (void __user *)arg; long r; - if (ioctl == KVM_INTERRUPT) { + switch (ioctl) { + case KVM_INTERRUPT: { struct kvm_interrupt irq; r = -EFAULT; if (copy_from_user(irq, argp, sizeof(irq))) - goto out_nolock; + goto out; r = kvm_vcpu_ioctl_interrupt(vcpu, irq); - goto out_nolock; + goto out; } - vcpu_load(vcpu); - switch (ioctl) { case KVM_ENABLE_CAP: { struct kvm_enable_cap cap; @@ -537,8 +536,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp, } out: - vcpu_put(vcpu); -out_nolock: return r; } diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 28cd8fd..fad1024 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -638,16 +638,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, void __user *argp = (void __user *)arg; long r; - if (ioctl == KVM_S390_INTERRUPT) { + switch (ioctl) { + case KVM_S390_INTERRUPT: { struct kvm_s390_interrupt s390int; + r = -EFAULT; if (copy_from_user(s390int, argp, sizeof(s390int))) - return -EFAULT; - return kvm_s390_inject_vcpu(vcpu, s390int); + break; + r = kvm_s390_inject_vcpu(vcpu, s390int); + break; } - - vcpu_load(vcpu); - switch (ioctl) { case KVM_S390_STORE_STATUS: r = kvm_s390_vcpu_store_status(vcpu, arg); break; @@ -666,7 +666,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp, default: r = -EINVAL; } - vcpu_put(vcpu); return r; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ce4e943..7500cba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2298,7 +2298,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp, int r; struct kvm_lapic_state *lapic = NULL; - vcpu_load(vcpu); switch (ioctl) { case KVM_GET_LAPIC: { r = -EINVAL; @@ -2496,7 +2495,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = -EINVAL; } out: - vcpu_put(vcpu); kfree(lapic); return r; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 08b2ccd..5ee558c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1564,9 +1564,7 @@ out_free2: break; } default: - vcpu_put(vcpu); r = kvm_arch_vcpu_ioctl(filp, ioctl, arg); - vcpu_load(vcpu); } out: vcpu_put(vcpu); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 05/16/2010 04:00 AM, Alexander Graf wrote: On 15.05.2010, at 19:30, Avi Kivity wrote: On 05/15/2010 11:26 AM, Alexander Graf wrote: That means you never inject an interrupt from the iothread (or from a different vcpu thread)? If that's the case we might make it part of the API and require the ioctl to be issued from the vcpu thread. We'd still be left with the s390 exception. Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the user base is small enough to ignore them. Either way, I'm actually rather unhappy with the way interrupts work right now. We're only injecting interrupts when in the main loop, which is rare if we did our homework right. So what I'd really like to see is that the MPIC on ppc directly invokes KVM_INTERRUPT to pull (and losen) the interrupt line. That way we can't just accidently miss interrupts. on x86 we signal the vcpu thread to pull it out of the main loop and poll the apic. Hrm, makes sense. Though it's additional overhead of a task switch. Why take the burden if you don't have to? That's what the world looked like in 2006. We could change it, but there's not much point, since having the local apic in the kernel is pretty much a requirement for reasonable performance. That's exactly why x86 has run-request_interrupt_window, run-ready_for_interrupt_injection, and run-if_flag. So how does x86 userspace get notified when it has an interrupt pending but couldn't inject it? Without a notification, we delay interrupts by quite some time. run-ready_for_interrupt_injection and run-request_irq_window. - we need to exit to userspace to realize that the interrupt is still active This is fundamentally broken. What I'd like to see is: * device in userspace wants to trigger an interrupt * mpic pulls external interrupt line, automatically issues ioctl(KVM_INTERRUPT) * we enter the guest * we inject the external interrupt * guest gets the interrupt, sees it's lazily disabled, sets msr.ee=0 * guest moves on, sets msr.ee=1 again later * we inject the external interrupt again, since it's still active * guest acknowledges the interrupt with the mpic, issues ioctl(KVM_INTERRUPT, disable) - all is great This is similar to KVM_IRQ_LINE. Well, KVM_IRQ_LINE would be the in-kernel pic, no? That'd mean I'd have to reimplement the mpic in kernel space - which might eventually be a good idea when going for SMP, but I'd first like to see if I can keep the current interrupt injection path efficient. Sure. For that to work we need to enable triggering KVM_INTERRUPT from a non-vcpu thread. KVM_IRQ_LINE is a vm ioctl instead of a vcpu ioctl. The motivation for the strict 'issue vcpu ioctls from vcpu thread' rule is to prepare the way for a syscall (instead of ioctl) API. Then we lost the fd argument, and choosing the vcpu is done by associating it with the current task. That allows us to get rid of vcpu-mutex and fget_light() (as well as the ioctl dispatch). If we define the API to only work on the current vcpu with a few excetions where we're safe (KVM_INTERRUPT), that'd get rid of the mutex too, no? Yes. Need to document it though. What does fget_light do? Make sure that the vcpu fd doesn't disappear. And does the ioctl dispatch cost that much? I like the flexibility of it to be honest. Not much, which is why there's no movement in that direction. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] [PATCHv2] Support for booting from virtio disks
On Thu, May 13, 2010 at 07:49:40PM +0300, Avi Kivity wrote: On 05/10/2010 06:58 PM, Anthony Liguori wrote: Isn't this problem unrelated to this patch? I mean if I start qemu with two ide devices can I specify from qemu command line which one I want to boot from? That's sort of what I'm asking. If you compare this approach to extboot, extboot provided a capability to select a disk. I think it can be argued though that this isn't a necessary feature to carry over and I'm looking for additional opinions on that. I'd say it's a necessary feature, but not one to carry over from the extboot implementation. We have the seabios boot menu (how to reach it?), we need to store the nvram persistently, and we need to extend the selection menu to qemu, but that's unrelated to this patch. To reach seabios boot menu run qemu with -boot menu=on option and press f12 when prompted. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 16.05.2010, at 10:23, Avi Kivity wrote: On 05/16/2010 04:00 AM, Alexander Graf wrote: On 15.05.2010, at 19:30, Avi Kivity wrote: On 05/15/2010 11:26 AM, Alexander Graf wrote: That means you never inject an interrupt from the iothread (or from a different vcpu thread)? If that's the case we might make it part of the API and require the ioctl to be issued from the vcpu thread. We'd still be left with the s390 exception. Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the user base is small enough to ignore them. Either way, I'm actually rather unhappy with the way interrupts work right now. We're only injecting interrupts when in the main loop, which is rare if we did our homework right. So what I'd really like to see is that the MPIC on ppc directly invokes KVM_INTERRUPT to pull (and losen) the interrupt line. That way we can't just accidently miss interrupts. on x86 we signal the vcpu thread to pull it out of the main loop and poll the apic. Hrm, makes sense. Though it's additional overhead of a task switch. Why take the burden if you don't have to? That's what the world looked like in 2006. We could change it, but there's not much point, since having the local apic in the kernel is pretty much a requirement for reasonable performance. Well, I'm not convinced yet that's the case for PPC as well. The timer is in-cpu anyways and I don't see why IPIs should be slow with a userspace pic - if we keep the overhead low. So let me think this through. With remote interrupt injection we have. * thread 1 does vcpu_run * thread 2 triggers KVM_INTERRUPT on fd * thread 2 signals thread 1 so we're sure the interrupt gets injected * thread 1 exits into qemu * thread 1 goes back into the vcpu, triggering an interrupt Without we have: * thread 1 does vcpu_run * thread 2 wants to trigger an an interrupt, sets the qemu internal bit * thread 2 signals thread 1 so we're sure the interrupt gets processed * thread 1 exits into qemu * thread 1 triggers KVM_INTERRUPT on fd * thread 1 goes into the vcpu So we don't really buy anything from doing the remote injection. Hrm. What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all those kvm_run fields? Can't we just do interrupt injection by setting run-trigger_interrupt? There's only a single interrupt line on the CPU anyways. That way we'd save the ioctl and get rid of the locking problem altogether. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 05/16/2010 12:01 PM, Alexander Graf wrote: That's what the world looked like in 2006. We could change it, but there's not much point, since having the local apic in the kernel is pretty much a requirement for reasonable performance. Well, I'm not convinced yet that's the case for PPC as well. The timer is in-cpu anyways and I don't see why IPIs should be slow with a userspace pic - if we keep the overhead low. If it's at all possible keep the mpic out. I am _not_ advocating pushing ppc's mpic into the kernel. So let me think this through. With remote interrupt injection we have. * thread 1 does vcpu_run * thread 2 triggers KVM_INTERRUPT on fd * thread 2 signals thread 1 so we're sure the interrupt gets injected * thread 1 exits into qemu This doesn't seem necessary. The kernel can own the interrupt line, so it remembers it from the last KVM_INTERRUPT. * thread 1 goes back into the vcpu, triggering an interrupt Without we have: * thread 1 does vcpu_run * thread 2 wants to trigger an an interrupt, sets the qemu internal bit * thread 2 signals thread 1 so we're sure the interrupt gets processed * thread 1 exits into qemu * thread 1 triggers KVM_INTERRUPT on fd * thread 1 goes into the vcpu So we don't really buy anything from doing the remote injection. Hrm. Not if you make interrupt injection a lightweight exit. What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all those kvm_run fields? Can't we just do interrupt injection by setting run-trigger_interrupt? There's only a single interrupt line on the CPU anyways. That way we'd save the ioctl and get rid of the locking problem altogether. That's what x86 does. However, it's synchronous. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 16.05.2010, at 11:09, Avi Kivity wrote: On 05/16/2010 12:01 PM, Alexander Graf wrote: That's what the world looked like in 2006. We could change it, but there's not much point, since having the local apic in the kernel is pretty much a requirement for reasonable performance. Well, I'm not convinced yet that's the case for PPC as well. The timer is in-cpu anyways and I don't see why IPIs should be slow with a userspace pic - if we keep the overhead low. If it's at all possible keep the mpic out. I am _not_ advocating pushing ppc's mpic into the kernel. So let me think this through. With remote interrupt injection we have. * thread 1 does vcpu_run * thread 2 triggers KVM_INTERRUPT on fd * thread 2 signals thread 1 so we're sure the interrupt gets injected * thread 1 exits into qemu This doesn't seem necessary. The kernel can own the interrupt line, so it remembers it from the last KVM_INTERRUPT. It's not? On signals we always exit to userspace, no? * thread 1 goes back into the vcpu, triggering an interrupt Without we have: * thread 1 does vcpu_run * thread 2 wants to trigger an an interrupt, sets the qemu internal bit * thread 2 signals thread 1 so we're sure the interrupt gets processed * thread 1 exits into qemu * thread 1 triggers KVM_INTERRUPT on fd * thread 1 goes into the vcpu So we don't really buy anything from doing the remote injection. Hrm. Not if you make interrupt injection a lightweight exit. Please elaborate. What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all those kvm_run fields? Can't we just do interrupt injection by setting run-trigger_interrupt? There's only a single interrupt line on the CPU anyways. That way we'd save the ioctl and get rid of the locking problem altogether. That's what x86 does. However, it's synchronous. For everyone except for the vcpu thread executing the interrupt, it's asynchronous, right? The same applies to an in-kernel pic. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 05/16/2010 12:35 PM, Alexander Graf wrote: So let me think this through. With remote interrupt injection we have. * thread 1 does vcpu_run * thread 2 triggers KVM_INTERRUPT on fd * thread 2 signals thread 1 so we're sure the interrupt gets injected * thread 1 exits into qemu This doesn't seem necessary. The kernel can own the interrupt line, so it remembers it from the last KVM_INTERRUPT. It's not? With s/signals/IPIs/. On signals we always exit to userspace, no? Yes (if the signal isn't blocked). * thread 1 goes back into the vcpu, triggering an interrupt Without we have: * thread 1 does vcpu_run * thread 2 wants to trigger an an interrupt, sets the qemu internal bit * thread 2 signals thread 1 so we're sure the interrupt gets processed * thread 1 exits into qemu * thread 1 triggers KVM_INTERRUPT on fd * thread 1 goes into the vcpu So we don't really buy anything from doing the remote injection. Hrm. Not if you make interrupt injection a lightweight exit. Please elaborate. 1: vcpu_run 2: KVM_INTERRUPT 2k: sets flag, if msr.ee IPIs 1 or wakes up 1 if halted 1k: notices flag, if msr.ee injects interrupt ... 1g: acks 1k: forwards ack to userspace 1: completes interrupt What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all those kvm_run fields? Can't we just do interrupt injection by setting run-trigger_interrupt? There's only a single interrupt line on the CPU anyways. That way we'd save the ioctl and get rid of the locking problem altogether. That's what x86 does. However, it's synchronous. For everyone except for the vcpu thread executing the interrupt, it's asynchronous, right? For everyone other than the vcpu thread, it's off limits. kvm_run is only read on KVM_RUN entries and written on KVM_RUN exits. The same applies to an in-kernel pic. The in-kernel pic doesn't use kvm_run. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 16.05.2010, at 11:47, Avi Kivity wrote: 1: vcpu_run 2: KVM_INTERRUPT 2k: sets flag, if msr.ee IPIs 1 or wakes up 1 if halted Doesn't that break when we have a while(1) loop in the guest with msr.ee=0 while no timer is scheduled on the host? But then again with msr.ee=0 we don't get interrupts in the guest and to set msr.ee=1 we trap. Yeah, that would work. 1k: notices flag, if msr.ee injects interrupt ... 1g: acks The ack is done in userspace by the mpic, so we can just complete the interrupt there. 1k: forwards ack to userspace 1: completes interrupt So if I just have a field kvm_run-external_active I could set that to =1 on KVM_INTERRUPT including the above logic. To acknowledge it userspace would then do something like this in kvm_arch_pre_run: if (kvm_run-external_active !((env-interrupt_request CPU_INTERRUPT_HARD) (env-irq_input_state (1PPC_INPUT_INT { kvm_run-external_active = 0; } The big question is how to make such a change backwards compatible. But I guess I could just reuse the feature enabling framework. Well, sounds like we're getting closer. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] KVM: move vcpu locking to dispatcher for generic vcpu ioctls
On 05/15/2010 03:03 AM, Marcelo Tosatti wrote: On Thu, May 13, 2010 at 02:17:35PM +0300, Avi Kivity wrote: All vcpu ioctls need to be locked, so instead of locking each one specifically we lock at the generic dispatcher. This patch only updates generic ioctls and leaves arch specific ioctls alone. Forgot mp_state get/set. Right, fixed and applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: system_powerdown not working for qemu-kvm 0.12.4?
On 05/15/2010 04:19 AM, Teck Choon Giam wrote: Hi, Anyone encountered the same issue as me about system_powerdown no longer working since upgraded to qemu-kvm 0.12.4? Compared with what version? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?
On 05/12/2010 02:11 AM, Juan Quintela wrote: Peter Lievenp...@dlh.net wrote: Hi Qemu/KVM Devel Team, Live Migration from a 0.12.2 qemu-kvm to a 0.12.3 (and 0.12.4) does not work: load of migration failed Is there any way to find out, why exactly it fails? I have a lot of VMs running on 0.12.2 and would like to migrate them to 0.12.4 cmdline: -net tap,vlan=6,script=no,downscript=no,ifname=tap7 -net nic,vlan=6,model=e1000,macaddr=52:54:00:fe:00:88 -net tap,vlan=651,script=no,downscript=no,ifname=tap8 -net nic,vlan=651,model=e1000,macaddr=52:54:00:ff:00:69 -drive file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-b6eca7604-6280020bc4b4bde8-quagga,if=ide,boot=on,cache=none,aio=native -m 256 -monitor tcp:0:4090,server,nowait -vnc :90 -name 'Quagga' -boot order=dc,menu=on -k de -incoming tcp:172.21.59.132:5090 -pidfile /var/run/qemu/vm-148.pid -rtc base=utc,clock=vm -usb -usbdevice tablet -no-kvm-irqchip -vga cirrus Any hints would be appreciated! Can you try reverting this patch? commit 3fa017e24b0a0f0e68619a689b9b02fe486dae9e Author: Marcelo Tosattimtosa...@redhat.com Date: Thu Feb 11 18:19:44 2010 -0200 ide save/restore pio/atapi cmd transfer fields and io buffer and told me if it works? Any idea why it fails? And how to fix it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-intel does not load when Intel TXT is enabled
On 05/14/2010 09:24 PM, Andrej Podzimek wrote: Hello, when I enable Intel TXT (Trusted Execution Technology) in the BIOS settings of my Lenovo W510, the kvm-intel module does not load and says 'kvm: disabled by bios', despite the fact that both VT-x and VT-d are enabled. Disabling TXT enables hardware virtualization and kvm-intel works as expected. Are VT-[xd] and TXT supposed to be mutually exclusive? Or could it be a BIOS bug? Running kvm under txt needs special support in the kernel; it will be available in 2.6.35. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [block]: Fix scsi-generic breakage in find_image_format()
On Sat, May 15, 2010 at 06:30:52AM -0700, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a special BlockDriverState-sg check in block.c:find_image_format() after bdrv_file_open() - block/raw-posix.c:hdev_open() has been called to determine if we are dealing with a Linux host scsi-generic device or not. The patch then returns the BlockDriver * from find_protocol(), skipping the subsequent bdrv_read() and rest of find_image_format(). That's not quite correct as we don't want to expose formats directly. Returning bdrv_find_format(raw); should fix it for now, although I really hate having these special cases in block.c. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] [block]: Skip refresh_total_sectors() for scsi-generic devices
On Sat, May 15, 2010 at 06:30:59AM -0700, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a BlockDriverState-sg check in block.c:bdrv_common_open() to skip the new refresh_total_sectors() call once we know we are working with a scsi-generic device. We go ahead and skip this call for scsi-generic devices because block/raw-posix.c:raw_getlength() - lseek() will return -ESPIPE. How about moving that check into refresh_total_sectors? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?
Avi Kivity a...@redhat.com wrote: On 05/12/2010 02:11 AM, Juan Quintela wrote: Peter Lievenp...@dlh.net wrote: Hi Qemu/KVM Devel Team, Live Migration from a 0.12.2 qemu-kvm to a 0.12.3 (and 0.12.4) does not work: load of migration failed Is there any way to find out, why exactly it fails? I have a lot of VMs running on 0.12.2 and would like to migrate them to 0.12.4 cmdline: -net tap,vlan=6,script=no,downscript=no,ifname=tap7 -net nic,vlan=6,model=e1000,macaddr=52:54:00:fe:00:88 -net tap,vlan=651,script=no,downscript=no,ifname=tap8 -net nic,vlan=651,model=e1000,macaddr=52:54:00:ff:00:69 -drive file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-b6eca7604-6280020bc4b4bde8-quagga,if=ide,boot=on,cache=none,aio=native -m 256 -monitor tcp:0:4090,server,nowait -vnc :90 -name 'Quagga' -boot order=dc,menu=on -k de -incoming tcp:172.21.59.132:5090 -pidfile /var/run/qemu/vm-148.pid -rtc base=utc,clock=vm -usb -usbdevice tablet -no-kvm-irqchip -vga cirrus Any hints would be appreciated! Can you try reverting this patch? commit 3fa017e24b0a0f0e68619a689b9b02fe486dae9e Author: Marcelo Tosattimtosa...@redhat.com Date: Thu Feb 11 18:19:44 2010 -0200 ide save/restore pio/atapi cmd transfer fields and io buffer and told me if it works? Any idea why it fails? And how to fix it? Lack of proper subsections. IDE is something like: const VMStateDescription vmstate_ide_drive = { .version_id = 4, } static const VMStateDescription vmstate_bmdma = { .name = ide bmdma, .version_id = 4, ... } const VMStateDescription vmstate_ide_pci = { .name = ide, .version_id = 4, VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0, vmstate_bmdma, BMDMAState), VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState), VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState), } Notice that everything is at version 4. It used to be everything at version 3. Now the problem is that when migrating from v3 - v4. We put in one place v3, But we only have a version number at the toplevel, rest of subsections don't sent a version number. There is no way to fix it in the general case. We can hack something around for ide, but that will just be a hack, or we can backport marcelo change and port it as a proper subsection (that is my plan). I expect to have time at the end of next time to work on this. So, to make the story short: I know what is happening, and I know how to fix it, just that fix is not trivial. I just need time. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?
On 05/16/2010 05:42 PM, Juan Quintela wrote: Any idea why it fails? And how to fix it? Lack of proper subsections. IDE is something like: const VMStateDescription vmstate_ide_drive = { .version_id = 4, } static const VMStateDescription vmstate_bmdma = { .name = ide bmdma, .version_id = 4, ... } const VMStateDescription vmstate_ide_pci = { .name = ide, .version_id = 4, VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0, vmstate_bmdma, BMDMAState), VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState), VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState), } Notice that everything is at version 4. It used to be everything at version 3. Now the problem is that when migrating from v3 - v4. We put in one place v3, But we only have a version number at the toplevel, rest of subsections don't sent a version number. There is no way to fix it in the general case. We can hack something around for ide, but that will just be a hack, or we can backport marcelo change and port it as a proper subsection (that is my plan). I expect to have time at the end of next time to work on this. end of next week? So, to make the story short: I know what is happening, and I know how to fix it, just that fix is not trivial. I just need time. Meanwhile, we have a broken 0.12.4. Is there a quick'n'dirty workaround that will be forward compatible with the real fix that we can push out? We've regressed from failing some migrations to failing all migrations. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?
Juan Quintela wrote: Lack of proper subsections. IDE is something like: const VMStateDescription vmstate_ide_drive = { .version_id = 4, } static const VMStateDescription vmstate_bmdma = { .name = ide bmdma, .version_id = 4, ... } const VMStateDescription vmstate_ide_pci = { .name = ide, .version_id = 4, VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0, vmstate_bmdma, BMDMAState), VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState), VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState), } Notice that everything is at version 4. It used to be everything at version 3. Now the problem is that when migrating from v3 - v4. We put in one place v3, But we only have a version number at the toplevel, rest of subsections don't sent a version number. There is no way to fix it in the general case. We can hack something around for ide, but that will just be a hack, or we can backport marcelo change and port it as a proper subsection (that is my plan). I expect to have time at the end of next time to work on this. BTW, the IDE subsystem is yet lacking a proper vmstate section split-up along qdev boundaries (ie. vmstate_ide_pci should not contain drive structures). Do you plan to address this as well? Jan signature.asc Description: OpenPGP digital signature
Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?
Avi Kivity a...@redhat.com wrote: On 05/16/2010 05:42 PM, Juan Quintela wrote: Any idea why it fails? And how to fix it? Lack of proper subsections. IDE is something like: const VMStateDescription vmstate_ide_drive = { .version_id = 4, } static const VMStateDescription vmstate_bmdma = { .name = ide bmdma, .version_id = 4, ... } const VMStateDescription vmstate_ide_pci = { .name = ide, .version_id = 4, VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0, vmstate_bmdma, BMDMAState), VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState), VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState), } Notice that everything is at version 4. It used to be everything at version 3. Now the problem is that when migrating from v3 - v4. We put in one place v3, But we only have a version number at the toplevel, rest of subsections don't sent a version number. There is no way to fix it in the general case. We can hack something around for ide, but that will just be a hack, or we can backport marcelo change and port it as a proper subsection (that is my plan). I expect to have time at the end of next time to work on this. end of next week? Humm, for Spaniards weeks start on Monday :) (Monday is holiday here). I mean here Friday 21. So, to make the story short: I know what is happening, and I know how to fix it, just that fix is not trivial. I just need time. Meanwhile, we have a broken 0.12.4. Is there a quick'n'dirty workaround that will be forward compatible with the real fix that we can push out? revert the patch. It almost never happen (being in the middle of one IO) while migrating. We've regressed from failing some migrations to failing all migrations. Humm, 0.12.4 - 0.12.4 should work. My advise is just revert the patch and live with it for another week, what do you think? Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?
Jan Kiszka jan.kis...@web.de wrote: Juan Quintela wrote: Lack of proper subsections. IDE is something like: const VMStateDescription vmstate_ide_drive = { .version_id = 4, } static const VMStateDescription vmstate_bmdma = { .name = ide bmdma, .version_id = 4, ... } const VMStateDescription vmstate_ide_pci = { .name = ide, .version_id = 4, VMSTATE_STRUCT_ARRAY(bmdma, PCIIDEState, 2, 0, vmstate_bmdma, BMDMAState), VMSTATE_IDE_DRIVES(bus[0].ifs, PCIIDEState), VMSTATE_IDE_DRIVES(bus[1].ifs, PCIIDEState), } Notice that everything is at version 4. It used to be everything at version 3. Now the problem is that when migrating from v3 - v4. We put in one place v3, But we only have a version number at the toplevel, rest of subsections don't sent a version number. There is no way to fix it in the general case. We can hack something around for ide, but that will just be a hack, or we can backport marcelo change and port it as a proper subsection (that is my plan). I expect to have time at the end of next time to work on this. BTW, the IDE subsystem is yet lacking a proper vmstate section split-up along qdev boundaries (ie. vmstate_ide_pci should not contain drive structures). Do you plan to address this as well? Not for Friday, and not for 0.12. That is 0.13 material, and have to get one agreement on how to go. We can go for: - good structure - backward compatibility I can't see any good way to get both at this stage :( But I am open to sugestions. Later, Juan. PD. BTW, very good work with printing the vmstate, that was one of the goals when we added it, that was the next step after porting everything to vmstate :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [block]: Fix scsi-generic breakage in find_image_format()
On Sun, 2010-05-16 at 15:29 +0200, Christoph Hellwig wrote: On Sat, May 15, 2010 at 06:30:52AM -0700, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a special BlockDriverState-sg check in block.c:find_image_format() after bdrv_file_open() - block/raw-posix.c:hdev_open() has been called to determine if we are dealing with a Linux host scsi-generic device or not. The patch then returns the BlockDriver * from find_protocol(), skipping the subsequent bdrv_read() and rest of find_image_format(). That's not quite correct as we don't want to expose formats directly. Returning bdrv_find_format(raw); should fix it for now, Ahh, that makes more sense.. Attached is updated patch #1: [PATCH 1/2] [block]: Make find_image_format() return 'raw' BlockDriver for SG_IO devices although I really hate having these special cases in block.c. Indeed having BlockDriverState-sg kinda feels kinda like a hack to begin with, but it appears this is currently QEMU's way of telling block code to disable internal SCSI CDB emulation for devices. Best, --nab 0001--block-Make-find_image_format-return-raw-Block.patch Description: application/mbox
Re: [PATCH 2/2] [block]: Skip refresh_total_sectors() for scsi-generic devices
On Sun, 2010-05-16 at 15:30 +0200, Christoph Hellwig wrote: On Sat, May 15, 2010 at 06:30:59AM -0700, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a BlockDriverState-sg check in block.c:bdrv_common_open() to skip the new refresh_total_sectors() call once we know we are working with a scsi-generic device. We go ahead and skip this call for scsi-generic devices because block/raw-posix.c:raw_getlength() - lseek() will return -ESPIPE. How about moving that check into refresh_total_sectors? Sounds good, attached is updated patch #2: [PATCH 2/2] [block]: Add SG_IO device check in refresh_total_sectors() Also, I am using the options in .vimrc to follow QEMU's indent style: set smartindent set tabstop=4 set shiftwidth=4 set expandtab I assume this is what should be used when sending QEMU patches, yes..? Best, --nab 0002--block-Add-SG_IO-device-check-in-refresh_total_sec.patch Description: application/mbox
[PATCH 0/2] Fix scsi-generic breakage in upstream qemu-kvm.git
From: Nicholas Bellinger n...@linux-iscsi.org Greetings, Attached are the updated patches following hch's comments to fix scsi-generic device breakage with find_image_format() and refresh_total_sectors(). These are being resent as the last attachments where in MBOX format from git-format-patch. Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org Nicholas Bellinger (2): [block]: Make find_image_format() return 'raw' BlockDriver for SG_IO devices [block]: Add SG_IO device check in refresh_total_sectors() block.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [block]: Add SG_IO device check in refresh_total_sectors()
From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a special case check for scsi-generic devices in refresh_total_sectors() to skip the subsequent BlockDriver-bdrv_getlength() that will be returning -ESPIPE from block/raw-posic.c:raw_getlength() for BlockDriverState-sg=1 devices. Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org --- block.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index f419ee6..7917712 100644 --- a/block.c +++ b/block.c @@ -364,6 +364,10 @@ static int refresh_total_sectors(BlockDriverState *bs, int64_t hint) { BlockDriver *drv = bs-drv; +/* Do not attempt drv-bdrv_getlength() on scsi-generic devices */ +if (bs-sg) +return 0; + /* query actual device if possible, otherwise just trust the hint */ if (drv-bdrv_getlength) { int64_t length = drv-bdrv_getlength(bs); -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [block]: Make find_image_format() return 'raw' BlockDriver for SG_IO devices
From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a special BlockDriverState-sg check in block.c:find_image_format() after bdrv_file_open() - block/raw-posix.c:hdev_open() has been called to determine if we are dealing with a Linux host scsi-generic device. The patch then returns the BlockDriver * from bdrv_find_format(raw), skipping the subsequent bdrv_read() and rest of find_image_format(). Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org --- block.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 48305b7..f419ee6 100644 --- a/block.c +++ b/block.c @@ -332,6 +332,11 @@ static BlockDriver *find_image_format(const char *filename) ret = bdrv_file_open(bs, filename, 0); if (ret 0) return NULL; + +/* Return the raw BlockDriver * to scsi-generic devices */ +if (bs-sg) +return bdrv_find_format(raw); + ret = bdrv_pread(bs, 0, buf, sizeof(buf)); bdrv_delete(bs); if (ret 0) { -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 05/16/2010 04:00 AM, Alexander Graf wrote: On 15.05.2010, at 19:30, Avi Kivity wrote: On 05/15/2010 11:26 AM, Alexander Graf wrote: That means you never inject an interrupt from the iothread (or from a different vcpu thread)? If that's the case we might make it part of the API and require the ioctl to be issued from the vcpu thread. We'd still be left with the s390 exception. Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the user base is small enough to ignore them. Either way, I'm actually rather unhappy with the way interrupts work right now. We're only injecting interrupts when in the main loop, which is rare if we did our homework right. So what I'd really like to see is that the MPIC on ppc directly invokes KVM_INTERRUPT to pull (and losen) the interrupt line. That way we can't just accidently miss interrupts. on x86 we signal the vcpu thread to pull it out of the main loop and poll the apic. Hrm, makes sense. Though it's additional overhead of a task switch. Why take the burden if you don't have to? That's what the world looked like in 2006. We could change it, but there's not much point, since having the local apic in the kernel is pretty much a requirement for reasonable performance. That's exactly why x86 has run-request_interrupt_window, run-ready_for_interrupt_injection, and run-if_flag. So how does x86 userspace get notified when it has an interrupt pending but couldn't inject it? Without a notification, we delay interrupts by quite some time. run-ready_for_interrupt_injection and run-request_irq_window. - we need to exit to userspace to realize that the interrupt is still active This is fundamentally broken. What I'd like to see is: * device in userspace wants to trigger an interrupt * mpic pulls external interrupt line, automatically issues ioctl(KVM_INTERRUPT) * we enter the guest * we inject the external interrupt * guest gets the interrupt, sees it's lazily disabled, sets msr.ee=0 * guest moves on, sets msr.ee=1 again later * we inject the external interrupt again, since it's still active * guest acknowledges the interrupt with the mpic, issues ioctl(KVM_INTERRUPT, disable) - all is great This is similar to KVM_IRQ_LINE. Well, KVM_IRQ_LINE would be the in-kernel pic, no? That'd mean I'd have to reimplement the mpic in kernel space - which might eventually be a good idea when going for SMP, but I'd first like to see if I can keep the current interrupt injection path efficient. Sure. For that to work we need to enable triggering KVM_INTERRUPT from a non-vcpu thread. KVM_IRQ_LINE is a vm ioctl instead of a vcpu ioctl. The motivation for the strict 'issue vcpu ioctls from vcpu thread' rule is to prepare the way for a syscall (instead of ioctl) API. Then we lost the fd argument, and choosing the vcpu is done by associating it with the current task. That allows us to get rid of vcpu-mutex and fget_light() (as well as the ioctl dispatch). If we define the API to only work on the current vcpu with a few excetions where we're safe (KVM_INTERRUPT), that'd get rid of the mutex too, no? Yes. Need to document it though. What does fget_light do? Make sure that the vcpu fd doesn't disappear. And does the ioctl dispatch cost that much? I like the flexibility of it to be honest. Not much, which is why there's no movement in that direction. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 16.05.2010, at 10:23, Avi Kivity wrote: On 05/16/2010 04:00 AM, Alexander Graf wrote: On 15.05.2010, at 19:30, Avi Kivity wrote: On 05/15/2010 11:26 AM, Alexander Graf wrote: That means you never inject an interrupt from the iothread (or from a different vcpu thread)? If that's the case we might make it part of the API and require the ioctl to be issued from the vcpu thread. We'd still be left with the s390 exception. Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the user base is small enough to ignore them. Either way, I'm actually rather unhappy with the way interrupts work right now. We're only injecting interrupts when in the main loop, which is rare if we did our homework right. So what I'd really like to see is that the MPIC on ppc directly invokes KVM_INTERRUPT to pull (and losen) the interrupt line. That way we can't just accidently miss interrupts. on x86 we signal the vcpu thread to pull it out of the main loop and poll the apic. Hrm, makes sense. Though it's additional overhead of a task switch. Why take the burden if you don't have to? That's what the world looked like in 2006. We could change it, but there's not much point, since having the local apic in the kernel is pretty much a requirement for reasonable performance. Well, I'm not convinced yet that's the case for PPC as well. The timer is in-cpu anyways and I don't see why IPIs should be slow with a userspace pic - if we keep the overhead low. So let me think this through. With remote interrupt injection we have. * thread 1 does vcpu_run * thread 2 triggers KVM_INTERRUPT on fd * thread 2 signals thread 1 so we're sure the interrupt gets injected * thread 1 exits into qemu * thread 1 goes back into the vcpu, triggering an interrupt Without we have: * thread 1 does vcpu_run * thread 2 wants to trigger an an interrupt, sets the qemu internal bit * thread 2 signals thread 1 so we're sure the interrupt gets processed * thread 1 exits into qemu * thread 1 triggers KVM_INTERRUPT on fd * thread 1 goes into the vcpu So we don't really buy anything from doing the remote injection. Hrm. What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all those kvm_run fields? Can't we just do interrupt injection by setting run-trigger_interrupt? There's only a single interrupt line on the CPU anyways. That way we'd save the ioctl and get rid of the locking problem altogether. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 16.05.2010, at 11:09, Avi Kivity wrote: On 05/16/2010 12:01 PM, Alexander Graf wrote: That's what the world looked like in 2006. We could change it, but there's not much point, since having the local apic in the kernel is pretty much a requirement for reasonable performance. Well, I'm not convinced yet that's the case for PPC as well. The timer is in-cpu anyways and I don't see why IPIs should be slow with a userspace pic - if we keep the overhead low. If it's at all possible keep the mpic out. I am _not_ advocating pushing ppc's mpic into the kernel. So let me think this through. With remote interrupt injection we have. * thread 1 does vcpu_run * thread 2 triggers KVM_INTERRUPT on fd * thread 2 signals thread 1 so we're sure the interrupt gets injected * thread 1 exits into qemu This doesn't seem necessary. The kernel can own the interrupt line, so it remembers it from the last KVM_INTERRUPT. It's not? On signals we always exit to userspace, no? * thread 1 goes back into the vcpu, triggering an interrupt Without we have: * thread 1 does vcpu_run * thread 2 wants to trigger an an interrupt, sets the qemu internal bit * thread 2 signals thread 1 so we're sure the interrupt gets processed * thread 1 exits into qemu * thread 1 triggers KVM_INTERRUPT on fd * thread 1 goes into the vcpu So we don't really buy anything from doing the remote injection. Hrm. Not if you make interrupt injection a lightweight exit. Please elaborate. What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all those kvm_run fields? Can't we just do interrupt injection by setting run-trigger_interrupt? There's only a single interrupt line on the CPU anyways. That way we'd save the ioctl and get rid of the locking problem altogether. That's what x86 does. However, it's synchronous. For everyone except for the vcpu thread executing the interrupt, it's asynchronous, right? The same applies to an in-kernel pic. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 05/16/2010 12:35 PM, Alexander Graf wrote: So let me think this through. With remote interrupt injection we have. * thread 1 does vcpu_run * thread 2 triggers KVM_INTERRUPT on fd * thread 2 signals thread 1 so we're sure the interrupt gets injected * thread 1 exits into qemu This doesn't seem necessary. The kernel can own the interrupt line, so it remembers it from the last KVM_INTERRUPT. It's not? With s/signals/IPIs/. On signals we always exit to userspace, no? Yes (if the signal isn't blocked). * thread 1 goes back into the vcpu, triggering an interrupt Without we have: * thread 1 does vcpu_run * thread 2 wants to trigger an an interrupt, sets the qemu internal bit * thread 2 signals thread 1 so we're sure the interrupt gets processed * thread 1 exits into qemu * thread 1 triggers KVM_INTERRUPT on fd * thread 1 goes into the vcpu So we don't really buy anything from doing the remote injection. Hrm. Not if you make interrupt injection a lightweight exit. Please elaborate. 1: vcpu_run 2: KVM_INTERRUPT 2k: sets flag, if msr.ee IPIs 1 or wakes up 1 if halted 1k: notices flag, if msr.ee injects interrupt ... 1g: acks 1k: forwards ack to userspace 1: completes interrupt What's somewhat striking me here though is - why do we need KVM_INTERRUPT when there's all those kvm_run fields? Can't we just do interrupt injection by setting run-trigger_interrupt? There's only a single interrupt line on the CPU anyways. That way we'd save the ioctl and get rid of the locking problem altogether. That's what x86 does. However, it's synchronous. For everyone except for the vcpu thread executing the interrupt, it's asynchronous, right? For everyone other than the vcpu thread, it's off limits. kvm_run is only read on KVM_RUN entries and written on KVM_RUN exits. The same applies to an in-kernel pic. The in-kernel pic doesn't use kvm_run. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Consolidate vcpu ioctl locking
On 16.05.2010, at 11:47, Avi Kivity wrote: 1: vcpu_run 2: KVM_INTERRUPT 2k: sets flag, if msr.ee IPIs 1 or wakes up 1 if halted Doesn't that break when we have a while(1) loop in the guest with msr.ee=0 while no timer is scheduled on the host? But then again with msr.ee=0 we don't get interrupts in the guest and to set msr.ee=1 we trap. Yeah, that would work. 1k: notices flag, if msr.ee injects interrupt ... 1g: acks The ack is done in userspace by the mpic, so we can just complete the interrupt there. 1k: forwards ack to userspace 1: completes interrupt So if I just have a field kvm_run-external_active I could set that to =1 on KVM_INTERRUPT including the above logic. To acknowledge it userspace would then do something like this in kvm_arch_pre_run: if (kvm_run-external_active !((env-interrupt_request CPU_INTERRUPT_HARD) (env-irq_input_state (1PPC_INPUT_INT { kvm_run-external_active = 0; } The big question is how to make such a change backwards compatible. But I guess I could just reuse the feature enabling framework. Well, sounds like we're getting closer. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html