RE: [PATCH v7 3/3] x86, apicv: add virtual x2apic support
Gleb Natapov wrote on 2012-12-25: On Tue, Dec 25, 2012 at 07:46:53AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-25: On Tue, Dec 25, 2012 at 07:25:15AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-25: On Tue, Dec 25, 2012 at 06:42:59AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-25: On Mon, Dec 24, 2012 at 11:53:37PM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-24: On Mon, Dec 24, 2012 at 02:35:35AM +, Zhang, Yang Z wrote: Zhang, Yang Z wrote on 2012-12-24: Gleb Natapov wrote on 2012-12-20: On Mon, Dec 17, 2012 at 01:30:50PM +0800, Yang Zhang wrote: basically to benefit from apicv, we need clear MSR bitmap for corresponding x2apic MSRs: 0x800 - 0x8ff: no read intercept for apicv register virtualization TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery We do not set Virtualize x2APIC mode bit in secondary execution control. If I read the spec correctly without that those MSR read/writes will go straight to physical local APIC. Right. Now it cannot get benefit, but we may enable it in future and then we can benefit from it. Without enabling it you cannot disable MSR intercept for x2apic MSRs. how about to add the following check: if (apicv_enabled virtual_x2apic_enabled) clear_msr(); I do not understand what do you mean here. In this patch, it will clear MSR bitmap(0x800 -0x8ff) when apicv enabled. As you said, since kvm doesn't set virtualize x2apic mode, APIC register virtualization never take effect. So we need to clear MSR bitmap only when apicv enabled and virtualize x2apic mode set. But currently it is never set. So you think the third patch is not necessary currently? Unless we enabled virtualize x2apic mode. Without third patch vid will not work properly if a guest is in x2apic mode. Actually second and third patches need to be reordered to not have a windows where x2apic is broken. The problem is that this patch itself is buggy since it does not set virtualize x2apic mode flag. It should set the flag if vid is enabled and if the flag cannot be set vid should be forced off. In what conditions this flag cannot be set? I think the only case is that KVM doesn't expose the x2apic capability to guest, if this is true, the guest will never use x2apic and we still can use vid. We can indeed set virtualize x2apic mode unconditionally since it does not take any effect if x2apic MSRs are intercepted. No. Since Virtual APIC access must be cleared if virtualize x2apic mode is set, and if guest still use xAPIC, then there should be lots of ept violations for apic access emulation. This will hurt performance. Stupid HW, why this pointless limitation? Can you point me where SDM says that? Vol 3, 26.2.1.1 We should only set virtualize x2apic mode when guest really uses x2apic(guest set bit 11 of APIC_BASE_MSR). Looks like SDM force us to. -- Gleb. Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] trace-cmd: fix kvm_mmu_prepare_zap_page even name and kvm_mmu_get_page event output in kvm plugin
kvm_mmu_zap_page even was renamed to kvm_mmu_prepare_zap_page. Print out created field for kvm_mmu_get_page event. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/plugin_kvm.c b/plugin_kvm.c index 55812ef..adc5694 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -382,7 +382,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, struct pevent_record *record, } else trace_seq_printf(s, WORD: %08x, role.word); - pevent_print_num_field(s, root %u, event, + pevent_print_num_field(s, root %u , event, root_count, record, 1); if (pevent_get_field_val(s, event, unsync, record, val, 1) 0) @@ -397,6 +397,11 @@ static int kvm_mmu_get_page_handler(struct trace_seq *s, struct pevent_record *r { unsigned long long val; + if (pevent_get_field_val(s, event, created, record, val, 1) 0) + return -1; + + trace_seq_printf(s, %s , val ? new : existing); + if (pevent_get_field_val(s, event, gfn, record, val, 1) 0) return -1; @@ -430,7 +435,7 @@ int PEVENT_PLUGIN_LOADER(struct pevent *pevent) pevent_register_event_handler(pevent, -1, kvmmmu, kvm_mmu_unsync_page, kvm_mmu_print_role, NULL); - pevent_register_event_handler(pevent, -1, kvmmmu, kvm_mmu_zap_page, + pevent_register_event_handler(pevent, -1, kvmmmu, kvm_mmu_prepare_zap_page, kvm_mmu_print_role, NULL); return 0; -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] x86, apicv: add virtual x2apic support
On Tue, Dec 25, 2012 at 08:24:43AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-25: On Tue, Dec 25, 2012 at 07:46:53AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-25: On Tue, Dec 25, 2012 at 07:25:15AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-25: On Tue, Dec 25, 2012 at 06:42:59AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-25: On Mon, Dec 24, 2012 at 11:53:37PM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2012-12-24: On Mon, Dec 24, 2012 at 02:35:35AM +, Zhang, Yang Z wrote: Zhang, Yang Z wrote on 2012-12-24: Gleb Natapov wrote on 2012-12-20: On Mon, Dec 17, 2012 at 01:30:50PM +0800, Yang Zhang wrote: basically to benefit from apicv, we need clear MSR bitmap for corresponding x2apic MSRs: 0x800 - 0x8ff: no read intercept for apicv register virtualization TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery We do not set Virtualize x2APIC mode bit in secondary execution control. If I read the spec correctly without that those MSR read/writes will go straight to physical local APIC. Right. Now it cannot get benefit, but we may enable it in future and then we can benefit from it. Without enabling it you cannot disable MSR intercept for x2apic MSRs. how about to add the following check: if (apicv_enabled virtual_x2apic_enabled) clear_msr(); I do not understand what do you mean here. In this patch, it will clear MSR bitmap(0x800 -0x8ff) when apicv enabled. As you said, since kvm doesn't set virtualize x2apic mode, APIC register virtualization never take effect. So we need to clear MSR bitmap only when apicv enabled and virtualize x2apic mode set. But currently it is never set. So you think the third patch is not necessary currently? Unless we enabled virtualize x2apic mode. Without third patch vid will not work properly if a guest is in x2apic mode. Actually second and third patches need to be reordered to not have a windows where x2apic is broken. The problem is that this patch itself is buggy since it does not set virtualize x2apic mode flag. It should set the flag if vid is enabled and if the flag cannot be set vid should be forced off. In what conditions this flag cannot be set? I think the only case is that KVM doesn't expose the x2apic capability to guest, if this is true, the guest will never use x2apic and we still can use vid. We can indeed set virtualize x2apic mode unconditionally since it does not take any effect if x2apic MSRs are intercepted. No. Since Virtual APIC access must be cleared if virtualize x2apic mode is set, and if guest still use xAPIC, then there should be lots of ept violations for apic access emulation. This will hurt performance. Stupid HW, why this pointless limitation? Can you point me where SDM says that? Vol 3, 26.2.1.1 Thanks. We should only set virtualize x2apic mode when guest really uses x2apic(guest set bit 11 of APIC_BASE_MSR). Looks like SDM force us to. And we can disable x2apic MSR interception only after virtualize x2apic mode is set i.e when guest sets bit 11 of APIC_BASE_MSR. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: mmu: remove unused trace event
trace_kvm_mmu_delay_free_pages() is no longer used. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index cd6e983..b8f6172 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -195,12 +195,6 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page, TP_ARGS(sp) ); -DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_delay_free_pages, - TP_PROTO(struct kvm_mmu_page *sp), - - TP_ARGS(sp) -); - TRACE_EVENT( mark_mmio_spte, TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access), -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu-kvm/pci-assign: 64 bits bar emulation
On Thu, Dec 20, 2012 at 11:07:23AM +0800, Xudong Hao wrote: Enable 64 bits bar emulation. v3 changes from v2: - Leave original error string and drop the leading 016. v2 changes from v1: - Change 0lx% to 0x%016 when print a 64 bit variable. Test pass with the current seabios which already support 64bit pci bars. Signed-off-by: Xudong Hao xudong@intel.com Thanks, applied to uq/master. --- hw/kvm/pci-assign.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c index 7a0998c..2271a2e 100644 --- a/hw/kvm/pci-assign.c +++ b/hw/kvm/pci-assign.c @@ -46,6 +46,7 @@ #define IORESOURCE_IRQ 0x0400 #define IORESOURCE_DMA 0x0800 #define IORESOURCE_PREFETCH 0x2000 /* No side effects */ +#define IORESOURCE_MEM_64 0x0010 //#define DEVICE_ASSIGNMENT_DEBUG @@ -442,9 +443,13 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, /* handle memory io regions */ if (cur_region-type IORESOURCE_MEM) { -int t = cur_region-type IORESOURCE_PREFETCH -? PCI_BASE_ADDRESS_MEM_PREFETCH -: PCI_BASE_ADDRESS_SPACE_MEMORY; +int t = PCI_BASE_ADDRESS_SPACE_MEMORY; +if (cur_region-type IORESOURCE_PREFETCH) { +t |= PCI_BASE_ADDRESS_MEM_PREFETCH; +} +if (cur_region-type IORESOURCE_MEM_64) { +t |= PCI_BASE_ADDRESS_MEM_TYPE_64; +} /* map physical memory */ pci_dev-v_addrs[i].u.r_virtbase = mmap(NULL, cur_region-size, @@ -632,7 +637,8 @@ again: rp-valid = 0; rp-resource_fd = -1; size = end - start + 1; -flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH; +flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH + | IORESOURCE_MEM_64; if (size == 0 || (flags ~IORESOURCE_PREFETCH) == 0) { continue; } -- 1.5.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/5] virtio-scsi: introduce multiqueue support
On 12/19/2012 12:02 AM, Michael S. Tsirkin wrote: On Tue, Dec 18, 2012 at 04:51:28PM +0100, Paolo Bonzini wrote: Il 18/12/2012 16:03, Michael S. Tsirkin ha scritto: On Tue, Dec 18, 2012 at 03:08:08PM +0100, Paolo Bonzini wrote: Il 18/12/2012 14:57, Michael S. Tsirkin ha scritto: -static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc) +static int virtscsi_queuecommand(struct virtio_scsi *vscsi, + struct virtio_scsi_target_state *tgt, + struct scsi_cmnd *sc) { -struct virtio_scsi *vscsi = shost_priv(sh); -struct virtio_scsi_target_state *tgt = vscsi-tgt[sc-device-id]; struct virtio_scsi_cmd *cmd; +struct virtio_scsi_vq *req_vq; int ret; struct Scsi_Host *shost = virtio_scsi_host(vscsi-vdev); @@ -461,7 +533,8 @@ static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc) BUG_ON(sc-cmd_len VIRTIO_SCSI_CDB_SIZE); memcpy(cmd-req.cmd.cdb, sc-cmnd, sc-cmd_len); -if (virtscsi_kick_cmd(tgt, vscsi-req_vq, cmd, +req_vq = ACCESS_ONCE(tgt-req_vq); This ACCESS_ONCE without a barrier looks strange to me. Can req_vq change? Needs a comment. Barriers are needed to order two things. Here I don't have the second thing to order against, hence no barrier. Accessing req_vq lockless is safe, and there's a comment about it, but you still want ACCESS_ONCE to ensure the compiler doesn't play tricks. That's just it. Why don't you want compiler to play tricks? Because I want the lockless access to occur exactly when I write it. It doesn't occur when you write it. CPU can still move accesses around. That's why you either need both ACCESS_ONCE and a barrier or none. Otherwise I have one more thing to think about, i.e. what a crazy compiler writer could do with my code. And having been on the other side of the trench, compiler writers can have *really* crazy ideas. Anyhow, I'll reorganize the code to move the ACCESS_ONCE closer to the write and make it clearer. +if (virtscsi_kick_cmd(tgt, req_vq, cmd, sizeof cmd-req.cmd, sizeof cmd-resp.cmd, GFP_ATOMIC) == 0) ret = 0; @@ -472,6 +545,48 @@ out: return ret; } +static int virtscsi_queuecommand_single(struct Scsi_Host *sh, +struct scsi_cmnd *sc) +{ +struct virtio_scsi *vscsi = shost_priv(sh); +struct virtio_scsi_target_state *tgt = vscsi-tgt[sc-device-id]; + +atomic_inc(tgt-reqs); And here we don't have barrier after atomic? Why? Needs a comment. Because we don't write req_vq, so there's no two writes to order. Barrier against what? Between atomic update and command. Once you queue command it can complete and decrement reqs, if this happens before increment reqs can become negative even. This is not a problem. Please read Documentation/memory-barrier.txt: The following also do _not_ imply memory barriers, and so may require explicit memory barriers under some circumstances (smp_mb__before_atomic_dec() for instance): atomic_add(); atomic_sub(); atomic_inc(); atomic_dec(); If they're used for statistics generation, then they probably don't need memory barriers, unless there's a coupling between statistical data. This is the single-queue case, so it falls under this case. Aha I missed it's single queue. Correct but please add a comment. /* Discover virtqueues and write information to configuration. */ -err = vdev-config-find_vqs(vdev, 3, vqs, callbacks, names); +err = vdev-config-find_vqs(vdev, num_vqs, vqs, callbacks, names); if (err) return err; -virtscsi_init_vq(vscsi-ctrl_vq, vqs[0]); -virtscsi_init_vq(vscsi-event_vq, vqs[1]); -virtscsi_init_vq(vscsi-req_vq, vqs[2]); +virtscsi_init_vq(vscsi-ctrl_vq, vqs[0], false); +virtscsi_init_vq(vscsi-event_vq, vqs[1], false); +for (i = VIRTIO_SCSI_VQ_BASE; i num_vqs; i++) +virtscsi_init_vq(vscsi-req_vqs[i - VIRTIO_SCSI_VQ_BASE], + vqs[i], vscsi-num_queues 1); So affinity is true if 1 vq? I am guessing this is not going to do the right thing unless you have at least as many vqs as CPUs. Yes, and then you're not setting up the thing correctly. Why not just check instead of doing the wrong thing? The right thing could be to set the affinity with a stride, e.g. CPUs 0-4 for virtqueue 0 and so on until CPUs 3-7 for virtqueue 3. Paolo I think a simple #vqs == #cpus check would be kind of OK for starters, otherwise let userspace set affinity. Again need to think what happens with CPU hotplug. How about dynamicly setting affinity this way?
Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes
On Sat, Dec 22, 2012 at 02:31:10PM +0200, Avi Kivity wrote: On Wed, Dec 19, 2012 at 3:29 PM, Marcelo Tosatti mtosa...@redhat.comwrote: CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a4ecf7c..3abe433 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3215,13 +3215,6 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) static int __vmx_get_cpl(struct kvm_vcpu *vcpu) { - if (!is_protmode(vcpu)) - return 0; - - if (!is_long_mode(vcpu) -(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ - return 3; - return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS) 3; } @@ -3229,6 +3222,13 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + if (!is_protmode(vcpu)) + return 0; + + if (!is_long_mode(vcpu) +(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ + return 3; + /* * If we enter real mode with cs.sel 3 != 0, the normal CPL calculations * fail; use the cache instead. This undoes the cache, now every vmx_get_cpl() in protected mode has to VMREAD(GUEST_RFLAGS). True. Marcelo what failure do you see without the patch? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes
On Tue, Dec 25, 2012 at 02:48:08PM +0200, Gleb Natapov wrote: On Sat, Dec 22, 2012 at 02:31:10PM +0200, Avi Kivity wrote: On Wed, Dec 19, 2012 at 3:29 PM, Marcelo Tosatti mtosa...@redhat.comwrote: CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a4ecf7c..3abe433 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3215,13 +3215,6 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) static int __vmx_get_cpl(struct kvm_vcpu *vcpu) { - if (!is_protmode(vcpu)) - return 0; - - if (!is_long_mode(vcpu) -(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ - return 3; - return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS) 3; } @@ -3229,6 +3222,13 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + if (!is_protmode(vcpu)) + return 0; + + if (!is_long_mode(vcpu) +(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ + return 3; + /* * If we enter real mode with cs.sel 3 != 0, the normal CPL calculations * fail; use the cache instead. This undoes the cache, now every vmx_get_cpl() in protected mode has to VMREAD(GUEST_RFLAGS). True. Marcelo what failure do you see without the patch? -- Gleb. On transition _to_ real mode, linearize fails due to CPL checks (FreeBSD). I'll resend the patch with use of cache for VMREAD(GUEST_RFLAGS), which is already implemented. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v12 0/8] pv event to notify host when the guest is panicked
On Thu, Dec 20, 2012 at 03:53:59PM +0800, Hu Tao wrote: Hi, Any comments? As far as i can see, items 2 and 3 of https://lkml.org/lkml/2012/11/12/588 Have not been addressed. https://lkml.org/lkml/2012/11/20/653 contains discussions on those items. 2) Format of the interface for other architectures (you can choose a different KVM supported architecture and write an example). It was your choice to choose an I/O port, which is x86 specific. 3) Clear/documented management interface for the feature. Note 3 is for management, not the guest-host interface. On Wed, Dec 12, 2012 at 02:13:43PM +0800, Hu Tao wrote: This series implements a new interface, kvm pv event, to notify host when some events happen in guest. Right now there is one supported event: guest panic. changes from v11: - add a new patch 'save/load cpu runstate' - fix a bug of null-dereference when no -machine option is supplied - reserve RUN_STATE_GUEST_PANICKED during migration - add doc of enable_pv_event option - disable reboot-on-panic if pv_event is on v11: http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04361.html Hu Tao (7): save/load cpu runstate update kernel headers add a new runstate: RUN_STATE_GUEST_PANICKED add a new qevent: QEVENT_GUEST_PANICKED introduce a new qom device to deal with panicked event allower the user to disable pv event support pv event: add document to describe the usage Wen Congyang (1): start vm after resetting it block.h | 2 + docs/pv-event.txt| 17 hw/kvm/Makefile.objs | 2 +- hw/kvm/pv_event.c| 197 +++ hw/pc_piix.c | 11 +++ kvm-stub.c | 4 + kvm.h| 2 + linux-headers/asm-x86/kvm_para.h | 1 + linux-headers/linux/kvm_para.h | 6 ++ migration.c | 7 +- monitor.c| 6 +- monitor.h| 1 + qapi-schema.json | 6 +- qemu-config.c| 4 + qemu-options.hx | 3 +- qmp.c| 5 +- savevm.c | 1 + sysemu.h | 2 + vl.c | 52 ++- 19 files changed, 312 insertions(+), 17 deletions(-) create mode 100644 docs/pv-event.txt create mode 100644 hw/kvm/pv_event.c -- 1.8.0.1.240.ge8a1f5a -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v12 0/8] pv event to notify host when the guest is panicked
On Thu, Dec 20, 2012 at 03:53:59PM +0800, Hu Tao wrote: Hi, Any comments? Did you verify possibilities listed at https://lkml.org/lkml/2012/11/20/653 ? If so, a summary in the patchset would be helpful. On Wed, Dec 12, 2012 at 02:13:43PM +0800, Hu Tao wrote: This series implements a new interface, kvm pv event, to notify host when some events happen in guest. Right now there is one supported event: guest panic. changes from v11: - add a new patch 'save/load cpu runstate' - fix a bug of null-dereference when no -machine option is supplied - reserve RUN_STATE_GUEST_PANICKED during migration - add doc of enable_pv_event option - disable reboot-on-panic if pv_event is on v11: http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04361.html Hu Tao (7): save/load cpu runstate update kernel headers add a new runstate: RUN_STATE_GUEST_PANICKED add a new qevent: QEVENT_GUEST_PANICKED introduce a new qom device to deal with panicked event allower the user to disable pv event support pv event: add document to describe the usage Wen Congyang (1): start vm after resetting it block.h | 2 + docs/pv-event.txt| 17 hw/kvm/Makefile.objs | 2 +- hw/kvm/pv_event.c| 197 +++ hw/pc_piix.c | 11 +++ kvm-stub.c | 4 + kvm.h| 2 + linux-headers/asm-x86/kvm_para.h | 1 + linux-headers/linux/kvm_para.h | 6 ++ migration.c | 7 +- monitor.c| 6 +- monitor.h| 1 + qapi-schema.json | 6 +- qemu-config.c| 4 + qemu-options.hx | 3 +- qmp.c| 5 +- savevm.c | 1 + sysemu.h | 2 + vl.c | 52 ++- 19 files changed, 312 insertions(+), 17 deletions(-) create mode 100644 docs/pv-event.txt create mode 100644 hw/kvm/pv_event.c -- 1.8.0.1.240.ge8a1f5a -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm lockdep splat with 3.8-rc1+
Hi all, just saw this in dmesg while running -rc1 + tip/master: [ 6983.694615] = [ 6983.694617] [ INFO: possible recursive locking detected ] [ 6983.694620] 3.8.0-rc1+ #26 Not tainted [ 6983.694621] - [ 6983.694623] kvm/20461 is trying to acquire lock: [ 6983.694625] (anon_vma-rwsem){..}, at: [8111d2c8] mm_take_all_locks+0x148/0x1a0 [ 6983.694636] [ 6983.694636] but task is already holding lock: [ 6983.694638] (anon_vma-rwsem){..}, at: [8111d2c8] mm_take_all_locks+0x148/0x1a0 [ 6983.694645] [ 6983.694645] other info that might help us debug this: [ 6983.694647] Possible unsafe locking scenario: [ 6983.694647] [ 6983.694649]CPU0 [ 6983.694650] [ 6983.694651] lock(anon_vma-rwsem); [ 6983.694654] lock(anon_vma-rwsem); [ 6983.694657] [ 6983.694657] *** DEADLOCK *** [ 6983.694657] [ 6983.694659] May be due to missing lock nesting notation [ 6983.694659] [ 6983.694661] 4 locks held by kvm/20461: [ 6983.694663] #0: (mm-mmap_sem){++}, at: [8112afb3] do_mmu_notifier_register+0x153/0x180 [ 6983.694670] #1: (mm_all_locks_mutex){+.+...}, at: [8111d1bc] mm_take_all_locks+0x3c/0x1a0 [ 6983.694678] #2: (mapping-i_mmap_mutex){+.+...}, at: [8111d24d] mm_take_all_locks+0xcd/0x1a0 [ 6983.694686] #3: (anon_vma-rwsem){..}, at: [8111d2c8] mm_take_all_locks+0x148/0x1a0 [ 6983.694694] [ 6983.694694] stack backtrace: [ 6983.694696] Pid: 20461, comm: kvm Not tainted 3.8.0-rc1+ #26 [ 6983.694698] Call Trace: [ 6983.694704] [8109c2fa] __lock_acquire+0x89a/0x1f30 [ 6983.694708] [810978ed] ? trace_hardirqs_off+0xd/0x10 [ 6983.694711] [81099b8d] ? mark_held_locks+0x8d/0x110 [ 6983.694714] [8111d24d] ? mm_take_all_locks+0xcd/0x1a0 [ 6983.694718] [8109e05e] lock_acquire+0x9e/0x1f0 [ 6983.694720] [8111d2c8] ? mm_take_all_locks+0x148/0x1a0 [ 6983.694724] [81097ace] ? put_lock_stats.isra.17+0xe/0x40 [ 6983.694728] [81519949] down_write+0x49/0x90 [ 6983.694731] [8111d2c8] ? mm_take_all_locks+0x148/0x1a0 [ 6983.694734] [8111d2c8] mm_take_all_locks+0x148/0x1a0 [ 6983.694737] [8112afb3] ? do_mmu_notifier_register+0x153/0x180 [ 6983.694740] [8112aedf] do_mmu_notifier_register+0x7f/0x180 [ 6983.694742] [8112b013] mmu_notifier_register+0x13/0x20 [ 6983.694765] [a00e665d] kvm_dev_ioctl+0x3cd/0x4f0 [kvm] [ 6983.694768] [8114bcb0] do_vfs_ioctl+0x90/0x570 [ 6983.694772] [81157403] ? fget_light+0x323/0x4c0 [ 6983.694775] [8114c1e0] sys_ioctl+0x50/0x90 [ 6983.694781] [8123a25e] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 6983.694785] [8151d4c2] system_call_fastpath+0x16/0x1b -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes
On Tue, Dec 25, 2012 at 07:37:10PM -0200, Marcelo Tosatti wrote: On Tue, Dec 25, 2012 at 02:48:08PM +0200, Gleb Natapov wrote: On Sat, Dec 22, 2012 at 02:31:10PM +0200, Avi Kivity wrote: On Wed, Dec 19, 2012 at 3:29 PM, Marcelo Tosatti mtosa...@redhat.comwrote: CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a4ecf7c..3abe433 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3215,13 +3215,6 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) static int __vmx_get_cpl(struct kvm_vcpu *vcpu) { - if (!is_protmode(vcpu)) - return 0; - - if (!is_long_mode(vcpu) -(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ - return 3; - return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS) 3; } @@ -3229,6 +3222,13 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + if (!is_protmode(vcpu)) + return 0; + + if (!is_long_mode(vcpu) +(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ + return 3; + /* * If we enter real mode with cs.sel 3 != 0, the normal CPL calculations * fail; use the cache instead. This undoes the cache, now every vmx_get_cpl() in protected mode has to VMREAD(GUEST_RFLAGS). True. Marcelo what failure do you see without the patch? -- Gleb. On transition _to_ real mode, linearize fails due to CPL checks (FreeBSD). I'll resend the patch with use of cache for VMREAD(GUEST_RFLAGS), which is already implemented. I am curious does it still fails with all my vmx patches applied too? The question is how does it happen that we enter real mode while cache is set to 3. It should never be 3 during boot since boot process never enters the userspace. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html