Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes
On Tue, Jan 01, 2013 at 09:36:38PM -0200, Marcelo Tosatti wrote: On Wed, Dec 26, 2012 at 03:33:16PM +0200, Gleb Natapov wrote: On Wed, Dec 26, 2012 at 11:25:24AM -0200, Marcelo Tosatti wrote: On Wed, Dec 26, 2012 at 07:25:49AM +0200, Gleb Natapov wrote: On Tue, Dec 25, 2012 at 07:37:10PM -0200, Marcelo Tosatti wrote: On Tue, Dec 25, 2012 at 02:48:08PM +0200, Gleb Natapov wrote: On Sat, Dec 22, 2012 at 02:31:10PM +0200, Avi Kivity wrote: On Wed, Dec 19, 2012 at 3:29 PM, Marcelo Tosatti mtosa...@redhat.comwrote: CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a4ecf7c..3abe433 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3215,13 +3215,6 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) static int __vmx_get_cpl(struct kvm_vcpu *vcpu) { - if (!is_protmode(vcpu)) - return 0; - - if (!is_long_mode(vcpu) -(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ - return 3; - return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS) 3; } @@ -3229,6 +3222,13 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + if (!is_protmode(vcpu)) + return 0; + + if (!is_long_mode(vcpu) +(kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ + return 3; + /* * If we enter real mode with cs.sel 3 != 0, the normal CPL calculations * fail; use the cache instead. This undoes the cache, now every vmx_get_cpl() in protected mode has to VMREAD(GUEST_RFLAGS). True. Marcelo what failure do you see without the patch? -- Gleb. On transition _to_ real mode, linearize fails due to CPL checks (FreeBSD). I'll resend the patch with use of cache for VMREAD(GUEST_RFLAGS), which is already implemented. I am curious does it still fails with all my vmx patches applied too? Yes. Which FreeBSD exactly? Cannot reproduce with FreeBSD 9 64 bits here. FreeBSD 9.1 with -smp 2. I cannot reproduce. I do see boot failure on the next branch with 9.[01] 64 bit -smp 2 here, but it is caused but segment registers been all incorrect on a secondary vcpu and this patch does not help. Applying http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102593 fixes it. The question is how does it happen that we enter real mode while cache is set to 3. It should never be 3 during boot since boot process never enters the userspace. Its transition _to_ real mode (from protected). But in protected mode CPL should be 0 during boot. BTX (FreeBSD's bootloader) does: http://people.freebsd.org/~jhb/docs/loader.txt. Crazy. Regardless, I think your patch is correct and the current code that uses cpl cache during emulation is wrong and it remains wrong even after your patch. What if last time cpl cache was updated was while vcpu ran in cpl 3? Commit message says that it tries to fix the case when CS.selector3 != 0 during transition to protected mode, but this can be fixed by setting cpl cache to 0 in vmx_set_cr0() instead of clearing it. Two things about the patch itself. Get rid of __vmx_get_cpl() and call to vmx_read_guest_seg_selector() directly from vmx_get_cpl() and drop __clear_bit(VCPU_EXREG_CPL) from vmx_set_rflags() and vmx_set_cr0() since vmx-cpl no longer depends on rflags and cr0. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers
On 01/02/2013 01:03 PM, Rusty Russell wrote: Paolo Bonzini pbonz...@redhat.com writes: The virtqueue_add_buf function has two limitations: 1) it requires the caller to provide all the buffers in a single call; 2) it does not support chained scatterlists: the buffers must be provided as an array of struct scatterlist; Chained scatterlists are a horrible interface, but that doesn't mean we shouldn't support them if there's a need. I think I once even had a patch which passed two chained sgs, rather than a combo sg and two length numbers. It's very old, but I've pasted it below. Duplicating the implementation by having another interface is pretty nasty; I think I'd prefer the chained scatterlists, if that's optimal for you. I rebased against virtio-next and use it in virtio-scsi, and tested it with 4 targets virtio-scsi devices and host cpu idle=poll. Saw a little performance regression here. General: Run status group 0 (all jobs): READ: io=34675MB, aggrb=248257KB/s, minb=248257KB/s, maxb=248257KB/s, mint=143025msec, maxt=143025msec WRITE: io=34625MB, aggrb=247902KB/s, minb=247902KB/s, maxb=247902KB/s, mint=143025msec, maxt=143025msec Chained: Run status group 0 (all jobs): READ: io=34863MB, aggrb=242320KB/s, minb=242320KB/s, maxb=242320KB/s, mint=147325msec, maxt=147325msec WRITE: io=34437MB, aggrb=239357KB/s, minb=239357KB/s, maxb=239357KB/s, mint=147325msec, maxt=147325msec Thanks, Wanlong Gao From d3181b3f9bbdebbd3f2928b64821b406774757f8 Mon Sep 17 00:00:00 2001 From: Rusty Russell ru...@rustcorp.com.au Date: Wed, 2 Jan 2013 16:43:49 +0800 Subject: [PATCH] virtio: use chained scatterlists Rather than handing a scatterlist[] and out and in numbers to virtqueue_add_buf(), hand two separate ones which can be chained. I shall refrain from ranting about what a disgusting hack chained scatterlists are. I'll just note that this doesn't make things simpler (see diff). The scatterlists we use can be too large for the stack, so we put them in our device struct and reuse them. But in many cases we don't want to pay the cost of sg_init_table() as we don't know how many elements we'll have and we'd have to initialize the entire table. This means we have two choices: carefully reset the end markers after we call virtqueue_add_buf(), which we do in virtio_net for the xmit path where it's easy and we want to be optimal. Elsewhere we implement a helper to unset the end markers after we've filled the array. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- drivers/block/virtio_blk.c | 57 ++--- drivers/char/hw_random/virtio-rng.c | 2 +- drivers/char/virtio_console.c | 6 ++-- drivers/net/virtio_net.c| 68 ++- drivers/scsi/virtio_scsi.c | 18 ++ drivers/virtio/virtio_balloon.c | 6 ++-- drivers/virtio/virtio_ring.c| 71 +++-- include/linux/virtio.h | 14 ++-- net/9p/trans_virtio.c | 31 +--- 9 files changed, 172 insertions(+), 101 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 0bdde8f..17cf0b7 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -102,8 +102,8 @@ static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk, static void virtblk_add_buf_wait(struct virtio_blk *vblk, struct virtblk_req *vbr, -unsigned long out, -unsigned long in) +struct scatterlist *out, +struct scatterlist *in) { DEFINE_WAIT(wait); @@ -112,7 +112,7 @@ static void virtblk_add_buf_wait(struct virtio_blk *vblk, TASK_UNINTERRUPTIBLE); spin_lock_irq(vblk-disk-queue-queue_lock); - if (virtqueue_add_buf(vblk-vq, vbr-sg, out, in, vbr, + if (virtqueue_add_buf(vblk-vq, out, in, vbr, GFP_ATOMIC) 0) { spin_unlock_irq(vblk-disk-queue-queue_lock); io_schedule(); @@ -128,12 +128,13 @@ static void virtblk_add_buf_wait(struct virtio_blk *vblk, } static inline void virtblk_add_req(struct virtblk_req *vbr, - unsigned int out, unsigned int in) + struct scatterlist *out, + struct scatterlist *in) { struct virtio_blk *vblk = vbr-vblk; spin_lock_irq(vblk-disk-queue-queue_lock); - if (unlikely(virtqueue_add_buf(vblk-vq, vbr-sg, out, in, vbr, + if (unlikely(virtqueue_add_buf(vblk-vq, out, in, vbr, GFP_ATOMIC) 0)) { spin_unlock_irq(vblk-disk-queue-queue_lock);
Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers
Il 02/01/2013 06:03, Rusty Russell ha scritto: Paolo Bonzini pbonz...@redhat.com writes: The virtqueue_add_buf function has two limitations: 1) it requires the caller to provide all the buffers in a single call; 2) it does not support chained scatterlists: the buffers must be provided as an array of struct scatterlist; Chained scatterlists are a horrible interface, but that doesn't mean we shouldn't support them if there's a need. I think I once even had a patch which passed two chained sgs, rather than a combo sg and two length numbers. It's very old, but I've pasted it below. Duplicating the implementation by having another interface is pretty nasty; I think I'd prefer the chained scatterlists, if that's optimal for you. Unfortunately, that cannot work because not all architectures support chained scatterlists. Having two different implementations on the driver side, with different locking rules, depending on the support for chained scatterlists is awful. (Also, as you mention chained scatterlists are horrible. They'd happen to work for virtio-scsi, but not for virtio-blk where the response status is part of the footer, not the header). We can move all drivers to virtqueue_add_sg, I just didn't do it in this series because I first wanted to get the API right. Paolo Cheers, Rusty. From: Rusty Russell ru...@rustcorp.com.au Subject: virtio: use chained scatterlists. Rather than handing a scatterlist[] and out and in numbers to virtqueue_add_buf(), hand two separate ones which can be chained. I shall refrain from ranting about what a disgusting hack chained scatterlists are. I'll just note that this doesn't make things simpler (see diff). The scatterlists we use can be too large for the stack, so we put them in our device struct and reuse them. But in many cases we don't want to pay the cost of sg_init_table() as we don't know how many elements we'll have and we'd have to initialize the entire table. This means we have two choices: carefully reset the end markers after we call virtqueue_add_buf(), which we do in virtio_net for the xmit path where it's easy and we want to be optimal. Elsewhere we implement a helper to unset the end markers after we've filled the array. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/block/virtio_blk.c | 37 +- drivers/char/hw_random/virtio-rng.c |2 - drivers/char/virtio_console.c |6 +-- drivers/net/virtio_net.c| 67 ++--- drivers/virtio/virtio_balloon.c |6 +-- drivers/virtio/virtio_ring.c| 71 ++-- include/linux/virtio.h |5 +- net/9p/trans_virtio.c | 38 +-- 8 files changed, 151 insertions(+), 81 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -100,6 +100,14 @@ static void blk_done(struct virtqueue *v spin_unlock_irqrestore(vblk-disk-queue-queue_lock, flags); } +static void sg_unset_end_markers(struct scatterlist *sg, unsigned int num) +{ + unsigned int i; + + for (i = 0; i num; i++) + sg[i].page_link = ~0x02; +} + static bool do_req(struct request_queue *q, struct virtio_blk *vblk, struct request *req) { @@ -140,6 +148,7 @@ static bool do_req(struct request_queue } } + /* We layout out scatterlist in a single array, out then in. */ sg_set_buf(vblk-sg[out++], vbr-out_hdr, sizeof(vbr-out_hdr)); /* @@ -151,17 +160,8 @@ static bool do_req(struct request_queue if (vbr-req-cmd_type == REQ_TYPE_BLOCK_PC) sg_set_buf(vblk-sg[out++], vbr-req-cmd, vbr-req-cmd_len); + /* This marks the end of the sg list at vblk-sg[out]. */ num = blk_rq_map_sg(q, vbr-req, vblk-sg + out); - - if (vbr-req-cmd_type == REQ_TYPE_BLOCK_PC) { - sg_set_buf(vblk-sg[num + out + in++], vbr-req-sense, SCSI_SENSE_BUFFERSIZE); - sg_set_buf(vblk-sg[num + out + in++], vbr-in_hdr, -sizeof(vbr-in_hdr)); - } - - sg_set_buf(vblk-sg[num + out + in++], vbr-status, -sizeof(vbr-status)); - if (num) { if (rq_data_dir(vbr-req) == WRITE) { vbr-out_hdr.type |= VIRTIO_BLK_T_OUT; @@ -172,7 +172,22 @@ static bool do_req(struct request_queue } } - if (virtqueue_add_buf(vblk-vq, vblk-sg, out, in, vbr, GFP_ATOMIC)0) { + if (vbr-req-cmd_type == REQ_TYPE_BLOCK_PC) { + sg_set_buf(vblk-sg[out + in++], vbr-req-sense, +SCSI_SENSE_BUFFERSIZE); + sg_set_buf(vblk-sg[out + in++], vbr-in_hdr, +sizeof(vbr-in_hdr)); + } + + sg_set_buf(vblk-sg[out + in++], vbr-status, +
Re: [PATCH qom-cpu 4/4] target-ppc: Error out for -cpu host on unknown PVR
On 18.12.2012, at 08:53, Andreas Färber wrote: Previously we silently exited, with subclasses we got an opcode warning. Instead explicitly tell the user what's wrong. An indication for this is -cpu ? showing host with an all-zero PVR. Signed-off-by: Andreas Färber afaer...@suse.de --- target-ppc/kvm.c |7 +++ 1 Datei geändert, 7 Zeilen hinzugefügt(+) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index f115892..8998d0f 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1186,7 +1186,14 @@ static void alter_insns(uint64_t *word, uint64_t flags, bool on) static void kvmppc_host_cpu_initfn(Object *obj) { +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(obj); + assert(kvm_enabled()); + +if (pcc-info-pvr != mfpvr()) { +fprintf(stderr, Host PVR unsupported.\n); This should probably rather say Host CPU unsupported for -cpu host or so :). Not everyone who invokes qemu-system-ppc knows what a PVR is. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/7] s390: Host support for channel I/O.
On 20.12.2012, at 15:32, Cornelia Huck wrote: Hi, here's the next iteration of the host patches to support channel I/O against kvm/next. Changes from v4 are on the style side; mainly using defines instead of magic numbers and using helper functions for decoding instructions. Patches 1 and 2 are new (and can be applied independently of the channel I/O patches); some things Alex pointed out in the patches apply to existing code as well. Please consider for kvm/next. Reviewed-by: Alexander Graf ag...@suse.de Alex Cornelia Huck (7): KVM: s390: Constify intercept handler tables. KVM: s390: Decoding helper functions. KVM: s390: Support for I/O interrupts. KVM: s390: Add support for machine checks. KVM: s390: In-kernel handling of I/O instructions. KVM: s390: Base infrastructure for enabling capabilities. KVM: s390: Add support for channel I/O instructions. Documentation/virtual/kvm/api.txt | 40 - arch/s390/include/asm/kvm_host.h | 11 ++ arch/s390/kvm/intercept.c | 45 +++--- arch/s390/kvm/interrupt.c | 252 +- arch/s390/kvm/kvm-s390.c | 38 + arch/s390/kvm/kvm-s390.h | 43 ++ arch/s390/kvm/priv.c | 316 -- arch/s390/kvm/sigp.c | 6 +- arch/s390/kvm/trace-s390.h| 26 +++- include/trace/events/kvm.h| 2 +- include/uapi/linux/kvm.h | 21 +++ 11 files changed, 721 insertions(+), 79 deletions(-) -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/6] more VMX real mode emulation fixes
On Thu, Dec 20, 2012 at 04:57:41PM +0200, Gleb Natapov wrote: This series goes on top of my previous one: Fix emulate_invalid_guest_state=0 part 2. It does not only fixes bugs, but also does a nice cleanup of VMX real mode emulation. All real mode segment register mangling is now contained in fix_rmode_seg() function. Changelog: v1 - v2: - emulate_invalid_guest_state=0 broke again. Fix it. - additional patch to handle IO during emulation caused by #GP Gleb Natapov (6): KVM: emulator: drop RPL check from linearize() function KVM: emulator: implement fninit, fnstsw, fnstcw KVM: VMX: make rmode_segment_valid() more strict. KVM: VMX: fix emulation of invalid guest state. KVM: VMX: Do not fix segment register during vcpu initialization. KVM: VMX: handle IO when emulation is due to #GP in real mode. arch/x86/kvm/emulate.c | 133 +++-- arch/x86/kvm/vmx.c | 219 +--- 2 files changed, 241 insertions(+), 111 deletions(-) -- 1.7.10.4 Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: VMX: fix incorrect cached cpl value with real/v8086 modes (v2)
CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 55dfc37..849f5f2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1696,7 +1696,6 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu) static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) { __set_bit(VCPU_EXREG_RFLAGS, (ulong *)vcpu-arch.regs_avail); - __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); to_vmx(vcpu)-rflags = rflags; if (to_vmx(vcpu)-rmode.vm86_active) { to_vmx(vcpu)-rmode.save_rflags = rflags; @@ -3220,8 +3219,10 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) return vmx_read_guest_seg_base(to_vmx(vcpu), seg); } -static int __vmx_get_cpl(struct kvm_vcpu *vcpu) +static int vmx_get_cpl(struct kvm_vcpu *vcpu) { + struct vcpu_vmx *vmx = to_vmx(vcpu); + if (!is_protmode(vcpu)) return 0; @@ -3229,13 +3230,6 @@ static int __vmx_get_cpl(struct kvm_vcpu *vcpu) (kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ return 3; - return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS) 3; -} - -static int vmx_get_cpl(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - /* * If we enter real mode with cs.sel 3 != 0, the normal CPL calculations * fail; use the cache instead. @@ -3246,7 +3240,7 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu) if (!test_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail)) { __set_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); - vmx-cpl = __vmx_get_cpl(vcpu); + vmx-cpl = vmx_read_guest_seg_selector(vmx, VCPU_SREG_CS) 3; } return vmx-cpl; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: VMX: fix incorrect cached cpl value with real/v8086 modes
On Thu, Jan 03, 2013 at 10:11:53AM +0200, Gleb Natapov wrote: FreeBSD 9.1 with -smp 2. I cannot reproduce. I do see boot failure on the next branch with 9.[01] 64 bit -smp 2 here, but it is caused but segment registers been all incorrect on a secondary vcpu and this patch does not help. Applying http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102593 fixes it. The question is how does it happen that we enter real mode while cache is set to 3. It should never be 3 during boot since boot process never enters the userspace. Its transition _to_ real mode (from protected). But in protected mode CPL should be 0 during boot. BTX (FreeBSD's bootloader) does: http://people.freebsd.org/~jhb/docs/loader.txt. Crazy. Regardless, I think your patch is correct and the current code that uses cpl cache during emulation is wrong and it remains wrong even after your patch. What if last time cpl cache was updated was while vcpu ran in cpl 3? Commit message says that it tries to fix the case when CS.selector3 != 0 during transition to protected mode, but this can be fixed by setting cpl cache to 0 in vmx_set_cr0() instead of clearing it. Yes, i suppose so, can be done by a separatch patch, though. Two things about the patch itself. Get rid of __vmx_get_cpl() and call to vmx_read_guest_seg_selector() directly from vmx_get_cpl() and drop __clear_bit(VCPU_EXREG_CPL) from vmx_set_rflags() and vmx_set_cr0() since vmx-cpl no longer depends on rflags and cr0. You still want to invalidate vmx-cpl cache on cr0 writes: think protected - real - protected transition. And as for EFLAGS, agreed. Sending new patch. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: x86: use dynamic percpu allocations for shared msrs area
Andy, Mike, can you confirm whether this fixes the percpu allocation failures when loading kvm.ko? TIA Use dynamic percpu allocations for the shared msrs structure, to avoid using the limited reserved percpu space. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1c9c834..5229a67 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -120,7 +120,7 @@ struct kvm_shared_msrs { }; static struct kvm_shared_msrs_global __read_mostly shared_msrs_global; -static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs); +static struct kvm_shared_msrs __percpu *shared_msrs; struct kvm_stats_debugfs_item debugfs_entries[] = { { pf_fixed, VCPU_STAT(pf_fixed) }, @@ -191,10 +191,10 @@ static void kvm_on_user_return(struct user_return_notifier *urn) static void shared_msr_update(unsigned slot, u32 msr) { - struct kvm_shared_msrs *smsr; u64 value; + unsigned int cpu = smp_processor_id(); + struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu); - smsr = __get_cpu_var(shared_msrs); /* only read, and nobody should modify it at this time, * so don't need lock */ if (slot = shared_msrs_global.nr) { @@ -226,7 +226,8 @@ static void kvm_shared_msr_cpu_online(void) void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask) { - struct kvm_shared_msrs *smsr = __get_cpu_var(shared_msrs); + unsigned int cpu = smp_processor_id(); + struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu); if (((value ^ smsr-values[slot].curr) mask) == 0) return; @@ -242,7 +243,8 @@ EXPORT_SYMBOL_GPL(kvm_set_shared_msr); static void drop_user_return_notifiers(void *ignore) { - struct kvm_shared_msrs *smsr = __get_cpu_var(shared_msrs); + unsigned int cpu = smp_processor_id(); + struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu); if (smsr-registered) kvm_on_user_return(smsr-urn); @@ -5233,9 +5235,16 @@ int kvm_arch_init(void *opaque) goto out; } + r = -ENOMEM; + shared_msrs = alloc_percpu(struct kvm_shared_msrs); + if (!shared_msrs) { + printk(KERN_ERR kvm: failed to allocate percpu kvm_shared_msrs\n); + goto out; + } + r = kvm_mmu_module_init(); if (r) - goto out; + goto out_free_percpu; kvm_set_mmio_spte_mask(); kvm_init_msr_list(); @@ -5258,6 +5267,8 @@ int kvm_arch_init(void *opaque) return 0; +out_free_percpu: + free_percpu(shared_msrs); out: return r; } @@ -5275,6 +5286,7 @@ void kvm_arch_exit(void) #endif kvm_x86_ops = NULL; kvm_mmu_module_exit(); + free_percpu(shared_msrs); } int kvm_emulate_halt(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Streamline arithmetic instruction emulation
On Thu, Jan 03, 2013 at 09:39:14PM +0200, Avi Kivity wrote: Avi Kivity (7): KVM: x86 emulator: framework for streamlining arithmetic opcodes KVM: x86 emulator: Support for declaring single operand fastops KVM: x86 emulator: introduce NoWrite flag KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite KVM: x86 emulator: convert NOT, NEG to fastop KVM: x86 emulator: add macros for defining 2-operand fastop emulation KVM: x86 emulator: convert basic ALU ops to fastop arch/x86/kvm/emulate.c | 215 +++-- 1 file changed, 120 insertions(+), 95 deletions(-) Looks good to me. Ping? Please regenerate against 'queue' branch. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On 12/29/2012 01:52 AM, Blue Swirl wrote: On Fri, Dec 28, 2012 at 10:32 AM, Jason Wang jasow...@redhat.com wrote: This patch implements both userspace and vhost support for multiple queue virtio-net (VIRTIO_NET_F_MQ). This is done by introducing an array of VirtIONetQueue to VirtIONet. Signed-off-by: Jason Wang jasow...@redhat.com --- hw/virtio-net.c | 318 ++- hw/virtio-net.h | 27 +- 2 files changed, 271 insertions(+), 74 deletions(-) [...] static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = to_virtio_net(vdev); @@ -464,6 +578,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) status = virtio_net_handle_mac(n, ctrl.cmd, elem); else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) status = virtio_net_handle_vlan_table(n, ctrl.cmd, elem); +else if (ctrl.class == VIRTIO_NET_CTRL_MQ) Please add braces. Sure. +status = virtio_net_handle_mq(n, ctrl.cmd, elem); stb_p(elem.in_sg[elem.in_num - 1].iov_base, status); @@ -477,19 +593,24 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = to_virtio_net(vdev); +int queue_index = vq2q(virtio_get_queue_index(vq)); -qemu_flush_queued_packets(qemu_get_queue(n-nic)); +qemu_flush_queued_packets(qemu_get_subqueue(n-nic, queue_index)); } [...] +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl) +{ +VirtIODevice *vdev = n-vdev; +int i; + +n-multiqueue = multiqueue; + +if (!multiqueue) +n-curr_queues = 1; Ditto. Didn't checkpatch.pl catch these or did you not check? Sorry, will add braces here. I run checkpatch.pl but finally find that some or lots of the existed codes (such as this file) does not obey the rules. So I'm not sure whether I need to correct my own codes, or left them as this file does and correct them all in the future. [...] } QEMU_PACKED; /* This is the first element of the scatter-gather list. If you don't @@ -168,6 +172,26 @@ struct virtio_net_ctrl_mac { #define VIRTIO_NET_CTRL_VLAN_ADD 0 #define VIRTIO_NET_CTRL_VLAN_DEL 1 +/* + * Control Multiqueue + * + * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET + * enables multiqueue, specifying the number of the transmit and + * receive queues that will be used. After the command is consumed and acked by + * the device, the device will not steer new packets on receive virtqueues + * other than specified nor read from transmit virtqueues other than specified. + * Accordingly, driver should not transmit new packets on virtqueues other than + * specified. + */ +struct virtio_net_ctrl_mq { VirtIONetCtrlMQ and please don't forget the typedef. Sure, but the same question as above. (See other structures in this file). +uint16_t virtqueue_pairs; +}; + +#define VIRTIO_NET_CTRL_MQ 4 + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET0 + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN1 + #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX0x8000 + #define DEFINE_VIRTIO_NET_FEATURES(_state, _field) \ DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \ DEFINE_PROP_BIT(csum, _state, _field, VIRTIO_NET_F_CSUM, true), \ @@ -186,5 +210,6 @@ struct virtio_net_ctrl_mac { DEFINE_PROP_BIT(ctrl_vq, _state, _field, VIRTIO_NET_F_CTRL_VQ, true), \ DEFINE_PROP_BIT(ctrl_rx, _state, _field, VIRTIO_NET_F_CTRL_RX, true), \ DEFINE_PROP_BIT(ctrl_vlan, _state, _field, VIRTIO_NET_F_CTRL_VLAN, true), \ -DEFINE_PROP_BIT(ctrl_rx_extra, _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true) +DEFINE_PROP_BIT(ctrl_rx_extra, _state, _field, VIRTIO_NET_F_CTRL_RX_EXTRA, true), \ +DEFINE_PROP_BIT(mq, _state, _field, VIRTIO_NET_F_MQ, true) #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: use dynamic percpu allocations for shared msrs area
On Thu, 2013-01-03 at 11:41 -0200, Marcelo Tosatti wrote: Andy, Mike, can you confirm whether this fixes the percpu allocation failures when loading kvm.ko? TIA monteverdi:~/:[1]# dmesg|grep PERCPU [0.00] PERCPU: Embedded 27 pages/cpu @88047f80 s80704 r8192 d21696 u262144 monteverdi:~/:[0]# lsmod|grep kvm kvm_intel 132688 0 kvm 423692 1 kvm_intel monteverdi:~/:[0]# uname -r 3.6.0-bisect Yup, kvm loaded itself, no gripeage. Use dynamic percpu allocations for the shared msrs structure, to avoid using the limited reserved percpu space. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1c9c834..5229a67 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -120,7 +120,7 @@ struct kvm_shared_msrs { }; static struct kvm_shared_msrs_global __read_mostly shared_msrs_global; -static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs); +static struct kvm_shared_msrs __percpu *shared_msrs; struct kvm_stats_debugfs_item debugfs_entries[] = { { pf_fixed, VCPU_STAT(pf_fixed) }, @@ -191,10 +191,10 @@ static void kvm_on_user_return(struct user_return_notifier *urn) static void shared_msr_update(unsigned slot, u32 msr) { - struct kvm_shared_msrs *smsr; u64 value; + unsigned int cpu = smp_processor_id(); + struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu); - smsr = __get_cpu_var(shared_msrs); /* only read, and nobody should modify it at this time, * so don't need lock */ if (slot = shared_msrs_global.nr) { @@ -226,7 +226,8 @@ static void kvm_shared_msr_cpu_online(void) void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask) { - struct kvm_shared_msrs *smsr = __get_cpu_var(shared_msrs); + unsigned int cpu = smp_processor_id(); + struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu); if (((value ^ smsr-values[slot].curr) mask) == 0) return; @@ -242,7 +243,8 @@ EXPORT_SYMBOL_GPL(kvm_set_shared_msr); static void drop_user_return_notifiers(void *ignore) { - struct kvm_shared_msrs *smsr = __get_cpu_var(shared_msrs); + unsigned int cpu = smp_processor_id(); + struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu); if (smsr-registered) kvm_on_user_return(smsr-urn); @@ -5233,9 +5235,16 @@ int kvm_arch_init(void *opaque) goto out; } + r = -ENOMEM; + shared_msrs = alloc_percpu(struct kvm_shared_msrs); + if (!shared_msrs) { + printk(KERN_ERR kvm: failed to allocate percpu kvm_shared_msrs\n); + goto out; + } + r = kvm_mmu_module_init(); if (r) - goto out; + goto out_free_percpu; kvm_set_mmio_spte_mask(); kvm_init_msr_list(); @@ -5258,6 +5267,8 @@ int kvm_arch_init(void *opaque) return 0; +out_free_percpu: + free_percpu(shared_msrs); out: return r; } @@ -5275,6 +5286,7 @@ void kvm_arch_exit(void) #endif kvm_x86_ops = NULL; kvm_mmu_module_exit(); + free_percpu(shared_msrs); } int kvm_emulate_halt(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 5/5] KVM: x86: improve reexecute_instruction
Hi Gleb, Thanks for your review and sorry for the delay reply since i was on my vacation. On 12/23/2012 11:02 PM, Gleb Natapov wrote: On Sat, Dec 15, 2012 at 03:01:12PM +0800, Xiao Guangrong wrote: +is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, addr, + walker, user_fault); + is_self_change_mapping() has a subtle side-effect by setting vcpu-arch.target_gfn_is_pt. From reading the page_fault() function you cannot guess why is_self_change_mapping() is not called inside if (walker.level = PT_DIRECTORY_LEVEL) since this is the only place where its output is used. May be pass it pointer to target_gfn_is_pt as a parameter to make it clear that return value is not the only output of the function. Yes, it is clearer, will do it in the next version. if (walker.level = PT_DIRECTORY_LEVEL) force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn) - || FNAME(is_self_change_mapping)(vcpu, walker, user_fault); + || is_self_change_mapping; else force_pt_level = 1; if (!force_pt_level) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bf66169..fc33563 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4756,29 +4756,25 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu) static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2) { gpa_t gpa = cr2; +gfn_t gfn; pfn_t pfn; -unsigned int indirect_shadow_pages; - -spin_lock(vcpu-kvm-mmu_lock); -indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages; -spin_unlock(vcpu-kvm-mmu_lock); - -if (!indirect_shadow_pages) -return false; if (!vcpu-arch.mmu.direct_map) { -gpa = kvm_mmu_gva_to_gpa_read(vcpu, cr2, NULL); +/* + * Write permission should be allowed since only + * write access need to be emulated. + */ +gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL); + +/* + * If the mapping is invalid in guest, let cpu retry + * it to generate fault. + */ if (gpa == UNMAPPED_GVA) -return true; /* let cpu generate fault */ +return true; } Why not fold this change to if (!vcpu-arch.mmu.direct_map) into previous patch where it was introduced. This looks independent of what you are doing in this patch. Fine to me. -/* - * if emulation was due to access to shadowed page table - * and it failed try to unshadow page and re-enter the - * guest to let CPU execute the instruction. - */ -if (kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa))) -return true; +gfn = gpa_to_gfn(gpa); /* * Do not retry the unhandleable instruction if it faults on the @@ -4786,13 +4782,33 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2) * retry instruction - write #PF - emulation fail - retry * instruction - ... */ -pfn = gfn_to_pfn(vcpu-kvm, gpa_to_gfn(gpa)); -if (!is_error_noslot_pfn(pfn)) { -kvm_release_pfn_clean(pfn); +pfn = gfn_to_pfn(vcpu-kvm, gfn); + +/* + * If the instruction failed on the error pfn, it can not be fixed, + * report the error to userspace. + */ +if (is_error_noslot_pfn(pfn)) +return false; + +kvm_release_pfn_clean(pfn); + +/* The instructions are well-emulated on direct mmu. */ +if (vcpu-arch.mmu.direct_map) { +unsigned int indirect_shadow_pages; + +spin_lock(vcpu-kvm-mmu_lock); +indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages; +spin_unlock(vcpu-kvm-mmu_lock); + +if (indirect_shadow_pages) +kvm_mmu_unprotect_page(vcpu-kvm, gfn); + return true; } -return false; +kvm_mmu_unprotect_page(vcpu-kvm, gfn); +return !(vcpu-arch.fault_addr == cr2 vcpu-arch.target_gfn_is_pt); Do you store fault_addr only to avoid using stale target_gfn_is_pt? If yes why not reset target_gfn_is_pt to false at the beginning of a page fault and get rid of fault_addr? Good suggestion, will do. :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html