Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
Hi Marcelo, Thanks for your patience. I was reading your reply over and over again, i would like to argue it more :). Please see below. On 11/29/2012 08:21 AM, Marcelo Tosatti wrote: https://lkml.org/lkml/2012/11/17/75 Does unshadowing work with large sptes at reexecute_instruction? That is, do we nuke any large read-only sptes that might be causing a certain gfn to be read-only? That is, following the sequence there, is the large read-only spte properly destroyed? Actually, sptes can not prevent gfn becoming writable, that means the gfn can become writable *even if* it exists a spte which maps to the gfn. The condition that can prevent gfn becoming writable is, the gfn has been shadowed, that means the gfn can not become writable if it exists a sp with sp.gfn = gfn. Note, drop_spte does not remove any sp. So, either destroying spte or keeping spte doest not have any affect for gfn write-protection. If luck enough, my point is right, the current code can do some optimizations as below: if (level PT_PAGE_TABLE_LEVEL - has_wrprotected_page(vcpu-kvm, gfn, level)) { - ret = 1; - drop_spte(vcpu-kvm, sptep); - goto done; - } + has_wrprotected_page(vcpu-kvm, gfn, level)) + return 0; 1): we can return 0 instead of 1 to avoid unnecessary emulation. vcpu will refault again then kvm will use small page. 2): need not do any change on the spte. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes
On 11/30/2012 09:55 PM, Pavel Emelyanov wrote: Hello, This is an attempt to implement support for memory snapshot for the the checkpoint-restore project (http://criu.org). To create a dump of an application(s) we save all the information about it to files. No surprise, the biggest part of such dump is the contents of tasks' memory. However, in some usage scenarios it's not required to get _all_ the task memory while creating a dump. For example, when doing periodical dumps it's only required to take full memory dump only at the first step and then take incremental changes of memory. Another example is live migration. In the simplest form it looks like -- create dump, copy it on the remote node then restore tasks from dump files. While all this dump-copy-restore thing goes all the process must be stopped. However, if we can monitor how tasks change their memory, we can dump and copy it in smaller chunks, periodically updating it and thus freezing tasks only at the very end for the very short time to pick up the recent changes. That said, some help from kernel to watch how processes modify the contents of their memory is required. I'd like to propose one possible solution of this task -- with the help of page-faults and trace events. Briefly the approach is -- remap some memory regions as read-only, get the #pf on task's attempt to modify the memory and issue a trace event of that. Since we're only interested in parts of memory of some tasks, make it possible to mark the vmas we're interested in and issue events for them only. Also, to be aware of tasks unmapping the vma-s being watched, also issue an event when the marked vma is removed (and for symmetry -- an event when a vma is marked). What do you think about this approach? Is this way of supporting mem snapshot OK for you, or should we invent some better one? The page fault mechanism is pretty obvious - anything that deals with dirty pages will end up having to do this. So there is nothing crazy about this. What concerns me, however, is that should this go in, we'll have two dirty mem loggers in the kernel: one to support CRIU, one to support KVM. And the worst part: They have the exact the same purpose!! So to begin with, I think one thing to consider, would be to generalize KVM's dirty memory notification so it can work on a normal process memory region. KVM api requires a memory slot to be passed, something we are unlikely to have. But KVM can easily keep its API and use an alternate mechanics, that's trivial... Generally speaking, KVM will do polling with this ioctl. I prefer your tracing mechanism better. The only difference, is that KVM tends to transfer large chunks of memory in some loads - in the high gigs range. So the proposal tracing API should be able to optionally batch requests within a time frame. It would also be good to hear what does the KVM guys think of it as well -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] x86: PIT connects to pin 2 of IOAPIC
On Mon, Dec 03, 2012 at 03:01:01PM +0800, Yang Zhang wrote: When PIT connects to IOAPIC, it route to pin 2 not pin 0. This hack didn't work for a long time and nobody complained. It should be safe to get rid of it. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- virt/kvm/ioapic.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index cfb7e4d..166c450 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -181,7 +181,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) #ifdef CONFIG_X86 /* Always delivery PIT interrupt to vcpu 0 */ - if (irq == 0) { + if (irq == 2) { irqe.dest_mode = 0; /* Physical mode. */ /* need to read apic_id from apic regiest since * it can be rewritten */ -- 1.7.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote: On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote: Jason Wang jasow...@redhat.com writes: +static const struct ethtool_ops virtnet_ethtool_ops; + +/* + * Converting between virtqueue no. and kernel tx/rx queue no. + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN + */ +static int vq2txq(struct virtqueue *vq) +{ + int index = virtqueue_get_queue_index(vq); + return index == 1 ? 0 : (index - 2) / 2; +} + +static int txq2vq(int txq) +{ + return txq ? 2 * txq + 2 : 1; +} + +static int vq2rxq(struct virtqueue *vq) +{ + int index = virtqueue_get_queue_index(vq); + return index ? (index - 1) / 2 : 0; +} + +static int rxq2vq(int rxq) +{ + return rxq ? 2 * rxq + 1 : 0; +} + I thought MST changed the proposed spec to make the control queue always the last one, so this logic becomes trivial. But it may break the support of legacy guest. If we boot a legacy single queue guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is indeed rx1. Legacy guyest support should be handled by host using feature bits in the usual way: host should detect legacy guest by checking the VIRTIO_NET_F_RFS feature. If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2. If it's not acked, cvq is vq 2. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
On 12/03/2012 05:47 PM, Michael S. Tsirkin wrote: On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote: On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote: Jason Wang jasow...@redhat.com writes: +static const struct ethtool_ops virtnet_ethtool_ops; + +/* + * Converting between virtqueue no. and kernel tx/rx queue no. + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN + */ +static int vq2txq(struct virtqueue *vq) +{ + int index = virtqueue_get_queue_index(vq); + return index == 1 ? 0 : (index - 2) / 2; +} + +static int txq2vq(int txq) +{ + return txq ? 2 * txq + 2 : 1; +} + +static int vq2rxq(struct virtqueue *vq) +{ + int index = virtqueue_get_queue_index(vq); + return index ? (index - 1) / 2 : 0; +} + +static int rxq2vq(int rxq) +{ + return rxq ? 2 * rxq + 1 : 0; +} + I thought MST changed the proposed spec to make the control queue always the last one, so this logic becomes trivial. But it may break the support of legacy guest. If we boot a legacy single queue guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is indeed rx1. Legacy guyest support should be handled by host using feature bits in the usual way: host should detect legacy guest by checking the VIRTIO_NET_F_RFS feature. If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2. If it's not acked, cvq is vq 2. We could, but we didn't gain much from this. Furthermore, we need also do the dynamic creation/destroying of virtqueues during feature negotiation which seems not supported in qemu now. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote: - if (!try_fill_recv(vi-rq, GFP_KERNEL)) - schedule_delayed_work(vi-rq.refill, 0); + for (i = 0; i vi-max_queue_pairs; i++) + if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) + schedule_delayed_work(vi-rq[i].refill, 0); mutex_lock(vi-config_lock); vi-config_enable = true; mutex_unlock(vi-config_lock); + BUG_ON(virtnet_set_queues(vi)); + return 0; } #endif Also crashing on device nack of command is also not nice. In this case it seems we can just switch to single-queue mode which should always be safe. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
On 12/03/2012 06:14 PM, Michael S. Tsirkin wrote: On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote: - if (!try_fill_recv(vi-rq, GFP_KERNEL)) - schedule_delayed_work(vi-rq.refill, 0); + for (i = 0; i vi-max_queue_pairs; i++) + if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) + schedule_delayed_work(vi-rq[i].refill, 0); mutex_lock(vi-config_lock); vi-config_enable = true; mutex_unlock(vi-config_lock); + BUG_ON(virtnet_set_queues(vi)); + return 0; } #endif Also crashing on device nack of command is also not nice. In this case it seems we can just switch to single-queue mode which should always be safe. Not sure it's safe. It depends on the reason why this call fails. If we left a state that the driver only use single queue but the device use multi queues, we may still lost the network. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 08/14] KVM: ARM: World-switch implementation
On 30/11/12 18:49, Christoffer Dall wrote: On Fri, Nov 30, 2012 at 12:14 PM, Will Deacon will.dea...@arm.com wrote: On Fri, Nov 30, 2012 at 04:47:40PM +, Christoffer Dall wrote: On Fri, Nov 30, 2012 at 10:15 AM, Will Deacon will.dea...@arm.com wrote: At this point, VM1 is running and VM0:VCPU1 is running. VM0:VCPU0 is not running because physical CPU0 is handling an interrupt. The problem is that when VCPU0 *is* resumed, it will update the VMID of VM0 and could be scheduled in parallel with VCPU1 but with a different VMID. How do you avoid this in the current code? I don't. Nice catch. Please apply your interesting brain to the following fix:) I'm far too sober to look at your patch right now, but I'll think about it over the weekend [I can't break it at a quick glance] :) In the meantime, can you think about whether the TLB operations need to run on every CPU please? they don't we can invalidate the TLB and the icache using the inner shareability domain. Here's a patch: diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index ad1390f..df1b753 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -146,6 +146,7 @@ struct kvm_one_reg; int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); u64 kvm_call_hyp(void *hypfn, ...); +void force_vm_exit(const cpumask_t *mask); #define KVM_ARCH_WANT_MMU_NOTIFIER struct kvm; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index c4f631e..674592e 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -405,9 +405,14 @@ int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v) return v-mode == IN_GUEST_MODE; } -static void reset_vm_context(void *info) +/* Just ensure a guest exit from a particular CPU */ +static void exit_vm_noop(void *info) { - kvm_call_hyp(__kvm_flush_vm_context); +} + +void force_vm_exit(const cpumask_t *mask) +{ + smp_call_function_many(mask, exit_vm_noop, NULL, true); } Care to update the do_nothing() call in emulate.c to use this as well? M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 05/13] ARM: KVM: VGIC accept vcpu and dist base addresses from user space
On Sat, Dec 01, 2012 at 02:52:13AM +, Christoffer Dall wrote: On Wed, Nov 28, 2012 at 8:11 AM, Will Deacon will.dea...@arm.com wrote: On Sat, Nov 10, 2012 at 03:44:51PM +, Christoffer Dall wrote: + kvm-arch.vgic.vgic_dist_base = addr; + break; + case KVM_VGIC_V2_ADDR_TYPE_CPU: + if (!IS_VGIC_ADDR_UNDEF(vgic-vgic_cpu_base)) + return -EEXIST; + if (addr + VGIC_CPU_SIZE addr) + return -EINVAL; + kvm-arch.vgic.vgic_cpu_base = addr; + break; + default: + r = -ENODEV; + } + + if (vgic_ioaddr_overlap(kvm)) { + kvm-arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF; + kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF; + return -EINVAL; Perhaps we could put all the address checking in one place, so that the wrapping round zero checks and the overlap checks can be in the same function? Like this (?): Almost: diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7f057a2..0b6b95e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2163,6 +2163,7 @@ Errors: ENXIO: Device not supported on current system EEXIST: Address already set E2BIG: Address outside guest physical address space + EBUSY: Address overlaps with other device range struct kvm_device_address { __u64 id; diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index f697c14..660fe24 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -1230,11 +1230,28 @@ static bool vgic_ioaddr_overlap(struct kvm *kvm) phys_addr_t cpu = kvm-arch.vgic.vgic_cpu_base; if (IS_VGIC_ADDR_UNDEF(dist) || IS_VGIC_ADDR_UNDEF(cpu)) - return false; + return 0; if ((dist = cpu dist + VGIC_DIST_SIZE cpu) || (cpu = dist cpu + VGIC_CPU_SIZE dist)) - return true; - return false; + return -EBUSY; + return 0; +} + +static int vgic_ioaddr_assign(struct kvm *kvm, phys_addr_t *ioaddr, + phys_addr_t addr, phys_addr_t size) +{ + int ret; + + if (!IS_VGIC_ADDR_UNDEF(*ioaddr)) + return -EEXIST; + if (addr + size addr) + return -EINVAL; + + ret = vgic_ioaddr_overlap(kvm); + if (ret) + return ret; + *ioaddr = addr; + return ret; } int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr) @@ -1251,29 +1268,21 @@ int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr) mutex_lock(kvm-lock); switch (type) { case KVM_VGIC_V2_ADDR_TYPE_DIST: - if (!IS_VGIC_ADDR_UNDEF(vgic-vgic_dist_base)) - return -EEXIST; - if (addr + VGIC_DIST_SIZE addr) - return -EINVAL; - kvm-arch.vgic.vgic_dist_base = addr; + r = vgic_ioaddr_assign(kvm, vgic-vgic_dist_base, +addr, VGIC_DIST_SIZE); break; case KVM_VGIC_V2_ADDR_TYPE_CPU: if (!IS_VGIC_ADDR_UNDEF(vgic-vgic_cpu_base)) return -EEXIST; if (addr + VGIC_CPU_SIZE addr) return -EINVAL; - kvm-arch.vgic.vgic_cpu_base = addr; + r = vgic_ioaddr_assign(kvm, vgic-vgic_cpu_base, +addr, VGIC_CPU_SIZE); You've left the checking in here, so now everything is checked twice. Also, you should probably touch bases with Marc as I'm under the impression that both of you are looking at addressing my comments so it would be good to avoid tripping over each other on this. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5] virtio-spec: virtio network device RFS support
Add RFS support to virtio network device. Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new configuration field max_virtqueue_pairs to detect supported number of virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program packet steering for unidirectional protocols. --- Changes from v5: - Address Rusty's comments. Changes are only in the text, not the ideas. - Some minor formatting changes. Changes from v4: - address Jason's comments - have configuration specify the number of VQ pairs and not pairs - 1 Changes from v3: - rename multiqueue - rfs this is what we support - Be more explicit about what driver should do. - Simplify layout making VQs functionality depend on feature. - Remove unused commands, only leave in programming # of queues Changes from v2: Address Jason's comments on v2: - Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX: this is both clearer and easier to support. It does not look like we need a separate steering command since host can just watch tx packets as they go. - Moved RX and TX steering sections near each other. - Add motivation for other changes in v2 Changes from Jason's rfc: - reserved vq 3: this makes all rx vqs even and tx vqs odd, which looks nicer to me. - documented packet steering, added a generalized steering programming command. Current modes are single queue and host driven multiqueue, but I envision support for guest driven multiqueue in the future. - make default vqs unused when in mq mode - this wastes some memory but makes it more efficient to switch between modes as we can avoid this causing packet reordering. Rusty, could you please take a look and comment soon? If this looks OK to everyone, we can proceed with finalizing the implementation. Would be nice to try and put it in 3.8. virtio-spec.lyx | 310 ++-- 1 file changed, 301 insertions(+), 9 deletions(-) diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 83f2771..119925c 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -59,6 +59,7 @@ \author -608949062 Rusty Russell,,, \author -385801441 Cornelia Huck cornelia.h...@de.ibm.com \author 1531152142 Paolo Bonzini,,, +\author 1986246365 Michael S. Tsirkin \end_header \begin_body @@ -4170,9 +4171,46 @@ ID 1 \end_layout \begin_layout Description -Virtqueues 0:receiveq. - 1:transmitq. - 2:controlq +Virtqueues 0:receiveq +\change_inserted 1986246365 1352742829 +0 +\change_unchanged +. + 1:transmitq +\change_inserted 1986246365 1352742832 +0 +\change_deleted 1986246365 1352742947 +. + +\change_inserted 1986246365 1352742952 +. + + 2N +\begin_inset Foot +status open + +\begin_layout Plain Layout + +\change_inserted 1986246365 1354531595 +N=0 if VIRTIO_NET_F_RFS is not negotiated, otherwise N is derived from +\emph on +max_virtqueue_pairs +\emph default + control +\emph on + +\emph default +field. + +\end_layout + +\end_inset + +: receivqN. + 2N+1: transmitqN. + 2N+ +\change_unchanged +2:controlq \begin_inset Foot status open @@ -4343,6 +4381,16 @@ VIRTIO_NET_F_CTRL_VLAN \begin_layout Description VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets. +\change_inserted 1986246365 1352742767 + +\end_layout + +\begin_layout Description + +\change_inserted 1986246365 1352742808 +VIRTIO_NET_F_RFS(22) Device supports Receive Flow Steering. +\change_unchanged + \end_layout \end_deeper @@ -4355,11 +4403,45 @@ configuration \begin_inset space ~ \end_inset -layout Two configuration fields are currently defined. +layout +\change_deleted 1986246365 1352743300 +Two +\change_inserted 1986246365 1354531413 +Three +\change_unchanged + configuration fields are currently defined. The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC is set), and the status field only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN K_UP and VIRTIO_NET_S_ANNOUNCE. + +\change_inserted 1986246365 1354531470 + The following read-only field, +\emph on +max_virtqueue_pairs +\emph default + only exists if VIRTIO_NET_F_RFS is set. + This field specifies the maximum number of each of transmit and receive + virtqueues (receiveq0..receiveq +\emph on +N +\emph default + and transmitq0..transmitq +\emph on +N +\emph default + respectively; +\emph on +N +\emph default += +\emph on +max_virtqueue_pairs - 1 +\emph default +) that can be configured once VIRTIO_NET_F_RFS is negotiated. + Legal values for this field are 1 to 8000h. + +\change_unchanged \begin_inset listings inline false @@ -4392,6 +4474,17 @@ struct virtio_net_config { \begin_layout Plain Layout u16 status; +\change_inserted 1986246365 1354531427 + +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1354531437 + +u16 max_virtqueue_pairs; +\change_unchanged + \end_layout \begin_layout Plain Layout @@ -4410,7 +4503,24 @@ Device Initialization
Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
On Mon, Dec 03, 2012 at 06:30:49PM +0800, Jason Wang wrote: On 12/03/2012 06:14 PM, Michael S. Tsirkin wrote: On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote: -if (!try_fill_recv(vi-rq, GFP_KERNEL)) -schedule_delayed_work(vi-rq.refill, 0); +for (i = 0; i vi-max_queue_pairs; i++) +if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) +schedule_delayed_work(vi-rq[i].refill, 0); mutex_lock(vi-config_lock); vi-config_enable = true; mutex_unlock(vi-config_lock); +BUG_ON(virtnet_set_queues(vi)); + return 0; } #endif Also crashing on device nack of command is also not nice. In this case it seems we can just switch to single-queue mode which should always be safe. Not sure it's safe. It depends on the reason why this call fails. If we left a state that the driver only use single queue but the device use multi queues, we may still lost the network. Not the way driver is currently written - you'll happily process incoming packets from all queues so no problem? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 09/14] KVM: ARM: Emulation framework and CP15 emulation
On Fri, Nov 30, 2012 at 08:22:28PM +, Christoffer Dall wrote: On Mon, Nov 19, 2012 at 10:01 AM, Will Deacon will.dea...@arm.com wrote: On Sat, Nov 10, 2012 at 03:43:13PM +, Christoffer Dall wrote: This should probably be 0xff and also use the macros that Lorenzo is introducing: http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/132977.html in the A15 TRM bits [7:2] are reserved, so we really only do care about bits [1:0], and this file is A15 specific, but if you prefer, I can merge this: The upper bits are RAZ, so yes, please do merge something like this. diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c index 55cb4a3..685063a 100644 --- a/arch/arm/kvm/coproc_a15.c +++ b/arch/arm/kvm/coproc_a15.c @@ -24,8 +24,6 @@ #include asm/kvm_coproc.h #include linux/init.h -#define MPIDR_CPUID0x3 - static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r) { /* @@ -35,8 +33,8 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r) * revealing the underlying hardware properties is likely to * be the best choice). */ - vcpu-arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() ~MPIDR_CPUID) - | (vcpu-vcpu_id MPIDR_CPUID); + vcpu-arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() ~MPIDR_LEVEL_MASK) + | (vcpu-vcpu_id MPIDR_LEVEL_MASK); } #include coproc.h Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
On Mon, Dec 03, 2012 at 06:01:58PM +0800, Jason Wang wrote: On 12/03/2012 05:47 PM, Michael S. Tsirkin wrote: On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote: On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote: Jason Wang jasow...@redhat.com writes: +static const struct ethtool_ops virtnet_ethtool_ops; + +/* + * Converting between virtqueue no. and kernel tx/rx queue no. + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN + */ +static int vq2txq(struct virtqueue *vq) +{ +int index = virtqueue_get_queue_index(vq); +return index == 1 ? 0 : (index - 2) / 2; +} + +static int txq2vq(int txq) +{ +return txq ? 2 * txq + 2 : 1; +} + +static int vq2rxq(struct virtqueue *vq) +{ +int index = virtqueue_get_queue_index(vq); +return index ? (index - 1) / 2 : 0; +} + +static int rxq2vq(int rxq) +{ +return rxq ? 2 * rxq + 1 : 0; +} + I thought MST changed the proposed spec to make the control queue always the last one, so this logic becomes trivial. But it may break the support of legacy guest. If we boot a legacy single queue guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is indeed rx1. Legacy guyest support should be handled by host using feature bits in the usual way: host should detect legacy guest by checking the VIRTIO_NET_F_RFS feature. If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2. If it's not acked, cvq is vq 2. We could, but we didn't gain much from this. It just seems cleaner and easier to understand. Furthermore, we need also do the dynamic creation/destroying of virtqueues during feature negotiation which seems not supported in qemu now. It's not *done* in qemu now, but it seems easy: just call virtio_add_queue for vq2 and on from virtio_net_set_features. As features can be modified multiple times, we should add virtio_del_queue and call that beforehand to get to the known state (two vqs). -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next rfc v7 1/3] virtio-net: separate fields of sending/receiving queue from virtnet_info
On Mon, Dec 03, 2012 at 01:15:01PM +0800, Jason Wang wrote: + + /* Work struct for refilling if we run low on memory. */ + struct delayed_work refill; I can't really see the justificaiton for a refill per queue. Just have one work iterate all the queues if it happens, unless it happens often (in which case, we need to look harder at this anyway). But during this kind of iteration, we may need enable/disable the napi regardless of whether the receive queue has lots to be refilled. This may add extra latency. We are running from the timer, so latency is not a concern I think. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support
From: Dong Aisheng dong.aish...@linaro.org Add vgic state save and retore via KVM_SET_IRQCHIP/KVM_GET_IRQCHIP. Signed-off-by: Dong Aisheng dong.aish...@linaro.org --- The patch is mainly for implementing the vgic state save and retore function. I'm not sure the current implementation method is the most proper way. So i'd like to send out the patch for RFC on the direction and issues i met during the implementation. There're mainly two ways to implement such function while it looks different ways have different issues respectively. 1) Firstly i tried using ONE_REG interfaces that user spaces could read all virtual gic dist registers and gic virtual interfaces control registers for saving gic states. (The vitural cpu interfaces registers are not exported since it's state could be got from GICH_VMCR register which already reflects the basic cpu interface states.) The way to do this could be to mannually emulate MMIO access in kernel to re-use vgic_handle_mmio code since the vgic code in kernel does not maintain gic state in registers format and we can not simply memcpy the kernel gic registers to user spaces. For this way, the problem is that it looks a bit low efficiency for emulate register access and via current ONE_REG interface we do not know which CPU is performing the register access, so the banked registers are not suppported well, e.g. some banked dist registers and banked cpu virtual interfaces control registers. So i'm not sure we should still go follow this way. 2) Another way is, as x86, to use standard KVM_SET_IRQCHIP/KVM_GET_IRQCHIP ioctl to save and restore irq chip state. e.g. arch/x86/kvm/x86.c. This method does not have banked registers issue since it could all be handled by SW and it looks simpler than the first way. The problem is that the x86 kvm irqchip driver are using almost the same data structure as user space to represent irqchip states which then can be easily copied to user spaces. But arm vgic driver are using different data structure, e.g. qemu gic common code is using gic_state while in kernel vgic code is using a number of vgic_bitmaps. We could either read all the in-kernel vgic_bitmaps from user space and mannually convert it to git_state for user space to use to re-use the exist common state handle codes in qemu or convert to git_state format in kernel and then export to user space, but neither way looks good and they also look low efficiency, so i can not see enough reasons to do so. The current implementation in this patch does not do any convertion between the two different state formats and it just simply export what we have in kernel to represent gic state to user space via KVM_GET_IRQCHIP, that means we does not use much common state handle code in arm_gic_common.c in qemu. However, by comparing to first method, the disadvantage of second method is that it has side effect. Once the kernel vgic state code changes, it may also need change the user space code accordingly since the user space are using the kernel state definition. And another known issue in current patch, VGIC_MAX_CPUS, defined the size of CPU data in dist state, used is in headfile arch/arm/include/asm/kvm.h which will also be used by qemu, is dynamically defined in kernel, it looks we should not export it to user space and need find a way to fix. Besides that, there're aslo some headfiles cross-inclusion issue also need to resolved. I'd like to know your suggestions on which way is better and your comments on the issues i metioned above. --- Documentation/virtual/kvm/api.txt |6 +- arch/arm/include/asm/kvm.h| 11 +++ arch/arm/include/asm/kvm_vgic.h | 47 + arch/arm/kernel/asm-offsets.c | 16 ++-- arch/arm/kvm/arm.c| 83 ++- arch/arm/kvm/interrupts_head.S|6 +- arch/arm/kvm/vgic.c | 134 ++-- include/linux/kvm.h |3 + include/linux/kvm_host.h |3 +- 9 files changed, 213 insertions(+), 96 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7695425..e736689 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -654,7 +654,7 @@ struct kvm_irq_level { 4.26 KVM_GET_IRQCHIP Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64 +Architectures: x86, ia64, arm Type: vm ioctl Parameters: struct kvm_irqchip (in/out) Returns: 0 on success, -1 on error @@ -669,6 +669,7 @@ struct kvm_irqchip { char dummy[512]; /* reserving space */ struct kvm_pic_state pic; struct kvm_ioapic_state ioapic; + struct kvm_vgic_state vgic; } chip; }; @@ -676,7 +677,7 @@ struct kvm_irqchip { 4.27 KVM_SET_IRQCHIP Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64 +Architectures: x86, ia64, arm Type: vm ioctl Parameters: struct kvm_irqchip (in) Returns: 0 on success, -1 on error @@ -691,6 +692,7 @@
Re: [net-next rfc v7 3/3] virtio-net: change the number of queues through ethtool
On Mon, Dec 03, 2012 at 02:09:28PM +0800, Jason Wang wrote: On Sunday, December 02, 2012 06:09:06 PM Michael S. Tsirkin wrote: On Tue, Nov 27, 2012 at 06:16:00PM +0800, Jason Wang wrote: This patch implement the {set|get}_channels method of ethool to allow user to change the number of queues dymaically when the device is running. This would let the user to configure it on demand. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 41 + 1 files changed, 41 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index bcaa6e5..f08ec2a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1578,10 +1578,51 @@ static struct virtio_driver virtio_net_driver = { #endif }; +/* TODO: Eliminate OOO packets during switching */ +static int virtnet_set_channels(struct net_device *dev, + struct ethtool_channels *channels) +{ + struct virtnet_info *vi = netdev_priv(dev); + u16 queue_pairs = channels-combined_count; by the way shouldn't this be combined_count / 2? And below channels-max_combined = vi-max_queue_pairs * 2; ? + + /* We don't support separate rx/tx channels. + * We don't allow setting 'other' channels. + */ + if (channels-rx_count || channels-tx_count || channels-other_count) + return -EINVAL; + + /* Only two modes were support currently */ + if (queue_pairs != vi-max_queue_pairs queue_pairs != 1) + return -EINVAL; + Why the limitation? Not sure the value bettwen 1 and max_queue_pairs is useful. But anyway, I can remove this limitation. I guess a reasonable value would be number of active CPUs, which can change with CPU hotplug. Also how does userspace discover what the legal values are? Userspace only check whether the value is greater than max_queue_pairs. Exactly. One can query max combined but the fact that configuring any less is impossible might be surprising. + vi-curr_queue_pairs = queue_pairs; + BUG_ON(virtnet_set_queues(vi)); + + netif_set_real_num_tx_queues(dev, vi-curr_queue_pairs); + netif_set_real_num_rx_queues(dev, vi-curr_queue_pairs); + + return 0; +} + +static void virtnet_get_channels(struct net_device *dev, + struct ethtool_channels *channels) +{ + struct virtnet_info *vi = netdev_priv(dev); + + channels-combined_count = vi-curr_queue_pairs; + channels-max_combined = vi-max_queue_pairs; + channels-max_other = 0; + channels-rx_count = 0; + channels-tx_count = 0; + channels-other_count = 0; +} + static const struct ethtool_ops virtnet_ethtool_ops = { .get_drvinfo = virtnet_get_drvinfo, .get_link = ethtool_op_get_link, .get_ringparam = virtnet_get_ringparam, + .set_channels = virtnet_set_channels, + .get_channels = virtnet_get_channels, }; static int __init init(void) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support
On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote: Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Most of my previous comments still apply. Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Kevin Tian kevin.t...@intel.com --- arch/x86/include/asm/kvm_host.h |4 + arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c | 53 ++- arch/x86/kvm/lapic.c| 56 +--- arch/x86/kvm/lapic.h|6 ++ arch/x86/kvm/svm.c | 19 + arch/x86/kvm/vmx.c | 140 ++- arch/x86/kvm/x86.c | 34 -- virt/kvm/ioapic.c |1 + 9 files changed, 291 insertions(+), 33 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu); + void (*update_irq)(struct kvm_vcpu *vcpu); + void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector, + int trig_mode, int always_set); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 21101b6..1003341 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -62,6 +62,7 @@ #define EXIT_REASON_MCE_DURING_VMENTRY 41 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 +#define EXIT_REASON_EOI_INDUCED 45 #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 @@ -143,6 +144,7 @@ #define SECONDARY_EXEC_WBINVD_EXITING0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST0x0080 #define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x0100 +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID0x1000 @@ -180,6 +182,7 @@ enum vmcs_field { GUEST_GS_SELECTOR = 0x080a, GUEST_LDTR_SELECTOR = 0x080c, GUEST_TR_SELECTOR = 0x080e, + GUEST_INTR_STATUS = 0x0810, HOST_ES_SELECTOR= 0x0c00, HOST_CS_SELECTOR= 0x0c02, HOST_SS_SELECTOR= 0x0c04, @@ -207,6 +210,14 @@ enum vmcs_field { APIC_ACCESS_ADDR_HIGH = 0x2015, EPT_POINTER = 0x201a, EPT_POINTER_HIGH= 0x201b, + EOI_EXIT_BITMAP0= 0x201c, + EOI_EXIT_BITMAP0_HIGH = 0x201d, + EOI_EXIT_BITMAP1= 0x201e, + EOI_EXIT_BITMAP1_HIGH = 0x201f, + EOI_EXIT_BITMAP2= 0x2020, + EOI_EXIT_BITMAP2_HIGH = 0x2021, + EOI_EXIT_BITMAP3= 0x2022, + EOI_EXIT_BITMAP3_HIGH = 0x2023, GUEST_PHYSICAL_ADDRESS = 0x2400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401, VMCS_LINK_POINTER = 0x2800, diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 7e06ba1..f782788 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer); */ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) { - struct kvm_pic *s; - if (!irqchip_in_kernel(v-kvm)) return v-arch.interrupt.pending; - if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */ - if (kvm_apic_accept_pic_intr(v)) { - s = pic_irqchip(v-kvm);/* PIC */ - return s-output; - } else - return 0; - } + if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */ + return
Re: [RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support
On 3 December 2012 10:36, Dong Aisheng b29...@freescale.com wrote: The patch is mainly for implementing the vgic state save and retore function. I'm not sure the current implementation method is the most proper way. So i'd like to send out the patch for RFC on the direction and issues i met during the implementation. There're mainly two ways to implement such function while it looks different ways have different issues respectively. 1) Firstly i tried using ONE_REG interfaces that user spaces could read all virtual gic dist registers and gic virtual interfaces control registers for saving gic states. (The vitural cpu interfaces registers are not exported since it's state could be got from GICH_VMCR register which already reflects the basic cpu interface states.) The way to do this could be to mannually emulate MMIO access in kernel to re-use vgic_handle_mmio code since the vgic code in kernel does not maintain gic state in registers format and we can not simply memcpy the kernel gic registers to user spaces. For this way, the problem is that it looks a bit low efficiency for emulate register access Efficiency is really not a significant concern here, because state save and restore only happens when doing VM migration, which is already a slow process (mostly dominated by the costs of moving the VM memory from one machine to another). and via current ONE_REG interface we do not know which CPU is performing the register access, so the banked registers are not suppported well, Actually you do, because it's a vcpu ioctl. That does raise a different question, though. ONE_REG is currently aimed as a vcpu ioctl for CPU state save/restore -- how does it need to change to handle device state save/restore where the device is not per-cpu? e.g. some banked dist registers and banked cpu virtual interfaces control registers. So i'm not sure we should still go follow this way. You can handle banked registers the same way we handle the cache ID registers in the main CPU. In general ONE_REG is not supposed to expose direct access to registers which behave like the hardware registers: it is supposed to expose access to registers which merely store state, ie which can be idempotently saved and loaded. Sometimes this means the kernel has to invent its own registers which don't have side effects. 2) Another way is, as x86, to use standard KVM_SET_IRQCHIP/KVM_GET_IRQCHIP ioctl to save and restore irq chip state. e.g. arch/x86/kvm/x86.c. This method does not have banked registers issue since it could all be handled by SW and it looks simpler than the first way. The problem is that the x86 kvm irqchip driver are using almost the same data structure as user space to represent irqchip states which then can be easily copied to user spaces. But arm vgic driver are using different data structure, e.g. qemu gic common code is using gic_state while in kernel vgic code is using a number of vgic_bitmaps. We could either read all the in-kernel vgic_bitmaps from user space and mannually convert it to git_state for user space to use to re-use the exist common state handle codes in qemu or convert to git_state format in kernel and then export to user space, but neither way looks good and they also look low efficiency, so i can not see enough reasons to do so. Somebody needs to do this conversion, or TCG-to-KVM migrations won't work. I don't have a strong opinion whether it ought to be userspace or kernel; I think the deciding factor is probably going to be which is the easiest for the kernel to support into the future as a stable ABI (even as GIC versions change etc). The current implementation in this patch does not do any convertion between the two different state formats and it just simply export what we have in kernel to represent gic state to user space via KVM_GET_IRQCHIP, that means we does not use much common state handle code in arm_gic_common.c in qemu. This is definitely the wrong approach, I'm afraid. I still strongly feel we should be using the standard ONE_REG interfaces here (though there is perhaps an argument to be made against this, along the lines of my remarks above about it being a vcpu ioctl). However, by comparing to first method, the disadvantage of second method is that it has side effect. Once the kernel vgic state code changes, it may also need change the user space code accordingly since the user space are using the kernel state definition. Any approach that doesn't maintain binary compatibility across kernel versions is definitely not going to work. In particular you have to be able to accept state that may have been saved out by a different older kernel version (for doing VM migrations between host machines with different kernels installed). By far the largest part of the save/restore work here is figuring out what the right state to expose to userspace is so we can retain that compatibility guarantee. Whether we then use ONE_REG or a single struct as
Re: KVM VMX: register state after reset violates spec
Thus spake Gleb Natapov g...@redhat.com: On Thu, Nov 29, 2012 at 03:07:38PM +0100, Julian Stecklina wrote: Hello, we have noticed that at least on 3.6.8 with VMX after a VCPU has been reset via the INIT-SIPI-SIPI sequence its register state violates Intel's specification. [...] Shouldn't vmx_vcpu_reset actively clear those registers? And from a quick glance at the SVM code the problem might exist there, too. It should, so why not move the fix to kvm_vcpu_reset() so it will work for both. Also what about R8-R15? Intel SDM says nothing about them in the section you mention, but in Volume 1 section 3.4.1.1 is says: [...] I take it that they are undefined on the first transition to 64-bit mode too. AMD spec says that they should be zeroed on reset, so lets do that. Also SVM does not set EDX to correct value on reset. I'll post a revised patch later today. Julian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] KVM: ARM: Handle guest faults in KVM
On Fri, Nov 30, 2012 at 09:40:37PM +, Christoffer Dall wrote: On Mon, Nov 19, 2012 at 10:07 AM, Will Deacon will.dea...@arm.com wrote: Why are PIPT caches affected by this? The virtual address is irrelevant. The comment is slightly misleading, and I'll update it. Just so we're clear, this is the culprit: 1. guest uses page X, containing instruction A 2. page X gets swapped out 3. host uses page X, containing instruction B 4. instruction B enters i-cache at page X's cache line 5. page X gets swapped out 6. guest swaps page X back in 7. guest executes instruction B from cache, should execute instruction A Ok, that's clearer. Thanks for the explanation. The point is that with PIPT we can flush only that page from the icache using the host virtual address, as the MMU will do the translation on the fly. In the VIPT we have to nuke the whole thing (unless we . Unless we what? Could we flush using the host VA + all virtual aliases instead? +* (or another VM) may have used this page at the same virtual address +* as this guest, and we read incorrect data from the icache. If +* we're using a PIPT cache, we can invalidate just that page, but if +* we are using a VIPT cache we need to invalidate the entire icache - +* damn shame - as written in the ARM ARM (DDI 0406C - Page B3-1384) +*/ + if (icache_is_pipt()) { + unsigned long hva = gfn_to_hva(kvm, gfn); + __cpuc_coherent_user_range(hva, hva + PAGE_SIZE); + } else if (!icache_is_vivt_asid_tagged()) { + /* any kind of VIPT cache */ + __flush_icache_all(); + } so what if it *is* vivt_asid_tagged? Surely that necessitates nuking the thing, unless it's VMID tagged as well (does that even exist?). see page B3-1392 in the ARM ARM, if it's vivt_asid_tagged it is also vmid tagged. Great. + write_fault = false; + else + write_fault = true; + + if (fault_status == FSC_PERM !write_fault) { + kvm_err(Unexpected L2 read permission error\n); + return -EFAULT; + } + + /* We need minimum second+third level pages */ + ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS); + if (ret) + return ret; + + mmu_seq = vcpu-kvm-mmu_notifier_seq; + smp_rmb(); What's this barrier for and why isn't there a write barrier paired with it? The read barrier is to ensure that mmu_notifier_seq is read before we call gfn_to_pfn_prot (which is essentially get_user_pages), so that we don't get a page which is unmapped by an MMU notifier before we grab the spinlock that we would never see. I also added a comment explaining it in the patch below. There is a write barrier paired with it, see virt/kvm/kvm_main.c, specifically kvm_mmu_notifier_invalidate_page (the spin_unlock), and kvm_mmu_notifier_invalidate_range_end. Ok, so it sounds like a homebrew seqlock implementatiom. Any reason KVM isn't just using those directly? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 50891] The smp_affinity cannot work correctly on guest os when PCI passthrough device using msi/msi-x with KVM
https://bugzilla.kernel.org/show_bug.cgi?id=50891 liyi yiliker...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID --- Comment #14 from liyi yiliker...@gmail.com 2012-12-03 13:18:21 --- sorry to disturb all of you. this issue is fixed at the latest qemu-kvm version -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support
On 3 December 2012 12:02, Peter Maydell peter.mayd...@linaro.org wrote: By far the largest part of the save/restore work here is figuring out what the right state to expose to userspace is so we can retain that compatibility guarantee. Some further thoughts on this... What we're really providing the guest here is a hardware-accelerated software emulation of a no-virtualization GICv2. So it's better for the state we expose to userspace to be the state of that emulated hardware, and not to expose details of exactly what that hardware acceleration is. The hw accel is a property of the host CPU, not of the guest VM environment, so it could be different on different kernels or host CPUs, which would make migration potentially tedious. For instance, consider migration where the source machine has a VGIC with more list registers than the destination machine. Obviously you could convert between different representations in a number of places (source kernel, source qemu, destination qemu, destination kernel), but I think it makes sense for the canonical representation to be the guest-visible representation of the emulated hardware, rather than privileging any one acceleration implementation above another. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management
Hi Marc, I've managed to look at some more of the vgic code, so here is some more feedback. I've still not got to the end of the series, but there's light at the end of the tunnel... On Sat, Nov 10, 2012 at 03:45:05PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Add VGIC virtual CPU interface code, picking pending interrupts from the distributor and stashing them in the VGIC control interface list registers. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_vgic.h | 41 +++ arch/arm/kvm/vgic.c | 226 +++ 2 files changed, 266 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 9e60b1d..7229324 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -193,8 +193,45 @@ struct vgic_dist { }; struct vgic_cpu { +#ifdef CONFIG_KVM_ARM_VGIC + /* per IRQ to LR mapping */ + u8 vgic_irq_lr_map[VGIC_NR_IRQS]; per IRQ? + + /* Pending interrupts on this VCPU */ + DECLARE_BITMAP( pending, VGIC_NR_IRQS); + + /* Bitmap of used/free list registers */ + DECLARE_BITMAP( lr_used, 64); + + /* Number of list registers on this CPU */ + int nr_lr; + + /* CPU vif control registers for world switch */ + u32 vgic_hcr; + u32 vgic_vmcr; + u32 vgic_misr; /* Saved only */ + u32 vgic_eisr[2]; /* Saved only */ + u32 vgic_elrsr[2]; /* Saved only */ + u32 vgic_apr; + u32 vgic_lr[64];/* A15 has only 4... */ +#endif }; Looks like we should have a #define for the maximum number of list registers, so we keep vgic_lr and lr_user in sync. +#define VGIC_HCR_EN(1 0) +#define VGIC_HCR_UIE (1 1) + +#define VGIC_LR_VIRTUALID (0x3ff 0) +#define VGIC_LR_PHYSID_CPUID (7 10) +#define VGIC_LR_STATE (3 28) +#define VGIC_LR_PENDING_BIT(1 28) +#define VGIC_LR_ACTIVE_BIT (1 29) +#define VGIC_LR_EOI(1 19) + +#define VGIC_MISR_EOI (1 0) +#define VGIC_MISR_U(1 1) + +#define LR_EMPTY 0xff + Could stick these in asm/hardware/gic.h. I know they're not used by the gic driver, but they're the same piece of architecture so it's probably worth keeping in one place. You'd probably also want a s/VGIC/GICH/ struct kvm; struct kvm_vcpu; struct kvm_run; @@ -202,9 +239,13 @@ struct kvm_exit_mmio; #ifdef CONFIG_KVM_ARM_VGIC int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); +void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); +void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu); +int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio); +#define irqchip_in_kernel(k) (!!((k)-arch.vgic.vctrl_base)) #else static inline int kvm_vgic_hyp_init(void) { diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index 82feee8..d7cdec5 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -587,7 +587,25 @@ static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg) static int compute_pending_for_cpu(struct kvm_vcpu *vcpu) { - return 0; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + unsigned long *pending, *enabled, *pend; + int vcpu_id; + + vcpu_id = vcpu-vcpu_id; + pend = vcpu-arch.vgic_cpu.pending; + + pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id); + enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id); + bitmap_and(pend, pending, enabled, 32); pend and pending! vcpu_pending and dist_pending? + + pending = vgic_bitmap_get_shared_map(dist-irq_state); + enabled = vgic_bitmap_get_shared_map(dist-irq_enabled); + bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS); + bitmap_and(pend + 1, pend + 1, + vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]), + VGIC_NR_SHARED_IRQS); + + return (find_first_bit(pend, VGIC_NR_IRQS) VGIC_NR_IRQS); } /* @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm) } } +#define LR_PHYSID(lr) (((lr) VGIC_LR_PHYSID_CPUID) 10) Is VGIC_LR_PHYSID_CPUID wide enough for this? The CPUID is only 3 bits, but the interrupt ID could be larger. Or do you not supported hardware interrupt forwarding? (in which case, LR_PHYSID is a misleading name). +#define MK_LR_PEND(src, irq) (VGIC_LR_PENDING_BIT | ((src) 10) | (irq)) +/* + * Queue an interrupt to a CPU virtual interface. Return true on success, + * or false if it wasn't possible
Re: [PATCH v4 08/13] ARM: KVM: vgic: retire queued, disabled interrupts
On Sat, Nov 10, 2012 at 03:45:11PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com An interrupt may have been disabled after being made pending on the CPU interface (the classic case is a timer running while we're rebooting the guest - the interrupt would kick as soon as the CPU interface gets enabled, with deadly consequences). The solution is to examine already active LRs, and check the interrupt is still enabled. If not, just retire it. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/kvm/vgic.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index d7cdec5..dda5623 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -633,6 +633,34 @@ static void vgic_update_state(struct kvm *kvm) #define LR_PHYSID(lr)(((lr) VGIC_LR_PHYSID_CPUID) 10) #define MK_LR_PEND(src, irq) (VGIC_LR_PENDING_BIT | ((src) 10) | (irq)) + +/* + * An interrupt may have been disabled after being made pending on the + * CPU interface (the classic case is a timer running while we're + * rebooting the guest - the interrupt would kick as soon as the CPU + * interface gets enabled, with deadly consequences). + * + * The solution is to examine already active LRs, and check the + * interrupt is still enabled. If not, just retire it. + */ +static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu) +{ + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + int lr; + + for_each_set_bit(lr, vgic_cpu-lr_used, vgic_cpu-nr_lr) { + int irq = vgic_cpu-vgic_lr[lr] VGIC_LR_VIRTUALID; + + if (!vgic_bitmap_get_irq_val(dist-irq_enabled, + vcpu-vcpu_id, irq)) { + vgic_cpu-vgic_irq_lr_map[irq] = LR_EMPTY; + clear_bit(lr, vgic_cpu-lr_used); + vgic_cpu-vgic_lr[lr] = ~VGIC_LR_STATE; + } + } +} + /* * Queue an interrupt to a CPU virtual interface. Return true on success, * or false if it wasn't possible to queue it. @@ -696,6 +724,8 @@ static void __kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) vcpu_id = vcpu-vcpu_id; + vgic_retire_disabled_irqs(vcpu); Wouldn't it be better to do this when the interrupt is disabled, rather than do the checking in the sync_to_cpu path? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection
On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Plug the interrupt injection code. Interrupts can now be generated from user space. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_vgic.h |8 +++ arch/arm/kvm/arm.c | 29 + arch/arm/kvm/vgic.c | 90 +++ 3 files changed, 127 insertions(+) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 7229324..6e3d303 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -241,6 +241,8 @@ struct kvm_exit_mmio; int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu); +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, + bool level); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio); @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {} +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, + const struct kvm_irq_level *irq) +{ + return 0; +} + static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) { return 0; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 3ac1aab..f43da01 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level) switch (irq_type) { case KVM_ARM_IRQ_TYPE_CPU: + if (irqchip_in_kernel(kvm)) + return -ENXIO; + if (irq_num KVM_ARM_IRQ_CPU_FIQ) return -EINVAL; return vcpu_interrupt_line(vcpu, irq_num, level); +#ifdef CONFIG_KVM_ARM_VGIC + case KVM_ARM_IRQ_TYPE_PPI: + if (!irqchip_in_kernel(kvm)) + return -ENXIO; + + if (irq_num 16 || irq_num 31) + return -EINVAL; It's our favourite two numbers again! :) + + return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level); + case KVM_ARM_IRQ_TYPE_SPI: + if (!irqchip_in_kernel(kvm)) + return -ENXIO; + + if (irq_num 32 || irq_num KVM_ARM_IRQ_GIC_MAX) + return -EINVAL; + + return kvm_vgic_inject_irq(kvm, 0, irq_num, level); +#endif } return -EINVAL; @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp, void __user *argp = (void __user *)arg; switch (ioctl) { +#ifdef CONFIG_KVM_ARM_VGIC + case KVM_CREATE_IRQCHIP: { + if (vgic_present) + return kvm_vgic_create(kvm); + else + return -EINVAL; ENXIO? At least, that's what you use when setting the GIC addresses. + } +#endif case KVM_SET_DEVICE_ADDRESS: { struct kvm_device_address dev_addr; diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index dda5623..70040bb 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -75,6 +75,7 @@ #define ACCESS_WRITE_MASK(x) ((x) (3 1)) static void vgic_update_state(struct kvm *kvm); +static void vgic_kick_vcpus(struct kvm *kvm); static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg); static inline int vgic_irq_is_edge(struct vgic_dist *dist, int irq) @@ -542,6 +543,9 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exi kvm_prepare_mmio(run, mmio); kvm_handle_mmio_return(vcpu, run); + if (updated_state) + vgic_kick_vcpus(vcpu-kvm); + return true; } @@ -867,6 +871,92 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) return test_bit(vcpu-vcpu_id, dist-irq_pending_on_cpu); } +static void vgic_kick_vcpus(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu; + int c; + + /* + * We've injected an interrupt, time to find out who deserves + * a good kick... + */ + kvm_for_each_vcpu(c, vcpu, kvm) { + if (kvm_vgic_vcpu_pending_irq(vcpu)) + kvm_vcpu_kick(vcpu); + } +} + +static bool vgic_update_irq_state(struct kvm *kvm, int cpuid, + unsigned int irq_num, bool level) +{ + struct vgic_dist *dist = kvm-arch.vgic; + struct kvm_vcpu *vcpu; + int is_edge, is_level, state; + int enabled; + bool
Re: [PATCH v4 10/13] ARM: KVM: VGIC control interface world switch
On Sat, Nov 10, 2012 at 03:45:25PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Enable the VGIC control interface to be save-restored on world switch. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_arm.h | 12 +++ arch/arm/kernel/asm-offsets.c | 12 +++ arch/arm/kvm/interrupts_head.S | 68 3 files changed, 92 insertions(+) diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 246afd7..8f5dd22 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -192,4 +192,16 @@ #define HSR_EC_DABT (0x24) #define HSR_EC_DABT_HYP (0x25) +/* GICH offsets */ +#define GICH_HCR 0x0 +#define GICH_VTR 0x4 +#define GICH_VMCR0x8 +#define GICH_MISR0x10 +#define GICH_EISR0 0x20 +#define GICH_EISR1 0x24 +#define GICH_ELRSR0 0x30 +#define GICH_ELRSR1 0x34 +#define GICH_APR 0xf0 +#define GICH_LR0 0x100 Similar thing to the other new gic defines -- they're probably better off in gic.h + #endif /* __ARM_KVM_ARM_H__ */ diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index 95cab37..39b6221 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -167,6 +167,18 @@ int main(void) DEFINE(VCPU_HxFAR, offsetof(struct kvm_vcpu, arch.hxfar)); DEFINE(VCPU_HPFAR, offsetof(struct kvm_vcpu, arch.hpfar)); DEFINE(VCPU_HYP_PC,offsetof(struct kvm_vcpu, arch.hyp_pc)); +#ifdef CONFIG_KVM_ARM_VGIC + DEFINE(VCPU_VGIC_CPU, offsetof(struct kvm_vcpu, arch.vgic_cpu)); + DEFINE(VGIC_CPU_HCR, offsetof(struct vgic_cpu, vgic_hcr)); + DEFINE(VGIC_CPU_VMCR, offsetof(struct vgic_cpu, vgic_vmcr)); + DEFINE(VGIC_CPU_MISR, offsetof(struct vgic_cpu, vgic_misr)); + DEFINE(VGIC_CPU_EISR, offsetof(struct vgic_cpu, vgic_eisr)); + DEFINE(VGIC_CPU_ELRSR, offsetof(struct vgic_cpu, vgic_elrsr)); + DEFINE(VGIC_CPU_APR, offsetof(struct vgic_cpu, vgic_apr)); + DEFINE(VGIC_CPU_LR,offsetof(struct vgic_cpu, vgic_lr)); + DEFINE(VGIC_CPU_NR_LR, offsetof(struct vgic_cpu, nr_lr)); + DEFINE(KVM_VGIC_VCTRL, offsetof(struct kvm, arch.vgic.vctrl_base)); +#endif DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr)); #endif return 0; diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index 2ac8b4a..c2423d8 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -341,6 +341,45 @@ * @vcpup: Register pointing to VCPU struct */ .macro save_vgic_state vcpup +#ifdef CONFIG_KVM_ARM_VGIC + /* Get VGIC VCTRL base into r2 */ + ldr r2, [\vcpup, #VCPU_KVM] + ldr r2, [r2, #KVM_VGIC_VCTRL] + cmp r2, #0 + beq 2f + + /* Compute the address of struct vgic_cpu */ + add r11, \vcpup, #VCPU_VGIC_CPU Given that we're dealing with constants, it would be more efficient to express this addition as part of the immediate offset and let gas spit out the final computed address for the stores below. + + /* Save all interesting registers */ + ldr r3, [r2, #GICH_HCR] + ldr r4, [r2, #GICH_VMCR] + ldr r5, [r2, #GICH_MISR] + ldr r6, [r2, #GICH_EISR0] + ldr r7, [r2, #GICH_EISR1] + ldr r8, [r2, #GICH_ELRSR0] + ldr r9, [r2, #GICH_ELRSR1] + ldr r10, [r2, #GICH_APR] + + str r3, [r11, #VGIC_CPU_HCR] + str r4, [r11, #VGIC_CPU_VMCR] + str r5, [r11, #VGIC_CPU_MISR] + str r6, [r11, #VGIC_CPU_EISR] + str r7, [r11, #(VGIC_CPU_EISR + 4)] + str r8, [r11, #VGIC_CPU_ELRSR] + str r9, [r11, #(VGIC_CPU_ELRSR + 4)] + str r10, [r11, #VGIC_CPU_APR] + + /* Save list registers */ + add r2, r2, #GICH_LR0 + add r3, r11, #VGIC_CPU_LR + ldr r4, [r11, #VGIC_CPU_NR_LR] +1: ldr r6, [r2], #4 + str r6, [r3], #4 + subsr4, r4, #1 + bne 1b +2: +#endif .endm Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 1/2] x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
On Tue, Nov 27, 2012 at 02:18:47PM +0200, Gleb Natapov wrote: Eric, can you ACK it? Eric, ping. On Tue, Nov 27, 2012 at 11:26:02AM +0800, Zhang Yanfei wrote: This patch provides a way to VMCLEAR VMCSs related to guests on all cpus before executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the vmcore updated and non-corrupted. Signed-off-by: Zhang Yanfei zhangyan...@cn.fujitsu.com --- arch/x86/include/asm/kexec.h |2 ++ arch/x86/kernel/crash.c | 25 + 2 files changed, 27 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 317ff17..28feeba 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -163,6 +163,8 @@ struct kimage_arch { }; #endif +extern void (*crash_vmclear_loaded_vmcss)(void); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_KEXEC_H */ diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 13ad899..4a2a12f 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -16,6 +16,7 @@ #include linux/delay.h #include linux/elf.h #include linux/elfcore.h +#include linux/module.h #include asm/processor.h #include asm/hardirq.h @@ -29,6 +30,20 @@ #include asm/virtext.h int in_crash_kexec; + +/* + * This is used to VMCLEAR all VMCSs loaded on the + * processor. And when loading kvm_intel module, the + * callback function pointer will be assigned. + */ +void (*crash_vmclear_loaded_vmcss)(void) = NULL; +EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss); + +static inline void cpu_emergency_vmclear_loaded_vmcss(void) +{ + if (crash_vmclear_loaded_vmcss) + crash_vmclear_loaded_vmcss(); +} #if defined(CONFIG_SMP) defined(CONFIG_X86_LOCAL_APIC) @@ -46,6 +61,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) #endif crash_save_cpu(regs, cpu); + /* +* VMCLEAR VMCSs loaded on all cpus if needed. +*/ + cpu_emergency_vmclear_loaded_vmcss(); + /* Disable VMX or SVM if needed. * * We need to disable virtualization on all CPUs. @@ -88,6 +108,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs) kdump_nmi_shootdown_cpus(); + /* +* VMCLEAR VMCSs loaded on this cpu if needed. +*/ + cpu_emergency_vmclear_loaded_vmcss(); + /* Booting kdump kernel with VMX or SVM enabled won't work, * because (among other limitations) we can't disable paging * with the virt flags. -- 1.7.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: x86: Make register state after reset conform to specification
Floating point initialization is moved to kvm_arch_vcpu_init. Added general purpose register clearing to the same function. SVM code now properly initializes EDX. Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de --- arch/x86/kvm/cpuid.c | 1 + arch/x86/kvm/svm.c | 7 +-- arch/x86/kvm/vmx.c | 7 +-- arch/x86/kvm/x86.c | 12 +++- 4 files changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0595f13..aa468c2 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -659,6 +659,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) } else *eax = *ebx = *ecx = *edx = 0; } +EXPORT_SYMBOL_GPL(kvm_cpuid); void kvm_emulate_cpuid(struct kvm_vcpu *vcpu) { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index baead95..642213a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -20,6 +20,7 @@ #include mmu.h #include kvm_cache_regs.h #include x86.h +#include cpuid.h #include linux/module.h #include linux/mod_devicetable.h @@ -1190,6 +1191,7 @@ static void init_vmcb(struct vcpu_svm *svm) static int svm_vcpu_reset(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + u32 dummy, eax; init_vmcb(svm); @@ -1198,8 +1200,9 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu) svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector 12; svm-vmcb-save.cs.selector = svm-vcpu.arch.sipi_vector 8; } - vcpu-arch.regs_avail = ~0; - vcpu-arch.regs_dirty = ~0; + + kvm_cpuid(vcpu, eax, dummy, dummy, dummy); + kvm_register_write(vcpu, VCPU_REGS_RDX, eax); return 0; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ff66a3b..363fb03 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3942,7 +3942,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) u64 msr; int ret; - vcpu-arch.regs_avail = ~((1 VCPU_REGS_RIP) | (1 VCPU_REGS_RSP)); + vcpu-arch.regs_avail = ~(1 VCPU_REGS_RIP); vmx-rmode.vm86_active = 0; @@ -3955,10 +3955,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) msr |= MSR_IA32_APICBASE_BSP; kvm_set_apic_base(vmx-vcpu, msr); - ret = fx_init(vmx-vcpu); - if (ret != 0) - goto out; - vmx_segment_cache_clear(vmx); seg_setup(VCPU_SREG_CS); @@ -3999,7 +3995,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) kvm_rip_write(vcpu, 0xfff0); else kvm_rip_write(vcpu, 0); - kvm_register_write(vcpu, VCPU_REGS_RSP, 0); vmcs_writel(GUEST_DR7, 0x400); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2966c84..9640ea5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6045,6 +6045,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) { + int ret; + atomic_set(vcpu-arch.nmi_queued, 0); vcpu-arch.nmi_pending = 0; vcpu-arch.nmi_injected = false; @@ -6066,7 +6068,15 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) kvm_pmu_reset(vcpu); - return kvm_x86_ops-vcpu_reset(vcpu); + memset(vcpu-arch.regs, 0, sizeof(vcpu-arch.regs)); + vcpu-arch.regs_avail = ~0; + vcpu-arch.regs_dirty = ~0; + + if ((ret = fx_init(vcpu)) != 0) goto out; + + ret = kvm_x86_ops-vcpu_reset(vcpu); +out: + return ret; } int kvm_arch_hardware_enable(void *garbage) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM VMX: register state after reset violates spec
Thus spake Gleb Natapov g...@redhat.com: It should, so why not move the fix to kvm_vcpu_reset() so it will work for both. Also what about R8-R15? Intel SDM says nothing about them in the section you mention, but in Volume 1 section 3.4.1.1 is says: [...] I take it that they are undefined on the first transition to 64-bit mode too. AMD spec says that they should be zeroed on reset, so lets do that. Also SVM does not set EDX to correct value on reset. It should be: I have posted a new version of the patch taking your suggestions into account. The VMX version is working for me. I could not test it on AMD hardware, though. Julian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management
On 03/12/12 13:23, Will Deacon wrote: Hi Marc, I've managed to look at some more of the vgic code, so here is some more feedback. I've still not got to the end of the series, but there's light at the end of the tunnel... On Sat, Nov 10, 2012 at 03:45:05PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Add VGIC virtual CPU interface code, picking pending interrupts from the distributor and stashing them in the VGIC control interface list registers. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_vgic.h | 41 +++ arch/arm/kvm/vgic.c | 226 +++ 2 files changed, 266 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 9e60b1d..7229324 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -193,8 +193,45 @@ struct vgic_dist { }; struct vgic_cpu { +#ifdef CONFIG_KVM_ARM_VGIC + /* per IRQ to LR mapping */ + u8 vgic_irq_lr_map[VGIC_NR_IRQS]; per IRQ? Yes. We need to track which IRQ maps to which LR (so we can piggyback a pending interrupt on an active one). + + /* Pending interrupts on this VCPU */ + DECLARE_BITMAP( pending, VGIC_NR_IRQS); + + /* Bitmap of used/free list registers */ + DECLARE_BITMAP( lr_used, 64); + + /* Number of list registers on this CPU */ + int nr_lr; + + /* CPU vif control registers for world switch */ + u32 vgic_hcr; + u32 vgic_vmcr; + u32 vgic_misr; /* Saved only */ + u32 vgic_eisr[2]; /* Saved only */ + u32 vgic_elrsr[2]; /* Saved only */ + u32 vgic_apr; + u32 vgic_lr[64];/* A15 has only 4... */ +#endif }; Looks like we should have a #define for the maximum number of list registers, so we keep vgic_lr and lr_user in sync. Indeed. +#define VGIC_HCR_EN(1 0) +#define VGIC_HCR_UIE (1 1) + +#define VGIC_LR_VIRTUALID (0x3ff 0) +#define VGIC_LR_PHYSID_CPUID (7 10) +#define VGIC_LR_STATE (3 28) +#define VGIC_LR_PENDING_BIT(1 28) +#define VGIC_LR_ACTIVE_BIT (1 29) +#define VGIC_LR_EOI(1 19) + +#define VGIC_MISR_EOI (1 0) +#define VGIC_MISR_U(1 1) + +#define LR_EMPTY 0xff + Could stick these in asm/hardware/gic.h. I know they're not used by the gic driver, but they're the same piece of architecture so it's probably worth keeping in one place. This is on my list of things to do once the GIC code is shared between arm and arm64. Could do it earlier if that makes more sense. You'd probably also want a s/VGIC/GICH/ Sure. struct kvm; struct kvm_vcpu; struct kvm_run; @@ -202,9 +239,13 @@ struct kvm_exit_mmio; #ifdef CONFIG_KVM_ARM_VGIC int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); +void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); +void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu); +int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio); +#define irqchip_in_kernel(k) (!!((k)-arch.vgic.vctrl_base)) #else static inline int kvm_vgic_hyp_init(void) { diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index 82feee8..d7cdec5 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -587,7 +587,25 @@ static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg) static int compute_pending_for_cpu(struct kvm_vcpu *vcpu) { - return 0; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + unsigned long *pending, *enabled, *pend; + int vcpu_id; + + vcpu_id = vcpu-vcpu_id; + pend = vcpu-arch.vgic_cpu.pending; + + pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id); + enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id); + bitmap_and(pend, pending, enabled, 32); pend and pending! vcpu_pending and dist_pending? A lot of that code has already been reworked. See: https://lists.cs.columbia.edu/pipermail/kvmarm/2012-November/004138.html + + pending = vgic_bitmap_get_shared_map(dist-irq_state); + enabled = vgic_bitmap_get_shared_map(dist-irq_enabled); + bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS); + bitmap_and(pend + 1, pend + 1, + vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]), + VGIC_NR_SHARED_IRQS); + + return (find_first_bit(pend, VGIC_NR_IRQS) VGIC_NR_IRQS); } /* @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm) } } +#define LR_PHYSID(lr) (((lr)
Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection
On 03/12/12 13:25, Will Deacon wrote: On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Plug the interrupt injection code. Interrupts can now be generated from user space. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_vgic.h |8 +++ arch/arm/kvm/arm.c | 29 + arch/arm/kvm/vgic.c | 90 +++ 3 files changed, 127 insertions(+) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 7229324..6e3d303 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -241,6 +241,8 @@ struct kvm_exit_mmio; int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu); +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, +bool level); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio); @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {} +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, + const struct kvm_irq_level *irq) +{ +return 0; +} + static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) { return 0; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 3ac1aab..f43da01 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level) switch (irq_type) { case KVM_ARM_IRQ_TYPE_CPU: +if (irqchip_in_kernel(kvm)) +return -ENXIO; + if (irq_num KVM_ARM_IRQ_CPU_FIQ) return -EINVAL; return vcpu_interrupt_line(vcpu, irq_num, level); +#ifdef CONFIG_KVM_ARM_VGIC +case KVM_ARM_IRQ_TYPE_PPI: +if (!irqchip_in_kernel(kvm)) +return -ENXIO; + +if (irq_num 16 || irq_num 31) +return -EINVAL; It's our favourite two numbers again! :) I already fixed a number of them. Probably missed this one though. + +return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level); +case KVM_ARM_IRQ_TYPE_SPI: +if (!irqchip_in_kernel(kvm)) +return -ENXIO; + +if (irq_num 32 || irq_num KVM_ARM_IRQ_GIC_MAX) +return -EINVAL; + +return kvm_vgic_inject_irq(kvm, 0, irq_num, level); +#endif } return -EINVAL; @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp, void __user *argp = (void __user *)arg; switch (ioctl) { +#ifdef CONFIG_KVM_ARM_VGIC +case KVM_CREATE_IRQCHIP: { +if (vgic_present) +return kvm_vgic_create(kvm); +else +return -EINVAL; ENXIO? At least, that's what you use when setting the GIC addresses. -EINVAL seems to be one of the values other archs are using. -ENXIO is not one of them for KVM_CREATE_IRQCHIP. Doesn't mean they are right, but for the sake of keeping userspace happy, I'm not really inclined to change this. Christoffer? +} +#endif case KVM_SET_DEVICE_ADDRESS: { struct kvm_device_address dev_addr; diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index dda5623..70040bb 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -75,6 +75,7 @@ #define ACCESS_WRITE_MASK(x)((x) (3 1)) static void vgic_update_state(struct kvm *kvm); +static void vgic_kick_vcpus(struct kvm *kvm); static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg); static inline int vgic_irq_is_edge(struct vgic_dist *dist, int irq) @@ -542,6 +543,9 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exi kvm_prepare_mmio(run, mmio); kvm_handle_mmio_return(vcpu, run); +if (updated_state) +vgic_kick_vcpus(vcpu-kvm); + return true; } @@ -867,6 +871,92 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) return test_bit(vcpu-vcpu_id, dist-irq_pending_on_cpu); } +static void vgic_kick_vcpus(struct kvm *kvm) +{ +struct kvm_vcpu *vcpu; +int c; + +/* + * We've injected an interrupt, time to find out who deserves + * a good kick... + */ +kvm_for_each_vcpu(c, vcpu, kvm) { +if (kvm_vgic_vcpu_pending_irq(vcpu)) +
Re: [PATCH v4 10/13] ARM: KVM: VGIC control interface world switch
On 03/12/12 13:31, Will Deacon wrote: On Sat, Nov 10, 2012 at 03:45:25PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Enable the VGIC control interface to be save-restored on world switch. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_arm.h | 12 +++ arch/arm/kernel/asm-offsets.c | 12 +++ arch/arm/kvm/interrupts_head.S | 68 3 files changed, 92 insertions(+) diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 246afd7..8f5dd22 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -192,4 +192,16 @@ #define HSR_EC_DABT (0x24) #define HSR_EC_DABT_HYP (0x25) +/* GICH offsets */ +#define GICH_HCR0x0 +#define GICH_VTR0x4 +#define GICH_VMCR 0x8 +#define GICH_MISR 0x10 +#define GICH_EISR0 0x20 +#define GICH_EISR1 0x24 +#define GICH_ELRSR0 0x30 +#define GICH_ELRSR1 0x34 +#define GICH_APR0xf0 +#define GICH_LR00x100 Similar thing to the other new gic defines -- they're probably better off in gic.h Agreed. + #endif /* __ARM_KVM_ARM_H__ */ diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index 95cab37..39b6221 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -167,6 +167,18 @@ int main(void) DEFINE(VCPU_HxFAR,offsetof(struct kvm_vcpu, arch.hxfar)); DEFINE(VCPU_HPFAR,offsetof(struct kvm_vcpu, arch.hpfar)); DEFINE(VCPU_HYP_PC, offsetof(struct kvm_vcpu, arch.hyp_pc)); +#ifdef CONFIG_KVM_ARM_VGIC + DEFINE(VCPU_VGIC_CPU, offsetof(struct kvm_vcpu, arch.vgic_cpu)); + DEFINE(VGIC_CPU_HCR, offsetof(struct vgic_cpu, vgic_hcr)); + DEFINE(VGIC_CPU_VMCR, offsetof(struct vgic_cpu, vgic_vmcr)); + DEFINE(VGIC_CPU_MISR, offsetof(struct vgic_cpu, vgic_misr)); + DEFINE(VGIC_CPU_EISR, offsetof(struct vgic_cpu, vgic_eisr)); + DEFINE(VGIC_CPU_ELRSR,offsetof(struct vgic_cpu, vgic_elrsr)); + DEFINE(VGIC_CPU_APR, offsetof(struct vgic_cpu, vgic_apr)); + DEFINE(VGIC_CPU_LR, offsetof(struct vgic_cpu, vgic_lr)); + DEFINE(VGIC_CPU_NR_LR,offsetof(struct vgic_cpu, nr_lr)); + DEFINE(KVM_VGIC_VCTRL,offsetof(struct kvm, arch.vgic.vctrl_base)); +#endif DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr)); #endif return 0; diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index 2ac8b4a..c2423d8 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -341,6 +341,45 @@ * @vcpup: Register pointing to VCPU struct */ .macro save_vgic_state vcpup +#ifdef CONFIG_KVM_ARM_VGIC +/* Get VGIC VCTRL base into r2 */ +ldr r2, [\vcpup, #VCPU_KVM] +ldr r2, [r2, #KVM_VGIC_VCTRL] +cmp r2, #0 +beq 2f + +/* Compute the address of struct vgic_cpu */ +add r11, \vcpup, #VCPU_VGIC_CPU Given that we're dealing with constants, it would be more efficient to express this addition as part of the immediate offset and let gas spit out the final computed address for the stores below. We had that for a while, and with the kvm_vcpu structure growing, we ended up having fields out of the reach of an immediate offset. Is kvm_vcpu too big? Yes. + +/* Save all interesting registers */ +ldr r3, [r2, #GICH_HCR] +ldr r4, [r2, #GICH_VMCR] +ldr r5, [r2, #GICH_MISR] +ldr r6, [r2, #GICH_EISR0] +ldr r7, [r2, #GICH_EISR1] +ldr r8, [r2, #GICH_ELRSR0] +ldr r9, [r2, #GICH_ELRSR1] +ldr r10, [r2, #GICH_APR] + +str r3, [r11, #VGIC_CPU_HCR] +str r4, [r11, #VGIC_CPU_VMCR] +str r5, [r11, #VGIC_CPU_MISR] +str r6, [r11, #VGIC_CPU_EISR] +str r7, [r11, #(VGIC_CPU_EISR + 4)] +str r8, [r11, #VGIC_CPU_ELRSR] +str r9, [r11, #(VGIC_CPU_ELRSR + 4)] +str r10, [r11, #VGIC_CPU_APR] + +/* Save list registers */ +add r2, r2, #GICH_LR0 +add r3, r11, #VGIC_CPU_LR +ldr r4, [r11, #VGIC_CPU_NR_LR] +1: ldr r6, [r2], #4 +str r6, [r3], #4 +subsr4, r4, #1 +bne 1b +2: +#endif .endm Will -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management
On Mon, Dec 03, 2012 at 02:11:03PM +, Marc Zyngier wrote: On 03/12/12 13:23, Will Deacon wrote: +#define VGIC_HCR_EN(1 0) +#define VGIC_HCR_UIE (1 1) + +#define VGIC_LR_VIRTUALID (0x3ff 0) +#define VGIC_LR_PHYSID_CPUID (7 10) +#define VGIC_LR_STATE (3 28) +#define VGIC_LR_PENDING_BIT(1 28) +#define VGIC_LR_ACTIVE_BIT (1 29) +#define VGIC_LR_EOI(1 19) + +#define VGIC_MISR_EOI (1 0) +#define VGIC_MISR_U(1 1) + +#define LR_EMPTY 0xff + Could stick these in asm/hardware/gic.h. I know they're not used by the gic driver, but they're the same piece of architecture so it's probably worth keeping in one place. This is on my list of things to do once the GIC code is shared between arm and arm64. Could do it earlier if that makes more sense. Might as well as I found some others in a later patch too. static int compute_pending_for_cpu(struct kvm_vcpu *vcpu) { - return 0; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + unsigned long *pending, *enabled, *pend; + int vcpu_id; + + vcpu_id = vcpu-vcpu_id; + pend = vcpu-arch.vgic_cpu.pending; + + pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id); + enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id); + bitmap_and(pend, pending, enabled, 32); pend and pending! vcpu_pending and dist_pending? A lot of that code has already been reworked. See: https://lists.cs.columbia.edu/pipermail/kvmarm/2012-November/004138.html Argh, too much code! Ok, as long as it's being looked at. + + pending = vgic_bitmap_get_shared_map(dist-irq_state); + enabled = vgic_bitmap_get_shared_map(dist-irq_enabled); + bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS); + bitmap_and(pend + 1, pend + 1, + vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]), + VGIC_NR_SHARED_IRQS); + + return (find_first_bit(pend, VGIC_NR_IRQS) VGIC_NR_IRQS); } /* @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm) } } +#define LR_PHYSID(lr) (((lr) VGIC_LR_PHYSID_CPUID) 10) Is VGIC_LR_PHYSID_CPUID wide enough for this? The CPUID is only 3 bits, but the interrupt ID could be larger. Or do you not supported hardware interrupt forwarding? (in which case, LR_PHYSID is a misleading name). Hardware interrupt forwarding is not supported. PHYSID is the name of the actual field in the spec, hence the name of the macro. LR_CPUID? Sure. + kvm_debug(LR%d piggyback for IRQ%d %x\n, lr, irq, vgic_cpu-vgic_lr[lr]); + BUG_ON(!test_bit(lr, vgic_cpu-lr_used)); + vgic_cpu-vgic_lr[lr] |= VGIC_LR_PENDING_BIT; + if (is_level) + vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI; + return true; + } + + /* Try to use another LR for this interrupt */ + lr = find_first_bit((unsigned long *)vgic_cpu-vgic_elrsr, + vgic_cpu-nr_lr); + if (lr = vgic_cpu-nr_lr) + return false; + + kvm_debug(LR%d allocated for IRQ%d %x\n, lr, irq, sgi_source_id); + vgic_cpu-vgic_lr[lr] = MK_LR_PEND(sgi_source_id, irq); + if (is_level) + vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI; + + vgic_cpu-vgic_irq_lr_map[irq] = lr; + clear_bit(lr, (unsigned long *)vgic_cpu-vgic_elrsr); + set_bit(lr, vgic_cpu-lr_used); + + return true; +} I can't help but feel that this could be made cleaner by moving the level-specific EOI handling out into a separate function. Do you mean having two functions, one for edge and the other for level? Seems overkill to me. I could move the if (is_level) ... to a common spot though. Indeed, you could just have something like vgic_eoi_irq and call that in one place, letting that function do the level check. + +/* + * Fill the list registers with pending interrupts before running the + * guest. + */ +static void __kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) +{ + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + unsigned long *pending; + int i, c, vcpu_id; + int overflow = 0; + + vcpu_id = vcpu-vcpu_id; + + /* +* We may not have any pending interrupt, or the interrupts +* may have been serviced from another vcpu. In all cases, +* move along. +*/ + if (!kvm_vgic_vcpu_pending_irq(vcpu)) { + pr_debug(CPU%d has no pending interrupt\n, vcpu_id); + goto epilog; + } + + /* SGIs */ + pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id); + for_each_set_bit(i,
Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management
[...] + + clear_bit(c, sources); + } + + if (!sources) + clear_bit(i, pending); What does this signify and how does it happen? An SGI without a source sounds pretty weird... See the clear_bit() just above. Once all the sources for this SGI are cleared, we can make the interrupt not pending anymore. every time I read the code, I get completely bogged up on trying to understand this case and I tell myself we should put a comment here, then I understand why it happens and I think, oh it's obvious, no comment needed, but now I (almost) forgot again. Could we add a comment? -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection
[...] + +static bool vgic_update_irq_state(struct kvm *kvm, int cpuid, + unsigned int irq_num, bool level) +{ +struct vgic_dist *dist = kvm-arch.vgic; +struct kvm_vcpu *vcpu; +int is_edge, is_level, state; +int enabled; +bool ret = true; + +spin_lock(dist-lock); + +is_edge = vgic_irq_is_edge(dist, irq_num); +is_level = !is_edge; +state = vgic_bitmap_get_irq_val(dist-irq_state, cpuid, irq_num); + +/* + * Only inject an interrupt if: + * - level triggered and we change level + * - edge triggered and we have a rising edge + */ +if ((is_level !(state ^ level)) || (is_edge (state || !level))) { +ret = false; +goto out; +} Eek, more of the edge/level combo. Can this be be restructured so that we have vgic_update_{edge,level}_irq_state, which are called from here appropriately? I'll have a look. oh, you're no fun anymore. That if statement is one of the funniest pieces of this code. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] KVM: ARM: Handle guest faults in KVM
On Mon, Dec 3, 2012 at 8:06 AM, Will Deacon will.dea...@arm.com wrote: On Fri, Nov 30, 2012 at 09:40:37PM +, Christoffer Dall wrote: On Mon, Nov 19, 2012 at 10:07 AM, Will Deacon will.dea...@arm.com wrote: Why are PIPT caches affected by this? The virtual address is irrelevant. The comment is slightly misleading, and I'll update it. Just so we're clear, this is the culprit: 1. guest uses page X, containing instruction A 2. page X gets swapped out 3. host uses page X, containing instruction B 4. instruction B enters i-cache at page X's cache line 5. page X gets swapped out 6. guest swaps page X back in 7. guest executes instruction B from cache, should execute instruction A Ok, that's clearer. Thanks for the explanation. The point is that with PIPT we can flush only that page from the icache using the host virtual address, as the MMU will do the translation on the fly. In the VIPT we have to nuke the whole thing (unless we . Unless we what? Could we flush using the host VA + all virtual aliases instead? you'd have to know all the virtual addresses of the guest(s) mapping that physical page and then flush all the aliases of those addresses, which we don't know at this time. What we can do (down the road) is to mark the pages as XN, and catch the fault, get the virtual fault address, and then flush that single page. The tradeoffs need to be measured before implementing this imho, and is an optimization we can add later. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 08/14] KVM: ARM: World-switch implementation
On Mon, Dec 3, 2012 at 5:33 AM, Marc Zyngier marc.zyng...@arm.com wrote: On 30/11/12 18:49, Christoffer Dall wrote: On Fri, Nov 30, 2012 at 12:14 PM, Will Deacon will.dea...@arm.com wrote: On Fri, Nov 30, 2012 at 04:47:40PM +, Christoffer Dall wrote: On Fri, Nov 30, 2012 at 10:15 AM, Will Deacon will.dea...@arm.com wrote: At this point, VM1 is running and VM0:VCPU1 is running. VM0:VCPU0 is not running because physical CPU0 is handling an interrupt. The problem is that when VCPU0 *is* resumed, it will update the VMID of VM0 and could be scheduled in parallel with VCPU1 but with a different VMID. How do you avoid this in the current code? I don't. Nice catch. Please apply your interesting brain to the following fix:) I'm far too sober to look at your patch right now, but I'll think about it over the weekend [I can't break it at a quick glance] :) In the meantime, can you think about whether the TLB operations need to run on every CPU please? they don't we can invalidate the TLB and the icache using the inner shareability domain. Here's a patch: diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index ad1390f..df1b753 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -146,6 +146,7 @@ struct kvm_one_reg; int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); u64 kvm_call_hyp(void *hypfn, ...); +void force_vm_exit(const cpumask_t *mask); #define KVM_ARCH_WANT_MMU_NOTIFIER struct kvm; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index c4f631e..674592e 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -405,9 +405,14 @@ int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v) return v-mode == IN_GUEST_MODE; } -static void reset_vm_context(void *info) +/* Just ensure a guest exit from a particular CPU */ +static void exit_vm_noop(void *info) { - kvm_call_hyp(__kvm_flush_vm_context); +} + +void force_vm_exit(const cpumask_t *mask) +{ + smp_call_function_many(mask, exit_vm_noop, NULL, true); } Care to update the do_nothing() call in emulate.c to use this as well? yes, was actually already done: commit d1ff14417724e9062ec9d585bf3ef0537e12b54d Author: Christoffer Dall c.d...@virtualopensystems.com Date: Fri Nov 30 13:37:01 2012 -0500 KVM: ARM: Reuse force_vm_exit from copy_current_insn The VMID management code also needs to force VM exits, so reuse that functionality from both places. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c index 60e7ec0..ad743b7 100644 --- a/arch/arm/kvm/emulate.c +++ b/arch/arm/kvm/emulate.c @@ -221,11 +221,6 @@ static bool copy_from_guest_va(struct kvm_vcpu *vcpu, return true; } -/* Just ensure we're not running the guest. */ -static void do_nothing(void *info) -{ -} - /* * We have to be very careful copying memory from a running (ie. SMP) guest. * Another CPU may remap the page (eg. swap out a userspace text page) as we @@ -257,8 +252,9 @@ static bool copy_current_insn(struct kvm_vcpu *vcpu, unsigned long *instr) /* Kick out any which are still running. */ kvm_for_each_vcpu(i, v, vcpu-kvm) { /* Guest could exit now, making cpu wrong. That's OK. */ - if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) - smp_call_function_single(v-cpu, do_nothing, NULL, 1); + if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) { + force_vm_exit(get_cpu_mask(v-cpu)); + } } -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management
On 03/12/12 14:34, Will Deacon wrote: On Mon, Dec 03, 2012 at 02:11:03PM +, Marc Zyngier wrote: On 03/12/12 13:23, Will Deacon wrote: +#define VGIC_HCR_EN(1 0) +#define VGIC_HCR_UIE (1 1) + +#define VGIC_LR_VIRTUALID (0x3ff 0) +#define VGIC_LR_PHYSID_CPUID (7 10) +#define VGIC_LR_STATE (3 28) +#define VGIC_LR_PENDING_BIT(1 28) +#define VGIC_LR_ACTIVE_BIT (1 29) +#define VGIC_LR_EOI(1 19) + +#define VGIC_MISR_EOI (1 0) +#define VGIC_MISR_U(1 1) + +#define LR_EMPTY 0xff + Could stick these in asm/hardware/gic.h. I know they're not used by the gic driver, but they're the same piece of architecture so it's probably worth keeping in one place. This is on my list of things to do once the GIC code is shared between arm and arm64. Could do it earlier if that makes more sense. Might as well as I found some others in a later patch too. static int compute_pending_for_cpu(struct kvm_vcpu *vcpu) { - return 0; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + unsigned long *pending, *enabled, *pend; + int vcpu_id; + + vcpu_id = vcpu-vcpu_id; + pend = vcpu-arch.vgic_cpu.pending; + + pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id); + enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id); + bitmap_and(pend, pending, enabled, 32); pend and pending! vcpu_pending and dist_pending? A lot of that code has already been reworked. See: https://lists.cs.columbia.edu/pipermail/kvmarm/2012-November/004138.html Argh, too much code! Ok, as long as it's being looked at. + + pending = vgic_bitmap_get_shared_map(dist-irq_state); + enabled = vgic_bitmap_get_shared_map(dist-irq_enabled); + bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS); + bitmap_and(pend + 1, pend + 1, + vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]), + VGIC_NR_SHARED_IRQS); + + return (find_first_bit(pend, VGIC_NR_IRQS) VGIC_NR_IRQS); } /* @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm) } } +#define LR_PHYSID(lr) (((lr) VGIC_LR_PHYSID_CPUID) 10) Is VGIC_LR_PHYSID_CPUID wide enough for this? The CPUID is only 3 bits, but the interrupt ID could be larger. Or do you not supported hardware interrupt forwarding? (in which case, LR_PHYSID is a misleading name). Hardware interrupt forwarding is not supported. PHYSID is the name of the actual field in the spec, hence the name of the macro. LR_CPUID? Sure. + kvm_debug(LR%d piggyback for IRQ%d %x\n, lr, irq, vgic_cpu-vgic_lr[lr]); + BUG_ON(!test_bit(lr, vgic_cpu-lr_used)); + vgic_cpu-vgic_lr[lr] |= VGIC_LR_PENDING_BIT; + if (is_level) + vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI; + return true; + } + + /* Try to use another LR for this interrupt */ + lr = find_first_bit((unsigned long *)vgic_cpu-vgic_elrsr, + vgic_cpu-nr_lr); + if (lr = vgic_cpu-nr_lr) + return false; + + kvm_debug(LR%d allocated for IRQ%d %x\n, lr, irq, sgi_source_id); + vgic_cpu-vgic_lr[lr] = MK_LR_PEND(sgi_source_id, irq); + if (is_level) + vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI; + + vgic_cpu-vgic_irq_lr_map[irq] = lr; + clear_bit(lr, (unsigned long *)vgic_cpu-vgic_elrsr); + set_bit(lr, vgic_cpu-lr_used); + + return true; +} I can't help but feel that this could be made cleaner by moving the level-specific EOI handling out into a separate function. Do you mean having two functions, one for edge and the other for level? Seems overkill to me. I could move the if (is_level) ... to a common spot though. Indeed, you could just have something like vgic_eoi_irq and call that in one place, letting that function do the level check. + +/* + * Fill the list registers with pending interrupts before running the + * guest. + */ +static void __kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) +{ + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + unsigned long *pending; + int i, c, vcpu_id; + int overflow = 0; + + vcpu_id = vcpu-vcpu_id; + + /* +* We may not have any pending interrupt, or the interrupts +* may have been serviced from another vcpu. In all cases, +* move along. +*/ + if (!kvm_vgic_vcpu_pending_irq(vcpu)) { + pr_debug(CPU%d has no pending interrupt\n, vcpu_id); + goto epilog; + } + + /* SGIs */ + pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id); + for_each_set_bit(i, vgic_cpu-pending, 16) { + unsigned long sources; + +
KVM call agenda for 2012-12-04
Hi Please send in any agenda topics you are interested in. - migration troubles from 1.2 - 1.3 due to qemu-kvm integration Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv
On 12/01/2012 07:58:25 AM, Alexander Graf wrote: In BookE, EPCR is defined and valid when either the HV or the 64bit category are implemented. Reflect this in the field definition. Today the only KVM target on 64bit is HV enabled, so there is no change in actual source code, but this keeps the code closer to the spec and doesn't build up artificial road blocks for a PR KVM on 64bit. [snip] Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62fbd38..3480526 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -405,14 +405,19 @@ struct kvm_vcpu_arch { #ifdef CONFIG_KVM_BOOKE_HV u32 host_mas4; u32 host_mas6; - u32 shadow_epcr; - u32 epcr; u32 shadow_msrp; u32 eplc; u32 epsc; u32 oldpir; #endif +#if defined(CONFIG_BOOKE) +#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT) + u32 shadow_epcr; + u32 epcr; +#endif +#endif On a PR-mode implementation, why would we be have a shadow_epcr? It would always be set based on the host kernel, just like when running any other userspace process. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost-net: initialize zcopy packet counters
These packet counters are used to drive the zercopy selection heuristic so nothing too bad happens if they are off a bit - and they are also reset once in a while. But it's cleaner to clear them when backend is set so that we start in a known state. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 67898fa..ff6c9199 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -823,6 +823,9 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) r = vhost_init_used(vq); if (r) goto err_vq; + + n-tx_packets = 0; + n-tx_zcopy_err = 0; } mutex_unlock(vq-mutex); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] vfio powerpc: enabled on powernv platform
On Mon, 2012-12-03 at 13:52 +1100, Alexey Kardashevskiy wrote: This patch initializes IOMMU groups based on the IOMMU configuration discovered during the PCI scan on POWERNV (POWER non virtualized) platform. The IOMMU groups are to be used later by VFIO driver (PCI pass through). It also implements an API for mapping/unmapping pages for guest PCI drivers and providing DMA window properties. This API is going to be used later by QEMU-VFIO to handle h_put_tce hypercalls from the KVM guest. Although this driver has been tested only on the POWERNV platform, it should work on any platform which supports TCE tables. To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config option and configure VFIO as required. Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- arch/powerpc/include/asm/iommu.h |9 ++ arch/powerpc/kernel/iommu.c | 186 ++ arch/powerpc/platforms/powernv/pci.c | 135 drivers/iommu/Kconfig|8 ++ 4 files changed, 338 insertions(+) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index cbfe678..5c7087a 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -76,6 +76,9 @@ struct iommu_table { struct iommu_pool large_pool; struct iommu_pool pools[IOMMU_NR_POOLS]; unsigned long *it_map; /* A simple allocation bitmap for now */ +#ifdef CONFIG_IOMMU_API + struct iommu_group *it_group; +#endif }; struct scatterlist; @@ -147,5 +150,11 @@ static inline void iommu_restore(void) } #endif +extern long iommu_clear_tces(struct iommu_table *tbl, unsigned long entry, + unsigned long pages); +extern long iommu_put_tces(struct iommu_table *tbl, unsigned long entry, + uint64_t tce, enum dma_data_direction direction, + unsigned long pages); + #endif /* __KERNEL__ */ #endif /* _ASM_IOMMU_H */ diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index ff5a6ce..2738aa4 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -44,6 +44,7 @@ #include asm/kdump.h #include asm/fadump.h #include asm/vio.h +#include asm/tce.h #define DBG(...) @@ -856,3 +857,188 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size, free_pages((unsigned long)vaddr, get_order(size)); } } + +#ifdef CONFIG_IOMMU_API +/* + * SPAPR TCE API + */ + +/* + * Returns the number of used IOMMU pages (4K) within + * the same system page (4K or 64K). + * bitmap_weight is not used as it does not support bigendian maps. + */ +static int syspage_weight(unsigned long *map, unsigned long entry) +{ + int ret = 0, nbits = PAGE_SIZE/IOMMU_PAGE_SIZE; + + /* Aligns TCE entry number to system page boundary */ + entry = PAGE_MASK IOMMU_PAGE_SHIFT; + + /* Count used 4K pages */ + while (nbits--) + ret += (test_bit(entry++, map) == 0) ? 0 : 1; + + return ret; +} + +static void tce_flush(struct iommu_table *tbl) +{ + /* Flush/invalidate TLB caches if necessary */ + if (ppc_md.tce_flush) + ppc_md.tce_flush(tbl); + + /* Make sure updates are seen by hardware */ + mb(); +} + +/* + * iommu_clear_tces clears tces and returned the number of system pages + * which it called put_page() on + */ +static long clear_tces_nolock(struct iommu_table *tbl, unsigned long entry, + unsigned long pages) +{ + int i, retpages = 0; + unsigned long oldtce, oldweight; + struct page *page; + + for (i = 0; i pages; ++i) { + oldtce = ppc_md.tce_get(tbl, entry + i); + ppc_md.tce_free(tbl, entry + i, 1); + + oldweight = syspage_weight(tbl-it_map, entry); + __clear_bit(entry - tbl-it_offset, tbl-it_map); + + if (!(oldtce (TCE_PCI_WRITE | TCE_PCI_READ))) + continue; Could this happen earlier, above syspage_weight() and __clear_bit()? + + page = pfn_to_page(oldtce PAGE_SHIFT); + + WARN_ON(!page); + if (!page) + continue; + + if (oldtce TCE_PCI_WRITE) + SetPageDirty(page); + + put_page(page); + + /* That was the last IOMMU page within the system page */ + if ((oldweight == 1) !syspage_weight(tbl-it_map, entry)) + ++retpages; If you used __test_and_clear_bit() above I think you could avoid this 2nd call to syspage_weight. A minor optimization though. + } + + return retpages; +} + +/* + * iommu_clear_tces clears tces and returned the number + / of released system pages + */ Something bad happened to your comments here. +long iommu_clear_tces(struct iommu_table *tbl,
Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv
On 03.12.2012, at 17:47, Scott Wood wrote: On 12/01/2012 07:58:25 AM, Alexander Graf wrote: In BookE, EPCR is defined and valid when either the HV or the 64bit category are implemented. Reflect this in the field definition. Today the only KVM target on 64bit is HV enabled, so there is no change in actual source code, but this keeps the code closer to the spec and doesn't build up artificial road blocks for a PR KVM on 64bit. [snip] Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62fbd38..3480526 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -405,14 +405,19 @@ struct kvm_vcpu_arch { #ifdef CONFIG_KVM_BOOKE_HV u32 host_mas4; u32 host_mas6; -u32 shadow_epcr; -u32 epcr; u32 shadow_msrp; u32 eplc; u32 epsc; u32 oldpir; #endif +#if defined(CONFIG_BOOKE) +#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT) +u32 shadow_epcr; +u32 epcr; +#endif +#endif On a PR-mode implementation, why would we be have a shadow_epcr? It would always be set based on the host kernel, just like when running any other userspace process. Right - we could simply set MSR_CM. I'll move shadow_epcr back into the HV only bit above. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] vfio powerpc: implemented IOMMU driver for VFIO
On Mon, 2012-12-03 at 13:52 +1100, Alexey Kardashevskiy wrote: VFIO implements platform independent stuff such as a PCI driver, BAR access (via read/write on a file descriptor or direct mapping when possible) and IRQ signaling. The platform dependent part includes IOMMU initialization and handling. This patch implements an IOMMU driver for VFIO which does mapping/unmapping pages for the guest IO and provides information about DMA window (required by a POWERPC guest). The counterpart in QEMU is required to support this functionality. Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/Kconfig|6 + drivers/vfio/Makefile |1 + drivers/vfio/vfio_iommu_spapr_tce.c | 350 +++ include/linux/vfio.h| 26 +++ 4 files changed, 383 insertions(+) create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 7cd5dec..b464687 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1 depends on VFIO default n +config VFIO_IOMMU_SPAPR_TCE + tristate + depends on VFIO SPAPR_TCE_IOMMU + default n + menuconfig VFIO tristate VFIO Non-Privileged userspace driver framework depends on IOMMU_API select VFIO_IOMMU_TYPE1 if X86 + select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV help VFIO provides a framework for secure userspace device drivers. See Documentation/vfio.txt for more details. diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 2398d4a..72bfabc 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_VFIO) += vfio.o obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_PCI) += pci/ diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c new file mode 100644 index 000..806ad9f --- /dev/null +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -0,0 +1,350 @@ +/* + * VFIO: IOMMU DMA mapping support for TCE on POWER + * + * Copyright (C) 2012 IBM Corp. All rights reserved. + * Author: Alexey Kardashevskiy a...@ozlabs.ru + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio_iommu_type1.c: + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson alex.william...@redhat.com + */ + +#include linux/module.h +#include linux/pci.h +#include linux/slab.h +#include linux/uaccess.h +#include linux/err.h +#include linux/vfio.h +#include asm/iommu.h + +#define DRIVER_VERSION 0.1 +#define DRIVER_AUTHOR a...@ozlabs.ru +#define DRIVER_DESC VFIO IOMMU SPAPR TCE + +static void tce_iommu_detach_group(void *iommu_data, + struct iommu_group *iommu_group); + +/* + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation + */ + +/* + * This code handles mapping and unmapping of user data buffers + * into DMA'ble space using the IOMMU + */ + +#define NPAGE_TO_SIZE(npage) ((size_t)(npage) PAGE_SHIFT) + +struct vwork { + struct mm_struct*mm; + longnpage; + struct work_struct work; +}; + +/* delayed decrement/increment for locked_vm */ +static void lock_acct_bg(struct work_struct *work) +{ + struct vwork *vwork = container_of(work, struct vwork, work); + struct mm_struct *mm; + + mm = vwork-mm; + down_write(mm-mmap_sem); + mm-locked_vm += vwork-npage; + up_write(mm-mmap_sem); + mmput(mm); + kfree(vwork); +} + +static void lock_acct(long npage) +{ + struct vwork *vwork; + struct mm_struct *mm; + + if (!current-mm) + return; /* process exited */ + + if (down_write_trylock(current-mm-mmap_sem)) { + current-mm-locked_vm += npage; + up_write(current-mm-mmap_sem); + return; + } + + /* + * Couldn't get mmap_sem lock, so must setup to update + * mm-locked_vm later. If locked_vm were atomic, we + * wouldn't need this silliness + */ + vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL); + if (!vwork) + return; + mm = get_task_mm(current); + if (!mm) { + kfree(vwork); + return; + } + INIT_WORK(vwork-work, lock_acct_bg); + vwork-mm = mm; + vwork-npage = npage; + schedule_work(vwork-work); +} + +/* + * The container descriptor supports only a single group per container. + * Required by the API as the container is not supplied with the IOMMU group + * at the moment of
Re: [PATCH] vhost-net: initialize zcopy packet counters
From: Michael S. Tsirkin m...@redhat.com Date: Mon, 3 Dec 2012 19:31:51 +0200 These packet counters are used to drive the zercopy selection heuristic so nothing too bad happens if they are off a bit - and they are also reset once in a while. But it's cleaner to clear them when backend is set so that we start in a known state. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied to net-next, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 09/14] KVM: ARM: Emulation framework and CP15 emulation
On Mon, Dec 3, 2012 at 6:05 AM, Will Deacon will.dea...@arm.com wrote: On Fri, Nov 30, 2012 at 08:22:28PM +, Christoffer Dall wrote: On Mon, Nov 19, 2012 at 10:01 AM, Will Deacon will.dea...@arm.com wrote: On Sat, Nov 10, 2012 at 03:43:13PM +, Christoffer Dall wrote: This should probably be 0xff and also use the macros that Lorenzo is introducing: http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/132977.html in the A15 TRM bits [7:2] are reserved, so we really only do care about bits [1:0], and this file is A15 specific, but if you prefer, I can merge this: The upper bits are RAZ, so yes, please do merge something like this. I did, thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection
On Mon, Dec 3, 2012 at 9:21 AM, Marc Zyngier marc.zyng...@arm.com wrote: On 03/12/12 13:25, Will Deacon wrote: On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Plug the interrupt injection code. Interrupts can now be generated from user space. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_vgic.h |8 +++ arch/arm/kvm/arm.c | 29 + arch/arm/kvm/vgic.c | 90 +++ 3 files changed, 127 insertions(+) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 7229324..6e3d303 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -241,6 +241,8 @@ struct kvm_exit_mmio; int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu); +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, +bool level); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio); @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {} +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, + const struct kvm_irq_level *irq) +{ +return 0; +} + static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) { return 0; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 3ac1aab..f43da01 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level) switch (irq_type) { case KVM_ARM_IRQ_TYPE_CPU: +if (irqchip_in_kernel(kvm)) +return -ENXIO; + if (irq_num KVM_ARM_IRQ_CPU_FIQ) return -EINVAL; return vcpu_interrupt_line(vcpu, irq_num, level); +#ifdef CONFIG_KVM_ARM_VGIC +case KVM_ARM_IRQ_TYPE_PPI: +if (!irqchip_in_kernel(kvm)) +return -ENXIO; + +if (irq_num 16 || irq_num 31) +return -EINVAL; It's our favourite two numbers again! :) I already fixed a number of them. Probably missed this one though. + +return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level); +case KVM_ARM_IRQ_TYPE_SPI: +if (!irqchip_in_kernel(kvm)) +return -ENXIO; + +if (irq_num 32 || irq_num KVM_ARM_IRQ_GIC_MAX) +return -EINVAL; + +return kvm_vgic_inject_irq(kvm, 0, irq_num, level); +#endif } return -EINVAL; @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp, void __user *argp = (void __user *)arg; switch (ioctl) { +#ifdef CONFIG_KVM_ARM_VGIC +case KVM_CREATE_IRQCHIP: { +if (vgic_present) +return kvm_vgic_create(kvm); +else +return -EINVAL; ENXIO? At least, that's what you use when setting the GIC addresses. -EINVAL seems to be one of the values other archs are using. -ENXIO is not one of them for KVM_CREATE_IRQCHIP. Doesn't mean they are right, but for the sake of keeping userspace happy, I'm not really inclined to change this. We don't have user space code relying on this, and EINVAL is misleading, so let's use ENXIO to be consistent with SET_DEVICE_ADDRESS. No error values are specified in the API docs, so we should use the most appropriate one. You fix? -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection
On 03/12/12 19:13, Christoffer Dall wrote: On Mon, Dec 3, 2012 at 9:21 AM, Marc Zyngier marc.zyng...@arm.com wrote: On 03/12/12 13:25, Will Deacon wrote: On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com Plug the interrupt injection code. Interrupts can now be generated from user space. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_vgic.h |8 +++ arch/arm/kvm/arm.c | 29 + arch/arm/kvm/vgic.c | 90 +++ 3 files changed, 127 insertions(+) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 7229324..6e3d303 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -241,6 +241,8 @@ struct kvm_exit_mmio; int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu); +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, +bool level); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio); @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {} static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {} +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, + const struct kvm_irq_level *irq) +{ +return 0; +} + static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) { return 0; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 3ac1aab..f43da01 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level) switch (irq_type) { case KVM_ARM_IRQ_TYPE_CPU: +if (irqchip_in_kernel(kvm)) +return -ENXIO; + if (irq_num KVM_ARM_IRQ_CPU_FIQ) return -EINVAL; return vcpu_interrupt_line(vcpu, irq_num, level); +#ifdef CONFIG_KVM_ARM_VGIC +case KVM_ARM_IRQ_TYPE_PPI: +if (!irqchip_in_kernel(kvm)) +return -ENXIO; + +if (irq_num 16 || irq_num 31) +return -EINVAL; It's our favourite two numbers again! :) I already fixed a number of them. Probably missed this one though. + +return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level); +case KVM_ARM_IRQ_TYPE_SPI: +if (!irqchip_in_kernel(kvm)) +return -ENXIO; + +if (irq_num 32 || irq_num KVM_ARM_IRQ_GIC_MAX) +return -EINVAL; + +return kvm_vgic_inject_irq(kvm, 0, irq_num, level); +#endif } return -EINVAL; @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp, void __user *argp = (void __user *)arg; switch (ioctl) { +#ifdef CONFIG_KVM_ARM_VGIC +case KVM_CREATE_IRQCHIP: { +if (vgic_present) +return kvm_vgic_create(kvm); +else +return -EINVAL; ENXIO? At least, that's what you use when setting the GIC addresses. -EINVAL seems to be one of the values other archs are using. -ENXIO is not one of them for KVM_CREATE_IRQCHIP. Doesn't mean they are right, but for the sake of keeping userspace happy, I'm not really inclined to change this. We don't have user space code relying on this, and EINVAL is misleading, so let's use ENXIO to be consistent with SET_DEVICE_ADDRESS. No error values are specified in the API docs, so we should use the most appropriate one. You fix? Yes, I'll fix it as part the whole vgic series. M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
On Mon, Dec 03, 2012 at 04:33:01PM +0800, Xiao Guangrong wrote: Hi Marcelo, Thanks for your patience. I was reading your reply over and over again, i would like to argue it more :). Please see below. On 11/29/2012 08:21 AM, Marcelo Tosatti wrote: https://lkml.org/lkml/2012/11/17/75 Does unshadowing work with large sptes at reexecute_instruction? That is, do we nuke any large read-only sptes that might be causing a certain gfn to be read-only? That is, following the sequence there, is the large read-only spte properly destroyed? Actually, sptes can not prevent gfn becoming writable, that means the gfn can become writable *even if* it exists a spte which maps to the gfn. The condition that can prevent gfn becoming writable is, the gfn has been shadowed, that means the gfn can not become writable if it exists a sp with sp.gfn = gfn. Note, drop_spte does not remove any sp. So, either destroying spte or keeping spte doest not have any affect for gfn write-protection. If luck enough, my point is right, the current code can do some optimizations as below: if (level PT_PAGE_TABLE_LEVEL - has_wrprotected_page(vcpu-kvm, gfn, level)) { - ret = 1; - drop_spte(vcpu-kvm, sptep); - goto done; - } + has_wrprotected_page(vcpu-kvm, gfn, level)) + return 0; 1): we can return 0 instead of 1 to avoid unnecessary emulation. vcpu will refault again then kvm will use small page. So on refault the large spte is nuked. That works, yes. 2): need not do any change on the spte. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case
On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote: On 11/28/2012 06:42 AM, Marcelo Tosatti wrote: Don't understand the reasoning behind why 3 is a good choice. Here is where I came from. (explaining from scratch for completeness, forgive me :)) In moderate overcommits, we can falsely exit from ple handler even when we have preempted task of same VM waiting on other cpus. To reduce this problem, we try few times before exiting. The problem boils down to: what is the probability that we exit ple handler even when we have more than 1 task in other cpus. Theoretical worst case should be around 1.5x overcommit (As also pointed by Andrew Theurer). [But practical worstcase may be around 2x,3x overcommits as indicated by the results for the patch series] So if p is the probability of finding rq length one on a particular cpu, and if we do n tries, then probability of exiting ple handler is: p^(n+1) [ because we would have come across one source with rq length 1 and n target cpu rqs with length 1 ] so num tries: probability of aborting ple handler (1.5x overcommit) 1 1/4 2 1/8 3 1/16 We can increase this probability with more tries, but the problem is the overhead. Also, If we have tried three times that means we would have iterated over 3 good eligible vcpus along with many non-eligible candidates. In worst case if we iterate all the vcpus, we reduce 1x performance and overcommit performance get hit. [ as in results ]. I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I concluded 3 is enough. Infact I have also run kernbench and hackbench which are giving 5-20% improvement. [ As a side note , I also thought how about having num_tries = f(n) = ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much overhead and also there is no point in probably making it dependent on online cpus ] Please let me know if you are happy with this rationale/ or correct me if you foresee some problem. (Infact Avi, Rik's concern about false exiting made me arrive at 'try' logic which I did not have earlier). I am currently trying out the result for 1.5x overcommit will post the result. Raghavendra Makes sense to me. Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes
On Mon, Dec 03, 2012 at 12:36:33PM +0400, Glauber Costa wrote: On 11/30/2012 09:55 PM, Pavel Emelyanov wrote: Hello, This is an attempt to implement support for memory snapshot for the the checkpoint-restore project (http://criu.org). To create a dump of an application(s) we save all the information about it to files. No surprise, the biggest part of such dump is the contents of tasks' memory. However, in some usage scenarios it's not required to get _all_ the task memory while creating a dump. For example, when doing periodical dumps it's only required to take full memory dump only at the first step and then take incremental changes of memory. Another example is live migration. In the simplest form it looks like -- create dump, copy it on the remote node then restore tasks from dump files. While all this dump-copy-restore thing goes all the process must be stopped. However, if we can monitor how tasks change their memory, we can dump and copy it in smaller chunks, periodically updating it and thus freezing tasks only at the very end for the very short time to pick up the recent changes. That said, some help from kernel to watch how processes modify the contents of their memory is required. I'd like to propose one possible solution of this task -- with the help of page-faults and trace events. Briefly the approach is -- remap some memory regions as read-only, get the #pf on task's attempt to modify the memory and issue a trace event of that. Since we're only interested in parts of memory of some tasks, make it possible to mark the vmas we're interested in and issue events for them only. Also, to be aware of tasks unmapping the vma-s being watched, also issue an event when the marked vma is removed (and for symmetry -- an event when a vma is marked). What do you think about this approach? Is this way of supporting mem snapshot OK for you, or should we invent some better one? The page fault mechanism is pretty obvious - anything that deals with dirty pages will end up having to do this. So there is nothing crazy about this. What concerns me, however, is that should this go in, we'll have two dirty mem loggers in the kernel: one to support CRIU, one to support KVM. And the worst part: They have the exact the same purpose!! So to begin with, I think one thing to consider, would be to generalize KVM's dirty memory notification so it can work on a normal process memory region. KVM api requires a memory slot to be passed, something we are unlikely to have. But KVM can easily keep its API and use an alternate mechanics, that's trivial... Generally speaking, KVM will do polling with this ioctl. I prefer your tracing mechanism better. The only difference, is that KVM tends to transfer large chunks of memory in some loads - in the high gigs range. So the proposal tracing API should be able to optionally batch requests within a time frame. It would also be good to hear what does the KVM guys think of it as well There are significant differences. KVM's dirty logging works for guest translations (NPT/shadow) and is optimized for specific use cases. Above is about dirty logging of userspace memory areas. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: optimize for set_spte
There are two cases we need to adjust page size in set_spte: 1): the one is other vcpu creates new sp in the window between mapping_level() and acquiring mmu-lock. 2): the another case is the new sp is created by itself (page-fault path) when guest uses the target gfn as its page table. In current code, set_spte drop the spte and emulate the access for these case, it works not good: - for the case 1, it may destroy the mapping established by other vcpu, and do expensive instruction emulation. - for the case 2, it may emulate the access even if the guest is accessing the page which not used as page table. There is a example, 0~2M is used as huge page in guest, in this huge page, only page 3 used as page table, then guest read/writes on other pages can cause instruction emulation. Both of these cases can be fixed by allowing guest to retry the access, it will refault, then we can establish the mapping by using small page Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/kvm/mmu.c | 16 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b875a9e..01d7c2a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2382,12 +2382,20 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, || (!vcpu-arch.mmu.direct_map write_fault !is_write_protection(vcpu) !user_fault)) { + /* +* There are two cases: +* - the one is other vcpu creates new sp in the window +* between mapping_level() and acquiring mmu-lock. +* - the another case is the new sp is created by itself +* (page-fault path) when guest uses the target gfn as +* its page table. +* Both of these cases can be fixed by allowing guest to +* retry the access, it will refault, then we can establish +* the mapping by using small page. +*/ if (level PT_PAGE_TABLE_LEVEL - has_wrprotected_page(vcpu-kvm, gfn, level)) { - ret = 1; - drop_spte(vcpu-kvm, sptep); + has_wrprotected_page(vcpu-kvm, gfn, level)) goto done; - } spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 5/6] kvm: Re-introduce memslots-nmemslots
struct kvm_memory_slot is currently 52 bytes (LP64), not counting the arch data. On x86 this means the memslot array to support a tiny 32+3 entries (user+private) is over 2k. We'd like to support more slots so that we can support more assigned devices, but it doesn't make sense to penalize everyone by using a statically allocated array. This allows us to start introducing a grow-able array. Signed-off-by: Alex Williamson alex.william...@redhat.com --- arch/ia64/kvm/kvm-ia64.c |2 +- arch/powerpc/kvm/book3s_hv.c |2 +- arch/x86/kvm/vmx.c |1 + arch/x86/kvm/x86.c |4 +++- include/linux/kvm_host.h |9 ++--- virt/kvm/kvm_main.c | 10 ++ 6 files changed, 18 insertions(+), 10 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 012e5dd..96401b5 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1836,7 +1836,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, memslot = id_to_memslot(kvm-memslots, log-slot); r = -ENOENT; - if (!memslot-dirty_bitmap) + if (!memslots || !memslot-dirty_bitmap) goto out; kvm_ia64_sync_dirty_log(kvm, memslot); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 56067db..0417190 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1267,7 +1267,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) memslot = id_to_memslot(kvm-memslots, log-slot); r = -ENOENT; - if (!memslot-dirty_bitmap) + if (!memslot || !memslot-dirty_bitmap) goto out; n = kvm_dirty_bitmap_bytes(memslot); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2bb9157..07fdd90 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2751,6 +2751,7 @@ static gva_t rmode_tss_base(struct kvm *kvm) slots = kvm_memslots(kvm); slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS); + BUG_ON(!slot); base_gfn = slot-base_gfn + slot-npages - 3; return base_gfn PAGE_SHIFT; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8765485..53fe9b2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3139,9 +3139,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) goto out; memslot = id_to_memslot(kvm-memslots, log-slot); + r = -ENOENT; + if (!memslot) + goto out; dirty_bitmap = memslot-dirty_bitmap; - r = -ENOENT; if (!dirty_bitmap) goto out; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7b3d5c4..1955a4e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -313,6 +313,7 @@ struct kvm_irq_routing_table {}; * to get the memslot by its id. */ struct kvm_memslots { + int nmemslots; u64 generation; struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM]; }; @@ -397,7 +398,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) #define kvm_for_each_memslot(memslot, slots) \ for (memslot = slots-memslots[0]; \ - memslot slots-memslots + KVM_MEM_SLOTS_NUM memslot-npages;\ + memslot slots-memslots + slots-nmemslots memslot-npages;\ memslot++) int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); @@ -424,10 +425,12 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) static inline struct kvm_memory_slot * id_to_memslot(struct kvm_memslots *slots, int id) { - int index = slots-memslots[id].id_to_index; struct kvm_memory_slot *slot; - slot = slots-memslots[index]; + if (id = slots-nmemslots) + return NULL; + + slot = slots-memslots[slots-memslots[id].id_to_index]; WARN_ON(slot-id != id); return slot; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3ce2664..ebd3960 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -444,7 +444,9 @@ static void kvm_init_memslots_id(struct kvm *kvm) int i; struct kvm_memslots *slots = kvm-memslots; - for (i = 0; i KVM_MEM_SLOTS_NUM; i++) + slots-nmemslots = KVM_MEM_SLOTS_NUM; + + for (i = 0; i kvm-memslots-nmemslots; i++) slots-memslots[i].id_to_index = slots-memslots[i].id = i; } @@ -658,10 +660,10 @@ static void sort_memslots(struct kvm_memslots *slots) { int i; - sort(slots-memslots, KVM_MEM_SLOTS_NUM, + sort(slots-memslots, slots-nmemslots, sizeof(struct kvm_memory_slot), cmp_memslot, NULL); - for (i = 0; i KVM_MEM_SLOTS_NUM; i++) + for (i = 0; i slots-nmemslots; i++) slots-memslots[slots-memslots[i].id].id_to_index = i; } @@ -898,7 +900,7 @@ int kvm_get_dirty_log(struct kvm
[RFC PATCH 6/6] kvm: Allow memory slots to grow
Start with zero and grow up to KVM_MEM_SLOTS_NUM. A modest guest without device assignment likely uses around 1/4 of the total entries. We don't attempt to shrink the array when slots are released. Both x86 and powerpc still have some statically sized elements elsewhere, but this covers the bulk of the memory used. Signed-off-by: Alex Williamson alex.william...@redhat.com --- include/linux/kvm_host.h |2 + virt/kvm/kvm_main.c | 62 +++--- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1955a4e..effc800 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -315,7 +315,7 @@ struct kvm_irq_routing_table {}; struct kvm_memslots { int nmemslots; u64 generation; - struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM]; + struct kvm_memory_slot memslots[]; }; struct kvm { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ebd3960..fa4df50 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -439,17 +439,6 @@ static int kvm_init_mmu_notifier(struct kvm *kvm) #endif /* CONFIG_MMU_NOTIFIER KVM_ARCH_WANT_MMU_NOTIFIER */ -static void kvm_init_memslots_id(struct kvm *kvm) -{ - int i; - struct kvm_memslots *slots = kvm-memslots; - - slots-nmemslots = KVM_MEM_SLOTS_NUM; - - for (i = 0; i kvm-memslots-nmemslots; i++) - slots-memslots[i].id_to_index = slots-memslots[i].id = i; -} - static struct kvm *kvm_create_vm(unsigned long type) { int r, i; @@ -475,7 +464,6 @@ static struct kvm *kvm_create_vm(unsigned long type) kvm-memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); if (!kvm-memslots) goto out_err_nosrcu; - kvm_init_memslots_id(kvm); if (init_srcu_struct(kvm-srcu)) goto out_err_nosrcu; for (i = 0; i KVM_NR_BUSES; i++) { @@ -696,6 +684,40 @@ static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) return 0; } +struct kvm_memslots *__kvm_dup_and_grow_memslots(struct kvm_memslots *oldslots, +int slot) +{ + int nmemslots; + struct kvm_memslots *newslots; + + nmemslots = (slot = oldslots-nmemslots) ? + slot + 1 : oldslots-nmemslots; + + newslots = kmalloc(sizeof(struct kvm_memslots) + + nmemslots * sizeof(struct kvm_memory_slot), GFP_KERNEL); + if (!newslots) + return NULL; + + memcpy(newslots, oldslots, sizeof(struct kvm_memslots) + + oldslots-nmemslots * sizeof(struct kvm_memory_slot)); + + if (nmemslots != oldslots-nmemslots) { + int i; + memset(newslots-memslots[oldslots-nmemslots], 0, + (nmemslots - oldslots-nmemslots) * + sizeof(struct kvm_memory_slot)); + + for (i = oldslots-nmemslots; i nmemslots; i++) { + newslots-memslots[i].id_to_index = i; + newslots-memslots[i].id = i; + } + + newslots-nmemslots = nmemslots; + } + + return newslots; +} + /* * Allocate some memory and give it an address in the guest physical address * space. @@ -711,8 +733,8 @@ int __kvm_set_memory_region(struct kvm *kvm, int r; gfn_t base_gfn; unsigned long npages; - struct kvm_memory_slot *memslot, *slot; - struct kvm_memory_slot old, new; + struct kvm_memory_slot *memslot = NULL, *slot; + struct kvm_memory_slot old = {}, new = {}; struct kvm_memslots *slots, *old_memslots; r = check_memory_region_flags(mem); @@ -737,7 +759,6 @@ int __kvm_set_memory_region(struct kvm *kvm, if (mem-guest_phys_addr + mem-memory_size mem-guest_phys_addr) goto out; - memslot = id_to_memslot(kvm-memslots, mem-slot); base_gfn = mem-guest_phys_addr PAGE_SHIFT; npages = mem-memory_size PAGE_SHIFT; @@ -748,7 +769,10 @@ int __kvm_set_memory_region(struct kvm *kvm, if (!npages) mem-flags = ~KVM_MEM_LOG_DIRTY_PAGES; - new = old = *memslot; + if (mem-slot kvm-memslots-nmemslots) { + memslot = id_to_memslot(kvm-memslots, mem-slot); + new = old = *memslot; + } new.id = mem-slot; new.base_gfn = base_gfn; @@ -796,8 +820,7 @@ int __kvm_set_memory_region(struct kvm *kvm, struct kvm_memory_slot *slot; r = -ENOMEM; - slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots), - GFP_KERNEL); + slots = __kvm_dup_and_grow_memslots(kvm-memslots, mem-slot); if (!slots) goto out_free; slot = id_to_memslot(slots, mem-slot); @@ -832,8 +855,7 @@ int
[RFC PATCH 0/6] kvm: Growable memory slot array
Memory slots are currently a fixed resource with a relatively small limit. When using PCI device assignment in a qemu guest it's fairly easy to exhaust the number of available slots. I posted patches exploring growing the number of memory slots a while ago, but it was prior to caching memory slot array misses and thefore had potentially poor performance. Now that we do that, Avi seemed receptive to increasing the memory slot array to arbitrary lengths. I think we still don't want to impose unnecessary kernel memory consumptions on guests not making use of this, so I present again a growable memory slot array. A couple notes/questions; in the previous version we had a kvm_arch_flush_shadow() call when we increased the number of slots. I'm not sure if this is still necessary. I had also made the x86 specific slot_bitmap dynamically grow as well and switch between a direct bitmap and indirect pointer to a bitmap. That may have contributed to needing the flush. I haven't done that yet here because it seems like an unnecessary complication if we have a max on the order of 512 or 1024 entries. A bit per slot isn't a lot of overhead. If we want to go more, maybe we should make it switch. That leads to the final question, we need an upper bound since this does allow consumption of extra kernel memory, what should it be? A PCI bus filled with assigned devices can theorically use up to 2048 slots (32 devices * 8 functions * (6 BARs + ROM + possibly split MSI-X BAR)). For this RFC, I don't change the max, just make it grow up to 32 user slots. Untested on anything but x86 so far. Thanks, Alex --- Alex Williamson (6): kvm: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS kvm: Make KVM_PRIVATE_MEM_SLOTS optional kvm: Merge id_to_index into memslots kvm: Move private memory slots to start of memslots array kvm: Re-introduce memslots-nmemslots kvm: Allow memory slots to grow arch/ia64/include/asm/kvm_host.h|4 -- arch/ia64/kvm/kvm-ia64.c|6 +-- arch/powerpc/include/asm/kvm_host.h |6 +-- arch/powerpc/kvm/book3s_hv.c|4 +- arch/s390/include/asm/kvm_host.h|4 -- arch/x86/include/asm/kvm_host.h |8 ++- arch/x86/include/asm/vmx.h |6 +-- arch/x86/kvm/vmx.c |3 + arch/x86/kvm/x86.c | 10 +++- include/linux/kvm_host.h| 20 +--- virt/kvm/kvm_main.c | 83 --- 11 files changed, 93 insertions(+), 61 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/6] kvm: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is the user accessible slots and the other is user + private. Make this more obvious. Signed-off-by: Alex Williamson alex.william...@redhat.com --- arch/ia64/include/asm/kvm_host.h|2 +- arch/ia64/kvm/kvm-ia64.c|2 +- arch/powerpc/include/asm/kvm_host.h |4 ++-- arch/powerpc/kvm/book3s_hv.c|2 +- arch/s390/include/asm/kvm_host.h|2 +- arch/x86/include/asm/kvm_host.h |4 ++-- arch/x86/include/asm/vmx.h |6 +++--- arch/x86/kvm/x86.c |6 +++--- include/linux/kvm_host.h|2 +- virt/kvm/kvm_main.c |8 10 files changed, 19 insertions(+), 19 deletions(-) diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index 6d6a5ac..48d7b0e 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -23,7 +23,7 @@ #ifndef __ASM_KVM_HOST_H #define __ASM_KVM_HOST_H -#define KVM_MEMORY_SLOTS 32 +#define KVM_USER_MEM_SLOTS 32 /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 8b3a9c0..f1a46bd 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1831,7 +1831,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, mutex_lock(kvm-slots_lock); r = -EINVAL; - if (log-slot = KVM_MEMORY_SLOTS) + if (log-slot = KVM_USER_MEM_SLOTS) goto out; memslot = id_to_memslot(kvm-memslots, log-slot); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 28e8f5e..5eb1dd8 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -37,10 +37,10 @@ #define KVM_MAX_VCPUS NR_CPUS #define KVM_MAX_VCORES NR_CPUS -#define KVM_MEMORY_SLOTS 32 +#define KVM_USER_MEM_SLOTS 32 /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 -#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) +#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS) #ifdef CONFIG_KVM_MMIO #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 721d460..75ce80e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1262,7 +1262,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) mutex_lock(kvm-slots_lock); r = -EINVAL; - if (log-slot = KVM_MEMORY_SLOTS) + if (log-slot = KVM_USER_MEM_SLOTS) goto out; memslot = id_to_memslot(kvm-memslots, log-slot); diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index b784154..ac33432 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -20,7 +20,7 @@ #include asm/cpu.h #define KVM_MAX_VCPUS 64 -#define KVM_MEMORY_SLOTS 32 +#define KVM_USER_MEM_SLOTS 32 /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b2e11f4..e619519 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -31,10 +31,10 @@ #define KVM_MAX_VCPUS 254 #define KVM_SOFT_MAX_VCPUS 160 -#define KVM_MEMORY_SLOTS 32 +#define KVM_USER_MEM_SLOTS 32 /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 -#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) +#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS) #define KVM_MMIO_SIZE 16 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 36ec21c..72932d2 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -427,9 +427,9 @@ enum vmcs_field { #define AR_RESERVD_MASK 0xfffe0f00 -#define TSS_PRIVATE_MEMSLOT(KVM_MEMORY_SLOTS + 0) -#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 1) -#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 2) +#define TSS_PRIVATE_MEMSLOT(KVM_USER_MEM_SLOTS + 0) +#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 1) +#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2) #define VMX_NR_VPIDS (1 16) #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4f76417..1aa3fae 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2204,7 +2204,7 @@ int kvm_dev_ioctl_check_extension(long ext) r = KVM_MAX_VCPUS; break; case KVM_CAP_NR_MEMSLOTS: - r = KVM_MEMORY_SLOTS; + r = KVM_USER_MEM_SLOTS; break; case KVM_CAP_PV_MMU:/* obsolete */
[RFC PATCH 2/6] kvm: Make KVM_PRIVATE_MEM_SLOTS optional
Seems like everyone copied x86 and defined 4 private memory slots that never actually get used. Even x86 only uses 3 of the 4. These aren't exposed so there's no need to add padding. Signed-off-by: Alex Williamson alex.william...@redhat.com --- arch/ia64/include/asm/kvm_host.h|2 -- arch/powerpc/include/asm/kvm_host.h |4 +--- arch/s390/include/asm/kvm_host.h|2 -- arch/x86/include/asm/kvm_host.h |4 ++-- include/linux/kvm_host.h|4 5 files changed, 7 insertions(+), 9 deletions(-) diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index 48d7b0e..cfa7498 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -24,8 +24,6 @@ #define __ASM_KVM_HOST_H #define KVM_USER_MEM_SLOTS 32 -/* memory slots that does not exposed to userspace */ -#define KVM_PRIVATE_MEM_SLOTS 4 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5eb1dd8..23ca70d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -38,9 +38,7 @@ #define KVM_MAX_VCPUS NR_CPUS #define KVM_MAX_VCORES NR_CPUS #define KVM_USER_MEM_SLOTS 32 -/* memory slots that does not exposed to userspace */ -#define KVM_PRIVATE_MEM_SLOTS 4 -#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS) +#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS #ifdef CONFIG_KVM_MMIO #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index ac33432..711c5ab 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -21,8 +21,6 @@ #define KVM_MAX_VCPUS 64 #define KVM_USER_MEM_SLOTS 32 -/* memory slots that does not exposed to userspace */ -#define KVM_PRIVATE_MEM_SLOTS 4 struct sca_entry { atomic_t scn; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e619519..ce8b037 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -32,8 +32,8 @@ #define KVM_MAX_VCPUS 254 #define KVM_SOFT_MAX_VCPUS 160 #define KVM_USER_MEM_SLOTS 32 -/* memory slots that does not exposed to userspace */ -#define KVM_PRIVATE_MEM_SLOTS 4 +/* memory slots that are not exposed to userspace */ +#define KVM_PRIVATE_MEM_SLOTS 3 #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS) #define KVM_MMIO_SIZE 16 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index fb9354d..bf8380f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -298,6 +298,10 @@ struct kvm_irq_routing_table {}; #endif +#ifndef KVM_PRIVATE_MEM_SLOTS +#define KVM_PRIVATE_MEM_SLOTS 0 +#endif + #ifndef KVM_MEM_SLOTS_NUM #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS) #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/6] kvm: Merge id_to_index into memslots
This allows us to resize this structure and therefore the number of memslots as part of the RCU update. Signed-off-by: Alex Williamson alex.william...@redhat.com --- include/linux/kvm_host.h |5 ++--- virt/kvm/kvm_main.c |4 ++-- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bf8380f..7b3d5c4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -257,6 +257,7 @@ struct kvm_memory_slot { unsigned long userspace_addr; int user_alloc; int id; + int id_to_index; }; static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) @@ -314,8 +315,6 @@ struct kvm_irq_routing_table {}; struct kvm_memslots { u64 generation; struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM]; - /* The mapping table from slot id to the index in memslots[]. */ - int id_to_index[KVM_MEM_SLOTS_NUM]; }; struct kvm { @@ -425,7 +424,7 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) static inline struct kvm_memory_slot * id_to_memslot(struct kvm_memslots *slots, int id) { - int index = slots-id_to_index[id]; + int index = slots-memslots[id].id_to_index; struct kvm_memory_slot *slot; slot = slots-memslots[index]; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 48d3ede..4f8ae4b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -445,7 +445,7 @@ static void kvm_init_memslots_id(struct kvm *kvm) struct kvm_memslots *slots = kvm-memslots; for (i = 0; i KVM_MEM_SLOTS_NUM; i++) - slots-id_to_index[i] = slots-memslots[i].id = i; + slots-memslots[i].id_to_index = slots-memslots[i].id = i; } static struct kvm *kvm_create_vm(unsigned long type) @@ -662,7 +662,7 @@ static void sort_memslots(struct kvm_memslots *slots) sizeof(struct kvm_memory_slot), cmp_memslot, NULL); for (i = 0; i KVM_MEM_SLOTS_NUM; i++) - slots-id_to_index[slots-memslots[i].id] = i; + slots-memslots[slots-memslots[i].id].id_to_index = i; } void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array
In order to make the memslots array grow on demand, move the private slots to the lower indexes of the array. The private slots are assumed likely to be in use, so if we didn't do this we'd end up allocating the full memslots array all the time. Signed-off-by: Alex Williamson alex.william...@redhat.com --- arch/ia64/kvm/kvm-ia64.c |4 ++-- arch/powerpc/kvm/book3s_hv.c |2 +- arch/x86/include/asm/vmx.h |6 +++--- arch/x86/kvm/vmx.c |2 +- arch/x86/kvm/x86.c |2 +- virt/kvm/kvm_main.c | 15 ++- 6 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index f1a46bd..012e5dd 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -949,7 +949,7 @@ long kvm_arch_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(kvm_mem, argp, sizeof kvm_mem)) goto out; - kvm_userspace_mem.slot = kvm_mem.slot; + kvm_userspace_mem.slot = kvm_mem.slot + KVM_PRIVATE_MEM_SLOTS; kvm_userspace_mem.flags = kvm_mem.flags; kvm_userspace_mem.guest_phys_addr = kvm_mem.guest_phys_addr; @@ -1831,7 +1831,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, mutex_lock(kvm-slots_lock); r = -EINVAL; - if (log-slot = KVM_USER_MEM_SLOTS) + if (log-slot = KVM_MEM_SLOTS_NUM) goto out; memslot = id_to_memslot(kvm-memslots, log-slot); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 75ce80e..56067db 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1262,7 +1262,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) mutex_lock(kvm-slots_lock); r = -EINVAL; - if (log-slot = KVM_USER_MEM_SLOTS) + if (log-slot = KVM_MEM_SLOTS_NUM) goto out; memslot = id_to_memslot(kvm-memslots, log-slot); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 72932d2..97bcd7d 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -427,9 +427,9 @@ enum vmcs_field { #define AR_RESERVD_MASK 0xfffe0f00 -#define TSS_PRIVATE_MEMSLOT(KVM_USER_MEM_SLOTS + 0) -#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 1) -#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2) +#define TSS_PRIVATE_MEMSLOT0 +#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT 1 +#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT 2 #define VMX_NR_VPIDS (1 16) #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f858159..2bb9157 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2750,7 +2750,7 @@ static gva_t rmode_tss_base(struct kvm *kvm) gfn_t base_gfn; slots = kvm_memslots(kvm); - slot = id_to_memslot(slots, 0); + slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS); base_gfn = slot-base_gfn + slot-npages - 3; return base_gfn PAGE_SHIFT; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1aa3fae..8765485 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6449,7 +6449,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, int map_flags = MAP_PRIVATE | MAP_ANONYMOUS; /* Prevent internal slot pages from being moved by fork()/COW. */ - if (memslot-id = KVM_USER_MEM_SLOTS) + if (memslot-id KVM_PRIVATE_MEM_SLOTS) map_flags = MAP_SHARED | MAP_ANONYMOUS; /*To keep backward compatibility with older userspace, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4f8ae4b..3ce2664 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -761,7 +761,7 @@ int __kvm_set_memory_region(struct kvm *kvm, /* Check for overlaps */ r = -EEXIST; kvm_for_each_memslot(slot, kvm-memslots) { - if (slot-id = KVM_USER_MEM_SLOTS || slot == memslot) + if (slot-id KVM_PRIVATE_MEM_SLOTS || slot == memslot) continue; if (!((base_gfn + npages = slot-base_gfn) || (base_gfn = slot-base_gfn + slot-npages))) @@ -879,7 +879,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, kvm_userspace_memory_region *mem, int user_alloc) { - if (mem-slot = KVM_USER_MEM_SLOTS) + if (mem-slot = KVM_MEM_SLOTS_NUM) return -EINVAL; return kvm_set_memory_region(kvm, mem, user_alloc); } @@ -893,7 +893,7 @@ int kvm_get_dirty_log(struct kvm *kvm, unsigned long any = 0; r = -EINVAL; - if (log-slot = KVM_USER_MEM_SLOTS) + if
Re: [net-next rfc v7 3/3] virtio-net: change the number of queues through ethtool
On Mon, 2012-12-03 at 13:25 +0200, Michael S. Tsirkin wrote: On Mon, Dec 03, 2012 at 02:09:28PM +0800, Jason Wang wrote: On Sunday, December 02, 2012 06:09:06 PM Michael S. Tsirkin wrote: On Tue, Nov 27, 2012 at 06:16:00PM +0800, Jason Wang wrote: This patch implement the {set|get}_channels method of ethool to allow user to change the number of queues dymaically when the device is running. This would let the user to configure it on demand. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 41 + 1 files changed, 41 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index bcaa6e5..f08ec2a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1578,10 +1578,51 @@ static struct virtio_driver virtio_net_driver = { #endif }; +/* TODO: Eliminate OOO packets during switching */ +static int virtnet_set_channels(struct net_device *dev, + struct ethtool_channels *channels) +{ + struct virtnet_info *vi = netdev_priv(dev); + u16 queue_pairs = channels-combined_count; by the way shouldn't this be combined_count / 2? And below channels-max_combined = vi-max_queue_pairs * 2; ? [...] In this ethtool API, 'channel' means an IRQ and set of queues that trigger it. So each ethtool-channel will correspond to one queue-pair and not one virtio channel. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support
Peter Maydell peter.mayd...@linaro.org writes: On 3 December 2012 10:36, Dong Aisheng b29...@freescale.com wrote: and via current ONE_REG interface we do not know which CPU is performing the register access, so the banked registers are not suppported well, Actually you do, because it's a vcpu ioctl. That does raise a different question, though. ONE_REG is currently aimed as a vcpu ioctl for CPU state save/restore -- how does it need to change to handle device state save/restore where the device is not per-cpu? Good question. I'd prefer to stretch the existing interface, than create an identical one for non-per-cpu resources, but I could be swayed if there were many other cases. The simplest method is to give it a new type and mirror the non-per-cpu regs on all vcpus. Then a completely naive implementation save/restore will Just Work, with redundant data. Or we could pick a CPU, but that means your discovery logic becomes more complex unless you know which CPU it is (lowest number?). Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Maybe avoid a IPI between the cpu cores
The cpu inject the interrupt to vcpu which vcpu-cpu is the same as it. And it maybe avoid a IPI between the cpu core. Signed-off-by: Yi Li yiliker...@gmail.com --- linux/virt/kvm/irq_comm.c 2012-12-04 10:14:57.711024619 +0800 +++ linux/virt/kvm/irq_comm.c 2012-12-04 11:01:27.728859597 +0800 @@ -64,9 +64,10 @@ inline static bool kvm_is_dm_lowest_prio int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq) { - int i, r = -1; + int i, cpu, r = -1; struct kvm_vcpu *vcpu, *lowest = NULL; + cpu = get_cpu(); if (irq-dest_mode == 0 irq-dest_id == 0xff kvm_is_dm_lowest_prio(irq)) { printk(KERN_INFO kvm: apic: phys broadcast and lowest prio\n); @@ -89,13 +90,17 @@ int kvm_irq_delivery_to_apic(struct kvm r = 0; r += kvm_apic_set_irq(vcpu, irq); } else if (kvm_lapic_enabled(vcpu)) { - if (!lowest) + if(vcpu-cpu == cpu) { + lowest = vcpu; + break; + } + else if (!lowest) lowest = vcpu; else if (kvm_apic_compare_prio(vcpu, lowest) 0) lowest = vcpu; } } - + put_cpu(); if (lowest) r = kvm_apic_set_irq(lowest, irq); YiLi Thanks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next rfc v7 1/3] virtio-net: separate fields of sending/receiving queue from virtnet_info
Jason Wang jasow...@redhat.com writes: On Monday, December 03, 2012 12:25:42 PM Rusty Russell wrote: + + /* Work struct for refilling if we run low on memory. */ + struct delayed_work refill; I can't really see the justificaiton for a refill per queue. Just have one work iterate all the queues if it happens, unless it happens often (in which case, we need to look harder at this anyway). But during this kind of iteration, we may need enable/disable the napi regardless of whether the receive queue has lots to be refilled. This may add extra latency. Sure, but does it actually happen? We only use the work when we run out of memory. If this happens in normal behaviour we need to change something else... Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Maybe avoid a IPI between the cpu cores
Maybe the patch have a risk to break the process of lapic.just avoid a IPI between the cpu core asap. Thanks YiLi 2012/12/4 yi li yiliker...@gmail.com: The cpu inject the interrupt to vcpu which vcpu-cpu is the same as it. And it maybe avoid a IPI between the cpu core. Signed-off-by: Yi Li yiliker...@gmail.com --- linux/virt/kvm/irq_comm.c 2012-12-04 10:14:57.711024619 +0800 +++ linux/virt/kvm/irq_comm.c 2012-12-04 11:01:27.728859597 +0800 @@ -64,9 +64,10 @@ inline static bool kvm_is_dm_lowest_prio int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq) { - int i, r = -1; + int i, cpu, r = -1; struct kvm_vcpu *vcpu, *lowest = NULL; + cpu = get_cpu(); if (irq-dest_mode == 0 irq-dest_id == 0xff kvm_is_dm_lowest_prio(irq)) { printk(KERN_INFO kvm: apic: phys broadcast and lowest prio\n); @@ -89,13 +90,17 @@ int kvm_irq_delivery_to_apic(struct kvm r = 0; r += kvm_apic_set_irq(vcpu, irq); } else if (kvm_lapic_enabled(vcpu)) { - if (!lowest) + if(vcpu-cpu == cpu) { + lowest = vcpu; + break; + } + else if (!lowest) lowest = vcpu; else if (kvm_apic_compare_prio(vcpu, lowest) 0) lowest = vcpu; } } - + put_cpu(); if (lowest) r = kvm_apic_set_irq(lowest, irq); YiLi Thanks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 1/4] x86: PIT connects to pin 2 of IOAPIC
Gleb Natapov wrote on 2012-12-03: On Mon, Dec 03, 2012 at 03:01:01PM +0800, Yang Zhang wrote: When PIT connects to IOAPIC, it route to pin 2 not pin 0. This hack didn't work for a long time and nobody complained. It should be safe to get rid of it. This hack is used to work around an issue about RHEL4. Not sure whether anyone still uses RHEL4. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- virt/kvm/ioapic.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index cfb7e4d..166c450 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -181,7 +181,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) #ifdef CONFIG_X86 /* Always delivery PIT interrupt to vcpu 0 */ -if (irq == 0) { +if (irq == 2) { irqe.dest_mode = 0; /* Physical mode. */ /* need to read apic_id from apic regiest since * it can be rewritten */ -- 1.7.1 -- Gleb. Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Remove unused bits checking for EPT logic.
From: Zhang Xiantao xiantao.zh...@intel.com This two bits{bit 6 in exit_qualification, and bit 24 in VMX_EPT_VPID_CAP_MSR} are not available in SDM, remove them for consistence. Zhang Xiantao (2): kvm: remove unnecessary bit checking for ept violation kvm: don't use bit24 for detecting address-specific invalidation capability arch/x86/include/asm/vmx.h |3 +-- arch/x86/kvm/vmx.c | 21 - 2 files changed, 1 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm: remove unnecessary bit checking for ept violation
From: Zhang Xiantao xiantao.zh...@intel.com Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it. Signed-off-by: Zhang Xiantao xiantao.zh...@intel.com --- arch/x86/kvm/vmx.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ad6b1dd..d077996 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4827,11 +4827,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) exit_qualification = vmcs_readl(EXIT_QUALIFICATION); - if (exit_qualification (1 6)) { - printk(KERN_ERR EPT: GPA exceeds GAW!\n); - return -EINVAL; - } - gla_validity = (exit_qualification 7) 0x3; if (gla_validity != 0x3 gla_validity != 0x1 gla_validity != 0) { printk(KERN_ERR EPT: Handling EPT violation failed!\n); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm: don't use bit24 for detecting address-specific invalidation capability
From: Zhang Xiantao xiantao.zh...@intel.com Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability reporting, so remove it from KVM to avoid conflicts in future. Signed-off-by: Zhang Xiantao xiantao.zh...@intel.com --- arch/x86/include/asm/vmx.h |3 +-- arch/x86/kvm/vmx.c | 16 2 files changed, 1 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 36ec21c..c2d56b3 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -445,8 +445,7 @@ enum vmcs_field { #define VMX_EPTP_WB_BIT(1ull 14) #define VMX_EPT_2MB_PAGE_BIT (1ull 16) #define VMX_EPT_1GB_PAGE_BIT (1ull 17) -#define VMX_EPT_AD_BIT (1ull 21) -#define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull 24) +#define VMX_EPT_AD_BIT (1ull 21) #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull 25) #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull 26) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d077996..81b1aa6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -802,11 +802,6 @@ static inline bool cpu_has_vmx_ept_ad_bits(void) return vmx_capability.ept VMX_EPT_AD_BIT; } -static inline bool cpu_has_vmx_invept_individual_addr(void) -{ - return vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT; -} - static inline bool cpu_has_vmx_invept_context(void) { return vmx_capability.ept VMX_EPT_EXTENT_CONTEXT_BIT; @@ -1051,17 +1046,6 @@ static inline void ept_sync_context(u64 eptp) } } -static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa) -{ - if (enable_ept) { - if (cpu_has_vmx_invept_individual_addr()) - __invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR, - eptp, gpa); - else - ept_sync_context(eptp); - } -} - static __always_inline unsigned long vmcs_readl(unsigned long field) { unsigned long value; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support
Gleb Natapov wrote on 2012-12-03: On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote: Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Most of my previous comments still apply. Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Kevin Tian kevin.t...@intel.com --- arch/x86/include/asm/kvm_host.h |4 + arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c | 53 ++- arch/x86/kvm/lapic.c| 56 +--- arch/x86/kvm/lapic.h|6 ++ arch/x86/kvm/svm.c | 19 + arch/x86/kvm/vmx.c | 140 ++- arch/x86/kvm/x86.c | 34 -- virt/kvm/ioapic.c |1 + 9 files changed, 291 insertions(+), 33 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); +int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu); +void (*update_irq)(struct kvm_vcpu *vcpu); +void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector, +int trig_mode, int always_set); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 21101b6..1003341 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -62,6 +62,7 @@ #define EXIT_REASON_MCE_DURING_VMENTRY 41 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 +#define EXIT_REASON_EOI_INDUCED 45 #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 @@ -143,6 +144,7 @@ #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 #define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x0100 +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID 0x1000 @@ -180,6 +182,7 @@ enum vmcs_field { GUEST_GS_SELECTOR = 0x080a, GUEST_LDTR_SELECTOR = 0x080c, GUEST_TR_SELECTOR = 0x080e, + GUEST_INTR_STATUS = 0x0810, HOST_ES_SELECTOR = 0x0c00, HOST_CS_SELECTOR= 0x0c02, HOST_SS_SELECTOR= 0x0c04, @@ -207,6 +210,14 @@ enum vmcs_field { APIC_ACCESS_ADDR_HIGH = 0x2015, EPT_POINTER = 0x201a, EPT_POINTER_HIGH= 0x201b, +EOI_EXIT_BITMAP0= 0x201c, +EOI_EXIT_BITMAP0_HIGH = 0x201d, +EOI_EXIT_BITMAP1= 0x201e, +EOI_EXIT_BITMAP1_HIGH = 0x201f, +EOI_EXIT_BITMAP2= 0x2020, +EOI_EXIT_BITMAP2_HIGH = 0x2021, +EOI_EXIT_BITMAP3= 0x2022, +EOI_EXIT_BITMAP3_HIGH = 0x2023, GUEST_PHYSICAL_ADDRESS = 0x2400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401, VMCS_LINK_POINTER = 0x2800, diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 7e06ba1..f782788 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer); */ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) { -struct kvm_pic *s; - if (!irqchip_in_kernel(v-kvm)) return v-arch.interrupt.pending; -if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */ -if (kvm_apic_accept_pic_intr(v)) { -s = pic_irqchip(v-kvm);/* PIC */ -return s-output; -} else -return 0; -} +if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */ +return kvm_cpu_has_extint(v); /* non-APIC */ return
Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
On Mon, Dec 03, 2012 at 06:30:49PM +0800, Jason Wang wrote: On 12/03/2012 06:14 PM, Michael S. Tsirkin wrote: On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote: -if (!try_fill_recv(vi-rq, GFP_KERNEL)) -schedule_delayed_work(vi-rq.refill, 0); +for (i = 0; i vi-max_queue_pairs; i++) +if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) +schedule_delayed_work(vi-rq[i].refill, 0); mutex_lock(vi-config_lock); vi-config_enable = true; mutex_unlock(vi-config_lock); +BUG_ON(virtnet_set_queues(vi)); + return 0; } #endif Also crashing on device nack of command is also not nice. In this case it seems we can just switch to single-queue mode which should always be safe. Not sure it's safe. It depends on the reason why this call fails. If we left a state that the driver only use single queue but the device use multi queues, we may still lost the network. Looks like we won't: napi will stay enabled on all queues so we will process incoming packets. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes
On 12/04/2012 12:16 AM, Marcelo Tosatti wrote: On Mon, Dec 03, 2012 at 12:36:33PM +0400, Glauber Costa wrote: On 11/30/2012 09:55 PM, Pavel Emelyanov wrote: Hello, This is an attempt to implement support for memory snapshot for the the checkpoint-restore project (http://criu.org). To create a dump of an application(s) we save all the information about it to files. No surprise, the biggest part of such dump is the contents of tasks' memory. However, in some usage scenarios it's not required to get _all_ the task memory while creating a dump. For example, when doing periodical dumps it's only required to take full memory dump only at the first step and then take incremental changes of memory. Another example is live migration. In the simplest form it looks like -- create dump, copy it on the remote node then restore tasks from dump files. While all this dump-copy-restore thing goes all the process must be stopped. However, if we can monitor how tasks change their memory, we can dump and copy it in smaller chunks, periodically updating it and thus freezing tasks only at the very end for the very short time to pick up the recent changes. That said, some help from kernel to watch how processes modify the contents of their memory is required. I'd like to propose one possible solution of this task -- with the help of page-faults and trace events. Briefly the approach is -- remap some memory regions as read-only, get the #pf on task's attempt to modify the memory and issue a trace event of that. Since we're only interested in parts of memory of some tasks, make it possible to mark the vmas we're interested in and issue events for them only. Also, to be aware of tasks unmapping the vma-s being watched, also issue an event when the marked vma is removed (and for symmetry -- an event when a vma is marked). What do you think about this approach? Is this way of supporting mem snapshot OK for you, or should we invent some better one? The page fault mechanism is pretty obvious - anything that deals with dirty pages will end up having to do this. So there is nothing crazy about this. What concerns me, however, is that should this go in, we'll have two dirty mem loggers in the kernel: one to support CRIU, one to support KVM. And the worst part: They have the exact the same purpose!! So to begin with, I think one thing to consider, would be to generalize KVM's dirty memory notification so it can work on a normal process memory region. KVM api requires a memory slot to be passed, something we are unlikely to have. But KVM can easily keep its API and use an alternate mechanics, that's trivial... Generally speaking, KVM will do polling with this ioctl. I prefer your tracing mechanism better. The only difference, is that KVM tends to transfer large chunks of memory in some loads - in the high gigs range. So the proposal tracing API should be able to optionally batch requests within a time frame. It would also be good to hear what does the KVM guys think of it as well There are significant differences. KVM's dirty logging works for guest translations (NPT/shadow) and is optimized for specific use cases. Above is about dirty logging of userspace memory areas. This is envelope. At the end, KVM is tracking pages, regardless of their format, and want to know when pages are dirty, and which pages are dirty. What you are saying, for example, is that if Linux had a dirty page tracking for userpages memory, kvm could not make use of it ? IIRC qemu's migration algorithm, a notification of dirty user pages - plus an eventual extra trickery - would do just fine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv
On 12/01/2012 07:58:25 AM, Alexander Graf wrote: In BookE, EPCR is defined and valid when either the HV or the 64bit category are implemented. Reflect this in the field definition. Today the only KVM target on 64bit is HV enabled, so there is no change in actual source code, but this keeps the code closer to the spec and doesn't build up artificial road blocks for a PR KVM on 64bit. [snip] Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62fbd38..3480526 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -405,14 +405,19 @@ struct kvm_vcpu_arch { #ifdef CONFIG_KVM_BOOKE_HV u32 host_mas4; u32 host_mas6; - u32 shadow_epcr; - u32 epcr; u32 shadow_msrp; u32 eplc; u32 epsc; u32 oldpir; #endif +#if defined(CONFIG_BOOKE) +#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT) + u32 shadow_epcr; + u32 epcr; +#endif +#endif On a PR-mode implementation, why would we be have a shadow_epcr? It would always be set based on the host kernel, just like when running any other userspace process. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv
On 03.12.2012, at 17:47, Scott Wood wrote: On 12/01/2012 07:58:25 AM, Alexander Graf wrote: In BookE, EPCR is defined and valid when either the HV or the 64bit category are implemented. Reflect this in the field definition. Today the only KVM target on 64bit is HV enabled, so there is no change in actual source code, but this keeps the code closer to the spec and doesn't build up artificial road blocks for a PR KVM on 64bit. [snip] Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62fbd38..3480526 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -405,14 +405,19 @@ struct kvm_vcpu_arch { #ifdef CONFIG_KVM_BOOKE_HV u32 host_mas4; u32 host_mas6; -u32 shadow_epcr; -u32 epcr; u32 shadow_msrp; u32 eplc; u32 epsc; u32 oldpir; #endif +#if defined(CONFIG_BOOKE) +#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT) +u32 shadow_epcr; +u32 epcr; +#endif +#endif On a PR-mode implementation, why would we be have a shadow_epcr? It would always be set based on the host kernel, just like when running any other userspace process. Right - we could simply set MSR_CM. I'll move shadow_epcr back into the HV only bit above. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html