Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10, Jeremy Fitzhardinge wrote: On 02/10/2015 05:26 AM, Oleg Nesterov wrote: On 02/10, Raghavendra K T wrote: Unfortunately xadd could result in head overflow as tail is high. The other option was repeated cmpxchg which is bad I believe. Any suggestions? Stupid question... what if we simply move SLOWPATH from .tail to .head? In this case arch_spin_unlock() could do xadd(tickets.head) and check the result Well, right now, tail is manipulated by locked instructions by CPUs who are contending for the ticketlock, but head can be manipulated unlocked by the CPU which currently owns the ticketlock. If SLOWPATH moved into head, then non-owner CPUs would be touching head, requiring everyone to use locked instructions on it. That's the theory, but I don't see much (any?) code which depends on that. Ideally we could find a way so that pv ticketlocks could use a plain unlocked add for the unlock like the non-pv case, but I just don't see a way to do it. I agree, and I have to admit I am not sure I fully understand why unlock uses the locked add. Except we need a barrier to avoid the race with the enter_slowpath() users, of course. Perhaps this is the only reason? Anyway, I suggested this to avoid the overflow if we use xadd(), and I guess we need the locked insn anyway if we want to eliminate the unsafe read-after-unlock... BTW. If we move clear slowpath into lock path, then probably trylock should be changed too? Something like below, we just need to clear SLOWPATH before cmpxchg. How important / widely used is trylock these days? I am not saying this is that important. Just this looks more consistent imo and we can do this for free. Oleg. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/11/2015 11:08 PM, Oleg Nesterov wrote: On 02/11, Raghavendra K T wrote: On 02/10/2015 06:56 PM, Oleg Nesterov wrote: In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg the whole .head_tail. Plus obviously more boring changes. This needs a separate patch even _if_ this can work. Correct, but apart from this, before doing xadd in unlock, we would have to make sure lsb bit is cleared so that we can live with 1 bit overflow to tail which is unused. now either or both of head,tail lsb bit may be set after unlock. Sorry, can't understand... could you spell? If TICKET_SLOWPATH_FLAG lives in .head arch_spin_unlock() could simply do head = xadd(lock-tickets.head, TICKET_LOCK_INC); if (head TICKET_SLOWPATH_FLAG) __ticket_unlock_kick(head); so it can't overflow to .tail? You are right. I totally forgot we can get rid of tail operations :) And we we do this, probably it makes sense to add something like bool tickets_equal(__ticket_t one, __ticket_t two) { return (one ^ two) ~TICKET_SLOWPATH_FLAG; } Very nice idea. I was tired of ~TICKET_SLOWPATH_FLAG usage all over in the current (complex :)) implementation. These two suggestions helps alot. and change kvm_lock_spinning() to use tickets_equal(tickets.head, want), plus it can have more users in asm/spinlock.h. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL
If exception_enter happens when already in IN_KERNEL state, the code still calls context_tracking_exit, which ends up in rcu_eqs_exit_common, which explodes with a WARN_ON when it is called in a situation where dynticks are not enabled. This can be avoided by having exception_enter only switch to IN_KERNEL state if the current state is not already IN_KERNEL. Signed-off-by: Rik van Riel r...@redhat.com Reported-by: Luiz Capitulino lcapitul...@redhat.com --- Frederic, you will want this bonus patch, too :) Thanks to Luiz for finding this one. Whatever I was running did not trigger this issue... include/linux/context_tracking.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index b65fd1420e53..9da230406e8c 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void) return 0; prev_ctx = this_cpu_read(context_tracking.state); - context_tracking_exit(prev_ctx); + if (prev_ctx != IN_KERNEL) + context_tracking_exit(prev_ctx); return prev_ctx; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/11, Raghavendra K T wrote: On 02/10/2015 06:56 PM, Oleg Nesterov wrote: In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg the whole .head_tail. Plus obviously more boring changes. This needs a separate patch even _if_ this can work. Correct, but apart from this, before doing xadd in unlock, we would have to make sure lsb bit is cleared so that we can live with 1 bit overflow to tail which is unused. now either or both of head,tail lsb bit may be set after unlock. Sorry, can't understand... could you spell? If TICKET_SLOWPATH_FLAG lives in .head arch_spin_unlock() could simply do head = xadd(lock-tickets.head, TICKET_LOCK_INC); if (head TICKET_SLOWPATH_FLAG) __ticket_unlock_kick(head); so it can't overflow to .tail? But probably I missed your concern. And we we do this, probably it makes sense to add something like bool tickets_equal(__ticket_t one, __ticket_t two) { return (one ^ two) ~TICKET_SLOWPATH_FLAG; } and change kvm_lock_spinning() to use tickets_equal(tickets.head, want), plus it can have more users in asm/spinlock.h. Oleg. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nSVM: Booting L2 results in L1 hang and a skip_emulated_instruction
Hi, This was tested with kernel-3.19.0-1.fc22) and QEMU (qemu-2.2.0-5.fc22) on L0 L1. Description --- Inside L1, boot a nested KVM guest (L2) . Instead of a full blown guest, let's use `qemu-sanity-check` with KVM: $ qemu-sanity-check --accel=kvm Wwich gives you this CLI (run from a different shell), that confirms that the L2 guest is indeed running on KVM (and not TCG): $ ps -ef | grep -i qemu root 763 762 35 11:49 ttyS000:00:00 qemu-system-x86_64 -nographic -nodefconfig -nodefaults -machine accel=kvm -no-reboot -serial file:/tmp/tmp.rl3naPaCkZ.out -kernel /boot/vmlinuz-3.19.0-1.fc21.x86_64 -initrd /usr/lib64/qemu-sanity-check/initrd -append console=ttyS0 oops=panic panic=-1 Which results in: (a) L1 (guest hypervisor) completely hangs and is unresponsive. But when I query libvirt, (`virsh list`) the guest is still reported as 'running' (b) On L0, I notice a ton of these messages: skip_emulated_instruction: ip 0xffec next 0x8105e964 I can get `dmesg`, `dmidecode` , `x86info -a` on L0 and L1 if it helps in narrowing down the issue. Related bug and reproducer details -- https://bugzilla.redhat.com/show_bug.cgi?id=1191665 -- Nested KVM with AMD: L2 (nested guest) fails with divide error: [#1] SMP -- /kashyap -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL
On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote: If exception_enter happens when already in IN_KERNEL state, the code still calls context_tracking_exit, which ends up in rcu_eqs_exit_common, which explodes with a WARN_ON when it is called in a situation where dynticks are not enabled. This can be avoided by having exception_enter only switch to IN_KERNEL state if the current state is not already IN_KERNEL. Ugh... Time to formally verify, sounds like... Thanx, Paul Signed-off-by: Rik van Riel r...@redhat.com Reported-by: Luiz Capitulino lcapitul...@redhat.com --- Frederic, you will want this bonus patch, too :) Thanks to Luiz for finding this one. Whatever I was running did not trigger this issue... include/linux/context_tracking.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index b65fd1420e53..9da230406e8c 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void) return 0; prev_ctx = this_cpu_read(context_tracking.state); - context_tracking_exit(prev_ctx); + if (prev_ctx != IN_KERNEL) + context_tracking_exit(prev_ctx); return prev_ctx; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtual: Documentation: simplify and generalize paravirt_ops.txt
Luis R. Rodriguez mcg...@do-not-panic.com writes: From: Luis R. Rodriguez mcg...@suse.com The general documentation we have for pv_ops is currenty present on the IA64 docs, but since this documentation covers IA64 xen enablement and IA64 Xen support got ripped out a while ago through commit d52eefb47 present since v3.14-rc1 lets just simplify, generalize and move the pv_ops documentation to a shared place. OK, I've applied this. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/11/2015 09:24 AM, Oleg Nesterov wrote: I agree, and I have to admit I am not sure I fully understand why unlock uses the locked add. Except we need a barrier to avoid the race with the enter_slowpath() users, of course. Perhaps this is the only reason? Right now it needs to be a locked operation to prevent read-reordering. x86 memory ordering rules state that all writes are seen in a globally consistent order, and are globally ordered wrt reads *on the same addresses*, but reads to different addresses can be reordered wrt to writes. So, if the unlocking add were not a locked operation: __add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */ if (unlikely(lock-tickets.tail TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); Then the read of lock-tickets.tail can be reordered before the unlock, which introduces a race: /* read reordered here */ if (unlikely(lock-tickets.tail TICKET_SLOWPATH_FLAG)) /* false */ /* ... */; /* other CPU sets SLOWPATH and blocks */ __add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */ /* other CPU hung */ So it doesn't *have* to be a locked operation. This should also work: __add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */ lfence(); /* prevent read reordering */ if (unlikely(lock-tickets.tail TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); but in practice a locked add is cheaper than an lfence (or at least was). This *might* be OK, but I think it's on dubious ground: __add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */ /* read overlaps write, and so is ordered */ if (unlikely(lock-head_tail (TICKET_SLOWPATH_FLAG TICKET_SHIFT)) __ticket_unlock_slowpath(lock, prev); because I think Intel and AMD differed in interpretation about how overlapping but different-sized reads writes are ordered (or it simply isn't architecturally defined). If the slowpath flag is moved to head, then it would always have to be locked anyway, because it needs to be atomic against other CPU's RMW operations setting the flag. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 2/4] KVM: arm: vgic: fix state machine for forwarded IRQ
Fix multiple injection of level sensitive forwarded IRQs. With current code, the second injection fails since the state bitmaps are not reset (process_maintenance is not called anymore). New implementation follows those principles: - A forwarded IRQ only can be sampled when it is pending - when queueing the IRQ (programming the LR), the pending state is removed as for edge sensitive IRQs - an injection of a forwarded IRQ is considered always valid since coming from the HW and level always is 1. Signed-off-by: Eric Auger eric.au...@linaro.org --- v1 - v2: - integration in new vgic_can_sample_irq - remove the pending state when programming the LR --- virt/kvm/arm/vgic.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index cd00cf2..433ecba 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -361,7 +361,10 @@ static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq) static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq) { - return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq); + bool is_forwarded = (vgic_get_phys_irq(vcpu, irq) = 0); + + return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq) || + (is_forwarded vgic_dist_irq_is_pending(vcpu, irq)); } static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask) @@ -1296,6 +1299,7 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq) struct vgic_dist *dist = vcpu-kvm-arch.vgic; struct vgic_lr vlr; int lr; + bool is_forwarded = (vgic_get_phys_irq(vcpu, irq) = 0); /* Sanitize the input... */ BUG_ON(sgi_source_id ~7); @@ -1331,7 +1335,7 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq) vlr.irq = irq; vlr.source = sgi_source_id; vlr.state = LR_STATE_PENDING; - if (!vgic_irq_is_edge(vcpu, irq)) + if (!vgic_irq_is_edge(vcpu, irq) !is_forwarded) vlr.state |= LR_EOI_INT; vgic_set_lr(vcpu, lr, vlr); @@ -1372,11 +1376,12 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq) static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq) { + bool is_forwarded = (vgic_get_phys_irq(vcpu, irq) = 0); if (!vgic_can_sample_irq(vcpu, irq)) return true; /* level interrupt, already queued */ if (vgic_queue_irq(vcpu, 0, irq)) { - if (vgic_irq_is_edge(vcpu, irq)) { + if (vgic_irq_is_edge(vcpu, irq) || is_forwarded) { vgic_dist_irq_clear_pending(vcpu, irq); vgic_cpu_irq_clear(vcpu, irq); } else { @@ -1626,14 +1631,17 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid, int edge_triggered, level_triggered; int enabled; bool ret = true; + bool is_forwarded; spin_lock(dist-lock); vcpu = kvm_get_vcpu(kvm, cpuid); + is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) = 0); + edge_triggered = vgic_irq_is_edge(vcpu, irq_num); level_triggered = !edge_triggered; - if (!vgic_validate_injection(vcpu, irq_num, level)) { + if (!vgic_validate_injection(vcpu, irq_num, level) !is_forwarded) { ret = false; goto out; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 4/4] KVM: arm: vgic: cleanup forwarded IRQs on destroy
When the VGIC is destroyed it must take care of - restoring the forwarded IRQs in non forwarded state, - deactivating the IRQ in case the guest left without doing it - cleaning nodes of the phys_map rbtree Signed-off-by: Eric Auger eric.au...@linaro.org --- v1 - v2: - remove vgic_clean_irq_phys_map call in kvm_vgic_destroy (useless since already called in kvm_vgic_vcpu_destroy) --- virt/kvm/arm/vgic.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index dd72ca2..ace8e46 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -32,6 +32,7 @@ #include asm/kvm_emulate.h #include asm/kvm_arm.h #include asm/kvm_mmu.h +#include linux/spinlock.h /* * How the whole thing works (courtesy of Christoffer Dall): @@ -103,6 +104,8 @@ static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr); static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc); static void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr); static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr); +static void vgic_clean_irq_phys_map(struct kvm_vcpu *vcpu, + struct rb_root *root); static const struct vgic_ops *vgic_ops; static const struct vgic_params *vgic; @@ -1819,6 +1822,36 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu, return NULL; } +static void vgic_clean_irq_phys_map(struct kvm_vcpu *vcpu, + struct rb_root *root) +{ + unsigned long flags; + + while (1) { + struct rb_node *node = rb_first(root); + struct irq_phys_map *map; + struct irq_desc *desc; + struct irq_data *d; + struct irq_chip *chip; + + if (!node) + break; + + map = container_of(node, struct irq_phys_map, node); + desc = irq_to_desc(map-phys_irq); + + raw_spin_lock_irqsave(desc-lock, flags); + d = desc-irq_data; + chip = desc-irq_data.chip; + irqd_clr_irq_forwarded(d); + chip-irq_eoi(d); + raw_spin_unlock_irqrestore(desc-lock, flags); + + rb_erase(node, root); + kfree(map); + } +} + int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq) { struct irq_phys_map *map; @@ -1861,6 +1894,7 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu) { struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + vgic_clean_irq_phys_map(vcpu, vgic_cpu-irq_phys_map); kfree(vgic_cpu-pending_shared); kfree(vgic_cpu-vgic_irq_lr_map); vgic_cpu-pending_shared = NULL; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 3/4] KVM: arm: vgic: add forwarded irq rbtree lock
Add a lock related to the rb tree manipulation. The rb tree can be searched in one thread (irqfd handler for instance) and map/unmap may happen in another. Signed-off-by: Eric Auger eric.au...@linaro.org --- v2 - v3: re-arrange lock sequence in vgic_map_phys_irq --- include/kvm/arm_vgic.h | 1 + virt/kvm/arm/vgic.c| 56 -- 2 files changed, 42 insertions(+), 15 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 1a49108..ad7229b 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -220,6 +220,7 @@ struct vgic_dist { unsigned long *irq_pending_on_cpu; struct rb_root irq_phys_map; + spinlock_t rb_tree_lock; #endif }; diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 433ecba..dd72ca2 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1756,9 +1756,22 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu, int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq) { - struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq); - struct rb_node **new = root-rb_node, *parent = NULL; + struct rb_root *root; + struct rb_node **new, *parent = NULL; struct irq_phys_map *new_map; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + + root = vgic_get_irq_phys_map(vcpu, virt_irq); + new = root-rb_node; + + new_map = kzalloc(sizeof(*new_map), GFP_KERNEL); + if (!new_map) + return -ENOMEM; + + new_map-virt_irq = virt_irq; + new_map-phys_irq = phys_irq; + + spin_lock(dist-rb_tree_lock); /* Boilerplate rb_tree code */ while (*new) { @@ -1770,19 +1783,16 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq) new = (*new)-rb_left; else if (this-virt_irq virt_irq) new = (*new)-rb_right; - else + else { + kfree(new_map); + spin_unlock(dist-rb_tree_lock); return -EEXIST; + } } - new_map = kzalloc(sizeof(*new_map), GFP_KERNEL); - if (!new_map) - return -ENOMEM; - - new_map-virt_irq = virt_irq; - new_map-phys_irq = phys_irq; - rb_link_node(new_map-node, parent, new); rb_insert_color(new_map-node, root); + spin_unlock(dist-rb_tree_lock); return 0; } @@ -1811,24 +1821,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu, int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq) { - struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq); + struct irq_phys_map *map; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + int ret; + + spin_lock(dist-rb_tree_lock); + map = vgic_irq_map_search(vcpu, virt_irq); if (map) - return map-phys_irq; + ret = map-phys_irq; + else + ret = -ENOENT; + + spin_unlock(dist-rb_tree_lock); + return ret; - return -ENOENT; } int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq) { - struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq); + struct irq_phys_map *map; + struct vgic_dist *dist = vcpu-kvm-arch.vgic; + + spin_lock(dist-rb_tree_lock); + + map = vgic_irq_map_search(vcpu, virt_irq); if (map map-phys_irq == phys_irq) { rb_erase(map-node, vgic_get_irq_phys_map(vcpu, virt_irq)); kfree(map); + spin_unlock(dist-rb_tree_lock); return 0; } - + spin_unlock(dist-rb_tree_lock); return -ENOENT; } @@ -2071,6 +2096,7 @@ int kvm_vgic_create(struct kvm *kvm) ret = 0; spin_lock_init(kvm-arch.vgic.lock); + spin_lock_init(kvm-arch.vgic.rb_tree_lock); kvm-arch.vgic.in_kernel = true; kvm-arch.vgic.vctrl_base = vgic-vctrl_base; kvm-arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 1/4] chip.c: complete the forwarded IRQ in case the handler is not reached
With current handle_fasteoi_irq implementation, in case irqd_irq_disabled is true (disable_irq was called) or !irq_may_run, the IRQ is not completed. Only the running priority is dropped. IN those cases, the IRQ will never be forwarded and hence will never be deactivated by anyone else. Signed-off-by: Eric Auger eric.au...@linaro.org --- kernel/irq/chip.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 2f9571b..f12cce6 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -561,8 +561,12 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc) raw_spin_unlock(desc-lock); return; out: - if (!(chip-flags IRQCHIP_EOI_IF_HANDLED)) - eoi_irq(desc, chip); + if (!(chip-flags IRQCHIP_EOI_IF_HANDLED)) { + if (chip-irq_priority_drop) + chip-irq_priority_drop(desc-irq_data); + if (chip-irq_eoi) + chip-irq_eoi(desc-irq_data); + } raw_spin_unlock(desc-lock); } EXPORT_SYMBOL_GPL(handle_fasteoi_irq); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 0/4] chip/vgic adaptations for forwarded irq
This series proposes some fixes that appeared to be necessary to integrate IRQ forwarding in KVM/VFIO. - deactivation of the forwarded IRQ in irq_disabled case - a specific handling of forwarded IRQ into the VGIC state machine. - deactivation of physical IRQ and unforwarding on vgic destruction - rb_tree lock in vgic.c Integrated pieces can be found at ssh://git.linaro.org/people/eric.auger/linux.git on branch irqfd_integ_v9 v1 - v2: - change title of the series (formerly vgic additions for forwarded irq) - [RFC 4/4] KVM: arm: vgic: handle irqfd forwarded IRQ injection before vgic readiness now handled in ARM irqfd series - add chip.c patch file Eric Auger (4): chip.c: complete the forwarded IRQ in case the handler is not reached KVM: arm: vgic: fix state machine for forwarded IRQ KVM: arm: vgic: add forwarded irq rbtree lock KVM: arm: vgic: cleanup forwarded IRQs on destroy include/kvm/arm_vgic.h | 1 + kernel/irq/chip.c | 8 +++- virt/kvm/arm/vgic.c| 106 - 3 files changed, 94 insertions(+), 21 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 04/13] KVM: kvm-vfio: User API for IRQ forwarding
This patch adds and documents a new KVM_DEV_VFIO_DEVICE group and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able to set a VFIO device IRQ as forwarded or not forwarded. the command takes as argument a handle to a new struct named kvm_vfio_dev_irq. Signed-off-by: Eric Auger eric.au...@linaro.org --- v3 - v4: - rename kvm_arch_forwarded_irq into kvm_vfio_dev_irq - some rewording in commit message - document forwarding restrictions and remove unforwarding ones v2 - v3: - rework vfio kvm device documentation - reword commit message and title - add subindex in kvm_arch_forwarded_irq to be closer to VFIO API - forwarding state can only be changed with VFIO IRQ signaling is off v1 - v2: - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h to include/uapi/linux/kvm.h also irq_index renamed into index and guest_irq renamed into gsi - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD --- Documentation/virtual/kvm/devices/vfio.txt | 34 -- include/uapi/linux/kvm.h | 12 +++ 2 files changed, 40 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt index ef51740..6186e6d 100644 --- a/Documentation/virtual/kvm/devices/vfio.txt +++ b/Documentation/virtual/kvm/devices/vfio.txt @@ -4,15 +4,20 @@ VFIO virtual device Device types supported: KVM_DEV_TYPE_VFIO -Only one VFIO instance may be created per VM. The created device -tracks VFIO groups in use by the VM and features of those groups -important to the correctness and acceleration of the VM. As groups -are enabled and disabled for use by the VM, KVM should be updated -about their presence. When registered with KVM, a reference to the -VFIO-group is held by KVM. +Only one VFIO instance may be created per VM. + +The created device tracks VFIO groups in use by the VM and features +of those groups important to the correctness and acceleration of +the VM. As groups are enabled and disabled for use by the VM, KVM +should be updated about their presence. When registered with KVM, +a reference to the VFIO-group is held by KVM. + +The device also enables to control some IRQ settings of VFIO devices: +forwarding/posting. Groups: KVM_DEV_VFIO_GROUP + KVM_DEV_VFIO_DEVICE KVM_DEV_VFIO_GROUP attributes: KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking @@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes: For each, kvm_device_attr.addr points to an int32_t file descriptor for the VFIO group. + +KVM_DEV_VFIO_DEVICE attributes: + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded + +For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct. + +When forwarded, a physical IRQ is completed by the guest and not by the +host. This requires HW support in the interrupt controller. + +Forwarding can only be set when the corresponding VFIO IRQ is not masked +(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence of this +IRQ being currently handled) or active at interrupt controller level. +In such a situation, -EAGAIN is returned. It is advised to to set the +forwarding before the VFIO signaling is set up, this avoids trial and errors. + +Unforwarding can happen at any time. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index a37fd12..d1a6496 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -938,6 +938,9 @@ struct kvm_device_attr { #define KVM_DEV_VFIO_GROUP1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 +#define KVM_DEV_VFIO_DEVICE 2 +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1 +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ2 enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20= 1, @@ -955,6 +958,15 @@ enum kvm_device_type { KVM_DEV_TYPE_MAX, }; +struct kvm_vfio_dev_irq { + __u32 argsz; /* structure length */ + __u32 fd; /* file descriptor of the VFIO device */ + __u32 index; /* VFIO device IRQ index */ + __u32 start; /* start of subindex range */ + __u32 count; /* size of subindex range */ + __u32 gsi[]; /* gsi, ie. virtual IRQ number */ +}; + /* * ioctls for VM fds */ -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 11/13] kvm: arm: implement kvm_arch_halt_guest and kvm_arch_resume_guest
This patch defines __KVM_HAVE_ARCH_HALT_GUEST and implements kvm_arch_halt_guest and kvm_arch_resume_guest for ARM. On halt, the guest is forced to exit and prevented from being re-entered. Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/include/asm/kvm_host.h | 4 arch/arm/kvm/arm.c | 32 +--- 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 87f0921..a9f2c31 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -28,6 +28,7 @@ #include kvm/arm_arch_timer.h #define __KVM_HAVE_ARCH_INTC_INITIALIZED +#define __KVM_HAVE_ARCH_HALT_GUEST #if defined(CONFIG_KVM_ARM_MAX_VCPUS) #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS @@ -131,6 +132,9 @@ struct kvm_vcpu_arch { /* vcpu power-off state*/ bool power_off; + /* Don't run the guest */ + bool pause; + /* IO related fields */ struct kvm_decode mmio_decode; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 6f63ab7..6c743a1 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -455,11 +455,36 @@ bool kvm_arch_intc_initialized(struct kvm *kvm) return vgic_initialized(kvm); } +void kvm_arch_halt_guest(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) + vcpu-arch.pause = true; + force_vm_exit(cpu_all_mask); +} + +void kvm_arch_resume_guest(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); + + vcpu-arch.pause = false; + wake_up_interruptible(wq); + } +} + + static void vcpu_pause(struct kvm_vcpu *vcpu) { wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); - wait_event_interruptible(*wq, !vcpu-arch.power_off); + wait_event_interruptible(*wq, ((!vcpu-arch.power_off) + (!vcpu-arch.pause))); } static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) @@ -509,7 +534,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) update_vttbr(vcpu-kvm); - if (vcpu-arch.power_off) + if (vcpu-arch.power_off || vcpu-arch.pause) vcpu_pause(vcpu); kvm_vgic_flush_hwstate(vcpu); @@ -527,7 +552,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) run-exit_reason = KVM_EXIT_INTR; } - if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) { + if (ret = 0 || need_new_vmid_gen(vcpu-kvm) || + vcpu-arch.pause) { kvm_timer_sync_hwstate(vcpu); local_irq_enable(); kvm_timer_finish_sync(vcpu); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 03/13] VFIO: platform: single handler using function pointer
A single handler now is registered whatever the use case: automasked or not. A function pointer is set according to the wished behavior and the handler calls this function. The irq lock is taken/released in the root handler. eventfd_signal can be called in regions not allowed to sleep. Signed-off-by: Eric Auger eric.au...@linaro.org --- v4: creation --- drivers/vfio/platform/vfio_platform_irq.c | 21 +++-- drivers/vfio/platform/vfio_platform_private.h | 1 + 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index 132bb3f..8eb65c1 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -147,11 +147,8 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev, static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id) { struct vfio_platform_irq *irq_ctx = dev_id; - unsigned long flags; int ret = IRQ_NONE; - spin_lock_irqsave(irq_ctx-lock, flags); - if (!irq_ctx-masked) { ret = IRQ_HANDLED; @@ -160,8 +157,6 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id) irq_ctx-masked = true; } - spin_unlock_irqrestore(irq_ctx-lock, flags); - if (ret == IRQ_HANDLED) eventfd_signal(irq_ctx-trigger, 1); @@ -177,6 +172,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id) return IRQ_HANDLED; } +static irqreturn_t vfio_handler(int irq, void *dev_id) +{ + struct vfio_platform_irq *irq_ctx = dev_id; + unsigned long flags; + irqreturn_t ret; + + spin_lock_irqsave(irq_ctx-lock, flags); + ret = irq_ctx-handler(irq, dev_id); + spin_unlock_irqrestore(irq_ctx-lock, flags); + + return ret; +} + static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, int fd, irq_handler_t handler) { @@ -206,9 +214,10 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, } irq-trigger = trigger; + irq-handler = handler; irq_set_status_flags(irq-hwirq, IRQ_NOAUTOEN); - ret = request_irq(irq-hwirq, handler, 0, irq-name, irq); + ret = request_irq(irq-hwirq, vfio_handler, 0, irq-name, irq); if (ret) { kfree(irq-name); eventfd_ctx_put(trigger); diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index 5d31e04..eb91deb 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -37,6 +37,7 @@ struct vfio_platform_irq { spinlock_t lock; struct virqfd *unmask; struct virqfd *mask; + irqreturn_t (*handler)(int irq, void *dev_id); }; struct vfio_platform_region { -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 02/13] VFIO: platform: test forwarded state when selecting IRQ handler
In case the IRQ is forwarded, the VFIO platform IRQ handler does not need to disable the IRQ anymore. When setting the IRQ handler we now also test the forwarded state. In case the IRQ is forwarded we select the vfio_irq_handler. Signed-off-by: Eric Auger eric.au...@linaro.org --- v3 - v4: - change title v2 - v3: - forwarded state was tested in the handler. Now the forwarded state is tested before setting the handler. This definitively limits the dynamics of forwarded state changes but I don't think there is a use case where we need to be able to change the state at any time. Conflicts: drivers/vfio/platform/vfio_platform_irq.c --- drivers/vfio/platform/vfio_platform_irq.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index 88bba57..132bb3f 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -229,8 +229,13 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev, { struct vfio_platform_irq *irq = vdev-irqs[index]; irq_handler_t handler; + struct irq_data *d; + bool is_forwarded; - if (vdev-irqs[index].flags VFIO_IRQ_INFO_AUTOMASKED) + d = irq_get_irq_data(irq-hwirq); + is_forwarded = irqd_irq_forwarded(d); + + if (vdev-irqs[index].flags VFIO_IRQ_INFO_AUTOMASKED !is_forwarded) handler = vfio_automasked_irq_handler; else handler = vfio_irq_handler; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 13/13] KVM: arm: vgic: forwarding control
This patch sets __KVM_HAVE_ARCH_KVM_VFIO_FORWARD and implements kvm_arch_set_forward for ARM. As a result the KVM-VFIO device now allows to forward/unforward a VFIO device IRQ on ARM. kvm_arch_set_forward and kvm_arch_unset_forward mostly take care of VGIC programming: physical IRQ/guest IRQ mapping, list register cleanup, VGIC state machine. Signed-off-by: Eric Auger eric.au...@linaro.org --- v3 - v4: - code originally located in kvm_vfio_arm.c - kvm_arch_vfio_{set|unset}_forward renamed into kvm_arch_{set|unset}_forward - split into 2 functions (set/unset) since unset does not fail anymore - unset can be invoked at whatever time. Extra care is taken to handle transition in VGIC state machine, LR cleanup, ... v2 - v3: - renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward - takes a bool arg instead of kvm_fwd_irq_action enum - removal of KVM_VFIO_IRQ_CLEANUP - platform device check now happens here - more precise errors returned - irq_eoi handled externally to this patch (VGIC) - correct enable_irq bug done twice - reword the commit message - correct check of platform_bus_type - use raw_spin_lock_irqsave and check the validity of the handler --- arch/arm/include/asm/kvm_host.h | 1 + virt/kvm/arm/vgic.c | 190 2 files changed, 191 insertions(+) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index a9f2c31..6e8be2b 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -29,6 +29,7 @@ #define __KVM_HAVE_ARCH_INTC_INITIALIZED #define __KVM_HAVE_ARCH_HALT_GUEST +#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD #if defined(CONFIG_KVM_ARM_MAX_VCPUS) #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index ace8e46..81bb2f2 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2691,3 +2691,193 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, { return 0; } + +/** + * kvm_arch_set_forward - Set forwarding for a given IRQ + * + * @kvm: handle to the VM + * @host_irq: physical IRQ number + * @guest_irq: virtual IRQ number + * + * This function is supposed to be called only if the IRQ + * is not in progress: ie. not active at VGIC level and not + * currently under injection in the KVM. + */ +int kvm_arch_set_forward(struct kvm *kvm, unsigned int host_irq, +unsigned int guest_irq) +{ + irq_hw_number_t gic_irq; + struct irq_desc *desc = irq_to_desc(host_irq); + struct irq_data *d; + unsigned long flags; + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0); + int ret = 0; + int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS; + struct vgic_dist *dist = kvm-arch.vgic; + + if (!vcpu) + return 0; + + spin_lock(dist-lock); + + raw_spin_lock_irqsave(desc-lock, flags); + d = desc-irq_data; + gic_irq = irqd_to_hwirq(d); + irqd_set_irq_forwarded(d); + /* +* next physical IRQ will be be handled as forwarded +* by the host (priority drop only) +*/ + + raw_spin_unlock_irqrestore(desc-lock, flags); + + /* +* need to release the dist spin_lock here since +* vgic_map_phys_irq can sleep +*/ + spin_unlock(dist-lock); + ret = vgic_map_phys_irq(vcpu, spi_id, (int)gic_irq); + /* +* next guest_irq injection will be considered as +* forwarded and next flush will program LR +* without maintenance IRQ but with HW bit set +*/ + return ret; +} + +/** + * kvm_arch_unset_forward - Unset forwarding for a given IRQ + * + * @kvm: handle to the VM + * @host_irq: physical IRQ number + * @guest_irq: virtual IRQ number + * @active: returns whether the physical IRQ is active + * + * This function must be called when the host_irq is disabled + * and guest has been exited and prevented from being re-entered. + * + */ +void kvm_arch_unset_forward(struct kvm *kvm, + unsigned int host_irq, + unsigned int guest_irq, + bool *active) +{ + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0); + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + struct vgic_dist *dist = kvm-arch.vgic; + int ret, lr; + struct vgic_lr vlr; + struct irq_desc *desc = irq_to_desc(host_irq); + struct irq_data *d; + unsigned long flags; + irq_hw_number_t gic_irq; + int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS; + struct irq_chip *chip; + bool queued, needs_deactivate = true; + + spin_lock(dist-lock); + + irq_get_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, active); + + if (!vcpu) + goto out; + + raw_spin_lock_irqsave(desc-lock, flags); + d = irq_desc_get_irq_data(desc); + gic_irq = irqd_to_hwirq(d); + raw_spin_unlock_irqrestore(desc-lock, flags); +
[RFC v4 06/13] VFIO: platform: add vfio_external_{mask|is_active|set_automasked}
Introduces 3 new external functions aimed at doining some actions on VFIO platform devices: - mask a VFIO IRQ - get the active status of a VFIO IRQ (active at interrupt controller level or masked by the level-sensitive automasking). - change the automasked property and the VFIO handler Note there is no way to discriminate between user-space masking and automasked handler masking. As a consequence, is_active will return true in case the IRQ was masked by the user-space. Signed-off-by: Eric Auger eric.au...@linaro.org --- V4: creation --- drivers/vfio/platform/vfio_platform_irq.c | 43 +++ include/linux/vfio.h | 14 ++ 2 files changed, 57 insertions(+) diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index 8eb65c1..49994cb 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -231,6 +231,49 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, return 0; } +void vfio_external_mask(struct vfio_platform_device *vdev, int index) +{ + vfio_platform_mask(vdev-irqs[index]); +} +EXPORT_SYMBOL_GPL(vfio_external_mask); + +bool vfio_external_is_active(struct vfio_platform_device *vdev, int index) +{ + unsigned long flags; + struct vfio_platform_irq *irq = vdev-irqs[index]; + bool active, masked, outstanding; + int ret; + + spin_lock_irqsave(irq-lock, flags); + + ret = irq_get_irqchip_state(irq-hwirq, IRQCHIP_STATE_ACTIVE, active); + BUG_ON(ret); + masked = irq-masked; + outstanding = active || masked; + + spin_unlock_irqrestore(irq-lock, flags); + return outstanding; +} +EXPORT_SYMBOL_GPL(vfio_external_is_active); + +void vfio_external_set_automasked(struct vfio_platform_device *vdev, + int index, bool automasked) +{ + unsigned long flags; + struct vfio_platform_irq *irq = vdev-irqs[index]; + + spin_lock_irqsave(irq-lock, flags); + if (automasked) { + irq-flags |= VFIO_IRQ_INFO_AUTOMASKED; + irq-handler = vfio_automasked_irq_handler; + } else { + irq-flags = ~VFIO_IRQ_INFO_AUTOMASKED; + irq-handler = vfio_irq_handler; + } + spin_unlock_irqrestore(irq-lock, flags); +} +EXPORT_SYMBOL_GPL(vfio_external_set_automasked); + static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev, unsigned index, unsigned start, unsigned count, uint32_t flags, diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 77c334b..e04ca93 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -103,6 +103,20 @@ extern struct vfio_device *vfio_device_get_external_user(struct file *filep); extern void vfio_device_put_external_user(struct vfio_device *vdev); extern struct device *vfio_external_base_device(struct vfio_device *vdev); +struct vfio_platform_device; +extern void vfio_external_mask(struct vfio_platform_device *vdev, int index); +/* + * returns whether the VFIO IRQ is active: + * true if not yet deactivated at interrupt controller level or if + * automasked (level sensitive IRQ). Unfortunately there is no way to + * discriminate between handler auto-masking and user-space masking + */ +extern bool vfio_external_is_active(struct vfio_platform_device *vdev, + int index); + +extern void vfio_external_set_automasked(struct vfio_platform_device *vdev, +int index, bool automasked); + struct pci_dev; #ifdef CONFIG_EEH extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 05/13] VFIO: external user API for interaction with vfio devices
The VFIO external user API is enriched with 3 new functions that allows a kernel user external to VFIO to retrieve some information from a VFIO device. - vfio_device_get_external_user enables to get a vfio device from its fd and increments its reference counter - vfio_device_put_external_user decrements the reference counter - vfio_external_base_device returns a handle to the struct device --- v3 - v4: - change the commit title v2 - v3: - reword the commit message v1 - v2: - vfio_external_get_base_device renamed into vfio_external_base_device - vfio_external_get_type removed Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/vfio.c | 24 include/linux/vfio.h | 3 +++ 2 files changed, 27 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 8e84471..282814e 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group) } EXPORT_SYMBOL_GPL(vfio_group_put_external_user); +struct vfio_device *vfio_device_get_external_user(struct file *filep) +{ + struct vfio_device *vdev = filep-private_data; + + if (filep-f_op != vfio_device_fops) + return ERR_PTR(-EINVAL); + + vfio_device_get(vdev); + return vdev; +} +EXPORT_SYMBOL_GPL(vfio_device_get_external_user); + +void vfio_device_put_external_user(struct vfio_device *vdev) +{ + vfio_device_put(vdev); +} +EXPORT_SYMBOL_GPL(vfio_device_put_external_user); + +struct device *vfio_external_base_device(struct vfio_device *vdev) +{ + return vdev-dev; +} +EXPORT_SYMBOL_GPL(vfio_external_base_device); + int vfio_external_user_iommu_id(struct vfio_group *group) { return iommu_group_id(group-iommu_group); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 5d45081..77c334b 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group); extern int vfio_external_user_iommu_id(struct vfio_group *group); extern long vfio_external_check_extension(struct vfio_group *group, unsigned long arg); +extern struct vfio_device *vfio_device_get_external_user(struct file *filep); +extern void vfio_device_put_external_user(struct vfio_device *vdev); +extern struct device *vfio_external_base_device(struct vfio_device *vdev); struct pci_dev; #ifdef CONFIG_EEH -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 12/13] KVM: kvm-vfio: generic forwarding control
This patch introduces a new KVM_DEV_VFIO_DEVICE group. This is a new control channel which enables KVM to cooperate with viable VFIO devices. The patch introduces 2 attributes for this group: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and respectively unset the feature. The generic part introduced here interact with VFIO, genirq, KVM while the architecture specific part mostly takes care of the virtual interrupt controller programming. Architecture specific implementation is enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those functions are void. Signed-off-by: Eric Auger eric.au...@linaro.org --- v3 - v4: - use new kvm_vfio_dev_irq struct - improve error handling according to Alex comments - full rework or generic/arch specific split to accomodate for unforward that never fails - kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device removed from that patch file and introduced before (since also used by Feng) - guard kvm_vfio_control_irq_forward call with __KVM_HAVE_ARCH_KVM_VFIO_FORWARD v2 - v3: - add API comments in kvm_host.h - improve the commit message - create a private kvm_vfio_fwd_irq struct - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This latter action will be handled in vgic. - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is to move platform specific stuff in architecture specific code. - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward - increment the ref counter each time we do an IRQ forwarding and decrement this latter each time one IRQ forward is unset. Simplifies the whole ref counting. - simplification of list handling: create, search, removal v1 - v2: - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD - original patch file separated into 2 parts: generic part moved in vfio.c and ARM specific part(kvm_arch_set_fwd_state) --- include/linux/kvm_host.h | 47 +++ virt/kvm/vfio.c | 311 ++- 2 files changed, 355 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 81c93de..f2bc192 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1048,6 +1048,15 @@ struct kvm_device_ops { unsigned long arg); }; +/* internal self-contained structure describing a forwarded IRQ */ +struct kvm_fwd_irq { + struct kvm *kvm; /* VM to inject the GSI into */ + struct vfio_device *vdev; /* vfio device the IRQ belongs to */ + __u32 index; /* VFIO device IRQ index */ + __u32 subindex; /* VFIO device IRQ subindex */ + __u32 gsi; /* gsi, ie. virtual IRQ number */ +}; + void kvm_device_get(struct kvm_device *dev); void kvm_device_put(struct kvm_device *dev); struct kvm_device *kvm_device_from_filp(struct file *filp); @@ -1069,6 +1078,44 @@ inline void kvm_arch_resume_guest(struct kvm *kvm) {} #endif +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD +/** + * kvm_arch_set_forward - Sets forwarding for a given IRQ + * + * @kvm: handle to the VM + * @host_irq: physical IRQ number + * @guest_irq: virtual IRQ number + * returns 0 on success, 0 on failure + */ +int kvm_arch_set_forward(struct kvm *kvm, +unsigned int host_irq, unsigned int guest_irq); + +/** + * kvm_arch_unset_forward - Unsets forwarding for a given IRQ + * + * @kvm: handle to the VM + * @host_irq: physical IRQ number + * @guest_irq: virtual IRQ number + * @active: returns whether the IRQ is active + */ +void kvm_arch_unset_forward(struct kvm *kvm, + unsigned int host_irq, + unsigned int guest_irq, + bool *active); + +#else +static inline int kvm_arch_set_forward(struct kvm *kvm, + unsigned int host_irq, + unsigned int guest_irq) +{ + return -ENOENT; +} +static inline void kvm_arch_unset_forward(struct kvm *kvm, + unsigned int host_irq, + unsigned int guest_irq, + bool *active) {} +#endif + #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index c995e51..4847597 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -19,14 +19,30 @@ #include linux/uaccess.h #include linux/vfio.h #include vfio.h +#include linux/platform_device.h +#include linux/irq.h +#include linux/spinlock.h +#include linux/interrupt.h +#include linux/delay.h + +#define DEBUG_FORWARD +#define DEBUG_UNFORWARD struct kvm_vfio_group { struct list_head node; struct vfio_group *vfio_group; }; +/* private linkable kvm_fwd_irq struct */ +struct kvm_vfio_fwd_irq_node {
[RFC v4 07/13] KVM: kvm-vfio: wrappers to VFIO external API device helpers
Provide wrapper functions that allow KVM-VFIO device code to interact with a vfio device: - kvm_vfio_device_get_external_user gets a handle to a struct vfio_device from the vfio device file descriptor and increments its reference counter, - kvm_vfio_device_put_external_user decrements the reference counter to a vfio device, - kvm_vfio_external_base_device returns a handle to the struct device of the vfio device. Also kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device helpers are introduced. Signed-off-by: Eric Auger eric.au...@linaro.org --- v3 - v4: - wrappers are no more exposed in kvm_host and become kvm/vfio.c static functions - added kvm_vfio_get_vfio_device/kvm_vfio_put_vfio_device in that patch file v2 - v3: - reword the commit message and title v1 - v2: - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device - kvm_vfio_external_get_type removed --- virt/kvm/vfio.c | 74 + 1 file changed, 74 insertions(+) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 620e37f..80a45e4 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -60,6 +60,80 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group) symbol_put(vfio_group_put_external_user); } +static struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep) +{ + struct vfio_device *vdev; + struct vfio_device *(*fn)(struct file *); + + fn = symbol_get(vfio_device_get_external_user); + if (!fn) + return ERR_PTR(-EINVAL); + + vdev = fn(filep); + + symbol_put(vfio_device_get_external_user); + + return vdev; +} + +static void kvm_vfio_device_put_external_user(struct vfio_device *vdev) +{ + void (*fn)(struct vfio_device *); + + fn = symbol_get(vfio_device_put_external_user); + if (!fn) + return; + + fn(vdev); + + symbol_put(vfio_device_put_external_user); +} + +static struct device *kvm_vfio_external_base_device(struct vfio_device *vdev) +{ + struct device *(*fn)(struct vfio_device *); + struct device *dev; + + fn = symbol_get(vfio_external_base_device); + if (!fn) + return NULL; + + dev = fn(vdev); + + symbol_put(vfio_external_base_device); + + return dev; +} + +/** + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device + * + * Checks it is a valid vfio device and increments its reference counter + * @fd: file descriptor of the vfio platform device + */ +static struct vfio_device *kvm_vfio_get_vfio_device(int fd) +{ + struct fd f = fdget(fd); + struct vfio_device *vdev; + + if (!f.file) + return ERR_PTR(-EINVAL); + vdev = kvm_vfio_device_get_external_user(f.file); + fdput(f); + return vdev; +} + +/** + * kvm_vfio_put_vfio_device: decrements the reference counter of the + * vfio platform * device + * + * @vdev: vfio_device handle to release + */ +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev) +{ + kvm_vfio_device_put_external_user(vdev); +} + static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group) { long (*fn)(struct vfio_group *, unsigned long); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 10/13] kvm: introduce kvm_arch_halt_guest and kvm_arch_resume_guest
This API allows to - exit the guest and avoid re-entering it - resume the guest execution Signed-off-by: Eric Auger eric.au...@linaro.org --- include/linux/kvm_host.h | 12 1 file changed, 12 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7f5858d..81c93de 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1057,6 +1057,18 @@ void kvm_unregister_device_ops(u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; +#ifdef __KVM_HAVE_ARCH_HALT_GUEST + +void kvm_arch_halt_guest(struct kvm *kvm); +void kvm_arch_resume_guest(struct kvm *kvm); + +#else + +inline void kvm_arch_halt_guest(struct kvm *kvm) {} +inline void kvm_arch_resume_guest(struct kvm *kvm) {} + +#endif + #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 09/13] KVM: arm: rename pause into power_off
The kvm_vcpu_arch pause field is renamed into power_off to prepare for the introduction of a new pause field. Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/include/asm/kvm_host.h | 4 ++-- arch/arm/kvm/arm.c | 10 +- arch/arm/kvm/psci.c | 10 +- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 9cbcc53..87f0921 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -128,8 +128,8 @@ struct kvm_vcpu_arch { * here. */ - /* Don't run the guest on this vcpu */ - bool pause; + /* vcpu power-off state*/ + bool power_off; /* IO related fields */ struct kvm_decode mmio_decode; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 61586a3..6f63ab7 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -459,7 +459,7 @@ static void vcpu_pause(struct kvm_vcpu *vcpu) { wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); - wait_event_interruptible(*wq, !vcpu-arch.pause); + wait_event_interruptible(*wq, !vcpu-arch.power_off); } static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) @@ -509,7 +509,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) update_vttbr(vcpu-kvm); - if (vcpu-arch.pause) + if (vcpu-arch.power_off) vcpu_pause(vcpu); kvm_vgic_flush_hwstate(vcpu); @@ -731,12 +731,12 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu, vcpu_reset_hcr(vcpu); /* -* Handle the start in power-off case by marking the VCPU as paused. +* Handle the start in power-off case. */ if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu-arch.features)) - vcpu-arch.pause = true; + vcpu-arch.power_off = true; else - vcpu-arch.pause = false; + vcpu-arch.power_off = false; return 0; } diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c index 58cb324..ec0bd13 100644 --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -60,7 +60,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu) static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu) { - vcpu-arch.pause = true; + vcpu-arch.power_off = true; } static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) @@ -92,7 +92,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) */ if (!vcpu) return PSCI_RET_INVALID_PARAMS; - if (!vcpu-arch.pause) { + if (!vcpu-arch.power_off) { if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1) return PSCI_RET_ALREADY_ON; else @@ -120,7 +120,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) * the general puspose registers are undefined upon CPU_ON. */ *vcpu_reg(vcpu, 0) = context_id; - vcpu-arch.pause = false; + vcpu-arch.power_off = false; smp_mb(); /* Make sure the above is visible */ wq = kvm_arch_vcpu_wq(vcpu); @@ -157,7 +157,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu) kvm_for_each_vcpu(i, tmp, kvm) { mpidr = kvm_vcpu_get_mpidr(tmp); if (((mpidr target_affinity_mask) == target_affinity) - !tmp-arch.pause) { + !tmp-arch.power_off) { return PSCI_0_2_AFFINITY_LEVEL_ON; } } @@ -180,7 +180,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type) * re-initialized. */ kvm_for_each_vcpu(i, tmp, vcpu-kvm) { - tmp-arch.pause = true; + tmp-arch.power_off = true; kvm_vcpu_kick(tmp); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 08/13] KVM: kvm-vfio: wrappers for vfio_external_{mask|is_active|set_automasked}
Those 3 new wrapper functions call the respective VFIO external functions. Signed-off-by: Eric Auger eric.au...@linaro.org --- v4: creation --- include/linux/vfio.h | 8 +++- virt/kvm/vfio.c | 44 2 files changed, 47 insertions(+), 5 deletions(-) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e04ca93..565f5f7 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -106,14 +106,12 @@ extern struct device *vfio_external_base_device(struct vfio_device *vdev); struct vfio_platform_device; extern void vfio_external_mask(struct vfio_platform_device *vdev, int index); /* - * returns whether the VFIO IRQ is active: - * true if not yet deactivated at interrupt controller level or if - * automasked (level sensitive IRQ). Unfortunately there is no way to - * discriminate between handler auto-masking and user-space masking + * returns whether the VFIO IRQ is active at interrupt controller level + * or VFIO-masked. Note that if the use-space masked the IRQ index it + * cannot be discriminated from automasked handler situation. */ extern bool vfio_external_is_active(struct vfio_platform_device *vdev, int index); - extern void vfio_external_set_automasked(struct vfio_platform_device *vdev, int index, bool automasked); diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 80a45e4..c995e51 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -134,6 +134,50 @@ static void kvm_vfio_put_vfio_device(struct vfio_device *vdev) kvm_vfio_device_put_external_user(vdev); } +bool kvm_vfio_external_is_active(struct vfio_platform_device *vpdev, +int index) +{ + bool (*fn)(struct vfio_platform_device *, int index); + bool active; + + fn = symbol_get(vfio_external_is_active); + if (!fn) + return -1; + + active = fn(vpdev, index); + + symbol_put(vfio_external_is_active); + return active; +} + +void kvm_vfio_external_mask(struct vfio_platform_device *vpdev, + int index) +{ + void (*fn)(struct vfio_platform_device *, int index); + + fn = symbol_get(vfio_external_mask); + if (!fn) + return; + + fn(vpdev, index); + + symbol_put(vfio_external_mask); +} + +void kvm_vfio_external_set_automasked(struct vfio_platform_device *vpdev, + int index, bool automasked) +{ + void (*fn)(struct vfio_platform_device *, int index, bool automasked); + + fn = symbol_get(vfio_external_set_automasked); + if (!fn) + return; + + fn(vpdev, index, automasked); + + symbol_put(vfio_external_set_automasked); +} + static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group) { long (*fn)(struct vfio_group *, unsigned long); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 01/13] KVM: arm/arm64: Enable the KVM-VFIO device
From: Kim Phillips kim.phill...@linaro.org The KVM-VFIO device is used by the QEMU VFIO device. It is used to record the list of in-use VFIO groups so that KVM can manipulate them. With this series, it will also be used to record the forwarded IRQs. Signed-off-by: Kim Phillips kim.phill...@linaro.org Signed-off-by: Eric Auger eric.au...@linaro.org --- v4 - v5: - reword the commit message to explain both usages of the KVM-VFIO device in QEMU - squash both arm and arm64 enables --- arch/arm/kvm/Kconfig| 1 + arch/arm/kvm/Makefile | 2 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/Makefile | 2 +- 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 7db7df4..0ddb745 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -25,6 +25,7 @@ config KVM select KVM_ARM_HOST select SRCU depends on ARM_VIRT_EXT ARM_LPAE + select KVM_VFIO select HAVE_KVM_EVENTFD ---help--- Support hosting virtualized guest machines. You will also diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index 859db09..ea1fa76 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) KVM := ../../../virt/kvm -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o obj-y += kvm-arm.o init.o interrupts.o obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 0965056..b73fba8 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -27,6 +27,7 @@ config KVM select KVM_ARM_VGIC select KVM_ARM_TIMER select SRCU + select KVM_VFIO select HAVE_KVM_EVENTFD ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 2e6b827..81ed091 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm obj-$(CONFIG_KVM_ARM_HOST) += kvm.o -kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v4 00/13] KVM-VFIO IRQ forward control
This series proposes an integration of ARM: Forwarding physical interrupts to a guest VM (http://lwn.net/Articles/603514/) in KVM. It enables to set/unset forwarding for a VFIO platform device IRQ. A forwarded IRQ is deactivated by the guest and not by the host. When the guest deactivates the associated virtual IRQ, the interrupt controller automatically completes the physical IRQ. Obviously this requires some HW support in the interrupt controller. This is the case for ARM GIC. The direct benefit is that, for a level sensitive IRQ, a VM exit can be avoided on forwarded IRQ completion. When the IRQ is forwarded, the VFIO platform driver does not need to mask the physical IRQ anymore before signaling the eventfd. Indeed genirq lowers the running priority, enabling other physical IRQ to hit except that one. Besides, the injection still is based on irqfd triggering. The only impact on irqfd process is resamplefd is not called anymore on virtual IRQ completion since deactivation is not trapped by KVM. The current integration is based on an extension of the KVM-VFIO device, previously used by KVM to interact with VFIO groups. The patch series now enables KVM to directly interact with a VFIO platform device. The VFIO external API was extended for that purpose. The IRQ forward programming is architecture specific (virtual interrupt controller programming basically). However the whole infrastructure is kept generic. from a user point of view, the functionality is provided through a new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated attributes: - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, - KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The capability can be checked with KVM_HAS_DEVICE_ATTR. Forwarding must be activated when the VFIO IRQ is not active at physical level or being under injection into the guest (VFIO masked) Forwarding can be unset at any time. --- This patch series has the following dependencies: - ARM: Forwarding physical interrupts to a guest VM (http://lwn.net/Articles/603514/) - [PATCH v13 00/18] VFIO support for platform devices, VOSYS (http://www.spinics.net/lists/kvm-arm/msg13414.html) - [PATCH v3 0/6] vfio: type1: support for ARM SMMUS with VFIO_IOMMU_TYPE1, VOSYS (http://www.spinics.net/lists/kvm-arm/msg11738.html) - [RFC v2] chip/vgic adaptations for forwarded irq Integrated pieces can be found at ssh://git.linaro.org/people/eric.auger/linux.git on branch irqfd_integ_v9 This was tested on Calxeda Midway, assigning the xgmac main IRQ. Unforward was tested doing periodic forward/unforward with random offsets, while using netcat traffic to make sure unforward often occurs while the IRQ is in progress. v3 - v4: - revert as RFC again due to lots of changes, extra complexity induced by new set/unset_forward implementation, and dependencies on RFC patches - kvm_vfio_dev_irq struct is used at user level to pass the parameters to KVM-VFIO KVM_DEV_VFIO_DEVICE/KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. Shared with Intel posted IRQs. - unforward now can happen any time with no constraint and cannot fail - new VFIO platform external functions introduced: vfio_externl_set_automasked, vfio_external_mask, vfio_external_is_active, - introduce a modality to force guest to exit prevent it from being re-entered and rename older ARM pause modality into power-off (related to PSCI power-off start) - kvm_vfio_arm.c no more exists. architecture specific code is moved into arm/gic.c. This code is not that much VFIO dependent anymore. Although some references still exit in comments. - 2 separate architecture specific functions for set and unset (only one has a return value). v2 - v3: - kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does not exist anymore) - a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it linkable: kvm_vfio_fwd_irq_node - kvm_fwd_irq now is self-contained (includes struct vfio_device *) - a single list of kvm_vfio_fwd_irq_irq_node is used instead of having a list of devices and a list of forward irq per device. Having 2 lists brought extra complexity. - the VFIO device ref counter is incremented each time a new IRQ is forwarded. It is not attempted anymore to hold a single reference whatever the number of forwarded IRQs. - subindex added on top of index to be closer to VFIO API - platform device check moved in the arm specific implementation - enable the KVM-VFIO device for arm64 - forwarded state change only can happen while the VFIO IRQ handler is not set; in other words, when the VFIO IRQ signaling is not set. v1 - v2: - forward control is moved from architecture specific file into generic vfio.c module. only kvm_arch_set_fwd_state remains architecture specific - integrate Kim's patch which enables KVM-VFIO for ARM - fix vgic state bypass in vgic_queue_hwirq - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h to include/uapi/linux/kvm.h also irq_index renamed into index and guest_irq
Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10/2015 06:56 PM, Oleg Nesterov wrote: On 02/10, Raghavendra K T wrote: On 02/10/2015 06:23 AM, Linus Torvalds wrote: add_smp(lock-tickets.head, TICKET_LOCK_INC); if (READ_ONCE(lock-tickets.tail) TICKET_SLOWPATH_FLAG) .. into something like val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC TICKET_SHIFT); if (unlikely(val TICKET_SLOWPATH_FLAG)) ... would be the right thing to do. Somebody should just check that I got that shift right, and that the tail is in the high bytes (head really needs to be high to work, if it's in the low byte(s) the xadd would overflow from head into tail which would be wrong). Unfortunately xadd could result in head overflow as tail is high. The other option was repeated cmpxchg which is bad I believe. Any suggestions? Stupid question... what if we simply move SLOWPATH from .tail to .head? In this case arch_spin_unlock() could do xadd(tickets.head) and check the result It is a good idea. Trying this now. In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg the whole .head_tail. Plus obviously more boring changes. This needs a separate patch even _if_ this can work. Correct, but apart from this, before doing xadd in unlock, we would have to make sure lsb bit is cleared so that we can live with 1 bit overflow to tail which is unused. now either or both of head,tail lsb bit may be set after unlock. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests] x86: cmpxchg8b: new 32-bit only testcase
This is similar to emulator.c, that does not run on 32-bit systems. This bug happens (due to kvm_mmu_page_fault's call to the emulator) during Windows 7 boot. Reported-by: Erik Rull erik.r...@rdsoftware.de Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- config/config-i386.mak | 4 +++- x86/cmpxchg8b.c| 34 ++ x86/run| 2 +- 3 files changed, 38 insertions(+), 2 deletions(-) create mode 100644 x86/cmpxchg8b.c diff --git a/config/config-i386.mak b/config/config-i386.mak index 503a3be..691381c 100644 --- a/config/config-i386.mak +++ b/config/config-i386.mak @@ -3,9 +3,11 @@ bits = 32 ldarch = elf32-i386 CFLAGS += -I $(KERNELDIR)/include -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat \ + $(TEST_DIR)/cmpxchg8b.flat include config/config-x86-common.mak +$(TEST_DIR)/cmpxchg8b.elf: $(cstart.o) $(TEST_DIR)/cmpxchg8b.o $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o diff --git a/x86/cmpxchg8b.c b/x86/cmpxchg8b.c new file mode 100644 index 000..ceb0cf8 --- /dev/null +++ b/x86/cmpxchg8b.c @@ -0,0 +1,34 @@ +#include ioram.h +#include vm.h +#include libcflat.h +#include desc.h +#include types.h +#include processor.h + +#define memset __builtin_memset +#define TESTDEV_IO_PORT 0xe0 + +static void test_cmpxchg8b(u32 *mem) +{ +mem[1] = 2; +mem[0] = 1; +asm(push %%ebx\n +mov %[ebx_val], %%ebx\n +lock cmpxchg8b (%0)\n +pop %%ebx : : D (mem), +d (2), a (1), c (4), [ebx_val] i (3) : memory); +report(cmpxchg8b, mem[0] == 3 mem[1] == 4); +} + +int main() +{ + void *mem; + + setup_vm(); + setup_idt(); + mem = alloc_vpages(1); + install_page((void *)read_cr3(), IORAM_BASE_PHYS, mem); + + test_cmpxchg8b(mem); + return report_summary(); +} diff --git a/x86/run b/x86/run index 646c577..af37eb4 100755 --- a/x86/run +++ b/x86/run @@ -33,7 +33,7 @@ else pc_testdev=-device testdev,chardev=testlog -chardev file,id=testlog,path=msr.out fi -command=${qemu} -enable-kvm $pc_testdev -display none -serial stdio $pci_testdev -kernel +command=${qemu} -enable-kvm $pc_testdev -vnc none -serial stdio $pci_testdev -kernel echo ${command} $@ ${command} $@ ret=$? -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #8 from Mark kernelbugzilla.org.mark...@dfgh.net --- Created attachment 166461 -- https://bugzilla.kernel.org/attachment.cgi?id=166461action=edit dmesg -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #9 from Mark kernelbugzilla.org.mark...@dfgh.net --- I'll try both of your suggestions, thanks -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nSVM: Booting L2 results in L1 hang and a skip_emulated_instruction
On 2015-02-11 19:12, Kashyap Chamarthy wrote: Hi, This was tested with kernel-3.19.0-1.fc22) and QEMU (qemu-2.2.0-5.fc22) on L0 L1. Description --- Inside L1, boot a nested KVM guest (L2) . Instead of a full blown guest, let's use `qemu-sanity-check` with KVM: $ qemu-sanity-check --accel=kvm Wwich gives you this CLI (run from a different shell), that confirms that the L2 guest is indeed running on KVM (and not TCG): $ ps -ef | grep -i qemu root 763 762 35 11:49 ttyS000:00:00 qemu-system-x86_64 -nographic -nodefconfig -nodefaults -machine accel=kvm -no-reboot -serial file:/tmp/tmp.rl3naPaCkZ.out -kernel /boot/vmlinuz-3.19.0-1.fc21.x86_64 -initrd /usr/lib64/qemu-sanity-check/initrd -append console=ttyS0 oops=panic panic=-1 Which results in: (a) L1 (guest hypervisor) completely hangs and is unresponsive. But when I query libvirt, (`virsh list`) the guest is still reported as 'running' (b) On L0, I notice a ton of these messages: skip_emulated_instruction: ip 0xffec next 0x8105e964 I can get `dmesg`, `dmidecode` , `x86info -a` on L0 and L1 if it helps in narrowing down the issue. Related bug and reproducer details -- https://bugzilla.redhat.com/show_bug.cgi?id=1191665 -- Nested KVM with AMD: L2 (nested guest) fails with divide error: [#1] SMP Is this a regression (of the kernel)? If so, can you bisect to the commit that introduced it? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: fix possible coalesced_mmio_ring page leaks.
It forgets to free coalesced_mmio_ring page after the anon_inode_getfd fails. Signed-off-by: Xiubo Li lixi...@cmss.chinamobile.com --- virt/kvm/kvm_main.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8579f18..85e8106 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2784,16 +2784,22 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) return PTR_ERR(kvm); #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET r = kvm_coalesced_mmio_init(kvm); - if (r 0) { - kvm_put_kvm(kvm); - return r; - } + if (r 0) + goto out_put_kvm; #endif r = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC); if (r 0) - kvm_put_kvm(kvm); + goto out_mmio_free; return r; + +out_mmio_free: +#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET + kvm_coalesced_mmio_free(kvm); +#endif +out_put_kvm: + kvm_put_kvm(kvm); + return r; } static long kvm_dev_ioctl(struct file *filp, -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html