Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Oleg Nesterov
On 02/10, Jeremy Fitzhardinge wrote:

 On 02/10/2015 05:26 AM, Oleg Nesterov wrote:
  On 02/10, Raghavendra K T wrote:
  Unfortunately xadd could result in head overflow as tail is high.
 
  The other option was repeated cmpxchg which is bad I believe.
  Any suggestions?
  Stupid question... what if we simply move SLOWPATH from .tail to .head?
  In this case arch_spin_unlock() could do xadd(tickets.head) and check
  the result

 Well, right now, tail is manipulated by locked instructions by CPUs
 who are contending for the ticketlock, but head can be manipulated
 unlocked by the CPU which currently owns the ticketlock. If SLOWPATH
 moved into head, then non-owner CPUs would be touching head, requiring
 everyone to use locked instructions on it.

 That's the theory, but I don't see much (any?) code which depends on that.

 Ideally we could find a way so that pv ticketlocks could use a plain
 unlocked add for the unlock like the non-pv case, but I just don't see a
 way to do it.

I agree, and I have to admit I am not sure I fully understand why unlock
uses the locked add. Except we need a barrier to avoid the race with the
enter_slowpath() users, of course. Perhaps this is the only reason?

Anyway, I suggested this to avoid the overflow if we use xadd(), and I
guess we need the locked insn anyway if we want to eliminate the unsafe
read-after-unlock...

  BTW. If we move clear slowpath into lock path, then probably trylock
  should be changed too? Something like below, we just need to clear SLOWPATH
  before cmpxchg.

 How important / widely used is trylock these days?

I am not saying this is that important. Just this looks more consistent imo
and we can do this for free.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Raghavendra K T

On 02/11/2015 11:08 PM, Oleg Nesterov wrote:

On 02/11, Raghavendra K T wrote:


On 02/10/2015 06:56 PM, Oleg Nesterov wrote:


In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
the whole .head_tail. Plus obviously more boring changes. This needs a
separate patch even _if_ this can work.


Correct, but apart from this, before doing xadd in unlock,
we would have to make sure lsb bit is cleared so that we can live with 1
bit overflow to tail which is unused. now either or both of head,tail
lsb bit may be set after unlock.


Sorry, can't understand... could you spell?

If TICKET_SLOWPATH_FLAG lives in .head arch_spin_unlock() could simply do

head = xadd(lock-tickets.head, TICKET_LOCK_INC);

if (head  TICKET_SLOWPATH_FLAG)
__ticket_unlock_kick(head);

so it can't overflow to .tail?



You are right.
I totally forgot we can get rid of tail operations :)



And we we do this, probably it makes sense to add something like

bool tickets_equal(__ticket_t one, __ticket_t two)
{
return (one ^ two)  ~TICKET_SLOWPATH_FLAG;
}



Very nice idea. I was tired of ~TICKET_SLOWPATH_FLAG usage all over in
the current (complex :)) implementation. These two suggestions helps
alot.


and change kvm_lock_spinning() to use tickets_equal(tickets.head, want), plus
it can have more users in asm/spinlock.h.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

2015-02-11 Thread Rik van Riel
If exception_enter happens when already in IN_KERNEL state, the
code still calls context_tracking_exit, which ends up in
rcu_eqs_exit_common, which explodes with a WARN_ON when it is
called in a situation where dynticks are not enabled.

This can be avoided by having exception_enter only switch to
IN_KERNEL state if the current state is not already IN_KERNEL.

Signed-off-by: Rik van Riel r...@redhat.com
Reported-by: Luiz Capitulino lcapitul...@redhat.com
---
Frederic, you will want this bonus patch, too :)

Thanks to Luiz for finding this one. Whatever I was running did not
trigger this issue...

 include/linux/context_tracking.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index b65fd1420e53..9da230406e8c 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void)
return 0;
 
prev_ctx = this_cpu_read(context_tracking.state);
-   context_tracking_exit(prev_ctx);
+   if (prev_ctx != IN_KERNEL)
+   context_tracking_exit(prev_ctx);
 
return prev_ctx;
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Oleg Nesterov
On 02/11, Raghavendra K T wrote:

 On 02/10/2015 06:56 PM, Oleg Nesterov wrote:

 In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
 the whole .head_tail. Plus obviously more boring changes. This needs a
 separate patch even _if_ this can work.

 Correct, but apart from this, before doing xadd in unlock,
 we would have to make sure lsb bit is cleared so that we can live with 1
 bit overflow to tail which is unused. now either or both of head,tail
 lsb bit may be set after unlock.

Sorry, can't understand... could you spell?

If TICKET_SLOWPATH_FLAG lives in .head arch_spin_unlock() could simply do

head = xadd(lock-tickets.head, TICKET_LOCK_INC);

if (head  TICKET_SLOWPATH_FLAG)
__ticket_unlock_kick(head);

so it can't overflow to .tail?

But probably I missed your concern.



And we we do this, probably it makes sense to add something like

bool tickets_equal(__ticket_t one, __ticket_t two)
{
return (one ^ two)  ~TICKET_SLOWPATH_FLAG;
}

and change kvm_lock_spinning() to use tickets_equal(tickets.head, want), plus
it can have more users in asm/spinlock.h.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nSVM: Booting L2 results in L1 hang and a skip_emulated_instruction

2015-02-11 Thread Kashyap Chamarthy
Hi, 

This was tested with kernel-3.19.0-1.fc22) and QEMU (qemu-2.2.0-5.fc22)
on L0  L1.


Description
---

Inside L1, boot a nested KVM guest (L2) . Instead of a full blown
guest, let's use `qemu-sanity-check` with KVM:

$ qemu-sanity-check --accel=kvm

Wwich gives you this CLI (run from a different shell), that confirms
that the L2 guest is indeed running on KVM (and not TCG):

  $ ps -ef | grep -i qemu
  root   763   762 35 11:49 ttyS000:00:00 qemu-system-x86_64 -nographic 
-nodefconfig -nodefaults -machine accel=kvm -no-reboot -serial 
file:/tmp/tmp.rl3naPaCkZ.out -kernel /boot/vmlinuz-3.19.0-1.fc21.x86_64 -initrd 
/usr/lib64/qemu-sanity-check/initrd -append console=ttyS0 oops=panic panic=-1


Which results in:

  (a) L1 (guest hypervisor) completely hangs and is unresponsive. But
  when I query libvirt, (`virsh list`) the guest is still reported
  as 'running'

  (b) On L0, I notice a ton of these messages:

skip_emulated_instruction: ip 0xffec next 0x8105e964


I can get `dmesg`, `dmidecode` , `x86info -a` on L0 and L1 if it helps
in narrowing down the issue.


Related bug and reproducer details
--


https://bugzilla.redhat.com/show_bug.cgi?id=1191665 --  Nested KVM with
AMD: L2 (nested guest) fails with divide error:  [#1] SMP




-- 
/kashyap
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

2015-02-11 Thread Paul E. McKenney
On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
 If exception_enter happens when already in IN_KERNEL state, the
 code still calls context_tracking_exit, which ends up in
 rcu_eqs_exit_common, which explodes with a WARN_ON when it is
 called in a situation where dynticks are not enabled.
 
 This can be avoided by having exception_enter only switch to
 IN_KERNEL state if the current state is not already IN_KERNEL.

Ugh...  Time to formally verify, sounds like...

Thanx, Paul

 Signed-off-by: Rik van Riel r...@redhat.com
 Reported-by: Luiz Capitulino lcapitul...@redhat.com
 ---
 Frederic, you will want this bonus patch, too :)
 
 Thanks to Luiz for finding this one. Whatever I was running did not
 trigger this issue...
 
  include/linux/context_tracking.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/include/linux/context_tracking.h 
 b/include/linux/context_tracking.h
 index b65fd1420e53..9da230406e8c 100644
 --- a/include/linux/context_tracking.h
 +++ b/include/linux/context_tracking.h
 @@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void)
   return 0;
 
   prev_ctx = this_cpu_read(context_tracking.state);
 - context_tracking_exit(prev_ctx);
 + if (prev_ctx != IN_KERNEL)
 + context_tracking_exit(prev_ctx);
 
   return prev_ctx;
  }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtual: Documentation: simplify and generalize paravirt_ops.txt

2015-02-11 Thread Rusty Russell
Luis R. Rodriguez mcg...@do-not-panic.com writes:
 From: Luis R. Rodriguez mcg...@suse.com

 The general documentation we have for pv_ops is currenty present
 on the IA64 docs, but since this documentation covers IA64 xen
 enablement and IA64 Xen support got ripped out a while ago
 through commit d52eefb47 present since v3.14-rc1 lets just
 simplify, generalize and move the pv_ops documentation to a
 shared place.

OK, I've applied this.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Jeremy Fitzhardinge

On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
 I agree, and I have to admit I am not sure I fully understand why
 unlock uses the locked add. Except we need a barrier to avoid the race
 with the enter_slowpath() users, of course. Perhaps this is the only
 reason?

Right now it needs to be a locked operation to prevent read-reordering.
x86 memory ordering rules state that all writes are seen in a globally
consistent order, and are globally ordered wrt reads *on the same
addresses*, but reads to different addresses can be reordered wrt to writes.

So, if the unlocking add were not a locked operation:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

Then the read of lock-tickets.tail can be reordered before the unlock,
which introduces a race:

/* read reordered here */
if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG)) /* false */
/* ... */;

/* other CPU sets SLOWPATH and blocks */

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

/* other CPU hung */

So it doesn't *have* to be a locked operation. This should also work:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

lfence();   /* prevent read 
reordering */
if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

but in practice a locked add is cheaper than an lfence (or at least was).

This *might* be OK, but I think it's on dubious ground:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

/* read overlaps write, and so is ordered */
if (unlikely(lock-head_tail  (TICKET_SLOWPATH_FLAG  TICKET_SHIFT))
__ticket_unlock_slowpath(lock, prev);

because I think Intel and AMD differed in interpretation about how
overlapping but different-sized reads  writes are ordered (or it simply
isn't architecturally defined).

If the slowpath flag is moved to head, then it would always have to be
locked anyway, because it needs to be atomic against other CPU's RMW
operations setting the flag.

J
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 2/4] KVM: arm: vgic: fix state machine for forwarded IRQ

2015-02-11 Thread Eric Auger
Fix multiple injection of level sensitive forwarded IRQs.
With current code, the second injection fails since the state bitmaps
are not reset (process_maintenance is not called anymore).

New implementation follows those principles:
- A forwarded IRQ only can be sampled when it is pending
- when queueing the IRQ (programming the LR), the pending state is removed
  as for edge sensitive IRQs
- an injection of a forwarded IRQ is considered always valid since
  coming from the HW and level always is 1.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v1 - v2:
- integration in new vgic_can_sample_irq
- remove the pending state when programming the LR
---
 virt/kvm/arm/vgic.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index cd00cf2..433ecba 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -361,7 +361,10 @@ static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int 
irq)
 
 static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
 {
-   return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
+   bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) = 0);
+
+   return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq) ||
+   (is_forwarded  vgic_dist_irq_is_pending(vcpu, irq));
 }
 
 static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
@@ -1296,6 +1299,7 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 
sgi_source_id, int irq)
struct vgic_dist *dist = vcpu-kvm-arch.vgic;
struct vgic_lr vlr;
int lr;
+   bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) = 0);
 
/* Sanitize the input... */
BUG_ON(sgi_source_id  ~7);
@@ -1331,7 +1335,7 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 
sgi_source_id, int irq)
vlr.irq = irq;
vlr.source = sgi_source_id;
vlr.state = LR_STATE_PENDING;
-   if (!vgic_irq_is_edge(vcpu, irq))
+   if (!vgic_irq_is_edge(vcpu, irq)  !is_forwarded)
vlr.state |= LR_EOI_INT;
 
vgic_set_lr(vcpu, lr, vlr);
@@ -1372,11 +1376,12 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int 
irq)
 
 static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
 {
+   bool is_forwarded = (vgic_get_phys_irq(vcpu, irq) = 0);
if (!vgic_can_sample_irq(vcpu, irq))
return true; /* level interrupt, already queued */
 
if (vgic_queue_irq(vcpu, 0, irq)) {
-   if (vgic_irq_is_edge(vcpu, irq)) {
+   if (vgic_irq_is_edge(vcpu, irq) || is_forwarded) {
vgic_dist_irq_clear_pending(vcpu, irq);
vgic_cpu_irq_clear(vcpu, irq);
} else {
@@ -1626,14 +1631,17 @@ static int vgic_update_irq_pending(struct kvm *kvm, int 
cpuid,
int edge_triggered, level_triggered;
int enabled;
bool ret = true;
+   bool is_forwarded;
 
spin_lock(dist-lock);
 
vcpu = kvm_get_vcpu(kvm, cpuid);
+   is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) = 0);
+
edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
level_triggered = !edge_triggered;
 
-   if (!vgic_validate_injection(vcpu, irq_num, level)) {
+   if (!vgic_validate_injection(vcpu, irq_num, level)  !is_forwarded) {
ret = false;
goto out;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 4/4] KVM: arm: vgic: cleanup forwarded IRQs on destroy

2015-02-11 Thread Eric Auger
When the VGIC is destroyed it must take care of
- restoring the forwarded IRQs in non forwarded state,
- deactivating the IRQ in case the guest left without doing it
- cleaning nodes of the phys_map rbtree

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v1 - v2:
- remove vgic_clean_irq_phys_map call in kvm_vgic_destroy
  (useless since already called in kvm_vgic_vcpu_destroy)
---
 virt/kvm/arm/vgic.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index dd72ca2..ace8e46 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -32,6 +32,7 @@
 #include asm/kvm_emulate.h
 #include asm/kvm_arm.h
 #include asm/kvm_mmu.h
+#include linux/spinlock.h
 
 /*
  * How the whole thing works (courtesy of Christoffer Dall):
@@ -103,6 +104,8 @@ static struct vgic_lr vgic_get_lr(const struct kvm_vcpu 
*vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 static void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
 static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
+static void vgic_clean_irq_phys_map(struct kvm_vcpu *vcpu,
+   struct rb_root *root);
 
 static const struct vgic_ops *vgic_ops;
 static const struct vgic_params *vgic;
@@ -1819,6 +1822,36 @@ static struct irq_phys_map *vgic_irq_map_search(struct 
kvm_vcpu *vcpu,
return NULL;
 }
 
+static void vgic_clean_irq_phys_map(struct kvm_vcpu *vcpu,
+   struct rb_root *root)
+{
+   unsigned long flags;
+
+   while (1) {
+   struct rb_node *node = rb_first(root);
+   struct irq_phys_map *map;
+   struct irq_desc *desc;
+   struct irq_data *d;
+   struct irq_chip *chip;
+
+   if (!node)
+   break;
+
+   map = container_of(node, struct irq_phys_map, node);
+   desc = irq_to_desc(map-phys_irq);
+
+   raw_spin_lock_irqsave(desc-lock, flags);
+   d = desc-irq_data;
+   chip = desc-irq_data.chip;
+   irqd_clr_irq_forwarded(d);
+   chip-irq_eoi(d);
+   raw_spin_unlock_irqrestore(desc-lock, flags);
+
+   rb_erase(node, root);
+   kfree(map);
+   }
+}
+
 int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
 {
struct irq_phys_map *map;
@@ -1861,6 +1894,7 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 
+   vgic_clean_irq_phys_map(vcpu, vgic_cpu-irq_phys_map);
kfree(vgic_cpu-pending_shared);
kfree(vgic_cpu-vgic_irq_lr_map);
vgic_cpu-pending_shared = NULL;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 3/4] KVM: arm: vgic: add forwarded irq rbtree lock

2015-02-11 Thread Eric Auger
Add a lock related to the rb tree manipulation. The rb tree can be
searched in one thread (irqfd handler for instance) and map/unmap
may happen in another.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v2 - v3:
re-arrange lock sequence in vgic_map_phys_irq
---
 include/kvm/arm_vgic.h |  1 +
 virt/kvm/arm/vgic.c| 56 --
 2 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 1a49108..ad7229b 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -220,6 +220,7 @@ struct vgic_dist {
unsigned long   *irq_pending_on_cpu;
 
struct rb_root  irq_phys_map;
+   spinlock_t  rb_tree_lock;
 #endif
 };
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 433ecba..dd72ca2 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1756,9 +1756,22 @@ static struct rb_root *vgic_get_irq_phys_map(struct 
kvm_vcpu *vcpu,
 
 int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-   struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
-   struct rb_node **new = root-rb_node, *parent = NULL;
+   struct rb_root *root;
+   struct rb_node **new, *parent = NULL;
struct irq_phys_map *new_map;
+   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
+
+   root = vgic_get_irq_phys_map(vcpu, virt_irq);
+   new = root-rb_node;
+
+   new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
+   if (!new_map)
+   return -ENOMEM;
+
+   new_map-virt_irq = virt_irq;
+   new_map-phys_irq = phys_irq;
+
+   spin_lock(dist-rb_tree_lock);
 
/* Boilerplate rb_tree code */
while (*new) {
@@ -1770,19 +1783,16 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int 
virt_irq, int phys_irq)
new = (*new)-rb_left;
else if (this-virt_irq  virt_irq)
new = (*new)-rb_right;
-   else
+   else {
+   kfree(new_map);
+   spin_unlock(dist-rb_tree_lock);
return -EEXIST;
+   }
}
 
-   new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
-   if (!new_map)
-   return -ENOMEM;
-
-   new_map-virt_irq = virt_irq;
-   new_map-phys_irq = phys_irq;
-
rb_link_node(new_map-node, parent, new);
rb_insert_color(new_map-node, root);
+   spin_unlock(dist-rb_tree_lock);
 
return 0;
 }
@@ -1811,24 +1821,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct 
kvm_vcpu *vcpu,
 
 int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
 {
-   struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+   struct irq_phys_map *map;
+   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
+   int ret;
+
+   spin_lock(dist-rb_tree_lock);
+   map = vgic_irq_map_search(vcpu, virt_irq);
 
if (map)
-   return map-phys_irq;
+   ret = map-phys_irq;
+   else
+   ret =  -ENOENT;
+
+   spin_unlock(dist-rb_tree_lock);
+   return ret;
 
-   return -ENOENT;
 }
 
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-   struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+   struct irq_phys_map *map;
+   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
+
+   spin_lock(dist-rb_tree_lock);
+
+   map = vgic_irq_map_search(vcpu, virt_irq);
 
if (map  map-phys_irq == phys_irq) {
rb_erase(map-node, vgic_get_irq_phys_map(vcpu, virt_irq));
kfree(map);
+   spin_unlock(dist-rb_tree_lock);
return 0;
}
-
+   spin_unlock(dist-rb_tree_lock);
return -ENOENT;
 }
 
@@ -2071,6 +2096,7 @@ int kvm_vgic_create(struct kvm *kvm)
ret = 0;
 
spin_lock_init(kvm-arch.vgic.lock);
+   spin_lock_init(kvm-arch.vgic.rb_tree_lock);
kvm-arch.vgic.in_kernel = true;
kvm-arch.vgic.vctrl_base = vgic-vctrl_base;
kvm-arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 1/4] chip.c: complete the forwarded IRQ in case the handler is not reached

2015-02-11 Thread Eric Auger
With current handle_fasteoi_irq implementation, in case irqd_irq_disabled
is true (disable_irq was called) or !irq_may_run, the IRQ is not completed.
Only the running priority is dropped. IN those cases, the IRQ will
never be forwarded and hence will never be deactivated by anyone else.

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 kernel/irq/chip.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 2f9571b..f12cce6 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -561,8 +561,12 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
raw_spin_unlock(desc-lock);
return;
 out:
-   if (!(chip-flags  IRQCHIP_EOI_IF_HANDLED))
-   eoi_irq(desc, chip);
+   if (!(chip-flags  IRQCHIP_EOI_IF_HANDLED)) {
+   if (chip-irq_priority_drop)
+   chip-irq_priority_drop(desc-irq_data);
+   if (chip-irq_eoi)
+   chip-irq_eoi(desc-irq_data);
+   }
raw_spin_unlock(desc-lock);
 }
 EXPORT_SYMBOL_GPL(handle_fasteoi_irq);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 0/4] chip/vgic adaptations for forwarded irq

2015-02-11 Thread Eric Auger
This series proposes some fixes that appeared to be necessary
to integrate IRQ forwarding in KVM/VFIO.

- deactivation of the forwarded IRQ in irq_disabled case
- a specific handling of forwarded IRQ into the VGIC state machine.
- deactivation of physical IRQ and unforwarding on vgic destruction
- rb_tree lock in vgic.c

Integrated pieces can be found at
ssh://git.linaro.org/people/eric.auger/linux.git
on branch irqfd_integ_v9

v1 - v2:
- change title of the series (formerly vgic additions for forwarded irq)
- [RFC 4/4] KVM: arm: vgic: handle irqfd forwarded IRQ injection
  before vgic readiness now handled in ARM irqfd series
- add chip.c patch file

Eric Auger (4):
  chip.c: complete the forwarded IRQ in case the handler is not reached
  KVM: arm: vgic: fix state machine for forwarded IRQ
  KVM: arm: vgic: add forwarded irq rbtree lock
  KVM: arm: vgic: cleanup forwarded IRQs on destroy

 include/kvm/arm_vgic.h |   1 +
 kernel/irq/chip.c  |   8 +++-
 virt/kvm/arm/vgic.c| 106 -
 3 files changed, 94 insertions(+), 21 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 04/13] KVM: kvm-vfio: User API for IRQ forwarding

2015-02-11 Thread Eric Auger
This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
to set a VFIO device IRQ as forwarded or not forwarded.
the command takes as argument a handle to a new struct named
kvm_vfio_dev_irq.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- rename kvm_arch_forwarded_irq into kvm_vfio_dev_irq
- some rewording in commit message
- document forwarding restrictions and remove unforwarding ones

v2 - v3:
- rework vfio kvm device documentation
- reword commit message and title
- add subindex in kvm_arch_forwarded_irq to be closer to VFIO API
- forwarding state can only be changed with VFIO IRQ signaling is off

v1 - v2:
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
---
 Documentation/virtual/kvm/devices/vfio.txt | 34 --
 include/uapi/linux/kvm.h   | 12 +++
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..6186e6d 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -4,15 +4,20 @@ VFIO virtual device
 Device types supported:
   KVM_DEV_TYPE_VFIO
 
-Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+Only one VFIO instance may be created per VM.
+
+The created device tracks VFIO groups in use by the VM and features
+of those groups important to the correctness and acceleration of
+the VM.  As groups are enabled and disabled for use by the VM, KVM
+should be updated about their presence.  When registered with KVM,
+a reference to the VFIO-group is held by KVM.
+
+The device also enables to control some IRQ settings of VFIO devices:
+forwarding/posting.
 
 Groups:
   KVM_DEV_VFIO_GROUP
+  KVM_DEV_VFIO_DEVICE
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
 
 For each, kvm_device_attr.addr points to an int32_t file descriptor
 for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
+  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
+
+For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
+
+When forwarded, a physical IRQ is completed by the guest and not by the
+host. This requires HW support in the interrupt controller.
+
+Forwarding can only be set when the corresponding VFIO IRQ is not masked
+(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence of this
+IRQ being currently handled) or active at interrupt controller level.
+In such a situation, -EAGAIN is returned. It is advised to to set the
+forwarding before the VFIO signaling is set up, this avoids trial and errors.
+
+Unforwarding can happen at any time.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a37fd12..d1a6496 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -938,6 +938,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP1
 #define   KVM_DEV_VFIO_GROUP_ADD   1
 #define   KVM_DEV_VFIO_GROUP_DEL   2
+#define  KVM_DEV_VFIO_DEVICE   2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ  1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ2
 
 enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20= 1,
@@ -955,6 +958,15 @@ enum kvm_device_type {
KVM_DEV_TYPE_MAX,
 };
 
+struct kvm_vfio_dev_irq {
+   __u32   argsz;  /* structure length */
+   __u32   fd; /* file descriptor of the VFIO device */
+   __u32   index;  /* VFIO device IRQ index */
+   __u32   start;  /* start of subindex range */
+   __u32   count;  /* size of subindex range */
+   __u32   gsi[];  /* gsi, ie. virtual IRQ number */
+};
+
 /*
  * ioctls for VM fds
  */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 11/13] kvm: arm: implement kvm_arch_halt_guest and kvm_arch_resume_guest

2015-02-11 Thread Eric Auger
This patch defines __KVM_HAVE_ARCH_HALT_GUEST and implements
kvm_arch_halt_guest and kvm_arch_resume_guest for ARM.

On halt, the guest is forced to exit and prevented from being
re-entered.

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 arch/arm/include/asm/kvm_host.h |  4 
 arch/arm/kvm/arm.c  | 32 +---
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 87f0921..a9f2c31 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -28,6 +28,7 @@
 #include kvm/arm_arch_timer.h
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
+#define __KVM_HAVE_ARCH_HALT_GUEST
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -131,6 +132,9 @@ struct kvm_vcpu_arch {
/* vcpu power-off state*/
bool power_off;
 
+   /* Don't run the guest */
+   bool pause;
+
/* IO related fields */
struct kvm_decode mmio_decode;
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6f63ab7..6c743a1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -455,11 +455,36 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
return vgic_initialized(kvm);
 }
 
+void kvm_arch_halt_guest(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   vcpu-arch.pause = true;
+   force_vm_exit(cpu_all_mask);
+}
+
+void kvm_arch_resume_guest(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+
+   vcpu-arch.pause = false;
+   wake_up_interruptible(wq);
+   }
+}
+
+
 static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
 
-   wait_event_interruptible(*wq, !vcpu-arch.power_off);
+   wait_event_interruptible(*wq, ((!vcpu-arch.power_off) 
+  (!vcpu-arch.pause)));
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -509,7 +534,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu-kvm);
 
-   if (vcpu-arch.power_off)
+   if (vcpu-arch.power_off || vcpu-arch.pause)
vcpu_pause(vcpu);
 
kvm_vgic_flush_hwstate(vcpu);
@@ -527,7 +552,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
run-exit_reason = KVM_EXIT_INTR;
}
 
-   if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
+   if (ret = 0 || need_new_vmid_gen(vcpu-kvm) ||
+   vcpu-arch.pause) {
kvm_timer_sync_hwstate(vcpu);
local_irq_enable();
kvm_timer_finish_sync(vcpu);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 03/13] VFIO: platform: single handler using function pointer

2015-02-11 Thread Eric Auger
A single handler now is registered whatever the use case: automasked
or not. A function pointer is set according to the wished behavior
and the handler calls this function.

The irq lock is taken/released in the root handler. eventfd_signal can
be called in regions not allowed to sleep.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4: creation
---
 drivers/vfio/platform/vfio_platform_irq.c | 21 +++--
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 132bb3f..8eb65c1 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -147,11 +147,8 @@ static int vfio_platform_set_irq_unmask(struct 
vfio_platform_device *vdev,
 static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
 {
struct vfio_platform_irq *irq_ctx = dev_id;
-   unsigned long flags;
int ret = IRQ_NONE;
 
-   spin_lock_irqsave(irq_ctx-lock, flags);
-
if (!irq_ctx-masked) {
ret = IRQ_HANDLED;
 
@@ -160,8 +157,6 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, 
void *dev_id)
irq_ctx-masked = true;
}
 
-   spin_unlock_irqrestore(irq_ctx-lock, flags);
-
if (ret == IRQ_HANDLED)
eventfd_signal(irq_ctx-trigger, 1);
 
@@ -177,6 +172,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static irqreturn_t vfio_handler(int irq, void *dev_id)
+{
+   struct vfio_platform_irq *irq_ctx = dev_id;
+   unsigned long flags;
+   irqreturn_t ret;
+
+   spin_lock_irqsave(irq_ctx-lock, flags);
+   ret = irq_ctx-handler(irq, dev_id);
+   spin_unlock_irqrestore(irq_ctx-lock, flags);
+
+   return ret;
+}
+
 static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
int fd, irq_handler_t handler)
 {
@@ -206,9 +214,10 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
}
 
irq-trigger = trigger;
+   irq-handler = handler;
 
irq_set_status_flags(irq-hwirq, IRQ_NOAUTOEN);
-   ret = request_irq(irq-hwirq, handler, 0, irq-name, irq);
+   ret = request_irq(irq-hwirq, vfio_handler, 0, irq-name, irq);
if (ret) {
kfree(irq-name);
eventfd_ctx_put(trigger);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 5d31e04..eb91deb 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -37,6 +37,7 @@ struct vfio_platform_irq {
spinlock_t  lock;
struct virqfd   *unmask;
struct virqfd   *mask;
+   irqreturn_t (*handler)(int irq, void *dev_id);
 };
 
 struct vfio_platform_region {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 02/13] VFIO: platform: test forwarded state when selecting IRQ handler

2015-02-11 Thread Eric Auger
In case the IRQ is forwarded, the VFIO platform IRQ handler does not
need to disable the IRQ anymore.

When setting the IRQ handler we now also test the forwarded state. In
case the IRQ is forwarded we select the vfio_irq_handler.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- change title

v2 - v3:
- forwarded state was tested in the handler. Now the forwarded state
  is tested before setting the handler. This definitively limits
  the dynamics of forwarded state changes but I don't think there is
  a use case where we need to be able to change the state at any time.

Conflicts:
drivers/vfio/platform/vfio_platform_irq.c
---
 drivers/vfio/platform/vfio_platform_irq.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 88bba57..132bb3f 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -229,8 +229,13 @@ static int vfio_platform_set_irq_trigger(struct 
vfio_platform_device *vdev,
 {
struct vfio_platform_irq *irq = vdev-irqs[index];
irq_handler_t handler;
+   struct irq_data *d;
+   bool is_forwarded;
 
-   if (vdev-irqs[index].flags  VFIO_IRQ_INFO_AUTOMASKED)
+   d = irq_get_irq_data(irq-hwirq);
+   is_forwarded = irqd_irq_forwarded(d);
+
+   if (vdev-irqs[index].flags  VFIO_IRQ_INFO_AUTOMASKED  !is_forwarded)
handler = vfio_automasked_irq_handler;
else
handler = vfio_irq_handler;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 13/13] KVM: arm: vgic: forwarding control

2015-02-11 Thread Eric Auger
This patch sets __KVM_HAVE_ARCH_KVM_VFIO_FORWARD and implements
kvm_arch_set_forward for ARM.

As a result the KVM-VFIO device now allows to forward/unforward a
VFIO device IRQ on ARM.

kvm_arch_set_forward and kvm_arch_unset_forward mostly take care of
VGIC programming: physical IRQ/guest IRQ mapping, list register cleanup,
VGIC state machine.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- code originally located in kvm_vfio_arm.c
- kvm_arch_vfio_{set|unset}_forward renamed into
  kvm_arch_{set|unset}_forward
- split into 2 functions (set/unset) since unset does not fail anymore
- unset can be invoked at whatever time. Extra care is taken to handle
  transition in VGIC state machine, LR cleanup, ...

v2 - v3:
- renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward
- takes a bool arg instead of kvm_fwd_irq_action enum
- removal of KVM_VFIO_IRQ_CLEANUP
- platform device check now happens here
- more precise errors returned
- irq_eoi handled externally to this patch (VGIC)
- correct enable_irq bug done twice
- reword the commit message
- correct check of platform_bus_type
- use raw_spin_lock_irqsave and check the validity of the handler
---
 arch/arm/include/asm/kvm_host.h |   1 +
 virt/kvm/arm/vgic.c | 190 
 2 files changed, 191 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index a9f2c31..6e8be2b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -29,6 +29,7 @@
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 #define __KVM_HAVE_ARCH_HALT_GUEST
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index ace8e46..81bb2f2 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2691,3 +2691,193 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 {
return 0;
 }
+
+/**
+ * kvm_arch_set_forward - Set forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ *
+ * This function is supposed to be called only if the IRQ
+ * is not in progress: ie. not active at VGIC level and not
+ * currently under injection in the KVM.
+ */
+int kvm_arch_set_forward(struct kvm *kvm, unsigned int host_irq,
+unsigned int guest_irq)
+{
+   irq_hw_number_t gic_irq;
+   struct irq_desc *desc = irq_to_desc(host_irq);
+   struct irq_data *d;
+   unsigned long flags;
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+   int ret = 0;
+   int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS;
+   struct vgic_dist *dist = kvm-arch.vgic;
+
+   if (!vcpu)
+   return 0;
+
+   spin_lock(dist-lock);
+
+   raw_spin_lock_irqsave(desc-lock, flags);
+   d = desc-irq_data;
+   gic_irq = irqd_to_hwirq(d);
+   irqd_set_irq_forwarded(d);
+   /*
+* next physical IRQ will be be handled as forwarded
+* by the host (priority drop only)
+*/
+
+   raw_spin_unlock_irqrestore(desc-lock, flags);
+
+   /*
+* need to release the dist spin_lock here since
+* vgic_map_phys_irq can sleep
+*/
+   spin_unlock(dist-lock);
+   ret = vgic_map_phys_irq(vcpu, spi_id, (int)gic_irq);
+   /*
+* next guest_irq injection will be considered as
+* forwarded and next flush will program LR
+* without maintenance IRQ but with HW bit set
+*/
+   return ret;
+}
+
+/**
+ * kvm_arch_unset_forward - Unset forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * @active: returns whether the physical IRQ is active
+ *
+ * This function must be called when the host_irq is disabled
+ * and guest has been exited and prevented from being re-entered.
+ *
+ */
+void kvm_arch_unset_forward(struct kvm *kvm,
+   unsigned int host_irq,
+   unsigned int guest_irq,
+   bool *active)
+{
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
+   struct vgic_dist *dist = kvm-arch.vgic;
+   int ret, lr;
+   struct vgic_lr vlr;
+   struct irq_desc *desc = irq_to_desc(host_irq);
+   struct irq_data *d;
+   unsigned long flags;
+   irq_hw_number_t gic_irq;
+   int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS;
+   struct irq_chip *chip;
+   bool queued, needs_deactivate = true;
+
+   spin_lock(dist-lock);
+
+   irq_get_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, active);
+
+   if (!vcpu)
+   goto out;
+
+   raw_spin_lock_irqsave(desc-lock, flags);
+   d = irq_desc_get_irq_data(desc);
+   gic_irq = irqd_to_hwirq(d);
+   raw_spin_unlock_irqrestore(desc-lock, flags);
+

[RFC v4 06/13] VFIO: platform: add vfio_external_{mask|is_active|set_automasked}

2015-02-11 Thread Eric Auger
Introduces 3 new external functions aimed at doining some actions
on VFIO platform devices:
- mask a VFIO IRQ
- get the active status of a VFIO IRQ (active at interrupt
  controller level or masked by the level-sensitive automasking).
- change the automasked property and the VFIO handler

Note there is no way to discriminate between user-space
masking and automasked handler masking. As a consequence, is_active
will return true in case the IRQ was masked by the user-space.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

V4: creation
---
 drivers/vfio/platform/vfio_platform_irq.c | 43 +++
 include/linux/vfio.h  | 14 ++
 2 files changed, 57 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 8eb65c1..49994cb 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -231,6 +231,49 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
return 0;
 }
 
+void vfio_external_mask(struct vfio_platform_device *vdev, int index)
+{
+   vfio_platform_mask(vdev-irqs[index]);
+}
+EXPORT_SYMBOL_GPL(vfio_external_mask);
+
+bool vfio_external_is_active(struct vfio_platform_device *vdev, int index)
+{
+   unsigned long flags;
+   struct vfio_platform_irq *irq = vdev-irqs[index];
+   bool active, masked, outstanding;
+   int ret;
+
+   spin_lock_irqsave(irq-lock, flags);
+
+   ret = irq_get_irqchip_state(irq-hwirq, IRQCHIP_STATE_ACTIVE, active);
+   BUG_ON(ret);
+   masked = irq-masked;
+   outstanding = active || masked;
+
+   spin_unlock_irqrestore(irq-lock, flags);
+   return outstanding;
+}
+EXPORT_SYMBOL_GPL(vfio_external_is_active);
+
+void vfio_external_set_automasked(struct vfio_platform_device *vdev,
+ int index, bool automasked)
+{
+   unsigned long flags;
+   struct vfio_platform_irq *irq = vdev-irqs[index];
+
+   spin_lock_irqsave(irq-lock, flags);
+   if (automasked) {
+   irq-flags |= VFIO_IRQ_INFO_AUTOMASKED;
+   irq-handler = vfio_automasked_irq_handler;
+   } else {
+   irq-flags = ~VFIO_IRQ_INFO_AUTOMASKED;
+   irq-handler = vfio_irq_handler;
+   }
+   spin_unlock_irqrestore(irq-lock, flags);
+}
+EXPORT_SYMBOL_GPL(vfio_external_set_automasked);
+
 static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
 unsigned index, unsigned start,
 unsigned count, uint32_t flags,
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 77c334b..e04ca93 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -103,6 +103,20 @@ extern struct vfio_device 
*vfio_device_get_external_user(struct file *filep);
 extern void vfio_device_put_external_user(struct vfio_device *vdev);
 extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
+struct vfio_platform_device;
+extern void vfio_external_mask(struct vfio_platform_device *vdev, int index);
+/*
+ * returns whether the VFIO IRQ is active:
+ * true if not yet deactivated at interrupt controller level or if
+ * automasked (level sensitive IRQ). Unfortunately there is no way to
+ * discriminate between handler auto-masking and user-space masking
+ */
+extern bool vfio_external_is_active(struct vfio_platform_device *vdev,
+   int index);
+
+extern void vfio_external_set_automasked(struct vfio_platform_device *vdev,
+int index, bool automasked);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 05/13] VFIO: external user API for interaction with vfio devices

2015-02-11 Thread Eric Auger
The VFIO external user API is enriched with 3 new functions that
allows a kernel user external to VFIO to retrieve some information
from a VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
  its fd and increments its reference counter
- vfio_device_put_external_user decrements the reference counter
- vfio_external_base_device returns a handle to the struct device

---

v3 - v4:
- change the commit title

v2 - v3:
- reword the commit message

v1 - v2:

- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 drivers/vfio/vfio.c  | 24 
 include/linux/vfio.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 8e84471..282814e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group 
*group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 
+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+   struct vfio_device *vdev = filep-private_data;
+
+   if (filep-f_op != vfio_device_fops)
+   return ERR_PTR(-EINVAL);
+
+   vfio_device_get(vdev);
+   return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+   vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+   return vdev-dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
 int vfio_external_user_iommu_id(struct vfio_group *group)
 {
return iommu_group_id(group-iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 5d45081..77c334b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group 
*group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
  unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
 struct pci_dev;
 #ifdef CONFIG_EEH
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 12/13] KVM: kvm-vfio: generic forwarding control

2015-02-11 Thread Eric Auger
This patch introduces a new KVM_DEV_VFIO_DEVICE group.

This is a new control channel which enables KVM to cooperate with
viable VFIO devices.

The patch introduces 2 attributes for this group:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
 respectively unset the feature.

The generic part introduced here interact with VFIO, genirq, KVM while
the architecture specific part mostly takes care of the virtual interrupt
controller programming.

Architecture specific implementation is enabled when
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
functions are void.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- use new kvm_vfio_dev_irq struct
- improve error handling according to Alex comments
- full rework or generic/arch specific split to accomodate for
  unforward that never fails
- kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device removed from
  that patch file and introduced before (since also used by Feng)
- guard kvm_vfio_control_irq_forward call with
  __KVM_HAVE_ARCH_KVM_VFIO_FORWARD

v2 - v3:
- add API comments in kvm_host.h
- improve the commit message
- create a private kvm_vfio_fwd_irq struct
- fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
  latter action will be handled in vgic.
- add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
  to move platform specific stuff in architecture specific code.
- kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
- increment the ref counter each time we do an IRQ forwarding and decrement
  this latter each time one IRQ forward is unset. Simplifies the whole
  ref counting.
- simplification of list handling: create, search, removal

v1 - v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
  and ARM specific part(kvm_arch_set_fwd_state)
---
 include/linux/kvm_host.h |  47 +++
 virt/kvm/vfio.c  | 311 ++-
 2 files changed, 355 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 81c93de..f2bc192 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1048,6 +1048,15 @@ struct kvm_device_ops {
  unsigned long arg);
 };
 
+/* internal self-contained structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+   struct kvm *kvm; /* VM to inject the GSI into */
+   struct vfio_device *vdev; /* vfio device the IRQ belongs to */
+   __u32 index; /* VFIO device IRQ index */
+   __u32 subindex; /* VFIO device IRQ subindex */
+   __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 void kvm_device_get(struct kvm_device *dev);
 void kvm_device_put(struct kvm_device *dev);
 struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1069,6 +1078,44 @@ inline void kvm_arch_resume_guest(struct kvm *kvm) {}
 
 #endif
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+/**
+ * kvm_arch_set_forward - Sets forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * returns 0 on success,  0 on failure
+ */
+int kvm_arch_set_forward(struct kvm *kvm,
+unsigned int host_irq, unsigned int guest_irq);
+
+/**
+ * kvm_arch_unset_forward - Unsets forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * @active: returns whether the IRQ is active
+ */
+void kvm_arch_unset_forward(struct kvm *kvm,
+   unsigned int host_irq,
+   unsigned int guest_irq,
+   bool *active);
+
+#else
+static inline int kvm_arch_set_forward(struct kvm *kvm,
+  unsigned int host_irq,
+  unsigned int guest_irq)
+{
+   return -ENOENT;
+}
+static inline void kvm_arch_unset_forward(struct kvm *kvm,
+ unsigned int host_irq,
+ unsigned int guest_irq,
+ bool *active) {}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index c995e51..4847597 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -19,14 +19,30 @@
 #include linux/uaccess.h
 #include linux/vfio.h
 #include vfio.h
+#include linux/platform_device.h
+#include linux/irq.h
+#include linux/spinlock.h
+#include linux/interrupt.h
+#include linux/delay.h
+
+#define DEBUG_FORWARD
+#define DEBUG_UNFORWARD
 
 struct kvm_vfio_group {
struct list_head node;
struct vfio_group *vfio_group;
 };
 
+/* private linkable kvm_fwd_irq struct */
+struct kvm_vfio_fwd_irq_node {

[RFC v4 07/13] KVM: kvm-vfio: wrappers to VFIO external API device helpers

2015-02-11 Thread Eric Auger
Provide wrapper functions that allow KVM-VFIO device code to
interact with a vfio device:
- kvm_vfio_device_get_external_user gets a handle to a struct
  vfio_device from the vfio device file descriptor and increments
  its reference counter,
- kvm_vfio_device_put_external_user decrements the reference counter
  to a vfio device,
- kvm_vfio_external_base_device returns a handle to the struct device
  of the vfio device.

Also kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device helpers
are introduced.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- wrappers are no more exposed in kvm_host and become kvm/vfio.c static
  functions
- added kvm_vfio_get_vfio_device/kvm_vfio_put_vfio_device in that
  patch file

v2 - v3:
- reword the commit message and title

v1 - v2:
- kvm_vfio_external_get_base_device renamed into
  kvm_vfio_external_base_device
- kvm_vfio_external_get_type removed
---
 virt/kvm/vfio.c | 74 +
 1 file changed, 74 insertions(+)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 620e37f..80a45e4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -60,6 +60,80 @@ static void kvm_vfio_group_put_external_user(struct 
vfio_group *vfio_group)
symbol_put(vfio_group_put_external_user);
 }
 
+static struct vfio_device *kvm_vfio_device_get_external_user(struct file 
*filep)
+{
+   struct vfio_device *vdev;
+   struct vfio_device *(*fn)(struct file *);
+
+   fn = symbol_get(vfio_device_get_external_user);
+   if (!fn)
+   return ERR_PTR(-EINVAL);
+
+   vdev = fn(filep);
+
+   symbol_put(vfio_device_get_external_user);
+
+   return vdev;
+}
+
+static void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+   void (*fn)(struct vfio_device *);
+
+   fn = symbol_get(vfio_device_put_external_user);
+   if (!fn)
+   return;
+
+   fn(vdev);
+
+   symbol_put(vfio_device_put_external_user);
+}
+
+static struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+   struct device *(*fn)(struct vfio_device *);
+   struct device *dev;
+
+   fn = symbol_get(vfio_external_base_device);
+   if (!fn)
+   return NULL;
+
+   dev = fn(vdev);
+
+   symbol_put(vfio_external_base_device);
+
+   return dev;
+}
+
+/**
+ * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
+ *
+ * Checks it is a valid vfio device and increments its reference counter
+ * @fd: file descriptor of the vfio platform device
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+   struct fd f = fdget(fd);
+   struct vfio_device *vdev;
+
+   if (!f.file)
+   return ERR_PTR(-EINVAL);
+   vdev = kvm_vfio_device_get_external_user(f.file);
+   fdput(f);
+   return vdev;
+}
+
+/**
+ * kvm_vfio_put_vfio_device: decrements the reference counter of the
+ * vfio platform * device
+ *
+ * @vdev: vfio_device handle to release
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+   kvm_vfio_device_put_external_user(vdev);
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
long (*fn)(struct vfio_group *, unsigned long);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 10/13] kvm: introduce kvm_arch_halt_guest and kvm_arch_resume_guest

2015-02-11 Thread Eric Auger
This API allows to
- exit the guest and avoid re-entering it
- resume the guest execution

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 include/linux/kvm_host.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7f5858d..81c93de 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1057,6 +1057,18 @@ void kvm_unregister_device_ops(u32 type);
 extern struct kvm_device_ops kvm_mpic_ops;
 extern struct kvm_device_ops kvm_xics_ops;
 
+#ifdef __KVM_HAVE_ARCH_HALT_GUEST
+
+void kvm_arch_halt_guest(struct kvm *kvm);
+void kvm_arch_resume_guest(struct kvm *kvm);
+
+#else
+
+inline void kvm_arch_halt_guest(struct kvm *kvm) {}
+inline void kvm_arch_resume_guest(struct kvm *kvm) {}
+
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 09/13] KVM: arm: rename pause into power_off

2015-02-11 Thread Eric Auger
The kvm_vcpu_arch pause field is renamed into power_off to prepare
for the introduction of a new pause field.

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 arch/arm/include/asm/kvm_host.h |  4 ++--
 arch/arm/kvm/arm.c  | 10 +-
 arch/arm/kvm/psci.c | 10 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 9cbcc53..87f0921 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -128,8 +128,8 @@ struct kvm_vcpu_arch {
 * here.
 */
 
-   /* Don't run the guest on this vcpu */
-   bool pause;
+   /* vcpu power-off state*/
+   bool power_off;
 
/* IO related fields */
struct kvm_decode mmio_decode;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 61586a3..6f63ab7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -459,7 +459,7 @@ static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
 
-   wait_event_interruptible(*wq, !vcpu-arch.pause);
+   wait_event_interruptible(*wq, !vcpu-arch.power_off);
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -509,7 +509,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu-kvm);
 
-   if (vcpu-arch.pause)
+   if (vcpu-arch.power_off)
vcpu_pause(vcpu);
 
kvm_vgic_flush_hwstate(vcpu);
@@ -731,12 +731,12 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu 
*vcpu,
vcpu_reset_hcr(vcpu);
 
/*
-* Handle the start in power-off case by marking the VCPU as paused.
+* Handle the start in power-off case.
 */
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu-arch.features))
-   vcpu-arch.pause = true;
+   vcpu-arch.power_off = true;
else
-   vcpu-arch.pause = false;
+   vcpu-arch.power_off = false;
 
return 0;
 }
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 58cb324..ec0bd13 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -60,7 +60,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu 
*vcpu)
 
 static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 {
-   vcpu-arch.pause = true;
+   vcpu-arch.power_off = true;
 }
 
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
@@ -92,7 +92,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
 */
if (!vcpu)
return PSCI_RET_INVALID_PARAMS;
-   if (!vcpu-arch.pause) {
+   if (!vcpu-arch.power_off) {
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
return PSCI_RET_ALREADY_ON;
else
@@ -120,7 +120,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
 * the general puspose registers are undefined upon CPU_ON.
 */
*vcpu_reg(vcpu, 0) = context_id;
-   vcpu-arch.pause = false;
+   vcpu-arch.power_off = false;
smp_mb();   /* Make sure the above is visible */
 
wq = kvm_arch_vcpu_wq(vcpu);
@@ -157,7 +157,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct 
kvm_vcpu *vcpu)
kvm_for_each_vcpu(i, tmp, kvm) {
mpidr = kvm_vcpu_get_mpidr(tmp);
if (((mpidr  target_affinity_mask) == target_affinity) 
-   !tmp-arch.pause) {
+   !tmp-arch.power_off) {
return PSCI_0_2_AFFINITY_LEVEL_ON;
}
}
@@ -180,7 +180,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, 
u32 type)
 * re-initialized.
 */
kvm_for_each_vcpu(i, tmp, vcpu-kvm) {
-   tmp-arch.pause = true;
+   tmp-arch.power_off = true;
kvm_vcpu_kick(tmp);
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 08/13] KVM: kvm-vfio: wrappers for vfio_external_{mask|is_active|set_automasked}

2015-02-11 Thread Eric Auger
Those 3 new wrapper functions call the respective VFIO external
functions.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4: creation
---
 include/linux/vfio.h |  8 +++-
 virt/kvm/vfio.c  | 44 
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e04ca93..565f5f7 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -106,14 +106,12 @@ extern struct device *vfio_external_base_device(struct 
vfio_device *vdev);
 struct vfio_platform_device;
 extern void vfio_external_mask(struct vfio_platform_device *vdev, int index);
 /*
- * returns whether the VFIO IRQ is active:
- * true if not yet deactivated at interrupt controller level or if
- * automasked (level sensitive IRQ). Unfortunately there is no way to
- * discriminate between handler auto-masking and user-space masking
+ * returns whether the VFIO IRQ is active at interrupt controller level
+ * or VFIO-masked. Note that if the use-space masked the IRQ index it
+ * cannot be discriminated from automasked handler situation.
  */
 extern bool vfio_external_is_active(struct vfio_platform_device *vdev,
int index);
-
 extern void vfio_external_set_automasked(struct vfio_platform_device *vdev,
 int index, bool automasked);
 
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 80a45e4..c995e51 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -134,6 +134,50 @@ static void kvm_vfio_put_vfio_device(struct vfio_device 
*vdev)
kvm_vfio_device_put_external_user(vdev);
 }
 
+bool kvm_vfio_external_is_active(struct vfio_platform_device *vpdev,
+int index)
+{
+   bool (*fn)(struct vfio_platform_device *, int index);
+   bool active;
+
+   fn = symbol_get(vfio_external_is_active);
+   if (!fn)
+   return -1;
+
+   active = fn(vpdev, index);
+
+   symbol_put(vfio_external_is_active);
+   return active;
+}
+
+void kvm_vfio_external_mask(struct vfio_platform_device *vpdev,
+   int index)
+{
+   void (*fn)(struct vfio_platform_device *, int index);
+
+   fn = symbol_get(vfio_external_mask);
+   if (!fn)
+   return;
+
+   fn(vpdev, index);
+
+   symbol_put(vfio_external_mask);
+}
+
+void kvm_vfio_external_set_automasked(struct vfio_platform_device *vpdev,
+ int index, bool automasked)
+{
+   void (*fn)(struct vfio_platform_device *, int index, bool automasked);
+
+   fn = symbol_get(vfio_external_set_automasked);
+   if (!fn)
+   return;
+
+   fn(vpdev, index, automasked);
+
+   symbol_put(vfio_external_set_automasked);
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
long (*fn)(struct vfio_group *, unsigned long);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 01/13] KVM: arm/arm64: Enable the KVM-VFIO device

2015-02-11 Thread Eric Auger
From: Kim Phillips kim.phill...@linaro.org

The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them. With this series, it will also be used to record the forwarded
IRQs.

Signed-off-by: Kim Phillips kim.phill...@linaro.org
Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4 - v5:
- reword the commit message to explain both usages of the KVM-VFIO
  device in QEMU
- squash both arm and arm64 enables
---
 arch/arm/kvm/Kconfig| 1 +
 arch/arm/kvm/Makefile   | 2 +-
 arch/arm64/kvm/Kconfig  | 1 +
 arch/arm64/kvm/Makefile | 2 +-
 4 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 7db7df4..0ddb745 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -25,6 +25,7 @@ config KVM
select KVM_ARM_HOST
select SRCU
depends on ARM_VIRT_EXT  ARM_LPAE
+   select KVM_VFIO
select HAVE_KVM_EVENTFD
---help---
  Support hosting virtualized guest machines. You will also
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 859db09..ea1fa76 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
 AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
 
 KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o 
$(KVM)/vfio.o
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 0965056..b73fba8 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -27,6 +27,7 @@ config KVM
select KVM_ARM_VGIC
select KVM_ARM_TIMER
select SRCU
+   select KVM_VFIO
select HAVE_KVM_EVENTFD
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 2e6b827..81ed091 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
 
 obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
 
-kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 00/13] KVM-VFIO IRQ forward control

2015-02-11 Thread Eric Auger
This series proposes an integration of ARM: Forwarding physical
interrupts to a guest VM (http://lwn.net/Articles/603514/) in
KVM.

It enables to set/unset forwarding for a VFIO platform device IRQ.

A forwarded IRQ is deactivated by the guest and not by the host.
When the guest deactivates the associated virtual IRQ, the interrupt
controller automatically completes the physical IRQ. Obviously
this requires some HW support in the interrupt controller. This is
the case for ARM GIC.

The direct benefit is that, for a level sensitive IRQ, a VM exit
can be avoided on forwarded IRQ completion.

When the IRQ is forwarded, the VFIO platform driver does not need to
mask the physical IRQ anymore before signaling the eventfd. Indeed
genirq lowers the running priority, enabling other physical IRQ to hit
except that one.

Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since deactivation is not trapped by KVM.

The current integration is based on an extension of the KVM-VFIO
device, previously used by KVM to interact with VFIO groups. The
patch series now enables KVM to directly interact with a VFIO
platform device. The VFIO external API was extended for that purpose.

The IRQ forward programming is architecture specific (virtual interrupt
controller programming basically). However the whole infrastructure is
kept generic.

from a user point of view, the functionality is provided through a
new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated
attributes:
- KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
- KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.

The capability can be checked with KVM_HAS_DEVICE_ATTR.

Forwarding must be activated when the VFIO IRQ is not active at
physical level or being under injection into the guest (VFIO masked)
Forwarding can be unset at any time.

---

This patch series has the following dependencies:
- ARM: Forwarding physical interrupts to a guest VM
  (http://lwn.net/Articles/603514/)
- [PATCH v13 00/18] VFIO support for platform devices, VOSYS
  (http://www.spinics.net/lists/kvm-arm/msg13414.html)
- [PATCH v3 0/6] vfio: type1: support for ARM SMMUS with VFIO_IOMMU_TYPE1,
  VOSYS (http://www.spinics.net/lists/kvm-arm/msg11738.html)
- [RFC v2] chip/vgic adaptations for forwarded irq

Integrated pieces can be found at
ssh://git.linaro.org/people/eric.auger/linux.git
on branch irqfd_integ_v9

This was tested on Calxeda Midway, assigning the xgmac main IRQ.
Unforward was tested doing periodic forward/unforward with random offsets,
while using netcat traffic to make sure unforward often occurs while the
IRQ is in progress.

v3 - v4:
- revert as RFC again due to lots of changes, extra complexity induced
  by new set/unset_forward implementation, and dependencies on RFC patches
- kvm_vfio_dev_irq struct is used at user level to pass the parameters
  to KVM-VFIO KVM_DEV_VFIO_DEVICE/KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. Shared
  with Intel posted IRQs.
- unforward now can happen any time with no constraint and cannot fail
- new VFIO platform external functions introduced:
  vfio_externl_set_automasked, vfio_external_mask, vfio_external_is_active,
- introduce a modality to force guest to exit  prevent it from being
  re-entered and rename older ARM pause modality into power-off
  (related to PSCI power-off start)
- kvm_vfio_arm.c no more exists. architecture specific code is moved into
  arm/gic.c. This code is not that much VFIO dependent anymore. Although
  some references still exit in comments.
- 2 separate architecture specific functions for set and unset (only one
  has a return value).

v2 - v3:
- kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does not
  exist anymore)
- a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it
  linkable: kvm_vfio_fwd_irq_node
- kvm_fwd_irq now is self-contained (includes struct vfio_device *)
- a single list of kvm_vfio_fwd_irq_irq_node is used instead of having
  a list of devices and a list of forward irq per device. Having 2 lists
  brought extra complexity.
- the VFIO device ref counter is incremented each time a new IRQ is forwarded.
  It is not attempted anymore to hold a single reference whatever the number
  of forwarded IRQs.
- subindex added on top of index to be closer to VFIO API
- platform device check moved in the arm specific implementation
- enable the KVM-VFIO device for arm64
- forwarded state change only can happen while the VFIO IRQ handler is not
  set; in other words, when the VFIO IRQ signaling is not set.

v1 - v2:
- forward control is moved from architecture specific file into generic
  vfio.c module.
  only kvm_arch_set_fwd_state remains architecture specific
- integrate Kim's patch which enables KVM-VFIO for ARM
- fix vgic state bypass in vgic_queue_hwirq
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq 

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Raghavendra K T

On 02/10/2015 06:56 PM, Oleg Nesterov wrote:

On 02/10, Raghavendra K T wrote:


On 02/10/2015 06:23 AM, Linus Torvalds wrote:


  add_smp(lock-tickets.head, TICKET_LOCK_INC);
  if (READ_ONCE(lock-tickets.tail)  TICKET_SLOWPATH_FLAG) ..

into something like

  val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC  TICKET_SHIFT);
  if (unlikely(val  TICKET_SLOWPATH_FLAG)) ...

would be the right thing to do. Somebody should just check that I got
that shift right, and that the tail is in the high bytes (head really
needs to be high to work, if it's in the low byte(s) the xadd would
overflow from head into tail which would be wrong).


Unfortunately xadd could result in head overflow as tail is high.

The other option was repeated cmpxchg which is bad I believe.
Any suggestions?


Stupid question... what if we simply move SLOWPATH from .tail to .head?
In this case arch_spin_unlock() could do xadd(tickets.head) and check
the result


It is a good idea. Trying this now.


In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
the whole .head_tail. Plus obviously more boring changes. This needs a
separate patch even _if_ this can work.


Correct, but apart from this, before doing xadd in unlock,
we would have to make sure lsb bit is cleared so that we can live with 1 
bit overflow to tail which is unused. now either or both of head,tail

lsb bit may be set after unlock.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: cmpxchg8b: new 32-bit only testcase

2015-02-11 Thread Paolo Bonzini
This is similar to emulator.c, that does not run on 32-bit systems.
This bug happens (due to kvm_mmu_page_fault's call to the emulator)
during Windows 7 boot.

Reported-by: Erik Rull erik.r...@rdsoftware.de
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 config/config-i386.mak |  4 +++-
 x86/cmpxchg8b.c| 34 ++
 x86/run|  2 +-
 3 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 x86/cmpxchg8b.c

diff --git a/config/config-i386.mak b/config/config-i386.mak
index 503a3be..691381c 100644
--- a/config/config-i386.mak
+++ b/config/config-i386.mak
@@ -3,9 +3,11 @@ bits = 32
 ldarch = elf32-i386
 CFLAGS += -I $(KERNELDIR)/include
 
-tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
+tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat \
+   $(TEST_DIR)/cmpxchg8b.flat
 
 include config/config-x86-common.mak
 
+$(TEST_DIR)/cmpxchg8b.elf: $(cstart.o) $(TEST_DIR)/cmpxchg8b.o
 $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
 $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
diff --git a/x86/cmpxchg8b.c b/x86/cmpxchg8b.c
new file mode 100644
index 000..ceb0cf8
--- /dev/null
+++ b/x86/cmpxchg8b.c
@@ -0,0 +1,34 @@
+#include ioram.h
+#include vm.h
+#include libcflat.h
+#include desc.h
+#include types.h
+#include processor.h
+
+#define memset __builtin_memset
+#define TESTDEV_IO_PORT 0xe0
+
+static void test_cmpxchg8b(u32 *mem)
+{
+mem[1] = 2;
+mem[0] = 1;
+asm(push %%ebx\n
+mov %[ebx_val], %%ebx\n
+lock cmpxchg8b (%0)\n
+pop %%ebx : : D (mem),
+d (2), a (1), c (4), [ebx_val] i (3) : memory);
+report(cmpxchg8b, mem[0] == 3  mem[1] == 4);
+}
+
+int main()
+{
+   void *mem;
+
+   setup_vm();
+   setup_idt();
+   mem = alloc_vpages(1);
+   install_page((void *)read_cr3(), IORAM_BASE_PHYS, mem);
+
+   test_cmpxchg8b(mem);
+   return report_summary();
+}
diff --git a/x86/run b/x86/run
index 646c577..af37eb4 100755
--- a/x86/run
+++ b/x86/run
@@ -33,7 +33,7 @@ else
pc_testdev=-device testdev,chardev=testlog -chardev 
file,id=testlog,path=msr.out
 fi
 
-command=${qemu} -enable-kvm $pc_testdev -display none -serial stdio 
$pci_testdev -kernel
+command=${qemu} -enable-kvm $pc_testdev -vnc none -serial stdio $pci_testdev 
-kernel
 echo ${command} $@
 ${command} $@
 ret=$?
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #8 from Mark kernelbugzilla.org.mark...@dfgh.net ---
Created attachment 166461
  -- https://bugzilla.kernel.org/attachment.cgi?id=166461action=edit
dmesg

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #9 from Mark kernelbugzilla.org.mark...@dfgh.net ---
I'll try both of your suggestions, thanks

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nSVM: Booting L2 results in L1 hang and a skip_emulated_instruction

2015-02-11 Thread Jan Kiszka
On 2015-02-11 19:12, Kashyap Chamarthy wrote:
 Hi, 
 
 This was tested with kernel-3.19.0-1.fc22) and QEMU (qemu-2.2.0-5.fc22)
 on L0  L1.
 
 
 Description
 ---
 
 Inside L1, boot a nested KVM guest (L2) . Instead of a full blown
 guest, let's use `qemu-sanity-check` with KVM:
 
 $ qemu-sanity-check --accel=kvm
 
 Wwich gives you this CLI (run from a different shell), that confirms
 that the L2 guest is indeed running on KVM (and not TCG):
 
   $ ps -ef | grep -i qemu
   root   763   762 35 11:49 ttyS000:00:00 qemu-system-x86_64 
 -nographic -nodefconfig -nodefaults -machine accel=kvm -no-reboot -serial 
 file:/tmp/tmp.rl3naPaCkZ.out -kernel /boot/vmlinuz-3.19.0-1.fc21.x86_64 
 -initrd /usr/lib64/qemu-sanity-check/initrd -append console=ttyS0 oops=panic 
 panic=-1
 
 
 Which results in:
 
   (a) L1 (guest hypervisor) completely hangs and is unresponsive. But
   when I query libvirt, (`virsh list`) the guest is still reported
   as 'running'
 
   (b) On L0, I notice a ton of these messages:
 
 skip_emulated_instruction: ip 0xffec next 0x8105e964
 
 
 I can get `dmesg`, `dmidecode` , `x86info -a` on L0 and L1 if it helps
 in narrowing down the issue.
 
 
 Related bug and reproducer details
 --
 
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1191665 --  Nested KVM with
 AMD: L2 (nested guest) fails with divide error:  [#1] SMP
 
 

Is this a regression (of the kernel)? If so, can you bisect to the
commit that introduced it?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: fix possible coalesced_mmio_ring page leaks.

2015-02-11 Thread Xiubo Li
It forgets to free coalesced_mmio_ring page after the anon_inode_getfd
fails.

Signed-off-by: Xiubo Li lixi...@cmss.chinamobile.com
---
 virt/kvm/kvm_main.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8579f18..85e8106 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2784,16 +2784,22 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
return PTR_ERR(kvm);
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
r = kvm_coalesced_mmio_init(kvm);
-   if (r  0) {
-   kvm_put_kvm(kvm);
-   return r;
-   }
+   if (r  0)
+   goto out_put_kvm;
 #endif
r = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
if (r  0)
-   kvm_put_kvm(kvm);
+   goto out_mmio_free;
 
return r;
+
+out_mmio_free:
+#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
+   kvm_coalesced_mmio_free(kvm);
+#endif
+out_put_kvm:
+   kvm_put_kvm(kvm);
+   return r;
 }
 
 static long kvm_dev_ioctl(struct file *filp,
-- 
1.9.1



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html