date:20150319

Re: [PATCH 8/9] qspinlock: Generic paravirt support

2015-03-19 Thread Waiman Long


On 03/19/2015 08:25 AM, Peter Zijlstra wrote:

On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote:

So I was now thinking of hashing the lock pointer; let me go and quickly
put something together.

A little something like so; ideally we'd allocate the hashtable since
NR_CPUS is kinda bloated, but it shows the idea I think.

And while this has loops in (the rehashing thing) their fwd progress
does not depend on other CPUs.

And I suspect that for the typical lock contention scenarios its
unlikely we ever really get into long rehashing chains.

---
  include/linux/lfsr.h|   49 
  kernel/locking/qspinlock_paravirt.h |  143 

  2 files changed, 178 insertions(+), 14 deletions(-)


This is a much better alternative.


--- /dev/null
+++ b/include/linux/lfsr.h
@@ -0,0 +1,49 @@
+#ifndef _LINUX_LFSR_H
+#define _LINUX_LFSR_H
+
+/*
+ * Simple Binary Galois Linear Feedback Shift Register
+ *
+ * http://en.wikipedia.org/wiki/Linear_feedback_shift_register
+ *
+ */
+
+extern void __lfsr_needs_more_taps(void);
+
+static __always_inline u32 lfsr_taps(int bits)
+{
+   if (bits ==  1) return 0x0001;
+   if (bits ==  2) return 0x0001;
+   if (bits ==  3) return 0x0003;
+   if (bits ==  4) return 0x0009;
+   if (bits ==  5) return 0x0012;
+   if (bits ==  6) return 0x0021;
+   if (bits ==  7) return 0x0041;
+   if (bits ==  8) return 0x008E;
+   if (bits ==  9) return 0x0108;
+   if (bits == 10) return 0x0204;
+   if (bits == 11) return 0x0402;
+   if (bits == 12) return 0x0829;
+   if (bits == 13) return 0x100D;
+   if (bits == 14) return 0x2015;
+
+   /*
+* For more taps see:
+*   http://users.ece.cmu.edu/~koopman/lfsr/index.html
+*/
+   __lfsr_needs_more_taps();
+
+   return 0;
+}
+
+static inline u32 lfsr(u32 val, int bits)
+{
+   u32 bit = val  1;
+
+   val= 1;
+   if (bit)
+   val ^= lfsr_taps(bits);
+   return val;
+}
+
+#endif /* _LINUX_LFSR_H */
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -2,6 +2,9 @@
  #error do not include this file
  #endif

+#includelinux/hash.h
+#includelinux/lfsr.h
+
  /*
   * Implement paravirt qspinlocks; the general idea is to halt the vcpus 
instead
   * of spinning them.
@@ -107,7 +110,120 @@ static void pv_kick_node(struct mcs_spin
pv_kick(pn-cpu);
  }

-static DEFINE_PER_CPU(struct qspinlock *, __pv_lock_wait);
+/*
+ * Hash table using open addressing with an LFSR probe sequence.
+ *
+ * Since we should not be holding locks from NMI context (very rare indeed) the
+ * max load factor is 0.75, which is around the point where open addressing
+ * breaks down.
+ *
+ * Instead of probing just the immediate bucket we probe all buckets in the
+ * same cacheline.
+ *
+ * http://en.wikipedia.org/wiki/Hash_table#Open_addressing
+ *
+ */
+
+#define HB_RESERVED((struct qspinlock *)1)
+
+struct pv_hash_bucket {
+   struct qspinlock *lock;
+   int cpu;
+};
+
+/*
+ * XXX dynamic allocate using nr_cpu_ids instead...
+ */
+#define PV_LOCK_HASH_BITS  (2 + NR_CPUS_BITS)
+


As said here, we should make it dynamically allocated depending on 
num_possible_cpus().



+#if PV_LOCK_HASH_BITS  6
+#undef PV_LOCK_HASH_BITS
+#define PB_LOCK_HASH_BITS  6
+#endif
+
+#define PV_LOCK_HASH_SIZE  (1  PV_LOCK_HASH_BITS)
+
+static struct pv_hash_bucket __pv_lock_hash[PV_LOCK_HASH_SIZE] 
cacheline_aligned;
+
+#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct 
pv_hash_bucket))
+
+static inline u32 hash_align(u32 hash)
+{
+   return hash  ~(PV_HB_PER_LINE - 1);
+}
+
+static struct qspinlock **pv_hash(struct qspinlock *lock)
+{
+   u32 hash = hash_ptr(lock, PV_LOCK_HASH_BITS);
+   struct pv_hash_bucket *hb, *end;
+
+   if (!hash)
+   hash = 1;
+
+   hb =__pv_lock_hash[hash_align(hash)];
+   for (;;) {
+   for (end = hb + PV_HB_PER_LINE; hb  end; hb++) {
+   if (cmpxchg(hb-lock, NULL, HB_RESERVED)) {
+   WRITE_ONCE(hb-cpu, smp_processor_id());
+   /*
+* Since we must read lock first and cpu
+* second, we must write cpu first and lock
+* second, therefore use HB_RESERVE to mark an
+* entry in use before writing the values.
+*
+* This can cause hb_hash_find() to not find a
+* cpu even though _Q_SLOW_VAL, this is not a
+* problem since we re-check l-locked before
+* going to sleep and the unlock will have
+* cleared l-locked already.
+*/
+

Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-19 Thread James Sullivan

On 03/19/2015 07:00 AM, Radim Krčmář wrote:
 2015-03-18 22:09-0300, Marcelo Tosatti:
 See native_compose_msi_msg:
 ((apic-irq_dest_mode == 0) ?
 MSI_ADDR_DEST_MODE_PHYSICAL :
 MSI_ADDR_DEST_MODE_LOGICAL) |
 ((apic-irq_delivery_mode != dest_LowestPrio) ?
 MSI_ADDR_REDIRECTION_CPU :
 MSI_ADDR_REDIRECTION_LOWPRI) |
 So it does configure DM = MSI_ADDR_DEST_MODE_LOGICAL
 and RH = MSI_ADDR_REDIRECTION_LOWPRI.
 ...and yet this is a good counterexample against my argument :)
 
 (It could be just to make the code nicer ... the developer might have
  known how real hardware will handle it.)
 
 What I think I'll do is revert this particular change so that dest_mode is
 set independently of RH. While I'm not entirely convinced that this is the
 intended interpretation, I do think that consistency with the existing logic
 is probably desirable for the time being. If I can get closure on the matter
 I'll re-submit that change, but for the time being I will undo it.
 Just write MSI-X table entries on real hardware (say: modify
 native_compose_msi_msg or MSI-X equivalent), with all RH/DM
 combinations, and see what behaviour is
 comes up?  
 
 I second this idea.
 (We'd also get to know how RH interacts with delivery mode.)
 

I played around with native_compose_msi_msg and discovered the following:

* dm=0, rh=0 = Physical Destination Mode
* dm=0, rh=1 = Failed delivery
* dm=1, rh=0 = Logical Destination Mode, No Redirection
* dm=1, rh=1 = Logical Destination Mode, Redirection

So it seems to be the case that logical destination mode is used whenever
DM=1, regardless of RH. Furthermore, the case where DM=0 and RH=1 is
undefined, as was indicated in the closing response to the thread in
https://software.intel.com/en-us/forums/topic/23 :

...X86 supports logical mode but does not support redirection of physical mode
interrupts. I think they should have just said RH=1 is not defined in physical
mode, but instead they are trying to tell you what the restrictions are if you
set it on.

The default config for PCI devices seems to be DM=1,RH=1.

-James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-19 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #5 from Thomas Stein himbe...@meine-oma.de ---
Hi.

Reverted the commit right now. I have to boot with this kernel. More tomorrow.

Thanks and cheers
t.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread jacob jacob

On Thu, Mar 19, 2015 at 5:53 PM, jacob jacob opstk...@gmail.com wrote:
 On Thu, Mar 19, 2015 at 5:42 PM, Shannon Nelson
 shannon.nel...@intel.com wrote:
 On Thu, Mar 19, 2015 at 2:04 PM, jacob jacob opstk...@gmail.com wrote:
 I have updated to latest firmware and still no luck.

 [...]

 [   61.554132] i40e :00:06.0 eth2: the driver failed to link
 because an unqualified module was detected. 
 [   61.555331] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready


 So I assume you're getting traffic on the other port and it doesn't
 complain about unqualified module?  Does the problem move if you
 swap the cables?  The usual problem here is a QSFP connector that
 isn't compatible with the NIC.  I don't have a pointer handy to the
 official list, but you should be able to get that from your NIC
 supplier.

 No. That is not the case.

 The traffic counters are for the dpdk test run which uses igb_uio driver.

 As i mentioned earlier, have tried all combinations of cable, pci slot
 , card etc to isolate any hw issues.

Everything works fine on the host and hence the
modules are verified to be working.
The issue is seen only when the device is passed through to a VM
running on the same host.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 4/9] KVM: s390: Add MEMOP ioctls for reading/writing guest memory

2015-03-19 Thread Marcelo Tosatti

On Thu, Mar 19, 2015 at 02:15:18PM +0100, Thomas Huth wrote:
  Date: Wed, 18 Mar 2015 21:23:48 -0300
  From: Marcelo Tosatti mtosa...@redhat.com

  On Mon, Mar 16, 2015 at 09:51:40AM +0100, Christian Borntraeger wrote:
   From: Thomas Huth th...@linux.vnet.ibm.com

   On s390, we've got to make sure to hold the IPTE lock while accessing
   logical memory. So let's add an ioctl for reading and writing logical
   memory to provide this feature for userspace, too.
 ...

  What i was wondering is why you can't translate the address
  in the kernel and let userspace perform the actual read/write?

 The idea here is to protect the read/write access with the ipte-lock, too.
 That way, the whole address translation _and_ the read/write access are
 protected together against invalidate-page-table operations from other
 CPUs,
 so the whole memory access looks atomic for other VCPUs. And since we do
 not
 want to expose the ipte lock directly to user space, both has to be done
 in the kernel.
 We already had a long internal discussion about this in our team, and
 indeed, if the ipte-lock would be the only problem that we face on s390,
 we might also come up with a solution where the memory read/write access
 is done in userspace instead. However, for full architecture compliance,
 we later have got to support the so-called storage keys during memory
 accesses, too, and this can hardly be done accurately and safely from
 userspace. So I'm afraid, it's somewhat ugly that we've got to provide an
 ioctl here to read/write the guest memory, but it's the only feasible
 solution that I could think of.

I see, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] qspinlock,x86,kvm: Implement KVM support for paravirt qspinlock

2015-03-19 Thread Peter Zijlstra

On Wed, Mar 18, 2015 at 10:45:55PM -0400, Waiman Long wrote:
 On 03/16/2015 09:16 AM, Peter Zijlstra wrote:
 I do have some concern about this call site patching mechanism as the
 modification is not atomic. The spin_unlock() calls are in many places in
 the kernel. There is a possibility that a thread is calling a certain
 spin_unlock call site while it is being patched by another one with the
 alternative() function call.
 
 So far, I don't see any problem with bare metal where paravirt_patch_insns()
 is used to patch it to the move instruction. However, in a virtual guest
 enivornment where paravirt_patch_call() was used, there were situations
 where the system panic because of page fault on some invalid memory in the
 kthread. If you look at the paravirt_patch_call(), you will see:
 
 :
 b-opcode = 0xe8; /* call */
 b-delta = delta;
 
 If another CPU reads the instruction at the call site at the right moment,
 it will get the modified call instruction, but not the new delta value. It
 will then jump to a random location. I believe that was causing the system
 panic that I saw.
 
 So I think it is kind of risky to use it here unless we can guarantee that
 call site patching is atomic wrt other CPUs.

Just look at where the patching is done:

init/main.c:start_kernel()
  check_bugs()
alternative_instructions()
  apply_paravirt()

We're UP and not holding any locks, disable IRQs (see text_poke_early())
and have NMIs 'disabled'.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread Stefan Assmann

On 18.03.2015 23:06, Shannon Nelson wrote:
 On Wed, Mar 18, 2015 at 3:01 PM, Shannon Nelson
 shannon.nel...@intel.com wrote:


 On Wed, Mar 18, 2015 at 8:40 AM, jacob jacob opstk...@gmail.com wrote:

 On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:

 Actually, Stefan suggests that support for this card is still sketchy
 and your best bet is to try out net-next
 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

 Also, could you please post more information about your hardware setup
 (chipset/processor/firmware version on the card etc) ?

 Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz

 Manufacturer Part Number:  XL710QDA1BLK
 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
 40GbE QSFP+ (rev 01)
  #ethtool -i enp9s0
 driver: i40e
 version: 1.2.6-k
 firmware-version: f4.22 a1.1 n04.24 e800013fd
 bus-info: :09:00.0
 supports-statistics: yes
 supports-test: yes
 supports-eeprom-access: yes
 supports-register-dump: yes
 supports-priv-flags: no

 
 Jacob,
 
 It looks like you're using a NIC with the e800013fd firmware from last
 summer, and from a separate message that you saw these issues with
 both the 1.2.2-k and the 1.2.37 version drivers.  I suggest the next
 step would be to update the NIC firmware as there are some performance
 and stability updates available that deal with similar issues.  Please
 see the Intel Networking support webpage at
 https://downloadcenter.intel.com/download/24769 and look for the
 NVMUpdatePackage.zip.  This should take care of several of the things
 Stefan might describe as sketchy :-).

Interesting, the following might explain why my XL710 feels a bit
sketchy then. ;-)
# ethtool -i p4p1
driver: i40e
version: 1.2.37-k
firmware-version: f4.22.26225 a1.1 n4.24 e12ef
Looks like the firmware on this NIC is even older.

I tried to update the firmware with nvmupdate64e and the first thing I
noticed is that you cannot update the firmware even with todays linux
git. The tool errors out because it cannot access the NVM. Only with a
recent net-next kernel I was able to update the firmware.
ethtool -i p4p1
driver: i40e
version: 1.2.37-k
firmware-version: f4.33.31377 a1.2 n4.42 e1932

However during the update I got a lot of errors in dmesg.
[  301.796664] i40e :82:00.0: ARQ Error: Unknown event 0x0702 received
[  301.893933] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
[  302.005223] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
[...]
[  387.884635] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
[  387.896862] i40e :82:00.0: ARQ Overflow Error detected
[  387.902995] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
[...]
[  391.583799] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
errno=-16 module=70 offset=0x0 size=2
[  391.714217] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
errno=-16 module=70 offset=0x0 size=2
[  391.842656] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
errno=-16 module=70 offset=0x0 size=2
[  391.973080] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
errno=-16 module=70 offset=0x0 size=2
[  392.107586] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
errno=-16 module=70 offset=0x0 size=2
[  392.244140] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
errno=-16 module=70 offset=0x0 size=2
[  392.373966] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received

Not sure if that flash was actually successful or not.

  Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] qspinlock: Generic paravirt support

2015-03-19 Thread Peter Zijlstra

On Wed, Mar 18, 2015 at 04:50:37PM -0400, Waiman Long wrote:
 +this_cpu_write(__pv_lock_wait, lock);
 
 We may run into the same problem of needing to have 4 queue nodes per CPU.
 If an interrupt happens just after the write and before the actual wait and
 it goes through the same sequence, it will overwrite the __pv_lock_wait[]
 entry. So we may have lost wakeup. That is why the pvticket lock code did
 that just before the actual wait with interrupt disabled. We probably
 couldn't disable interrupt here. So we may need to move the actual write to
 the KVM and Xen code if we keep the current logic.

Hmm indeed. So I didn't actually mean to keep this code, but yes I
missed that.

 +/*
 + * At this point the memory pointed at by lock can be freed/reused,
 + * however we can still use the pointer value to search in our cpu
 + * array.
 + *
 + * XXX: get rid of this loop
 + */
 +for_each_possible_cpu(cpu) {
 +if (per_cpu(__pv_lock_wait, cpu) == lock)
 +pv_kick(cpu);
 +}
 +}
 
 I do want to get rid of this loop too. On average, we have to scan about
 half the number of CPUs available. So it isn't that different
 performance-wise compared with my original idea of following the list from
 tail to head. And how about your idea of propagating the current head down
 the linked list?

Yeah, so I was half done with that patch when I fell asleep on the
flight home. Didn't get around to 'fixing' it. It applies and builds but
doesn't actually work.

_However_ I'm not sure I actually like it much.

The problem I have with it are these wait loops, they can generate the
same problem we're trying to avoid.

So I was now thinking of hashing the lock pointer; let me go and quickly
put something together.

---
 kernel/locking/qspinlock.c  |8 +-
 kernel/locking/qspinlock_paravirt.h |  124 
 2 files changed, 101 insertions(+), 31 deletions(-)

--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -246,10 +246,10 @@ static __always_inline void set_locked(s
  */
 
 static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(u32 old, struct mcs_spinlock *node) 
{ }
 static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { }
 
-static __always_inline void __pv_wait_head(struct qspinlock *lock) { }
+static __always_inline void __pv_wait_head(struct qspinlock *lock, struct 
mcs_spinlock *node) { }
 
 #define pv_enabled()   false
 
@@ -399,7 +399,7 @@ void queue_spin_lock_slowpath(struct qsp
prev = decode_tail(old);
WRITE_ONCE(prev-next, node);
 
-   pv_wait_node(node);
+   pv_wait_node(old, node);
arch_mcs_spin_lock_contended(node-locked);
}
 
@@ -414,7 +414,7 @@ void queue_spin_lock_slowpath(struct qsp
 * sequentiality; this is because the set_locked() function below
 * does not imply a full barrier.
 */
-   pv_wait_head(lock);
+   pv_wait_head(lock, node);
while ((val = smp_load_acquire(lock-val.counter))  
_Q_LOCKED_PENDING_MASK)
cpu_relax();
 
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -29,8 +29,14 @@ struct pv_node {
 
int cpu;
u8  state;
+   struct pv_node  *head;
 };
 
+static inline struct pv_node *pv_decode_tail(u32 tail)
+{
+   return (struct pv_node *)decode_tail(tail);
+}
+
 /*
  * Initialize the PV part of the mcs_spinlock node.
  */
@@ -42,17 +48,49 @@ static void pv_init_node(struct mcs_spin
 
pn-cpu = smp_processor_id();
pn-state = vcpu_running;
+   pn-head = NULL;
+}
+
+static void pv_propagate_head(struct pv_node *pn, struct pv_node *ppn)
+{
+   /*
+* When we race against the first waiter or ourselves we have to wait
+* until the previous link updates its head pointer before we can
+* propagate it.
+*/
+   while (!READ_ONCE(ppn-head)) {
+   /*
+* queue_spin_lock_slowpath could have been waiting for the
+* node-next store above before setting node-locked.
+*/
+   if (ppn-mcs.locked)
+   return;
+
+   cpu_relax();
+   }
+   /*
+* If we interleave such that the store from pv_set_head() happens
+* between our load and store we could have over-written the new head
+* value.
+*
+* Therefore use cmpxchg to only propagate the old head value if our
+* head value is unmodified.
+*/
+   (void)cmpxchg(pn-head, NULL, READ_ONCE(ppn-head));
 }
 
 /*
  * Wait for node-locked to become true, halt the vcpu after a short spin.
  * pv_kick_node() is used to wake the

[3.16.y-ckt stable] Patch KVM: MIPS: Fix trace event to save PC directly has been added to staging queue

2015-03-19 Thread Luis Henriques

This is a note to let you know that I have just added a patch titled

KVM: MIPS: Fix trace event to save PC directly

to the linux-3.16.y-queue branch of the 3.16.y-ckt extended stable tree 
which can be found at:

 
http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.16.y-queue

This patch is scheduled to be released in version 3.16.7-ckt9.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.16.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Luis

--

From d48fc17717a72401605b1f7f07ae96926e524c6f Mon Sep 17 00:00:00 2001
From: James Hogan james.ho...@imgtec.com
Date: Tue, 24 Feb 2015 11:46:20 +
Subject: KVM: MIPS: Fix trace event to save PC directly

commit b3cffac04eca9af46e1e23560a8ee22b1bd36d43 upstream.

Currently the guest exit trace event saves the VCPU pointer to the
structure, and the guest PC is retrieved by dereferencing it when the
event is printed rather than directly from the trace record. This isn't
safe as the printing may occur long afterwards, after the PC has changed
and potentially after the VCPU has been freed. Usually this results in
the same (wrong) PC being printed for multiple trace events. It also
isn't portable as userland has no way to access the VCPU data structure
when interpreting the trace record itself.

Lets save the actual PC in the structure so that the correct value is
accessible later.

Fixes: 669e846e6c4e (KVM/MIPS32: MIPS arch specific APIs for KVM)
Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Ralf Baechle r...@linux-mips.org
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: Gleb Natapov g...@kernel.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@redhat.com
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Acked-by: Steven Rostedt rost...@goodmis.org
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Luis Henriques luis.henriq...@canonical.com
---
 arch/mips/kvm/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/trace.h b/arch/mips/kvm/trace.h
index bc9e0f406c08..e51621e36152 100644
--- a/arch/mips/kvm/trace.h
+++ b/arch/mips/kvm/trace.h
@@ -26,18 +26,18 @@ TRACE_EVENT(kvm_exit,
TP_PROTO(struct kvm_vcpu *vcpu, unsigned int reason),
TP_ARGS(vcpu, reason),
TP_STRUCT__entry(
-   __field(struct kvm_vcpu *, vcpu)
+   __field(unsigned long, pc)
__field(unsigned int, reason)
),

TP_fast_assign(
-   __entry-vcpu = vcpu;
+   __entry-pc = vcpu-arch.pc;
__entry-reason = reason;
),

TP_printk([%s]PC: 0x%08lx,
  kvm_mips_exit_types_str[__entry-reason],
- __entry-vcpu-arch.pc)
+ __entry-pc)
 );

 #endif /* _TRACE_KVM_H */
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] qspinlock: Generic paravirt support

2015-03-19 Thread Peter Zijlstra

On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote:
 So I was now thinking of hashing the lock pointer; let me go and quickly
 put something together.

A little something like so; ideally we'd allocate the hashtable since
NR_CPUS is kinda bloated, but it shows the idea I think.

And while this has loops in (the rehashing thing) their fwd progress
does not depend on other CPUs.

And I suspect that for the typical lock contention scenarios its
unlikely we ever really get into long rehashing chains.

---
 include/linux/lfsr.h|   49 
 kernel/locking/qspinlock_paravirt.h |  143 
 2 files changed, 178 insertions(+), 14 deletions(-)

--- /dev/null
+++ b/include/linux/lfsr.h
@@ -0,0 +1,49 @@
+#ifndef _LINUX_LFSR_H
+#define _LINUX_LFSR_H
+
+/*
+ * Simple Binary Galois Linear Feedback Shift Register
+ *
+ * http://en.wikipedia.org/wiki/Linear_feedback_shift_register
+ *
+ */
+
+extern void __lfsr_needs_more_taps(void);
+
+static __always_inline u32 lfsr_taps(int bits)
+{
+   if (bits ==  1) return 0x0001;
+   if (bits ==  2) return 0x0001;
+   if (bits ==  3) return 0x0003;
+   if (bits ==  4) return 0x0009;
+   if (bits ==  5) return 0x0012;
+   if (bits ==  6) return 0x0021;
+   if (bits ==  7) return 0x0041;
+   if (bits ==  8) return 0x008E;
+   if (bits ==  9) return 0x0108;
+   if (bits == 10) return 0x0204;
+   if (bits == 11) return 0x0402;
+   if (bits == 12) return 0x0829;
+   if (bits == 13) return 0x100D;
+   if (bits == 14) return 0x2015;
+
+   /*
+* For more taps see:
+*   http://users.ece.cmu.edu/~koopman/lfsr/index.html
+*/
+   __lfsr_needs_more_taps();
+
+   return 0;
+}
+
+static inline u32 lfsr(u32 val, int bits)
+{
+   u32 bit = val  1;
+
+   val = 1;
+   if (bit)
+   val ^= lfsr_taps(bits);
+   return val;
+}
+
+#endif /* _LINUX_LFSR_H */
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -2,6 +2,9 @@
 #error do not include this file
 #endif
 
+#include linux/hash.h
+#include linux/lfsr.h
+
 /*
  * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead
  * of spinning them.
@@ -107,7 +110,120 @@ static void pv_kick_node(struct mcs_spin
pv_kick(pn-cpu);
 }
 
-static DEFINE_PER_CPU(struct qspinlock *, __pv_lock_wait);
+/*
+ * Hash table using open addressing with an LFSR probe sequence.
+ *
+ * Since we should not be holding locks from NMI context (very rare indeed) the
+ * max load factor is 0.75, which is around the point where open addressing
+ * breaks down.
+ *
+ * Instead of probing just the immediate bucket we probe all buckets in the
+ * same cacheline.
+ *
+ * http://en.wikipedia.org/wiki/Hash_table#Open_addressing
+ *
+ */
+
+#define HB_RESERVED((struct qspinlock *)1)
+
+struct pv_hash_bucket {
+   struct qspinlock *lock;
+   int cpu;
+};
+
+/*
+ * XXX dynamic allocate using nr_cpu_ids instead...
+ */
+#define PV_LOCK_HASH_BITS  (2 + NR_CPUS_BITS)
+
+#if PV_LOCK_HASH_BITS  6
+#undef PV_LOCK_HASH_BITS
+#define PB_LOCK_HASH_BITS  6
+#endif
+
+#define PV_LOCK_HASH_SIZE  (1  PV_LOCK_HASH_BITS)
+
+static struct pv_hash_bucket __pv_lock_hash[PV_LOCK_HASH_SIZE] 
cacheline_aligned;
+
+#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct 
pv_hash_bucket))
+
+static inline u32 hash_align(u32 hash)
+{
+   return hash  ~(PV_HB_PER_LINE - 1);
+}
+
+static struct qspinlock **pv_hash(struct qspinlock *lock)
+{
+   u32 hash = hash_ptr(lock, PV_LOCK_HASH_BITS);
+   struct pv_hash_bucket *hb, *end;
+
+   if (!hash)
+   hash = 1;
+
+   hb = __pv_lock_hash[hash_align(hash)];
+   for (;;) {
+   for (end = hb + PV_HB_PER_LINE; hb  end; hb++) {
+   if (cmpxchg(hb-lock, NULL, HB_RESERVED)) {
+   WRITE_ONCE(hb-cpu, smp_processor_id());
+   /*
+* Since we must read lock first and cpu
+* second, we must write cpu first and lock
+* second, therefore use HB_RESERVE to mark an
+* entry in use before writing the values.
+*
+* This can cause hb_hash_find() to not find a
+* cpu even though _Q_SLOW_VAL, this is not a
+* problem since we re-check l-locked before
+* going to sleep and the unlock will have
+* cleared l-locked already.
+*/
+   smp_wmb(); /* matches rmb from pv_hash_find */
+   WRITE_ONCE(hb-lock, lock);
+   goto done;
+   }
+

Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-19 Thread Radim Krčmář

2015-03-18 22:09-0300, Marcelo Tosatti:
   See native_compose_msi_msg:
   ((apic-irq_dest_mode == 0) ?
   MSI_ADDR_DEST_MODE_PHYSICAL :
   MSI_ADDR_DEST_MODE_LOGICAL) |
   ((apic-irq_delivery_mode != dest_LowestPrio) ?
   MSI_ADDR_REDIRECTION_CPU :
   MSI_ADDR_REDIRECTION_LOWPRI) |
   So it does configure DM = MSI_ADDR_DEST_MODE_LOGICAL
   and RH = MSI_ADDR_REDIRECTION_LOWPRI.
  ...and yet this is a good counterexample against my argument :)

(It could be just to make the code nicer ... the developer might have
 known how real hardware will handle it.)

  What I think I'll do is revert this particular change so that dest_mode is
  set independently of RH. While I'm not entirely convinced that this is the
  intended interpretation, I do think that consistency with the existing logic
  is probably desirable for the time being. If I can get closure on the matter
  I'll re-submit that change, but for the time being I will undo it.
 Just write MSI-X table entries on real hardware (say: modify
 native_compose_msi_msg or MSI-X equivalent), with all RH/DM
 combinations, and see what behaviour is
 comes up?  

I second this idea.
(We'd also get to know how RH interacts with delivery mode.)

https://software.intel.com/en-us/forums/topic/23
said that DM=1+RH=0 delivers to physical:

  The exact quote from 10.11.1 is When RH is 0, the interrupt is
  directed to the processor listed in the Destination ID field.
  This does not specify if physical or logical addressing mode is used.

  Experimentation shows that physical addressing mode is used
  with RH equal to zero.

and it also mentioned a disturbing behavior, which I chose to ignore:

  10.11.1 goes on to say that When RH is 1 and the physical destination
  mode is used [i.e., DM = 0], the Destination ID field must not be set
  to 0xFF; it must point to a processor that is present and enabled to
  receive the interrupt.

  This would seem to be the exact same case as RH equal to zero;
  there, DM is ignored: If RH is 0, then the DM bit is
  ignored and the message is sent ahead independent of whether
  the physical or logical destination mode is used.

  However, changing RH to 1 and DM to zero fails to send the message
  to the physical processor.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled

2015-03-19 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=82211

--- Comment #12 from Zhou, Chao chao.z...@intel.com ---
(In reply to Radim Krčmář from comment #11)
 Should be fixed with KVM: x86: call irq notifiers with directed EOI,
 (http://www.spinics.net/lists/kernel/msg1949367.html)
 
 can you check if it is?
 
 Thanks.

test this patch with kvm.git:4ff6f8e61eb7f96d3ca535c6d240f863ccd6fb7d
kernel version: 4.0.0-rc1+
qemu.git:cd232acfa0d70002fed89e9293f04afda577a513
test on Haswell_EP and Ivytown_EP

L1(xen on kvm) guest works fine when boot up L1 guest

-- 
You are receiving this mail because:
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-19 Thread Paolo Bonzini



On 18/03/2015 19:38, Radim Krčmář wrote:
 kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
 We need to do that for irq notifiers.  (Like with edge interrupts.)
 
 Fix it by skipping EOI broadcast only.
 
 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  arch/x86/kvm/ioapic.c | 4 +++-
  arch/x86/kvm/lapic.c  | 3 +--
  2 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
 index b1947e0f3e10..46d4449772bc 100644
 --- a/arch/x86/kvm/ioapic.c
 +++ b/arch/x86/kvm/ioapic.c
 @@ -422,6 +422,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
  {
   int i;
 + struct kvm_lapic *apic = vcpu-arch.apic;
  
   for (i = 0; i  IOAPIC_NUM_PINS; i++) {
   union kvm_ioapic_redirect_entry *ent = ioapic-redirtbl[i];
 @@ -443,7 +444,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   kvm_notify_acked_irq(ioapic-kvm, KVM_IRQCHIP_IOAPIC, i);
   spin_lock(ioapic-lock);
  
 - if (trigger_mode != IOAPIC_LEVEL_TRIG)
 + if (trigger_mode != IOAPIC_LEVEL_TRIG ||
 + kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI)
   continue;
  
   ASSERT(ent-fields.trig_mode == IOAPIC_LEVEL_TRIG);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index bd4e34de24c7..4ee827d7bf36 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -833,8 +833,7 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct 
 kvm_vcpu *vcpu2)
  
  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
  {
 - if (!(kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI) 
 - kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
 + if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {

Can you look into making kvm_ioapic_handles_vector inline in ioapic.h?

Anyway,

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Paolo

   int trigger_mode;
   if (apic_test_vector(vector, apic-regs + APIC_TMR))
   trigger_mode = IOAPIC_LEVEL_TRIG;
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 3.10

2015-03-19 Thread Stefan Bader

On 18.03.2015 10:18, Paolo Bonzini wrote:
 
 
 On 18/03/2015 09:46, Stefan Bader wrote:

 Regardless of that, I wonder whether the below (this version untested) sound
 acceptable for upstream? At least it would make debugging much simpler. :)

 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, 
 u32 ct
 ctl |= vmx_msr_low;  /* bit == 1 in low word  == must be one  */

 /* Ensure minimum (required) set of control bits are supported. */
 -   if (ctl_min  ~ctl)
 +   if (ctl_min  ~ctl) {
 +   printk(KERN_ERR vmx: msr(%08x) does not match requirements. 
 
 +   req=%08x cur=%08x\n, msr, ctl_min, ctl);
 return -EIO;
 +   }

 *result = ctl;
 return 0;
 
 Yes, this is nice.  Maybe -ENODEV.
 
 Also, a minimal patch for Ubuntu would probably be:
 
 @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
 *vmcs_conf)
 vmx_capability.ept, vmx_capability.vpid);
   }
  
 - min = 0;
 + min = VM_EXIT_SAVE_DEBUG_CONTROLS;
  #ifdef CONFIG_X86_64
   min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
  #endif
 
 but I don't think it's a good idea to add it to stable kernels.

Sorry, I got a bit confused on my assumptions. While the change above causes
guests to fail but the statement to say this is caused by host kernels before
this change was against better knowledge and wrong.

The actual range was hosts running 3.2 which (maybe not perfect but at least
well enough) allowed to use nested vmx for guest kernel 3.15 will break. But
running 3.13 on the host has no issues.

Comparing the rdmsr values of guests between those two host kernels, I found
that on 3.2 the exit control msr was very sparsely initialized. And looking at
the changes between 3.2 and 3.13 I found

commit 33fb20c39e98b90813b5ab2d9a0d6faa6300caca
Author: Jan Kiszka jan.kis...@siemens.com
Date:   Wed Mar 6 15:44:03 2013 +0100

KVM: nVMX: Fix content of MSR_IA32_VMX_ENTRY/EXIT_CTLS

This was added in 3.10. So the range of kernels affected 3.10 back to when
nested vmx became somewhat usable. For 3.2 Ben (and obviously us) would be
affected. Apart from that, I believe, it is only 3.4 which has an active
longterm. At least that change looks safer for stable as it sounds like
correcting things and not adding a feature. I was able to cherry-pick that into
a 3.2 kernel and then a 3.16 guest successfully can load the kvm-intel module
again, of course with the same shortcomings as before.

-Stefan
 
 Paolo
 




signature.asc
Description: OpenPGP digital signature

Re: [PATCH] vfio: Split virqfd into a separate module for vfio bus drivers

2015-03-19 Thread Paul Bolle

On Wed, 2015-03-18 at 11:27 -0600, Alex Williamson wrote:
 --- a/drivers/vfio/virqfd.c
 +++ b/drivers/vfio/virqfd.c

 +#define DRIVER_VERSION  0.1
 +#define DRIVER_AUTHOR   Alex Williamson alex.william...@redhat.com
 +#define DRIVER_DESC IRQFD support for VFIO bus drivers

 +MODULE_VERSION(DRIVER_VERSION);
 +MODULE_LICENSE(GPL v2);
 +MODULE_AUTHOR(DRIVER_AUTHOR);
 +MODULE_DESCRIPTION(DRIVER_DESC);

Why bother with those three defines? They're all used just once, aren't
they?


Paul Bolle

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GSoC] project proposal

2015-03-19 Thread Paolo Bonzini



On 19/03/2015 19:38, Catalin Vasile wrote:
 I have submitted my application on the official GSoC site.
 Do I also have to submit it on this discussion list or anywhere else?

No, I can see the application.  Thanks!

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-19 Thread Radim Krčmář

2015-03-19 18:24+0100, Paolo Bonzini:
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  @@ -833,8 +833,7 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, 
  struct kvm_vcpu *vcpu2)
  -   if (!(kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI) 
  -   kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
  +   if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
 
 Can you look into making kvm_ioapic_handles_vector inline in ioapic.h?

Will do.

 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-19 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=93251

Thomas Stein himbe...@meine-oma.de changed:

   What|Removed |Added

 CC||himbe...@meine-oma.de

--- Comment #3 from Thomas Stein himbe...@meine-oma.de ---
Hello.

I have the very same error on my kvm machines.

Mar 19 20:35:07 hn kernel: qemu-system-x86: page allocation failure: order:4,
mode:0x40d0
Mar 19 20:35:07 hn kernel: CPU: 0 PID: 7029 Comm: qemu-system-x86 Not tainted
3.19.2 #1
Mar 19 20:35:07 hn kernel: Hardware name: System manufacturer System Product
Name/P8B WS, BIOS 0802 08/29/
2011
Mar 19 20:35:07 hn kernel:   88038857f9d8 81711fce
0001
Mar 19 20:35:07 hn kernel:  40d0 88038857fa68 81108d4e
88042fded838
Mar 19 20:35:07 hn kernel:  0380 03c0 0286
ffff
Mar 19 20:35:07 hn kernel: Call Trace:
Mar 19 20:35:07 hn kernel:  [81711fce] dump_stack+0x45/0x57
Mar 19 20:35:07 hn kernel:  [81108d4e] warn_alloc_failed+0xce/0x120
Mar 19 20:35:07 hn kernel:  [8110bc24]
__alloc_pages_nodemask+0x614/0x850
Mar 19 20:35:07 hn kernel:  [8140da5f] ?
megasas_fire_cmd_gen2+0x3f/0x50
Mar 19 20:35:07 hn kernel:  [8114783c] alloc_pages_current+0x8c/0x100
Mar 19 20:35:07 hn kernel:  [81108eac] alloc_kmem_pages+0x2c/0xd0
Mar 19 20:35:07 hn kernel:  [811227f3] kmalloc_order+0x13/0x50
Mar 19 20:35:07 hn kernel:  [8112284f] kmalloc_order_trace+0x1f/0x90
Mar 19 20:35:07 hn kernel:  [811511b9]
__kmalloc_track_caller+0x159/0x250
Mar 19 20:35:07 hn kernel:  [8111ce1b] kmemdup+0x1b/0x40
Mar 19 20:35:07 hn kernel:  [a0004aab]
__kvm_set_memory_region+0x1eb/0x950 [kvm]
Mar 19 20:35:07 hn kernel:  [8115aca0] ?
mem_cgroup_uncharge_swap+0x20/0xa0
Mar 19 20:35:07 hn kernel:  [8113e3a8] ? swap_entry_free+0x198/0x310
Mar 19 20:35:07 hn kernel:  [a000523a]
kvm_set_memory_region+0x2a/0x50 [kvm]
Mar 19 20:35:07 hn kernel:  [a000577f] kvm_vm_ioctl+0x51f/0x760 [kvm]
Mar 19 20:35:07 hn kernel:  [81170d30] do_vfs_ioctl+0x2e0/0x4e0
Mar 19 20:35:07 hn kernel:  [8107666f] ? vtime_account_user+0x3f/0x60
Mar 19 20:35:07 hn kernel:  [81170fb1] SyS_ioctl+0x81/0xa0
Mar 19 20:35:07 hn kernel:  [81719ff2] system_call_fastpath+0x12/0x17

Is anyone working on this?

thanks and best regards
t.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 3.10

2015-03-19 Thread Paolo Bonzini



On 19/03/2015 20:58, Stefan Bader wrote:
 This was added in 3.10. So the range of kernels affected 3.10 back
 to when nested vmx became somewhat usable. For 3.2 Ben (and
 obviously us) would be affected. Apart from that, I believe, it is
 only 3.4 which has an active longterm. At least that change looks
 safer for stable as it sounds like correcting things and not adding
 a feature. I was able to cherry-pick that into a 3.2 kernel and
 then a 3.16 guest successfully can load the kvm-intel module again,
 of course with the same shortcomings as before.

Feel free to backport whatever you want to distro kernels.  But I'm
going to NACK for stable@ anything that is related to nested virt.

The code has changed so much that I simply cannot do a meaningful
review of most patches when applied to old codebases.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: Split virqfd into a separate module for vfio bus drivers

2015-03-19 Thread Alex Williamson

On Thu, 2015-03-19 at 20:32 +0100, Paul Bolle wrote:
 On Wed, 2015-03-18 at 11:27 -0600, Alex Williamson wrote:
  --- a/drivers/vfio/virqfd.c
  +++ b/drivers/vfio/virqfd.c
 
  +#define DRIVER_VERSION  0.1
  +#define DRIVER_AUTHOR   Alex Williamson alex.william...@redhat.com
  +#define DRIVER_DESC IRQFD support for VFIO bus drivers
 
  +MODULE_VERSION(DRIVER_VERSION);
  +MODULE_LICENSE(GPL v2);
  +MODULE_AUTHOR(DRIVER_AUTHOR);
  +MODULE_DESCRIPTION(DRIVER_DESC);
 
 Why bother with those three defines? They're all used just once, aren't
 they?

Well, at some point when I was doing vfio it seemed like a good idea and
I copied it from another driver.  The core vfio module printed the
description and version on driver loading, but the other modules that
comprise vfio just followed along for consistency.  Now we even have a
few more modules that comprise vfio, and they all follow this same
standard, even though they don't use the defines elsewhere.  The macro
names are pretty standard, so I don't think I'm infringing on any name
space by using them, it seems harmless otherwise and maintains
consistency within the driver.  Is it more valuable to remove a few
lines of source code with no net effect on the resulting output?
Besides, look at how much more aesthetically pleasing the above is
versus this:

MODULE_VERSION(0.1);
MODULE_LICENSE(GPL v2);
MODULE_AUTHOR(Alex Williamson alex.william...@redhat.com);
MODULE_DESCRIPTION(IRQFD support for VFIO bus drivers);

;)

Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 4/9] KVM: s390: Add MEMOP ioctls for reading/writing guest memory

2015-03-19 Thread Thomas Huth

 Date: Wed, 18 Mar 2015 21:23:48 -0300
 From: Marcelo Tosatti mtosa...@redhat.com

 On Mon, Mar 16, 2015 at 09:51:40AM +0100, Christian Borntraeger wrote:
  From: Thomas Huth th...@linux.vnet.ibm.com

  On s390, we've got to make sure to hold the IPTE lock while accessing
  logical memory. So let's add an ioctl for reading and writing logical
  memory to provide this feature for userspace, too.
...

 What i was wondering is why you can't translate the address
 in the kernel and let userspace perform the actual read/write?

The idea here is to protect the read/write access with the ipte-lock, too.
That way, the whole address translation _and_ the read/write access are
protected together against invalidate-page-table operations from other
CPUs,
so the whole memory access looks atomic for other VCPUs. And since we do
not
want to expose the ipte lock directly to user space, both has to be done
in the kernel.

We already had a long internal discussion about this in our team, and
indeed, if the ipte-lock would be the only problem that we face on s390,
we might also come up with a solution where the memory read/write access
is done in userspace instead. However, for full architecture compliance,
we later have got to support the so-called storage keys during memory
accesses, too, and this can hardly be done accurately and safely from
userspace. So I'm afraid, it's somewhat ugly that we've got to provide an
ioctl here to read/write the guest memory, but it's the only feasible
solution that I could think of.

 Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread jacob jacob

I was going to try this on fedora 21...now not very sure if that makes
much sense..

On Thu, Mar 19, 2015 at 4:15 AM, Stefan Assmann sassm...@redhat.com wrote:
 On 18.03.2015 23:06, Shannon Nelson wrote:
 On Wed, Mar 18, 2015 at 3:01 PM, Shannon Nelson
 shannon.nel...@intel.com wrote:


 On Wed, Mar 18, 2015 at 8:40 AM, jacob jacob opstk...@gmail.com wrote:

 On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:

 Actually, Stefan suggests that support for this card is still sketchy
 and your best bet is to try out net-next
 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

 Also, could you please post more information about your hardware setup
 (chipset/processor/firmware version on the card etc) ?

 Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz

 Manufacturer Part Number:  XL710QDA1BLK
 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
 40GbE QSFP+ (rev 01)
  #ethtool -i enp9s0
 driver: i40e
 version: 1.2.6-k
 firmware-version: f4.22 a1.1 n04.24 e800013fd
 bus-info: :09:00.0
 supports-statistics: yes
 supports-test: yes
 supports-eeprom-access: yes
 supports-register-dump: yes
 supports-priv-flags: no


 Jacob,

 It looks like you're using a NIC with the e800013fd firmware from last
 summer, and from a separate message that you saw these issues with
 both the 1.2.2-k and the 1.2.37 version drivers.  I suggest the next
 step would be to update the NIC firmware as there are some performance
 and stability updates available that deal with similar issues.  Please
 see the Intel Networking support webpage at
 https://downloadcenter.intel.com/download/24769 and look for the
 NVMUpdatePackage.zip.  This should take care of several of the things
 Stefan might describe as sketchy :-).

 Interesting, the following might explain why my XL710 feels a bit
 sketchy then. ;-)
 # ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.22.26225 a1.1 n4.24 e12ef
 Looks like the firmware on this NIC is even older.

 I tried to update the firmware with nvmupdate64e and the first thing I
 noticed is that you cannot update the firmware even with todays linux
 git. The tool errors out because it cannot access the NVM. Only with a
 recent net-next kernel I was able to update the firmware.
 ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.33.31377 a1.2 n4.42 e1932

 However during the update I got a lot of errors in dmesg.
 [  301.796664] i40e :82:00.0: ARQ Error: Unknown event 0x0702 received
 [  301.893933] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  302.005223] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  387.884635] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  387.896862] i40e :82:00.0: ARQ Overflow Error detected
 [  387.902995] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  391.583799] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.714217] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.842656] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.973080] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.107586] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.244140] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.373966] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received

 Not sure if that flash was actually successful or not.

   Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread Stefan Assmann

On 19.03.2015 15:04, jacob jacob wrote:
 Hi Stefan,
 have you been able to get PCI passthrough working without any issues
 after the upgrade?

My XL710 fails to transfer regular TCP traffic (netperf). If that works
for you then you're already one step ahead of me. Afraid I can't help
you there.

  Stefan

 Thanks
 Jacob
 
 On Thu, Mar 19, 2015 at 4:15 AM, Stefan Assmann sassm...@redhat.com wrote:
 On 18.03.2015 23:06, Shannon Nelson wrote:
 On Wed, Mar 18, 2015 at 3:01 PM, Shannon Nelson
 shannon.nel...@intel.com wrote:


 On Wed, Mar 18, 2015 at 8:40 AM, jacob jacob opstk...@gmail.com wrote:

 On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:

 Actually, Stefan suggests that support for this card is still sketchy
 and your best bet is to try out net-next
 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

 Also, could you please post more information about your hardware setup
 (chipset/processor/firmware version on the card etc) ?

 Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 
 2.60GHz

 Manufacturer Part Number:  XL710QDA1BLK
 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
 40GbE QSFP+ (rev 01)
  #ethtool -i enp9s0
 driver: i40e
 version: 1.2.6-k
 firmware-version: f4.22 a1.1 n04.24 e800013fd
 bus-info: :09:00.0
 supports-statistics: yes
 supports-test: yes
 supports-eeprom-access: yes
 supports-register-dump: yes
 supports-priv-flags: no


 Jacob,

 It looks like you're using a NIC with the e800013fd firmware from last
 summer, and from a separate message that you saw these issues with
 both the 1.2.2-k and the 1.2.37 version drivers.  I suggest the next
 step would be to update the NIC firmware as there are some performance
 and stability updates available that deal with similar issues.  Please
 see the Intel Networking support webpage at
 https://downloadcenter.intel.com/download/24769 and look for the
 NVMUpdatePackage.zip.  This should take care of several of the things
 Stefan might describe as sketchy :-).

 Interesting, the following might explain why my XL710 feels a bit
 sketchy then. ;-)
 # ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.22.26225 a1.1 n4.24 e12ef
 Looks like the firmware on this NIC is even older.

 I tried to update the firmware with nvmupdate64e and the first thing I
 noticed is that you cannot update the firmware even with todays linux
 git. The tool errors out because it cannot access the NVM. Only with a
 recent net-next kernel I was able to update the firmware.
 ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.33.31377 a1.2 n4.42 e1932

 However during the update I got a lot of errors in dmesg.
 [  301.796664] i40e :82:00.0: ARQ Error: Unknown event 0x0702 received
 [  301.893933] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  302.005223] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  387.884635] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  387.896862] i40e :82:00.0: ARQ Overflow Error detected
 [  387.902995] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  391.583799] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.714217] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.842656] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.973080] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.107586] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.244140] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.373966] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received

 Not sure if that flash was actually successful or not.

   Stefan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread jacob jacob

Hi Stefan,
have you been able to get PCI passthrough working without any issues
after the upgrade?
Thanks
Jacob

On Thu, Mar 19, 2015 at 4:15 AM, Stefan Assmann sassm...@redhat.com wrote:
 On 18.03.2015 23:06, Shannon Nelson wrote:
 On Wed, Mar 18, 2015 at 3:01 PM, Shannon Nelson
 shannon.nel...@intel.com wrote:


 On Wed, Mar 18, 2015 at 8:40 AM, jacob jacob opstk...@gmail.com wrote:

 On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:

 Actually, Stefan suggests that support for this card is still sketchy
 and your best bet is to try out net-next
 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

 Also, could you please post more information about your hardware setup
 (chipset/processor/firmware version on the card etc) ?

 Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz

 Manufacturer Part Number:  XL710QDA1BLK
 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
 40GbE QSFP+ (rev 01)
  #ethtool -i enp9s0
 driver: i40e
 version: 1.2.6-k
 firmware-version: f4.22 a1.1 n04.24 e800013fd
 bus-info: :09:00.0
 supports-statistics: yes
 supports-test: yes
 supports-eeprom-access: yes
 supports-register-dump: yes
 supports-priv-flags: no


 Jacob,

 It looks like you're using a NIC with the e800013fd firmware from last
 summer, and from a separate message that you saw these issues with
 both the 1.2.2-k and the 1.2.37 version drivers.  I suggest the next
 step would be to update the NIC firmware as there are some performance
 and stability updates available that deal with similar issues.  Please
 see the Intel Networking support webpage at
 https://downloadcenter.intel.com/download/24769 and look for the
 NVMUpdatePackage.zip.  This should take care of several of the things
 Stefan might describe as sketchy :-).

 Interesting, the following might explain why my XL710 feels a bit
 sketchy then. ;-)
 # ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.22.26225 a1.1 n4.24 e12ef
 Looks like the firmware on this NIC is even older.

 I tried to update the firmware with nvmupdate64e and the first thing I
 noticed is that you cannot update the firmware even with todays linux
 git. The tool errors out because it cannot access the NVM. Only with a
 recent net-next kernel I was able to update the firmware.
 ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.33.31377 a1.2 n4.42 e1932

 However during the update I got a lot of errors in dmesg.
 [  301.796664] i40e :82:00.0: ARQ Error: Unknown event 0x0702 received
 [  301.893933] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  302.005223] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  387.884635] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  387.896862] i40e :82:00.0: ARQ Overflow Error detected
 [  387.902995] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  391.583799] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.714217] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.842656] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.973080] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.107586] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.244140] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.373966] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received

 Not sure if that flash was actually successful or not.

   Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] qspinlock: Generic paravirt support

2015-03-19 Thread Peter Zijlstra

On Thu, Mar 19, 2015 at 01:25:36PM +0100, Peter Zijlstra wrote:
 +static struct qspinlock **pv_hash(struct qspinlock *lock)
 +{
 + u32 hash = hash_ptr(lock, PV_LOCK_HASH_BITS);
 + struct pv_hash_bucket *hb, *end;
 +
 + if (!hash)
 + hash = 1;
 +
 + hb = __pv_lock_hash[hash_align(hash)];
 + for (;;) {
 + for (end = hb + PV_HB_PER_LINE; hb  end; hb++) {
 + if (cmpxchg(hb-lock, NULL, HB_RESERVED)) {

That should be: !cmpxchg(), bit disturbing that that booted.

 + WRITE_ONCE(hb-cpu, smp_processor_id());
 + /*
 +  * Since we must read lock first and cpu
 +  * second, we must write cpu first and lock
 +  * second, therefore use HB_RESERVE to mark an
 +  * entry in use before writing the values.
 +  *
 +  * This can cause hb_hash_find() to not find a
 +  * cpu even though _Q_SLOW_VAL, this is not a
 +  * problem since we re-check l-locked before
 +  * going to sleep and the unlock will have
 +  * cleared l-locked already.
 +  */
 + smp_wmb(); /* matches rmb from pv_hash_find */
 + WRITE_ONCE(hb-lock, lock);
 + goto done;
 + }
 + }
 +
 + hash = lfsr(hash, PV_LOCK_HASH_BITS);
 + hb = __pv_lock_hash[hash_align(hash)];
 + }
 +
 +done:
 + return hb-lock;
 +}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15

2015-03-19 Thread David Vrabel

On 16/03/15 13:16, Peter Zijlstra wrote:
 
 I feel that if someone were to do a Xen patch we can go ahead and merge this
 stuff (finally!).

This seems work for me, but I've not got time to give it a more thorough
testing.

You can fold this into your series.

There doesn't seem to be a way to disable QUEUE_SPINLOCKS when supported by
the arch, is this intentional?  If so, the existing ticketlock code could go.

David

8--
x86/xen: paravirt support for qspinlocks

Provide the wait and kick ops necessary for paravirt-aware queue
spinlocks.

Signed-off-by: David Vrabel david.vra...@citrix.com
---
 arch/x86/xen/spinlock.c |   40 +---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 956374c..b019b2a 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -95,17 +95,43 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #endif  /* CONFIG_XEN_DEBUG_FS */
 
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(char *, irq_name);
+static bool xen_pvspin = true;
+
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+#include asm/qspinlock.h
+
+PV_CALLEE_SAVE_REGS_THUNK(__pv_queue_spin_unlock);
+
+static void xen_qlock_wait(u8 *ptr, u8 val)
+{
+   int irq = __this_cpu_read(lock_kicker_irq);
+
+   xen_clear_irq_pending(irq);
+
+   barrier();
+
+   if (READ_ONCE(*ptr) == val)
+   xen_poll_irq(irq);
+}
+
+static void xen_qlock_kick(int cpu)
+{
+   xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+}
+
+#else
+
 struct xen_lock_waiting {
struct arch_spinlock *lock;
__ticket_t want;
 };
 
-static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-static DEFINE_PER_CPU(char *, irq_name);
 static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
 static cpumask_t waiting_cpus;
 
-static bool xen_pvspin = true;
 __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
int irq = __this_cpu_read(lock_kicker_irq);
@@ -217,6 +243,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
}
}
 }
+#endif /* !QUEUE_SPINLOCK */
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -280,8 +307,15 @@ void __init xen_init_spinlocks(void)
return;
}
printk(KERN_DEBUG xen: PV spinlocks enabled\n);
+#ifdef CONFIG_QUEUE_SPINLOCK
+   pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath;
+   pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock);
+   pv_lock_ops.wait = xen_qlock_wait;
+   pv_lock_ops.kick = xen_qlock_kick;
+#else
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
+#endif
 }
 
 /*
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15

2015-03-19 Thread Peter Zijlstra

On Thu, Mar 19, 2015 at 06:01:34PM +, David Vrabel wrote:
 This seems work for me, but I've not got time to give it a more thorough
 testing.
 
 You can fold this into your series.

Thanks!

 There doesn't seem to be a way to disable QUEUE_SPINLOCKS when supported by
 the arch, is this intentional?  If so, the existing ticketlock code could go.

Yeah, its left as a rudiment such that if we find issues with the
qspinlock code we can 'revert' with a trivial patch. If no issues show
up we can rip out all the old code in a subsequent release.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 06/13] VFIO: platform: add vfio_external_{mask|is_active|set_automasked}

2015-03-19 Thread Eric Auger

Introduces 3 new external functions aimed at doining some actions
on VFIO platform devices:
- mask a VFIO IRQ
- get the active status of a VFIO IRQ (active at interrupt
  controller level or masked by the level-sensitive automasking).
- change the automasked property and the VFIO handler

Note there is no way to discriminate between user-space
masking and automasked handler masking. As a consequence, is_active
will return true in case the IRQ was masked by the user-space.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

V4: creation
---
 drivers/vfio/platform/vfio_platform_irq.c | 43 +++
 include/linux/vfio.h  | 14 ++
 2 files changed, 57 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 8eb65c1..49994cb 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -231,6 +231,49 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
return 0;
 }
 
+void vfio_external_mask(struct vfio_platform_device *vdev, int index)
+{
+   vfio_platform_mask(vdev-irqs[index]);
+}
+EXPORT_SYMBOL_GPL(vfio_external_mask);
+
+bool vfio_external_is_active(struct vfio_platform_device *vdev, int index)
+{
+   unsigned long flags;
+   struct vfio_platform_irq *irq = vdev-irqs[index];
+   bool active, masked, outstanding;
+   int ret;
+
+   spin_lock_irqsave(irq-lock, flags);
+
+   ret = irq_get_irqchip_state(irq-hwirq, IRQCHIP_STATE_ACTIVE, active);
+   BUG_ON(ret);
+   masked = irq-masked;
+   outstanding = active || masked;
+
+   spin_unlock_irqrestore(irq-lock, flags);
+   return outstanding;
+}
+EXPORT_SYMBOL_GPL(vfio_external_is_active);
+
+void vfio_external_set_automasked(struct vfio_platform_device *vdev,
+ int index, bool automasked)
+{
+   unsigned long flags;
+   struct vfio_platform_irq *irq = vdev-irqs[index];
+
+   spin_lock_irqsave(irq-lock, flags);
+   if (automasked) {
+   irq-flags |= VFIO_IRQ_INFO_AUTOMASKED;
+   irq-handler = vfio_automasked_irq_handler;
+   } else {
+   irq-flags = ~VFIO_IRQ_INFO_AUTOMASKED;
+   irq-handler = vfio_irq_handler;
+   }
+   spin_unlock_irqrestore(irq-lock, flags);
+}
+EXPORT_SYMBOL_GPL(vfio_external_set_automasked);
+
 static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
 unsigned index, unsigned start,
 unsigned count, uint32_t flags,
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index b18c38f..7aa6330 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -105,6 +105,20 @@ extern struct vfio_device 
*vfio_device_get_external_user(struct file *filep);
 extern void vfio_device_put_external_user(struct vfio_device *vdev);
 extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
+struct vfio_platform_device;
+extern void vfio_external_mask(struct vfio_platform_device *vdev, int index);
+/*
+ * returns whether the VFIO IRQ is active:
+ * true if not yet deactivated at interrupt controller level or if
+ * automasked (level sensitive IRQ). Unfortunately there is no way to
+ * discriminate between handler auto-masking and user-space masking
+ */
+extern bool vfio_external_is_active(struct vfio_platform_device *vdev,
+   int index);
+
+extern void vfio_external_set_automasked(struct vfio_platform_device *vdev,
+int index, bool automasked);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 12/13] KVM: kvm-vfio: generic forwarding control

2015-03-19 Thread Eric Auger

This patch introduces a new KVM_DEV_VFIO_DEVICE group.

This is a new control channel which enables KVM to cooperate with
viable VFIO devices.

The patch introduces 2 attributes for this group:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
 respectively unset the feature.

The generic part introduced here interact with VFIO, genirq, KVM while
the architecture specific part mostly takes care of the virtual interrupt
controller programming.

Architecture specific implementation is enabled when
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
functions are void.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- use new kvm_vfio_dev_irq struct
- improve error handling according to Alex comments
- full rework or generic/arch specific split to accomodate for
  unforward that never fails
- kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device removed from
  that patch file and introduced before (since also used by Feng)
- guard kvm_vfio_control_irq_forward call with
  __KVM_HAVE_ARCH_KVM_VFIO_FORWARD

v2 - v3:
- add API comments in kvm_host.h
- improve the commit message
- create a private kvm_vfio_fwd_irq struct
- fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
  latter action will be handled in vgic.
- add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
  to move platform specific stuff in architecture specific code.
- kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
- increment the ref counter each time we do an IRQ forwarding and decrement
  this latter each time one IRQ forward is unset. Simplifies the whole
  ref counting.
- simplification of list handling: create, search, removal

v1 - v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
  and ARM specific part(kvm_arch_set_fwd_state)
---
 include/linux/kvm_host.h |  47 +++
 virt/kvm/vfio.c  | 311 ++-
 2 files changed, 355 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4e1829..8f17f87 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1042,6 +1042,15 @@ struct kvm_device_ops {
  unsigned long arg);
 };
 
+/* internal self-contained structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+   struct kvm *kvm; /* VM to inject the GSI into */
+   struct vfio_device *vdev; /* vfio device the IRQ belongs to */
+   __u32 index; /* VFIO device IRQ index */
+   __u32 subindex; /* VFIO device IRQ subindex */
+   __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 void kvm_device_get(struct kvm_device *dev);
 void kvm_device_put(struct kvm_device *dev);
 struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1065,6 +1074,44 @@ inline void kvm_arch_resume_guest(struct kvm *kvm) {}
 
 #endif
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+/**
+ * kvm_arch_set_forward - Sets forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * returns 0 on success,  0 on failure
+ */
+int kvm_arch_set_forward(struct kvm *kvm,
+unsigned int host_irq, unsigned int guest_irq);
+
+/**
+ * kvm_arch_unset_forward - Unsets forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * @active: returns whether the IRQ is active
+ */
+void kvm_arch_unset_forward(struct kvm *kvm,
+   unsigned int host_irq,
+   unsigned int guest_irq,
+   bool *active);
+
+#else
+static inline int kvm_arch_set_forward(struct kvm *kvm,
+  unsigned int host_irq,
+  unsigned int guest_irq)
+{
+   return -ENOENT;
+}
+static inline void kvm_arch_unset_forward(struct kvm *kvm,
+ unsigned int host_irq,
+ unsigned int guest_irq,
+ bool *active) {}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index c995e51..4847597 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -19,14 +19,30 @@
 #include linux/uaccess.h
 #include linux/vfio.h
 #include vfio.h
+#include linux/platform_device.h
+#include linux/irq.h
+#include linux/spinlock.h
+#include linux/interrupt.h
+#include linux/delay.h
+
+#define DEBUG_FORWARD
+#define DEBUG_UNFORWARD
 
 struct kvm_vfio_group {
struct list_head node;
struct vfio_group *vfio_group;
 };
 
+/* private linkable kvm_fwd_irq struct */
+struct kvm_vfio_fwd_irq_node {

[RFC v5 07/13] KVM: kvm-vfio: wrappers to VFIO external API device helpers

2015-03-19 Thread Eric Auger

Provide wrapper functions that allow KVM-VFIO device code to
interact with a vfio device:
- kvm_vfio_device_get_external_user gets a handle to a struct
  vfio_device from the vfio device file descriptor and increments
  its reference counter,
- kvm_vfio_device_put_external_user decrements the reference counter
  to a vfio device,
- kvm_vfio_external_base_device returns a handle to the struct device
  of the vfio device.

Also kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device helpers
are introduced.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- wrappers are no more exposed in kvm_host and become kvm/vfio.c static
  functions
- added kvm_vfio_get_vfio_device/kvm_vfio_put_vfio_device in that
  patch file

v2 - v3:
- reword the commit message and title

v1 - v2:
- kvm_vfio_external_get_base_device renamed into
  kvm_vfio_external_base_device
- kvm_vfio_external_get_type removed
---
 virt/kvm/vfio.c | 74 +
 1 file changed, 74 insertions(+)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 620e37f..80a45e4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -60,6 +60,80 @@ static void kvm_vfio_group_put_external_user(struct 
vfio_group *vfio_group)
symbol_put(vfio_group_put_external_user);
 }
 
+static struct vfio_device *kvm_vfio_device_get_external_user(struct file 
*filep)
+{
+   struct vfio_device *vdev;
+   struct vfio_device *(*fn)(struct file *);
+
+   fn = symbol_get(vfio_device_get_external_user);
+   if (!fn)
+   return ERR_PTR(-EINVAL);
+
+   vdev = fn(filep);
+
+   symbol_put(vfio_device_get_external_user);
+
+   return vdev;
+}
+
+static void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+   void (*fn)(struct vfio_device *);
+
+   fn = symbol_get(vfio_device_put_external_user);
+   if (!fn)
+   return;
+
+   fn(vdev);
+
+   symbol_put(vfio_device_put_external_user);
+}
+
+static struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+   struct device *(*fn)(struct vfio_device *);
+   struct device *dev;
+
+   fn = symbol_get(vfio_external_base_device);
+   if (!fn)
+   return NULL;
+
+   dev = fn(vdev);
+
+   symbol_put(vfio_external_base_device);
+
+   return dev;
+}
+
+/**
+ * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
+ *
+ * Checks it is a valid vfio device and increments its reference counter
+ * @fd: file descriptor of the vfio platform device
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+   struct fd f = fdget(fd);
+   struct vfio_device *vdev;
+
+   if (!f.file)
+   return ERR_PTR(-EINVAL);
+   vdev = kvm_vfio_device_get_external_user(f.file);
+   fdput(f);
+   return vdev;
+}
+
+/**
+ * kvm_vfio_put_vfio_device: decrements the reference counter of the
+ * vfio platform * device
+ *
+ * @vdev: vfio_device handle to release
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+   kvm_vfio_device_put_external_user(vdev);
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
long (*fn)(struct vfio_group *, unsigned long);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 13/13] KVM: arm/arm64: vgic: forwarding control

2015-03-19 Thread Eric Auger

This patch sets __KVM_HAVE_ARCH_KVM_VFIO_FORWARD and implements
kvm_arch_set_forward for ARM/ARM64.

As a result the KVM-VFIO device now allows to forward/unforward a
VFIO device IRQ on ARM.

kvm_arch_set_forward and kvm_arch_unset_forward mostly take care of
VGIC programming: physical IRQ/guest IRQ mapping, list register cleanup,
VGIC state machine.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4 - v5:
- fix arm64 compilation issues, ie. also defines
  __KVM_HAVE_ARCH_HALT_GUEST for arm64

v3 - v4:
- code originally located in kvm_vfio_arm.c
- kvm_arch_vfio_{set|unset}_forward renamed into
  kvm_arch_{set|unset}_forward
- split into 2 functions (set/unset) since unset does not fail anymore
- unset can be invoked at whatever time. Extra care is taken to handle
  transition in VGIC state machine, LR cleanup, ...

v2 - v3:
- renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward
- takes a bool arg instead of kvm_fwd_irq_action enum
- removal of KVM_VFIO_IRQ_CLEANUP
- platform device check now happens here
- more precise errors returned
- irq_eoi handled externally to this patch (VGIC)
- correct enable_irq bug done twice
- reword the commit message
- correct check of platform_bus_type
- use raw_spin_lock_irqsave and check the validity of the handler
---
 arch/arm/include/asm/kvm_host.h   |   1 +
 arch/arm64/include/asm/kvm_host.h |   1 +
 virt/kvm/arm/vgic.c   | 190 ++
 3 files changed, 192 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index b829f93..8e3fd7f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -29,6 +29,7 @@
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 #define __KVM_HAVE_ARCH_HALT_GUEST
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index ffcc698..9be392a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 #define __KVM_HAVE_ARCH_HALT_GUEST
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index cc9ff63..8a396f9 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2117,3 +2117,193 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 {
return 0;
 }
+
+/**
+ * kvm_arch_set_forward - Set forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ *
+ * This function is supposed to be called only if the IRQ
+ * is not in progress: ie. not active at VGIC level and not
+ * currently under injection in the KVM.
+ */
+int kvm_arch_set_forward(struct kvm *kvm, unsigned int host_irq,
+unsigned int guest_irq)
+{
+   irq_hw_number_t gic_irq;
+   struct irq_desc *desc = irq_to_desc(host_irq);
+   struct irq_data *d;
+   unsigned long flags;
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+   int ret = 0;
+   int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS;
+   struct vgic_dist *dist = kvm-arch.vgic;
+
+   if (!vcpu)
+   return 0;
+
+   spin_lock(dist-lock);
+
+   raw_spin_lock_irqsave(desc-lock, flags);
+   d = desc-irq_data;
+   gic_irq = irqd_to_hwirq(d);
+   irqd_set_irq_forwarded(d);
+   /*
+* next physical IRQ will be be handled as forwarded
+* by the host (priority drop only)
+*/
+
+   raw_spin_unlock_irqrestore(desc-lock, flags);
+
+   /*
+* need to release the dist spin_lock here since
+* vgic_map_phys_irq can sleep
+*/
+   spin_unlock(dist-lock);
+   ret = vgic_map_phys_irq(vcpu, spi_id, (int)gic_irq);
+   /*
+* next guest_irq injection will be considered as
+* forwarded and next flush will program LR
+* without maintenance IRQ but with HW bit set
+*/
+   return ret;
+}
+
+/**
+ * kvm_arch_unset_forward - Unset forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * @active: returns whether the physical IRQ is active
+ *
+ * This function must be called when the host_irq is disabled
+ * and guest has been exited and prevented from being re-entered.
+ *
+ */
+void kvm_arch_unset_forward(struct kvm *kvm,
+   unsigned int host_irq,
+   unsigned int guest_irq,
+   bool *active)
+{
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
+   struct vgic_dist *dist = kvm-arch.vgic;
+   int ret, lr;
+   struct vgic_lr vlr;
+   struct irq_desc *desc =

[RFC v5 05/13] VFIO: external user API for interaction with vfio devices

2015-03-19 Thread Eric Auger

The VFIO external user API is enriched with 3 new functions that
allows a kernel user external to VFIO to retrieve some information
from a VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
  its fd and increments its reference counter
- vfio_device_put_external_user decrements the reference counter
- vfio_external_base_device returns a handle to the struct device

---

v3 - v4:
- change the commit title

v2 - v3:
- reword the commit message

v1 - v2:

- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 drivers/vfio/vfio.c  | 24 
 include/linux/vfio.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 23ba12a..33ae8c5 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1490,6 +1490,30 @@ void vfio_group_put_external_user(struct vfio_group 
*group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 
+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+   struct vfio_device *vdev = filep-private_data;
+
+   if (filep-f_op != vfio_device_fops)
+   return ERR_PTR(-EINVAL);
+
+   vfio_device_get(vdev);
+   return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+   vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+   return vdev-dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
 int vfio_external_user_iommu_id(struct vfio_group *group)
 {
return iommu_group_id(group-iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 683b514..b18c38f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -101,6 +101,9 @@ extern void vfio_group_put_external_user(struct vfio_group 
*group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
  unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
 struct pci_dev;
 #ifdef CONFIG_EEH
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 09/13] KVM: arm: rename pause into power_off

2015-03-19 Thread Eric Auger

The kvm_vcpu_arch pause field is renamed into power_off to prepare
for the introduction of a new pause field.

Signed-off-by: Eric Auger eric.au...@linaro.org

v4 - v5:
- fix compilation issue on arm64 (add power_off field in kvm_host.h)
---
 arch/arm/include/asm/kvm_host.h   |  4 ++--
 arch/arm/kvm/arm.c| 10 +-
 arch/arm/kvm/psci.c   | 10 +-
 arch/arm64/include/asm/kvm_host.h |  4 ++--
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 902a7d1..79a919c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -129,8 +129,8 @@ struct kvm_vcpu_arch {
 * here.
 */
 
-   /* Don't run the guest on this vcpu */
-   bool pause;
+   /* vcpu power-off state */
+   bool power_off;
 
/* IO related fields */
struct kvm_decode mmio_decode;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c3a2dc6..08e12dc 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -458,7 +458,7 @@ static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
 
-   wait_event_interruptible(*wq, !vcpu-arch.pause);
+   wait_event_interruptible(*wq, !vcpu-arch.power_off);
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -508,7 +508,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu-kvm);
 
-   if (vcpu-arch.pause)
+   if (vcpu-arch.power_off)
vcpu_pause(vcpu);
 
kvm_vgic_flush_hwstate(vcpu);
@@ -730,12 +730,12 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu 
*vcpu,
vcpu_reset_hcr(vcpu);
 
/*
-* Handle the start in power-off case by marking the VCPU as paused.
+* Handle the start in power-off case.
 */
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu-arch.features))
-   vcpu-arch.pause = true;
+   vcpu-arch.power_off = true;
else
-   vcpu-arch.pause = false;
+   vcpu-arch.power_off = false;
 
return 0;
 }
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 02fa8ef..367b131 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -61,7 +61,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu 
*vcpu)
 
 static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 {
-   vcpu-arch.pause = true;
+   vcpu-arch.power_off = true;
 }
 
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
@@ -85,7 +85,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
 */
if (!vcpu)
return PSCI_RET_INVALID_PARAMS;
-   if (!vcpu-arch.pause) {
+   if (!vcpu-arch.power_off) {
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
return PSCI_RET_ALREADY_ON;
else
@@ -113,7 +113,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
 * the general puspose registers are undefined upon CPU_ON.
 */
*vcpu_reg(vcpu, 0) = context_id;
-   vcpu-arch.pause = false;
+   vcpu-arch.power_off = false;
smp_mb();   /* Make sure the above is visible */
 
wq = kvm_arch_vcpu_wq(vcpu);
@@ -150,7 +150,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct 
kvm_vcpu *vcpu)
kvm_for_each_vcpu(i, tmp, kvm) {
mpidr = kvm_vcpu_get_mpidr_aff(tmp);
if (((mpidr  target_affinity_mask) == target_affinity) 
-   !tmp-arch.pause) {
+   !tmp-arch.power_off) {
return PSCI_0_2_AFFINITY_LEVEL_ON;
}
}
@@ -173,7 +173,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, 
u32 type)
 * re-initialized.
 */
kvm_for_each_vcpu(i, tmp, vcpu-kvm) {
-   tmp-arch.pause = true;
+   tmp-arch.power_off = true;
kvm_vcpu_kick(tmp);
}
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 967fb1c..a713b0c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -122,8 +122,8 @@ struct kvm_vcpu_arch {
 * here.
 */
 
-   /* Don't run the guest */
-   bool pause;
+   /* vcpu power-off state */
+   bool power_off;
 
/* IO related fields */
struct kvm_decode mmio_decode;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 11/13] kvm: arm/arm64: implement kvm_arch_halt_guest and kvm_arch_resume_guest

2015-03-19 Thread Eric Auger

This patch defines __KVM_HAVE_ARCH_HALT_GUEST and implements
kvm_arch_halt_guest and kvm_arch_resume_guest for ARM.

On halt, the guest is forced to exit and prevented from being
re-entered.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4 - v5: add arm64 support
- also defines __KVM_HAVE_ARCH_HALT_GUEST for arm64
- add pause field
---
 arch/arm/include/asm/kvm_host.h   |  4 
 arch/arm/kvm/arm.c| 32 +---
 arch/arm64/include/asm/kvm_host.h |  4 
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 79a919c..b829f93 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -28,6 +28,7 @@
 #include kvm/arm_arch_timer.h
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
+#define __KVM_HAVE_ARCH_HALT_GUEST
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -132,6 +133,9 @@ struct kvm_vcpu_arch {
/* vcpu power-off state */
bool power_off;
 
+   /* Don't run the guest */
+   bool pause;
+
/* IO related fields */
struct kvm_decode mmio_decode;
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 08e12dc..ebc0b55 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -454,11 +454,36 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
return vgic_initialized(kvm);
 }
 
+void kvm_arch_halt_guest(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   vcpu-arch.pause = true;
+   force_vm_exit(cpu_all_mask);
+}
+
+void kvm_arch_resume_guest(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+
+   vcpu-arch.pause = false;
+   wake_up_interruptible(wq);
+   }
+}
+
+
 static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
 
-   wait_event_interruptible(*wq, !vcpu-arch.power_off);
+   wait_event_interruptible(*wq, ((!vcpu-arch.power_off) 
+  (!vcpu-arch.pause)));
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -508,7 +533,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu-kvm);
 
-   if (vcpu-arch.power_off)
+   if (vcpu-arch.power_off || vcpu-arch.pause)
vcpu_pause(vcpu);
 
kvm_vgic_flush_hwstate(vcpu);
@@ -526,7 +551,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
run-exit_reason = KVM_EXIT_INTR;
}
 
-   if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
+   if (ret = 0 || need_new_vmid_gen(vcpu-kvm) ||
+   vcpu-arch.pause) {
kvm_timer_sync_hwstate(vcpu);
local_irq_enable();
kvm_timer_finish_sync(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index a713b0c..ffcc698 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -29,6 +29,7 @@
 #include asm/kvm_mmio.h
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
+#define __KVM_HAVE_ARCH_HALT_GUEST
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -125,6 +126,9 @@ struct kvm_vcpu_arch {
/* vcpu power-off state */
bool power_off;
 
+   /* Don't run the guest */
+   bool pause;
+
/* IO related fields */
struct kvm_decode mmio_decode;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 02/13] VFIO: platform: test forwarded state when selecting IRQ handler

2015-03-19 Thread Eric Auger

In case the IRQ is forwarded, the VFIO platform IRQ handler does not
need to disable the IRQ anymore.

When setting the IRQ handler we now also test the forwarded state. In
case the IRQ is forwarded we select the vfio_irq_handler.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- change title

v2 - v3:
- forwarded state was tested in the handler. Now the forwarded state
  is tested before setting the handler. This definitively limits
  the dynamics of forwarded state changes but I don't think there is
  a use case where we need to be able to change the state at any time.

Conflicts:
drivers/vfio/platform/vfio_platform_irq.c
---
 drivers/vfio/platform/vfio_platform_irq.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 88bba57..132bb3f 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -229,8 +229,13 @@ static int vfio_platform_set_irq_trigger(struct 
vfio_platform_device *vdev,
 {
struct vfio_platform_irq *irq = vdev-irqs[index];
irq_handler_t handler;
+   struct irq_data *d;
+   bool is_forwarded;
 
-   if (vdev-irqs[index].flags  VFIO_IRQ_INFO_AUTOMASKED)
+   d = irq_get_irq_data(irq-hwirq);
+   is_forwarded = irqd_irq_forwarded(d);
+
+   if (vdev-irqs[index].flags  VFIO_IRQ_INFO_AUTOMASKED  !is_forwarded)
handler = vfio_automasked_irq_handler;
else
handler = vfio_irq_handler;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 01/13] KVM: arm/arm64: Enable the KVM-VFIO device

2015-03-19 Thread Eric Auger

From: Kim Phillips kim.phill...@linaro.org

The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them. With this series, it will also be used to record the forwarded
IRQs.

Signed-off-by: Kim Phillips kim.phill...@linaro.org
Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4 - v5:
- reword the commit message to explain both usages of the KVM-VFIO
  device in QEMU
- squash both arm and arm64 enables
---
 arch/arm/kvm/Kconfig| 1 +
 arch/arm/kvm/Makefile   | 2 +-
 arch/arm64/kvm/Kconfig  | 1 +
 arch/arm64/kvm/Makefile | 2 +-
 4 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index f1f79d1..bfb915d 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select SRCU
select MMU_NOTIFIER
+   select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index a093bf1..a4dbb97 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
 AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
 
 KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o 
$(KVM)/vfio.o
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 5105e29..bfffe8f 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select KVM_ARM_HOST
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select SRCU
+   select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
---help---
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b22c636..7104ca1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
 
 obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
 
-kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 04/13] KVM: kvm-vfio: User API for IRQ forwarding

2015-03-19 Thread Eric Auger

This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
to set a VFIO device IRQ as forwarded or not forwarded.
the command takes as argument a handle to a new struct named
kvm_vfio_dev_irq.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v3 - v4:
- rename kvm_arch_forwarded_irq into kvm_vfio_dev_irq
- some rewording in commit message
- document forwarding restrictions and remove unforwarding ones

v2 - v3:
- rework vfio kvm device documentation
- reword commit message and title
- add subindex in kvm_arch_forwarded_irq to be closer to VFIO API
- forwarding state can only be changed with VFIO IRQ signaling is off

v1 - v2:
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
---
 Documentation/virtual/kvm/devices/vfio.txt | 34 --
 include/uapi/linux/kvm.h   | 12 +++
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..6186e6d 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -4,15 +4,20 @@ VFIO virtual device
 Device types supported:
   KVM_DEV_TYPE_VFIO
 
-Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+Only one VFIO instance may be created per VM.
+
+The created device tracks VFIO groups in use by the VM and features
+of those groups important to the correctness and acceleration of
+the VM.  As groups are enabled and disabled for use by the VM, KVM
+should be updated about their presence.  When registered with KVM,
+a reference to the VFIO-group is held by KVM.
+
+The device also enables to control some IRQ settings of VFIO devices:
+forwarding/posting.
 
 Groups:
   KVM_DEV_VFIO_GROUP
+  KVM_DEV_VFIO_DEVICE
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
 
 For each, kvm_device_attr.addr points to an int32_t file descriptor
 for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
+  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
+
+For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
+
+When forwarded, a physical IRQ is completed by the guest and not by the
+host. This requires HW support in the interrupt controller.
+
+Forwarding can only be set when the corresponding VFIO IRQ is not masked
+(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence of this
+IRQ being currently handled) or active at interrupt controller level.
+In such a situation, -EAGAIN is returned. It is advised to to set the
+forwarding before the VFIO signaling is set up, this avoids trial and errors.
+
+Unforwarding can happen at any time.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..1b78dd3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -945,6 +945,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP1
 #define   KVM_DEV_VFIO_GROUP_ADD   1
 #define   KVM_DEV_VFIO_GROUP_DEL   2
+#define  KVM_DEV_VFIO_DEVICE   2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ  1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ2
 
 enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20= 1,
@@ -964,6 +967,15 @@ enum kvm_device_type {
KVM_DEV_TYPE_MAX,
 };
 
+struct kvm_vfio_dev_irq {
+   __u32   argsz;  /* structure length */
+   __u32   fd; /* file descriptor of the VFIO device */
+   __u32   index;  /* VFIO device IRQ index */
+   __u32   start;  /* start of subindex range */
+   __u32   count;  /* size of subindex range */
+   __u32   gsi[];  /* gsi, ie. virtual IRQ number */
+};
+
 /*
  * ioctls for VM fds
  */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 03/13] VFIO: platform: single handler using function pointer

2015-03-19 Thread Eric Auger

A single handler now is registered whatever the use case: automasked
or not. A function pointer is set according to the wished behavior
and the handler calls this function.

The irq lock is taken/released in the root handler. eventfd_signal can
be called in regions not allowed to sleep.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4: creation
---
 drivers/vfio/platform/vfio_platform_irq.c | 21 +++--
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 132bb3f..8eb65c1 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -147,11 +147,8 @@ static int vfio_platform_set_irq_unmask(struct 
vfio_platform_device *vdev,
 static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
 {
struct vfio_platform_irq *irq_ctx = dev_id;
-   unsigned long flags;
int ret = IRQ_NONE;
 
-   spin_lock_irqsave(irq_ctx-lock, flags);
-
if (!irq_ctx-masked) {
ret = IRQ_HANDLED;
 
@@ -160,8 +157,6 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, 
void *dev_id)
irq_ctx-masked = true;
}
 
-   spin_unlock_irqrestore(irq_ctx-lock, flags);
-
if (ret == IRQ_HANDLED)
eventfd_signal(irq_ctx-trigger, 1);
 
@@ -177,6 +172,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static irqreturn_t vfio_handler(int irq, void *dev_id)
+{
+   struct vfio_platform_irq *irq_ctx = dev_id;
+   unsigned long flags;
+   irqreturn_t ret;
+
+   spin_lock_irqsave(irq_ctx-lock, flags);
+   ret = irq_ctx-handler(irq, dev_id);
+   spin_unlock_irqrestore(irq_ctx-lock, flags);
+
+   return ret;
+}
+
 static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
int fd, irq_handler_t handler)
 {
@@ -206,9 +214,10 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
}
 
irq-trigger = trigger;
+   irq-handler = handler;
 
irq_set_status_flags(irq-hwirq, IRQ_NOAUTOEN);
-   ret = request_irq(irq-hwirq, handler, 0, irq-name, irq);
+   ret = request_irq(irq-hwirq, vfio_handler, 0, irq-name, irq);
if (ret) {
kfree(irq-name);
eventfd_ctx_put(trigger);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 5d31e04..eb91deb 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -37,6 +37,7 @@ struct vfio_platform_irq {
spinlock_t  lock;
struct virqfd   *unmask;
struct virqfd   *mask;
+   irqreturn_t (*handler)(int irq, void *dev_id);
 };
 
 struct vfio_platform_region {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 08/13] KVM: kvm-vfio: wrappers for vfio_external_{mask|is_active|set_automasked}

2015-03-19 Thread Eric Auger

Those 3 new wrapper functions call the respective VFIO external
functions.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4: creation
---
 include/linux/vfio.h |  8 +++-
 virt/kvm/vfio.c  | 44 
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 7aa6330..78c1202 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -108,14 +108,12 @@ extern struct device *vfio_external_base_device(struct 
vfio_device *vdev);
 struct vfio_platform_device;
 extern void vfio_external_mask(struct vfio_platform_device *vdev, int index);
 /*
- * returns whether the VFIO IRQ is active:
- * true if not yet deactivated at interrupt controller level or if
- * automasked (level sensitive IRQ). Unfortunately there is no way to
- * discriminate between handler auto-masking and user-space masking
+ * returns whether the VFIO IRQ is active at interrupt controller level
+ * or VFIO-masked. Note that if the use-space masked the IRQ index it
+ * cannot be discriminated from automasked handler situation.
  */
 extern bool vfio_external_is_active(struct vfio_platform_device *vdev,
int index);
-
 extern void vfio_external_set_automasked(struct vfio_platform_device *vdev,
 int index, bool automasked);
 
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 80a45e4..c995e51 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -134,6 +134,50 @@ static void kvm_vfio_put_vfio_device(struct vfio_device 
*vdev)
kvm_vfio_device_put_external_user(vdev);
 }
 
+bool kvm_vfio_external_is_active(struct vfio_platform_device *vpdev,
+int index)
+{
+   bool (*fn)(struct vfio_platform_device *, int index);
+   bool active;
+
+   fn = symbol_get(vfio_external_is_active);
+   if (!fn)
+   return -1;
+
+   active = fn(vpdev, index);
+
+   symbol_put(vfio_external_is_active);
+   return active;
+}
+
+void kvm_vfio_external_mask(struct vfio_platform_device *vpdev,
+   int index)
+{
+   void (*fn)(struct vfio_platform_device *, int index);
+
+   fn = symbol_get(vfio_external_mask);
+   if (!fn)
+   return;
+
+   fn(vpdev, index);
+
+   symbol_put(vfio_external_mask);
+}
+
+void kvm_vfio_external_set_automasked(struct vfio_platform_device *vpdev,
+ int index, bool automasked)
+{
+   void (*fn)(struct vfio_platform_device *, int index, bool automasked);
+
+   fn = symbol_get(vfio_external_set_automasked);
+   if (!fn)
+   return;
+
+   fn(vpdev, index, automasked);
+
+   symbol_put(vfio_external_set_automasked);
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
long (*fn)(struct vfio_group *, unsigned long);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 10/13] kvm: introduce kvm_arch_halt_guest and kvm_arch_resume_guest

2015-03-19 Thread Eric Auger

This API allows to
- exit the guest and avoid re-entering it
- resume the guest execution

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 include/linux/kvm_host.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae9c720..f4e1829 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1053,6 +1053,18 @@ extern struct kvm_device_ops kvm_xics_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
 
+#ifdef __KVM_HAVE_ARCH_HALT_GUEST
+
+void kvm_arch_halt_guest(struct kvm *kvm);
+void kvm_arch_resume_guest(struct kvm *kvm);
+
+#else
+
+inline void kvm_arch_halt_guest(struct kvm *kvm) {}
+inline void kvm_arch_resume_guest(struct kvm *kvm) {}
+
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v5 00/13] KVM-VFIO IRQ forward control

2015-03-19 Thread Eric Auger

This series proposes an integration of ARM: Forwarding physical
interrupts to a guest VM (http://lwn.net/Articles/603514/) in
KVM.

It enables to set/unset forwarding for a VFIO platform device IRQ.

A forwarded IRQ is deactivated by the guest and not by the host.
When the guest deactivates the associated virtual IRQ, the interrupt
controller automatically completes the physical IRQ. Obviously
this requires some HW support in the interrupt controller. This is
the case for ARM GICv2.

The direct benefit is that, for a level sensitive IRQ, a VM exit
can be avoided on forwarded IRQ completion.

When the IRQ is forwarded, the VFIO platform driver does not need to
mask the physical IRQ anymore before signaling the eventfd. Indeed
genirq lowers the running priority, enabling other physical IRQ to hit
except that one.

Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since deactivation is not trapped by KVM.

The current integration is based on an extension of the KVM-VFIO
device, previously used by KVM to interact with VFIO groups. The
patch series now enables KVM to directly interact with a VFIO
platform device. The VFIO external API was extended for that purpose.

The IRQ forward programming is architecture specific (virtual interrupt
controller programming basically). However the whole infrastructure is
kept generic.

from a user point of view, the functionality is provided through a
new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated
attributes:
- KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
- KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.

The capability can be checked with KVM_HAS_DEVICE_ATTR.

Forwarding must be activated when the VFIO IRQ is not active at
physical level or being under injection into the guest (VFIO masked)
Forwarding can be unset at any time.

---

This patch series has the following dependencies:
- ARM: Forwarding physical interrupts to a guest VM
  (http://lwn.net/Articles/603514/)
- [PATCH v4 0/3] genirq: Saving/restoring the irqchip state of an irq line
- [RFC v2] chip/vgic adaptations for forwarded irq

Integrated pieces can be found at
ssh://git.linaro.org/people/eric.auger/linux.git
on branch vfio_integ_v11_kernel

This was tested on Calxeda Midway, assigning the xgmac main IRQ.
Unforward was tested doing periodic forward/unforward with random offsets,
while using netcat traffic to make sure unforward often occurs while the
IRQ is in progress.

v4 - v5:
- fix arm64 compilation issues
  - arch/arm64/include/asm/kvm_host.h now defines
x __KVM_HAVE_ARCH_KVM_VFIO_FORWARD for arm64
x __KVM_HAVE_ARCH_HALT_GUEST
x and features pause renamed into power_off

v3 - v4:
- revert as RFC again due to lots of changes, extra complexity induced
  by new set/unset_forward implementation, and dependencies on RFC patches
- kvm_vfio_dev_irq struct is used at user level to pass the parameters
  to KVM-VFIO KVM_DEV_VFIO_DEVICE/KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. Shared
  with Intel posted IRQs.
- unforward now can happen any time with no constraint and cannot fail
- new VFIO platform external functions introduced:
  vfio_externl_set_automasked, vfio_external_mask, vfio_external_is_active,
- introduce a modality to force guest to exit  prevent it from being
  re-entered and rename older ARM pause modality into power-off
  (related to PSCI power-off start)
- kvm_vfio_arm.c no more exists. architecture specific code is moved into
  arm/gic.c. This code is not that much VFIO dependent anymore. Although
  some references still exit in comments.
- 2 separate architecture specific functions for set and unset (only one
  has a return value).

v2 - v3:
- kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does not
  exist anymore)
- a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it
  linkable: kvm_vfio_fwd_irq_node
- kvm_fwd_irq now is self-contained (includes struct vfio_device *)
- a single list of kvm_vfio_fwd_irq_irq_node is used instead of having
  a list of devices and a list of forward irq per device. Having 2 lists
  brought extra complexity.
- the VFIO device ref counter is incremented each time a new IRQ is forwarded.
  It is not attempted anymore to hold a single reference whatever the number
  of forwarded IRQs.
- subindex added on top of index to be closer to VFIO API
- platform device check moved in the arm specific implementation
- enable the KVM-VFIO device for arm64
- forwarded state change only can happen while the VFIO IRQ handler is not
  set; in other words, when the VFIO IRQ signaling is not set.

v1 - v2:
- forward control is moved from architecture specific file into generic
  vfio.c module.
  only kvm_arch_set_fwd_state remains architecture specific
- integrate Kim's patch which enables KVM-VFIO for ARM
- fix vgic state bypass in vgic_queue_hwirq
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h

Re: [GSoC] project proposal

2015-03-19 Thread Catalin Vasile

I have submitted my application on the official GSoC site.
Do I also have to submit it on this discussion list or anywhere else?

On Wed, Mar 18, 2015 at 10:59 PM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 18/03/2015 18:05, Catalin Vasile wrote:
 cryptodev is not merged into upstream from what I know.

 Yes, but QEMU runs on non-Linux platforms too.  Of course doing
 vhost+driver or gnutls+driver would be already more than enough for the
 summer.

 In any case, just put all the justification in your application.  Thanks
 for participating to QEMU's GSoC!

 Paolo

 gnutls can use cryptodev and AF_ALG as crypto engines.
 From some benchmarks (that can also be found on cryptodev's webpage)
 you can see AF_ALG has a lot overhead over a standalone misc/char
 device.

 On Wed, Mar 18, 2015 at 6:42 PM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 18/03/2015 17:01, Catalin Vasile wrote:
 To be more exact, I want to make a virtio-crypto device to emulate a
 virtual cryptographic offloading device that will send jobs from the
 guest to a vhost that will process the jobs. This mechanism will link
 CryptoAPI from the guest to the CryptoAPI from the host. This way,
 whatever it's beneath CryptoAPI from the host will be used as
 offloading for the guest.
 Is there a mentor interested in getting involved in this kind of project?

 I think it's very likely that you'll find a mentor.  Please submit a
 proposal, also detailing the advantage of vhost over a userspace
 solution (using any of gnutls, AF_ALG, cryptodev).

 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: Split virqfd into a separate module for vfio bus drivers

2015-03-19 Thread Paul Bolle

On Thu, 2015-03-19 at 14:12 -0600, Alex Williamson wrote:
 Well, at some point when I was doing vfio it seemed like a good idea and
 I copied it from another driver.

For some time now I have this idea that there's a Linux kernel module
template somewhere that uses defines like the ones you're using. That
all started when I noticed drivers defining DRIVER_LICENSE. (I really
like the name of that define.) Because every driver defining it ends up
using it only once.

 Is it more valuable to remove a few lines of source code with no net
 effect on the resulting output?

We should discuss this from the opposite direction: why is this patch
adding a few lines with no obvious benefit?

 Besides, look at how much more aesthetically pleasing the above is
 versus this:
 
 MODULE_VERSION(0.1);
 MODULE_LICENSE(GPL v2);
 MODULE_AUTHOR(Alex Williamson alex.william...@redhat.com);
 MODULE_DESCRIPTION(IRQFD support for VFIO bus drivers);
 
 ;)

I don't do smileys. Perhaps that's why I never know what to think when
someone uses them. Anyhow, sure, my comment is extremely trivial, but I
do think I should raise this point just once. Because, well, ...
because!


Paul Bolle

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: inline kvm_ioapic_handles_vector()

2015-03-19 Thread Radim Krčmář

An overhead from function call is not appropriate for its size and
frequency of execution.

Suggested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
  I'm not very fond of that smp_rmb(): there is no real synchronization
  against update_handled_vectors(), so the only point I see is to drop
  cached value of handled_vectors, which seems like bad use of LFENCE.

  Am I missing something?

  Thanks.

 arch/x86/kvm/ioapic.c | 7 ---
 arch/x86/kvm/ioapic.h | 8 +++-
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 1522bab6bcff..f04986e1a0b0 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -475,13 +475,6 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
}
 }
 
-bool kvm_ioapic_handles_vector(struct kvm *kvm, int vector)
-{
-   struct kvm_ioapic *ioapic = kvm-arch.vioapic;
-   smp_rmb();
-   return test_bit(vector, ioapic-handled_vectors);
-}
-
 void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
 {
struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 38d8402ea65c..6e265cfcd86a 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -98,13 +98,19 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm 
*kvm)
return kvm-arch.vioapic;
 }
 
+static inline bool kvm_ioapic_handles_vector(struct kvm *kvm, int vector)
+{
+   struct kvm_ioapic *ioapic = kvm-arch.vioapic;
+   smp_rmb();
+   return test_bit(vector, ioapic-handled_vectors);
+}
+
 void kvm_rtc_eoi_tracking_restore_one(struct kvm_vcpu *vcpu);
 bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, unsigned int dest, int dest_mode);
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
 void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector,
int trigger_mode);
-bool kvm_ioapic_handles_vector(struct kvm *kvm, int vector);
 int kvm_ioapic_init(struct kvm *kvm);
 void kvm_ioapic_destroy(struct kvm *kvm);
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
-- 
2.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread jacob jacob

I have updated to latest firmware and still no luck.


]# ethtool -i eth1
driver: i40e
version: 1.2.37
firmware-version: f4.33.31377 a1.2 n4.42 e1930
bus-info: :00:05.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


Seeing similar results as before :
1)Everything works fine on host (used i40e version 1.2.37 and dpdk 1.8.0)

2)In vm tried both i40e driver version 1.2.37 and dpdk 1.8.0 and data
tx  fails (I have 2 40G interfaces passed through to a VM):See the
following error now in the VM which looks interesting...

[5.449672] i40e :00:06.0: f4.33.31377 a1.2 n4.42 e1930
[5.525061] i40e :00:06.0: FCoE capability is disabled
[5.528786] i40e :00:06.0: MAC address: 68:05:ca:2e:80:50
[5.534491] i40e :00:06.0: SAN MAC: 68:05:ca:2e:80:54
[5.544081] i40e :00:06.0: AQ Querying DCB configuration failed: aq_err 1
[5.545870] i40e :00:06.0: DCB init failed -53, disabled
[5.547462] i40e :00:06.0: fcoe queues = 0
[5.548970] i40e :00:06.0: irq 43 for MSI/MSI-X
[5.548987] i40e :00:06.0: irq 44 for MSI/MSI-X
[5.549012] i40e :00:06.0: irq 45 for MSI/MSI-X
[5.549028] i40e :00:06.0: irq 46 for MSI/MSI-X
[5.549044] i40e :00:06.0: irq 47 for MSI/MSI-X
[5.549059] i40e :00:06.0: irq 48 for MSI/MSI-X
[5.549074] i40e :00:06.0: irq 49 for MSI/MSI-X
[5.549089] i40e :00:06.0: irq 50 for MSI/MSI-X
[5.549103] i40e :00:06.0: irq 51 for MSI/MSI-X
[5.549117] i40e :00:06.0: irq 52 for MSI/MSI-X
[5.549132] i40e :00:06.0: irq 53 for MSI/MSI-X
[5.549146] i40e :00:06.0: irq 54 for MSI/MSI-X
[5.549160] i40e :00:06.0: irq 55 for MSI/MSI-X
[5.549174] i40e :00:06.0: irq 56 for MSI/MSI-X
[5.579062] i40e :00:06.0: enabling bridge mode: VEB
[5.615344] i40e :00:06.0: PHC enabled
[5.636028] i40e :00:06.0: PCI-Express: Speed 8.0GT/s Width x8
[5.639822] audit: type=1305 audit(1426797692.463:4): audit_pid=345
old=0 auid=4294967295 ses=4294967295
subj=system_u:system_r:auditd_t:s0 res=1
[5.651225] i40e :00:06.0: Features: PF-id[0] VFs: 128 VSIs:
130 QP: 4 RX: 1BUF RSS FD_ATR FD_SB NTUPLE PTP
[   12.720451] SELinux: initialized (dev tmpfs, type tmpfs), uses
transition SIDs
[   15.909477] SELinux: initialized (dev tmpfs, type tmpfs), uses
transition SIDs
[   61.553491] i40e :00:06.0 eth2: NIC Link is Down
[   61.554132] i40e :00:06.0 eth2: the driver failed to link
because an unqualified module was detected. 
[   61.555331] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready



With dpdk, see the following output in the VM:

testpmd stop
Telling cores to stop...
Waiting for lcores to finish...

  -- Forward statistics for port 0  --
  RX-packets: 41328971   RX-dropped: 0 RX-total: 41328971
  TX-packets: 0  TX-dropped: 0 TX-total: 0
  

  -- Forward statistics for port 1  --
  RX-packets: 0  RX-dropped: 0 RX-total: 0
  TX-packets: 41328972   TX-dropped: 0 TX-total: 41328972
  

  +++ Accumulated forward statistics for all ports+++
  RX-packets: 41328971   RX-dropped: 0 RX-total: 41328971
  TX-packets: 41328972   TX-dropped: 0 TX-total: 41328972
  


Here it can be seen that one of the ports transmits just fine.  I have
verified that it is not a card,pci port or any such hw issues..

On Thu, Mar 19, 2015 at 4:15 AM, Stefan Assmann sassm...@redhat.com wrote:
 On 18.03.2015 23:06, Shannon Nelson wrote:
 On Wed, Mar 18, 2015 at 3:01 PM, Shannon Nelson
 shannon.nel...@intel.com wrote:


 On Wed, Mar 18, 2015 at 8:40 AM, jacob jacob opstk...@gmail.com wrote:

 On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:

 Actually, Stefan suggests that support for this card is still sketchy
 and your best bet is to try out net-next
 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

 Also, could you please post more information about your hardware setup
 (chipset/processor/firmware version on the card etc) ?

 Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz

 Manufacturer Part Number:  XL710QDA1BLK
 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
 40GbE QSFP+ (rev 01)
  #ethtool -i enp9s0
 driver: i40e
 version: 1.2.6-k
 firmware-version: f4.22 a1.1 n04.24 e800013fd
 bus-info: :09:00.0
 supports-statistics: yes
 supports-test: yes
 supports-eeprom-access: yes
 supports-register-dump: yes
 supports-priv-flags: no


 Jacob,

 It looks like you're using a NIC with the

Re: [PATCH 9/9] qspinlock,x86,kvm: Implement KVM support for paravirt qspinlock

2015-03-19 Thread Waiman Long


On 03/19/2015 06:01 AM, Peter Zijlstra wrote:

On Wed, Mar 18, 2015 at 10:45:55PM -0400, Waiman Long wrote:

On 03/16/2015 09:16 AM, Peter Zijlstra wrote:
I do have some concern about this call site patching mechanism as the
modification is not atomic. The spin_unlock() calls are in many places in
the kernel. There is a possibility that a thread is calling a certain
spin_unlock call site while it is being patched by another one with the
alternative() function call.

So far, I don't see any problem with bare metal where paravirt_patch_insns()
is used to patch it to the move instruction. However, in a virtual guest
enivornment where paravirt_patch_call() was used, there were situations
where the system panic because of page fault on some invalid memory in the
kthread. If you look at the paravirt_patch_call(), you will see:

 :
b-opcode = 0xe8; /* call */
b-delta = delta;

If another CPU reads the instruction at the call site at the right moment,
it will get the modified call instruction, but not the new delta value. It
will then jump to a random location. I believe that was causing the system
panic that I saw.

So I think it is kind of risky to use it here unless we can guarantee that
call site patching is atomic wrt other CPUs.

Just look at where the patching is done:

init/main.c:start_kernel()
   check_bugs()
 alternative_instructions()
   apply_paravirt()

We're UP and not holding any locks, disable IRQs (see text_poke_early())
and have NMIs 'disabled'.


You are probably right. The initial apply_paravirt() was done before the 
SMP boot. Subsequent ones were at kernel module load time. I put a 
counter in the __native_queue_spin_unlock() and it registered 26949 
unlock calls in a 16-cpu guest before it got patched out.


The panic that I observed before might be due to some coding error of my 
own.


-Longman
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-19 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=93251

Bandan Das b...@makefile.in changed:

   What|Removed |Added

 CC||b...@makefile.in

--- Comment #4 from Bandan Das b...@makefile.in ---
Doe(In reply to Thomas Stein from comment #3)
...
 Is anyone working on this?
 
 thanks and best regards
 t.

Does reverting the commit suggested in comment 2 work for you ?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread Shannon Nelson

On Thu, Mar 19, 2015 at 2:04 PM, jacob jacob opstk...@gmail.com wrote:
 I have updated to latest firmware and still no luck.

[...]

 [   61.554132] i40e :00:06.0 eth2: the driver failed to link
 because an unqualified module was detected. 
 [   61.555331] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready


So I assume you're getting traffic on the other port and it doesn't
complain about unqualified module?  Does the problem move if you
swap the cables?  The usual problem here is a QSFP connector that
isn't compatible with the NIC.  I don't have a pointer handy to the
official list, but you should be able to get that from your NIC
supplier.

sln
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-19 Thread Marcelo Tosatti

On Wed, Mar 18, 2015 at 07:38:22PM +0100, Radim Krčmář wrote:
 kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
 We need to do that for irq notifiers.  (Like with edge interrupts.)
 
 Fix it by skipping EOI broadcast only.
 
 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  arch/x86/kvm/ioapic.c | 4 +++-
  arch/x86/kvm/lapic.c  | 3 +--
  2 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
 index b1947e0f3e10..46d4449772bc 100644
 --- a/arch/x86/kvm/ioapic.c
 +++ b/arch/x86/kvm/ioapic.c
 @@ -422,6 +422,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
  {
   int i;
 + struct kvm_lapic *apic = vcpu-arch.apic;
  
   for (i = 0; i  IOAPIC_NUM_PINS; i++) {
   union kvm_ioapic_redirect_entry *ent = ioapic-redirtbl[i];
 @@ -443,7 +444,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   kvm_notify_acked_irq(ioapic-kvm, KVM_IRQCHIP_IOAPIC, i);
   spin_lock(ioapic-lock);
  
 - if (trigger_mode != IOAPIC_LEVEL_TRIG)
 + if (trigger_mode != IOAPIC_LEVEL_TRIG ||
 + kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI)
   continue;

Don't you have to handle kvm_ioapic_eoi_inject_work as well?

   ASSERT(ent-fields.trig_mode == IOAPIC_LEVEL_TRIG);

This assert can now fail? 

 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index bd4e34de24c7..4ee827d7bf36 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -833,8 +833,7 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct 
 kvm_vcpu *vcpu2)
  
  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
  {
 - if (!(kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI) 
 - kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
 + if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
   int trigger_mode;
   if (apic_test_vector(vector, apic-regs + APIC_TMR))
   trigger_mode = IOAPIC_LEVEL_TRIG;
 -- 
 2.3.3
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread jacob jacob

On Thu, Mar 19, 2015 at 5:42 PM, Shannon Nelson
shannon.nel...@intel.com wrote:
 On Thu, Mar 19, 2015 at 2:04 PM, jacob jacob opstk...@gmail.com wrote:
 I have updated to latest firmware and still no luck.

 [...]

 [   61.554132] i40e :00:06.0 eth2: the driver failed to link
 because an unqualified module was detected. 
 [   61.555331] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready


 So I assume you're getting traffic on the other port and it doesn't
 complain about unqualified module?  Does the problem move if you
 swap the cables?  The usual problem here is a QSFP connector that
 isn't compatible with the NIC.  I don't have a pointer handy to the
 official list, but you should be able to get that from your NIC
 supplier.

No. That is not the case.

The traffic counters are for the dpdk test run which uses igb_uio driver.

As i mentioned earlier, have tried all combinations of cable, pci slot
, card etc to isolate any hw issues.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-19 Thread Shannon Nelson

On Thu, Mar 19, 2015 at 1:15 AM, Stefan Assmann sassm...@redhat.com wrote:
 Interesting, the following might explain why my XL710 feels a bit
 sketchy then. ;-)
 # ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.22.26225 a1.1 n4.24 e12ef
 Looks like the firmware on this NIC is even older.

 I tried to update the firmware with nvmupdate64e and the first thing I
 noticed is that you cannot update the firmware even with todays linux
 git. The tool errors out because it cannot access the NVM. Only with a
 recent net-next kernel I was able to update the firmware.
 ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.33.31377 a1.2 n4.42 e1932

 However during the update I got a lot of errors in dmesg.
 [  301.796664] i40e :82:00.0: ARQ Error: Unknown event 0x0702 received
 [  301.893933] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  302.005223] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  387.884635] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  387.896862] i40e :82:00.0: ARQ Overflow Error detected
 [  387.902995] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received

These are bogus messages should not have been generated, but it seems
there's a case statement missing in the upstream code.  I'll get a
patch together for this later today.

 [...]
 [  391.583799] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.714217] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.842656] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.973080] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.107586] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.244140] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.373966] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received

These are an annoying result of a user tool issue.


 Not sure if that flash was actually successful or not.

Since the driver was able to restart and give you the version
information from ethtool -i, the update was successful.  You are not
supposed to have to powercycle your system after the NVM update, but
it's a good idea anyway, especially when updating the older firmware
like what you had.

sln
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/12] KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC

2015-03-19 Thread Andre Przywara

Hej Christoffer,

On 14/03/15 14:27, Christoffer Dall wrote:
 On Fri, Mar 13, 2015 at 04:10:08PM +, Andre Przywara wrote:
 Currently we use a lot of VGIC specific code to do the MMIO
 dispatching.
 Use the previous reworks to add kvm_io_bus style MMIO handlers.

 Those are not yet called by the MMIO abort handler, also the actual
 VGIC emulator function do not make use of it yet, but will be enabled
 with the following patches.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h |9 
  virt/kvm/arm/vgic.c|  111 
 
  virt/kvm/arm/vgic.h|7 +++
  3 files changed, 127 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index b81630b..4bfc6a3 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -24,6 +24,7 @@
  #include linux/irqreturn.h
  #include linux/spinlock.h
  #include linux/types.h
 +#include kvm/iodev.h

  #define VGIC_NR_IRQS_LEGACY 256
  #define VGIC_NR_SGIS16
 @@ -147,6 +148,14 @@ struct vgic_vm_ops {
  int (*map_resources)(struct kvm *, const struct vgic_params *);
  };

 +struct vgic_io_device {
 +gpa_t addr;
 +int len;
 +const struct vgic_io_range *reg_ranges;
 +struct kvm_vcpu *redist_vcpu;
 +struct kvm_io_device dev;
 +};
 +
  struct vgic_dist {
  spinlock_t  lock;
  boolin_kernel;
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 7aae19b..71389b8 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -32,6 +32,8 @@
  #include asm/kvm_arm.h
  #include asm/kvm_mmu.h
  #include trace/events/kvm.h
 +#include asm/kvm.h
 +#include kvm/iodev.h

  /*
   * How the whole thing works (courtesy of Christoffer Dall):
 @@ -774,6 +776,66 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, 
 struct kvm_run *run,
  }

  /**
 + * vgic_handle_mmio_access - handle an in-kernel MMIO access
 + * This is called by the read/write KVM IO device wrappers below.
 + * @vcpu:   pointer to the vcpu performing the access
 + * @this:   pointer to the KVM IO device in charge
 + * @addr:   guest physical address of the access
 + * @len:size of the access
 + * @val:pointer to the data region
 + * @is_write:   read or write access
 + *
 + * returns true if the MMIO access could be performed
 + */
 +static int vgic_handle_mmio_access(struct kvm_vcpu *vcpu,
 +   struct kvm_io_device *this, gpa_t addr,
 +   int len, void *val, bool is_write)
 +{
 +struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +struct vgic_io_device *iodev = container_of(this,
 +struct vgic_io_device, dev);
 +struct kvm_run *run = vcpu-run;
 +const struct vgic_io_range *range;
 +struct kvm_exit_mmio mmio;
 +bool updated_state;
 +gpa_t offset;
 +
 +offset = addr - iodev-addr;
 +range = vgic_find_range(iodev-reg_ranges, len, offset);
 +if (unlikely(!range || !range-handle_mmio)) {
 +pr_warn(Unhandled access %d %08llx %d\n, is_write, addr, len);
 +return -ENXIO;
 +}
 +
 +mmio.phys_addr = addr;
 +mmio.len = len;
 +mmio.is_write = is_write;
 +if (is_write)
 +memcpy(mmio.data, val, len);
 +mmio.private = iodev-redist_vcpu;
 +
 +spin_lock(dist-lock);
 +offset -= range-base;
 +if (vgic_validate_access(dist, range, offset)) {
 +updated_state = call_range_handler(vcpu, mmio, offset, range);
 +if (!is_write)
 +memcpy(val, mmio.data, len);
 +} else {
 +if (!is_write)
 +memset(val, 0, len);
 +updated_state = false;
 +}
 +spin_unlock(dist-lock);
 +kvm_prepare_mmio(run, mmio);

 we're not the only user of kvm_exit_mmio I believe, so we could rename

(assuming you mean we _are_ the only user here, which I can acknowledge)

 this to vgic_io as well and you could change the mmio.data array to be a
 void *val pointer, which just gets set to the pointer passed into this
 function (which I think points to the kvm_run structs data array) and
 you can avoid all these memcopies, right?

That sounds indeed tempting, but the comment on the struct kvm_exit_mmio
declaration reads:
/*
 * The in-kernel MMIO emulation code wants to use a copy of run-mmio,
 * which is an anonymous type. Use our own type instead.
 */
How I understand this the structure was introduced to _not_ use the same
memory, but use a copy instead. Do you remember any reason for this? And
in how far is this type anonymous? It's even in an uapi header.

Briefly looking at the code we do quite some memcpy on the way.
I am about to go all the way down into that ARM MMIO handling cave now
to check this (Marc, if I am not showing up again after some hours,
please come and rescue me ;-)

Cheers,
Andre.

-- IMPORTANT NOTICE: The contents of this email and

52 matches

Mail list logo