Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

2015-02-12 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/12/2015 12:00 PM, Frederic Weisbecker wrote:
 On Thu, Feb 12, 2015 at 10:47:10AM -0500, Rik van Riel wrote:
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
 
 On 02/12/2015 10:42 AM, Frederic Weisbecker wrote:
 On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
 If exception_enter happens when already in IN_KERNEL state,
 the code still calls context_tracking_exit, which ends up in
  rcu_eqs_exit_common, which explodes with a WARN_ON when it
 is called in a situation where dynticks are not enabled.
 
 Fortunately context_tracking_exit() already has a current_state
 == IN_KERNEL check so this shouldn't be a problem.
 
 No, it had a hard-coded current_state == IN_USER check, which
 is very close, but ...
 
 ... I replaced that with a state argument, and forgot to ensure
 that it never gets called with state == IN_KERNEL. This patch
 fixes that.
 
 Ah that's right! Well I'm going to merge this patch to 1/5 then to 
 avoid breaking bisection.

Thank you, Frederic!

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJU3OeDAAoJEM553pKExN6D7BsIAJ8CKC73jQ8T5Dqa/tlHV7Db
QFSJdpxP+7jCZwssehgpjpxCwtJ0UvGgle5OwX/POUhmagHxHmxVydOBz+xfYdBr
UuGkEl5TL+oyoMUr80Q4RTnJSZN08zi+THqiv33tyPUj6cNiycBZAuho3ELTRNOA
bRcHrMW+xd95uqoung7dSKrgA2jcym3+umNGnQb0gniraqcNLAmWs+jfAO8yZLJg
vk8bIKed6epQ3n6gcdYe0A28cLOuBvjEs5JNcEPxujY/349sjitKR2pLQ6HsfHLV
frlKsh7qQIRtoUJLO9ZBBDtGrmThwBwH8rw+GcVR8zviPNvV4IRrx47VBcHDWjc=
=mwFO
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-12 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #10 from Mark kernelbugzilla.org.mark...@dfgh.net ---
well thanks very much guys, I'll cautiously say that a patched guest kernel
seems to resolve it :-)

the bug seemed to appear even when the host is untainted; serial log says
[  375.989736] divide error:  [#1] SMP 

that's a Kubuntu partition that manages without the proprietary drivers while
Fedora won't seem to give graphics properly with nouveau;

not entirely satisfactory of course given that I've now had to modify a virtual
machine that it was important to try to avoid modifying for testing purposes;
although definitely better than crashing :-)

do we keep the bug report open so that the host kvm handling bug gets fixed too
/ is this the right place to report that?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] KVM: x86: avoid logical_map when it is invalid

2015-02-12 Thread Radim Krčmář
We want to support mixed modes and the easiest solution is to avoid
optimizing those weird and unlikely scenarios.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v2
 - optimize and name a check for valid map [Paolo]
 - don't use cluster id with invalid map [Paolo]
 - define KVM_APIC_MODE_* closer to kvm_apic_map
   (personal preference)

 arch/x86/include/asm/kvm_host.h |  5 +
 arch/x86/kvm/lapic.c| 24 +++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ea673e18b20f..63a4e03fbdcf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -550,8 +550,13 @@ struct kvm_arch_memory_slot {
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
+#define KVM_APIC_MODE_XAPIC_CLUSTER  4
+#define KVM_APIC_MODE_XAPIC_FLAT 8
+#define KVM_APIC_MODE_X2APIC16
+
 struct kvm_apic_map {
struct rcu_head rcu;
+   u8 mode;
u8 ldr_bits;
/* fields bellow are used to decode ldr values in different modes */
u32 cid_shift, cid_mask, lid_mask;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 729bd7714790..88e2bf3be235 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -133,6 +133,14 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
 }
 
+/* The logical map is definitely wrong if we have multiple
+ * modes at the same time.  (Physical map is always right.)
+ */
+static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
+{
+   return !(map-mode  (map-mode - 1));
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
struct kvm_apic_map *new, *old = NULL;
@@ -162,16 +170,19 @@ static void recalculate_apic_map(struct kvm *kvm)
new-ldr_bits = 32;
new-cid_shift = 16;
new-cid_mask = new-lid_mask = 0x;
+   new-mode |= KVM_APIC_MODE_X2APIC;
} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
if (kvm_apic_get_reg(apic, APIC_DFR) ==
APIC_DFR_CLUSTER) {
new-cid_shift = 4;
new-cid_mask = 0xf;
new-lid_mask = 0xf;
+   new-mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
} else {
new-cid_shift = 8;
new-cid_mask = 0;
new-lid_mask = 0xff;
+   new-mode |= KVM_APIC_MODE_XAPIC_FLAT;
}
}
 
@@ -198,6 +209,10 @@ static void recalculate_apic_map(struct kvm *kvm)
 
if (aid  ARRAY_SIZE(new-phys_map))
new-phys_map[aid] = apic;
+
+   if (!kvm_apic_logical_map_valid(new));
+   continue;
+
if (lid  cid  ARRAY_SIZE(new-logical_map))
new-logical_map[cid][ffs(lid) - 1] = apic;
}
@@ -715,7 +730,14 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
dst = map-phys_map[irq-dest_id];
} else {
u32 mda = irq-dest_id  (32 - map-ldr_bits);
-   u16 cid = apic_cluster_id(map, mda);
+   u16 cid;
+
+   if (!kvm_apic_logical_map_valid(map)) {
+   ret = false;
+   goto out;
+   }
+
+   cid = apic_cluster_id(map, mda);
 
if (cid = ARRAY_SIZE(map-logical_map))
goto out;
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] KVM: x86: use MDA for interrupt matching

2015-02-12 Thread Radim Krčmář
In mixed modes, we musn't deliver xAPIC IPIs like x2APIC and vice versa.
Instead of preserving the information in apic_send_ipi(), we regain it
by converting all destinations into correct MDA in the slow path.
This allows easier reasoning about subsequent matching.

Our kvm_apic_broadcast() had an interesting design decision: it didn't
consider IOxAPIC 0xff as broadcast in x2APIC mode ...
everything worked because IOxAPIC can't set that in physical mode and
logical mode considered it as a message for first 8 VCPUs.
This patch interprets IOxAPIC 0xff as x2APIC broadcast.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v2
 - accept two 'struct lapic *' in kvm_apic_mda() for nicer code [Paolo]
 - removed XXX comment about checking if x2APIC broadcast is accepted
   (it is, and the code also accepts any x2APIC message starting with
0xff, like v1, for simplicity.)


 arch/x86/kvm/lapic.c | 40 +---
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 86609c15726f..4812bde5090c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -585,15 +585,23 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
apic_update_ppr(apic);
 }
 
-static bool kvm_apic_broadcast(struct kvm_lapic *apic, u32 dest)
+static bool kvm_apic_broadcast(struct kvm_lapic *apic, u32 mda)
 {
-   return dest == (apic_x2apic_mode(apic) ?
-   X2APIC_BROADCAST : APIC_BROADCAST);
+   if (apic_x2apic_mode(apic))
+   return mda == X2APIC_BROADCAST;
+
+   return GET_APIC_DEST_FIELD(mda) == APIC_BROADCAST;
 }
 
-static bool kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)
+static bool kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 mda)
 {
-   return kvm_apic_id(apic) == dest || kvm_apic_broadcast(apic, dest);
+   if (kvm_apic_broadcast(apic, mda))
+   return true;
+
+   if (apic_x2apic_mode(apic))
+   return mda == kvm_apic_id(apic);
+
+   return mda == SET_APIC_DEST_FIELD(kvm_apic_id(apic));
 }
 
 static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
@@ -610,6 +618,7 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic 
*apic, u32 mda)
(logical_id  mda  0x) != 0;
 
logical_id = GET_APIC_LOGICAL_ID(logical_id);
+   mda = GET_APIC_DEST_FIELD(mda);
 
switch (kvm_apic_get_reg(apic, APIC_DFR)) {
case APIC_DFR_FLAT:
@@ -624,10 +633,27 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic 
*apic, u32 mda)
}
 }
 
+/* KVM APIC implementation has two quirks
+ *  - dest always begins at 0 while xAPIC MDA has offset 24,
+ *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
+ */
+static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
+  struct kvm_lapic *target)
+{
+   bool ipi = source != NULL;
+   bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
+
+   if (!ipi  dest_id == APIC_BROADCAST  x2apic_mda)
+   return X2APIC_BROADCAST;
+
+   return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
+}
+
 bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
   int short_hand, unsigned int dest, int dest_mode)
 {
struct kvm_lapic *target = vcpu-arch.apic;
+   u32 mda = kvm_apic_mda(dest, source, target);
 
apic_debug(target %p, source %p, dest 0x%x, 
   dest_mode 0x%x, short_hand 0x%x\n,
@@ -637,9 +663,9 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct 
kvm_lapic *source,
switch (short_hand) {
case APIC_DEST_NOSHORT:
if (dest_mode == APIC_DEST_PHYSICAL)
-   return kvm_apic_match_physical_addr(target, dest);
+   return kvm_apic_match_physical_addr(target, mda);
else
-   return kvm_apic_match_logical_addr(target, dest);
+   return kvm_apic_match_logical_addr(target, mda);
case APIC_DEST_SELF:
return target == source;
case APIC_DEST_ALLINC:
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] KVM: x86: simplify kvm_apic_map

2015-02-12 Thread Radim Krčmář
recalculate_apic_map() uses two passes over all VCPUs.  This is a relic
from time when we selected a global mode in the first pass and set up
the optimized table in the second pass (to have a consistent mode).

Recent changes made mixed mode unoptimized and we can do it in one pass.
Format of logical MDA is a function of the mode, so we encode it in
apic_logical_id() and drop obsoleted variables from the struct.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v2
 - use different optimization to shorten apic_logical_id [Paolo]
 - move apic_logical_id to lapic.c [Paolo]
 - keep apic map validity check separate from apic_logical_id
   (It can be done now as it's optimized in a different way.)
 - rename apic_logical_id parameter from ldr to dest_id
   (To avoid hinting that it has anything to do with real APIC.)

 arch/x86/include/asm/kvm_host.h |  3 --
 arch/x86/kvm/lapic.c| 75 ++---
 arch/x86/kvm/lapic.h| 15 -
 3 files changed, 25 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 63a4e03fbdcf..c55dcaa5aead 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -557,9 +557,6 @@ struct kvm_arch_memory_slot {
 struct kvm_apic_map {
struct rcu_head rcu;
u8 mode;
-   u8 ldr_bits;
-   /* fields bellow are used to decode ldr values in different modes */
-   u32 cid_shift, cid_mask, lid_mask;
struct kvm_lapic *phys_map[256];
/* first index is cluster id second is cpu id in a cluster */
struct kvm_lapic *logical_map[16][16];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 88e2bf3be235..3661f79642c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -141,6 +141,17 @@ static inline bool kvm_apic_logical_map_valid(struct 
kvm_apic_map *map)
return !(map-mode  (map-mode - 1));
 }
 
+static inline void
+apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
+{
+   BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_CLUSTER !=  4);
+   BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_FLAT!=  8);
+   BUILD_BUG_ON(KVM_APIC_MODE_X2APIC!= 16);
+
+   *cid = dest_id  map-mode;
+   *lid = dest_id  ((1  map-mode) - 1);
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
struct kvm_apic_map *new, *old = NULL;
@@ -154,49 +165,6 @@ static void recalculate_apic_map(struct kvm *kvm)
if (!new)
goto out;
 
-   new-ldr_bits = 8;
-   /* flat mode is default */
-   new-cid_shift = 8;
-   new-cid_mask = 0;
-   new-lid_mask = 0xff;
-
-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   struct kvm_lapic *apic = vcpu-arch.apic;
-
-   if (!kvm_apic_present(vcpu))
-   continue;
-
-   if (apic_x2apic_mode(apic)) {
-   new-ldr_bits = 32;
-   new-cid_shift = 16;
-   new-cid_mask = new-lid_mask = 0x;
-   new-mode |= KVM_APIC_MODE_X2APIC;
-   } else if (kvm_apic_get_reg(apic, APIC_LDR)) {
-   if (kvm_apic_get_reg(apic, APIC_DFR) ==
-   APIC_DFR_CLUSTER) {
-   new-cid_shift = 4;
-   new-cid_mask = 0xf;
-   new-lid_mask = 0xf;
-   new-mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
-   } else {
-   new-cid_shift = 8;
-   new-cid_mask = 0;
-   new-lid_mask = 0xff;
-   new-mode |= KVM_APIC_MODE_XAPIC_FLAT;
-   }
-   }
-
-   /*
-* All APICs have to be configured in the same mode by an OS.
-* We take advatage of this while building logical id loockup
-* table. After reset APICs are in software disabled mode, so if
-* we find apic with different setting we assume this is the 
mode
-* OS wants all apics to be in; build lookup table accordingly.
-*/
-   if (kvm_apic_sw_enabled(apic))
-   break;
-   }
-
kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu-arch.apic;
u16 cid, lid;
@@ -204,15 +172,25 @@ static void recalculate_apic_map(struct kvm *kvm)
 
aid = kvm_apic_id(apic);
ldr = kvm_apic_get_reg(apic, APIC_LDR);
-   cid = apic_cluster_id(new, ldr);
-   lid = apic_logical_id(new, ldr);
 
if (aid  ARRAY_SIZE(new-phys_map))
new-phys_map[aid] = apic;
 
-   if (!kvm_apic_logical_map_valid(new));
+   if (apic_x2apic_mode(apic)) {
+  

[PATCH v2 2/4] KVM: x86: fix mixed APIC mode broadcast

2015-02-12 Thread Radim Krčmář
Broadcast allowed only one global APIC mode, but mixed modes are
theoretically possible.  x2APIC IPI doesn't mean 0xff as broadcast,
the rest does.

x2APIC broadcasts are accepted by xAPIC.  If we take SDM to be logical,
even addreses beginning with 0xff should be accepted, but real hardware
disagrees.  This patch aims for simple code by considering most of real
behavior as undefined.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v2
 - don't deliver all phys messages beginning with 0xff to xAPIC

 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/lapic.c| 9 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a236e39cc385..ea673e18b20f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -554,7 +554,7 @@ struct kvm_apic_map {
struct rcu_head rcu;
u8 ldr_bits;
/* fields bellow are used to decode ldr values in different modes */
-   u32 cid_shift, cid_mask, lid_mask, broadcast;
+   u32 cid_shift, cid_mask, lid_mask;
struct kvm_lapic *phys_map[256];
/* first index is cluster id second is cpu id in a cluster */
struct kvm_lapic *logical_map[16][16];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4812bde5090c..729bd7714790 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -151,7 +151,6 @@ static void recalculate_apic_map(struct kvm *kvm)
new-cid_shift = 8;
new-cid_mask = 0;
new-lid_mask = 0xff;
-   new-broadcast = APIC_BROADCAST;
 
kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu-arch.apic;
@@ -163,7 +162,6 @@ static void recalculate_apic_map(struct kvm *kvm)
new-ldr_bits = 32;
new-cid_shift = 16;
new-cid_mask = new-lid_mask = 0x;
-   new-broadcast = X2APIC_BROADCAST;
} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
if (kvm_apic_get_reg(apic, APIC_DFR) ==
APIC_DFR_CLUSTER) {
@@ -687,6 +685,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic **dst;
int i;
bool ret = false;
+   bool x2apic_ipi = src  apic_x2apic_mode(src);
 
*r = -1;
 
@@ -698,15 +697,15 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
struct kvm_lapic *src,
if (irq-shorthand)
return false;
 
+   if (irq-dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
+   return false;
+
rcu_read_lock();
map = rcu_dereference(kvm-arch.apic_map);
 
if (!map)
goto out;
 
-   if (irq-dest_id == map-broadcast)
-   goto out;
-
ret = true;
 
if (irq-dest_mode == APIC_DEST_PHYSICAL) {
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)

2015-02-12 Thread Radim Krčmář
Each patch has a diff from v1, here is only a prologue on the mythical
mixed xAPIC and x2APIC mode:

There is one interesting alias in xAPIC and x2APIC ICR destination, the
0xff00, which is a broadcast in xAPIC and either a physical message
to high x2APIC ID or a message to an empty set in x2APIC cluster.

This corner case in mixed mode can be triggered by a weird message
 1) when we have no x2APIC ID 0xff00, but send x2APIC message there

or after something that isn't forbidden in SDM most likely because they
didn't think that anyone would ever consider it
 2) we have x2APIC ID 0xff00 and reset other x2APICs into xAPIC mode

Current KVM doesn't need to consider (2), so there only is a slim chance
that some hobbyist OS pushes the specification to its limits.

The problem is that SDM for xAPIC doesn't believably specify[1] if all
messages beginning with 0xff are considered as broadcasts, 10.6.2.1:
  A broadcast IPI (bits 28-31 of the MDA are 1's)

No volunteer came to clear this up, so I hacked Linux to have one x2APIC
core between xAPIC ones.  Physical IPI to 0xff00 got delivered only
to CPU0, like most other destinations, Logical IPI to 0xff00 got
dropped and only 0x worked as a broadcast in both modes.

I think it is because Intel never considered mixed mode to be valid, and
seen delivery might be an emergent feature of QPI routing.
Luckily, broadcast from xAPIC isn't delivered to x2APIC.

Real exploration would require greater modifications to Linux (ideally
writing a custom kernel), so this series implements something that makes
some sense and isn't too far from reality.


Radim Krčmář (4):
  KVM: x86: use MDA for interrupt matching
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: simplify kvm_apic_map

 arch/x86/include/asm/kvm_host.h |   8 ++-
 arch/x86/kvm/lapic.c| 138 +++-
 arch/x86/kvm/lapic.h|  15 -
 3 files changed, 85 insertions(+), 76 deletions(-)

-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg).
2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus).

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 87 -
 arch/x86/kernel/kvm.c   |  4 +-
 2 files changed, 45 insertions(+), 46 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??
 * May be we could directly pass inc for __ticket_check_and_clear_slowpath
   but I hope current code is more readable.

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Result: 
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)

base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better) 
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..9697b45 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
+   set_bit(0, (volatile unsigned long *)lock-tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 }
 
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
+static inline int  __tickets_equal(__ticket_t one, __ticket_t two)
+{
+   return !((one ^ two)  ~TICKET_SLOWPATH_FLAG);
+}
+
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
+   __ticket_t head)
+{
+   if (head  TICKET_SLOWPATH_FLAG) {
+   arch_spinlock_t old, new;
+
+   old.tickets.head = head;
+   new.tickets.head = head  ~TICKET_SLOWPATH_FLAG;
+   old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
+   new.tickets.tail = old.tickets.tail;
+
+   /* try to clear slowpath flag when there are no 

[PATCH kvm-unit-test] tscdeadline_latency: add option to exit on max latency

2015-02-12 Thread Marcelo Tosatti

Add option to exit in case latency exceeds a given maximum. 

Its useful to test host latency without the complexity of a 
guest.

Example command line:

echo 1  /sys/kernel/debug/tracing/tracing_on; taskset -c 3 qemu-kvm
-enable-kvm -device testdev,chardev=testlog -chardev
file,id=testlog,path=msr.out -display none -serial stdio -kernel
x86/tscdeadline_latency.flat -cpu host -append 20 1 120050 
/tmp/a; echo 0  /sys/kernel/debug/tracing/tracing_on

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c
index 9f255c0..14ab8c8 100644
--- a/x86/tscdeadline_latency.c
+++ b/x86/tscdeadline_latency.c
@@ -9,6 +9,17 @@
  * hist(latency, 50)
  */
 
+/*
+ * for host tracing of breakmax option:
+ *
+ * # cd /sys/kernel/debug/tracing/
+ * # echo x86-tsc  trace_clock
+ * # echo kvm_exit kvm_entry kvm_msr  set_event
+ * # echo sched_switch $extratracepoints  set_event
+ * # echo apic_timer_fn  set_ftrace_filter
+ * # echo function  current_tracer
+ */
+
 #include libcflat.h
 #include apic.h
 #include vm.h
@@ -37,6 +48,8 @@ int delta;
 #define TABLE_SIZE 1
 u64 table[TABLE_SIZE];
 volatile int table_idx;
+volatile int hitmax = 0;
+int breakmax = 0;
 
 static void tsc_deadline_timer_isr(isr_regs_t *regs)
 {
@@ -46,6 +59,12 @@ static void tsc_deadline_timer_isr(isr_regs_t *regs)
 if (table_idx  TABLE_SIZE  tdt_count  1)
 table[table_idx++] = now - exptime;
 
+if (breakmax  tdt_count  1  (now - exptime)  breakmax) {
+hitmax = 1;
+apic_write(APIC_EOI, 0);
+return;
+}
+
 exptime = now+delta;
 wrmsr(MSR_IA32_TSCDEADLINE, now+delta);
 apic_write(APIC_EOI, 0);
@@ -97,16 +116,21 @@ int main(int argc, char **argv)
 mask_pic_interrupts();
 
 delta = argc = 1 ? 20 : atol(argv[1]);
-size = argc = 2 ? TABLE_SIZE : atol(argv[1]);
+size = argc = 2 ? TABLE_SIZE : atol(argv[2]);
+breakmax = argc = 3 ? 0 : atol(argv[3]);
printf(breakmax=%d\n, breakmax);
 test_tsc_deadline_timer();
 irq_enable();
 
 do {
 asm volatile(hlt);
-} while (table_idx  size);
+} while (!hitmax  table_idx  size);
 
-for (i = 0; i  size; i++)
+for (i = 0; i  table_idx; i++) {
+if (hitmax  i == table_idx-1)
+printf(hit max: %d  , breakmax);
 printf(latency: %d\n, table[i]);
+}
 
 return report_summary();
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-test v2] tscdeadline_latency: add option to exit on max latency

2015-02-12 Thread Marcelo Tosatti

v2: fix typo

---

Add option to exit in case latency exceeds a given maximum.

Its useful to test host latency without the complexity of a
guest.

Example command line:

echo 1  /sys/kernel/debug/tracing/tracing_on; taskset -c 3 qemu-kvm
-enable-kvm -device testdev,chardev=testlog -chardev
file,id=testlog,path=msr.out -display none -serial stdio -kernel
x86/tscdeadline_latency.flat -cpu host -append 20 1 120050 
/tmp/a; echo 0  /sys/kernel/debug/tracing/tracing_on

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c
index 9f255c0..14ab8c8 100644
--- a/x86/tscdeadline_latency.c
+++ b/x86/tscdeadline_latency.c
@@ -9,6 +9,17 @@
  * hist(latency, 50)
  */
 
+/*
+ * for host tracing of breakmax option:
+ *
+ * # cd /sys/kernel/debug/tracing/
+ * # echo x86-tsc  trace_clock
+ * # echo kvm_exit kvm_entry kvm_msr  set_event
+ * # echo sched_switch $extratracepoints  set_event
+ * # echo apic_timer_fn  set_ftrace_filter
+ * # echo function  current_tracer
+ */
+
 #include libcflat.h
 #include apic.h
 #include vm.h
@@ -37,6 +48,8 @@ int delta;
 #define TABLE_SIZE 1
 u64 table[TABLE_SIZE];
 volatile int table_idx;
+volatile int hitmax = 0;
+int breakmax = 0;
 
 static void tsc_deadline_timer_isr(isr_regs_t *regs)
 {
@@ -46,6 +59,12 @@ static void tsc_deadline_timer_isr(isr_regs_t *regs)
 if (table_idx  TABLE_SIZE  tdt_count  1)
 table[table_idx++] = now - exptime;
 
+if (breakmax  tdt_count  1  (now - exptime)  breakmax) {
+hitmax = 1;
+apic_write(APIC_EOI, 0);
+return;
+}
+
 exptime = now+delta;
 wrmsr(MSR_IA32_TSCDEADLINE, now+delta);
 apic_write(APIC_EOI, 0);
@@ -97,16 +116,21 @@ int main(int argc, char **argv)
 mask_pic_interrupts();
 
 delta = argc = 1 ? 20 : atol(argv[1]);
-size = argc = 2 ? TABLE_SIZE : atol(argv[1]);
+size = argc = 2 ? TABLE_SIZE : atol(argv[2]);
+breakmax = argc = 3 ? 0 : atol(argv[3]);
+printf(breakmax=%d\n, breakmax);
 test_tsc_deadline_timer();
 irq_enable();
 
 do {
 asm volatile(hlt);
-} while (table_idx  size);
+} while (!hitmax  table_idx  size);
 
-for (i = 0; i  size; i++)
+for (i = 0; i  table_idx; i++) {
+if (hitmax  i == table_idx-1)
+printf(hit max: %d  , breakmax);
 printf(latency: %d\n, table[i]);
+}
 
 return report_summary();
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Google Summer of Code application submitted

2015-02-12 Thread Stefan Hajnoczi
Hi,
I have submitted our Google Summer of Code application for QEMU,
libvirt, and KVM.

Accepted organizations will be announced on March 2nd at 19:00 UTC on
http://google-melange.com/.

You can still add project ideas if you wish to mentor:
http://qemu-project.org/Google_Summer_of_Code_2015

Mentors must be active contributors and willing to commit 5 hours per
week during the summer.

Please let me know if you have any questions.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


VFIO iommu page size masking

2015-02-12 Thread Alexander Graf
Hi Alex,

While trying to get VFIO-PCI working on AArch64 (with 64k page size), I
stumbled over the following piece of code:

 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
 {
 struct vfio_domain *domain;
 unsigned long bitmap = PAGE_MASK;
 
 mutex_lock(iommu-lock);
 list_for_each_entry(domain, iommu-domain_list, next)
 bitmap = domain-domain-ops-pgsize_bitmap;
 mutex_unlock(iommu-lock);
 
 return bitmap;
 }

The SMMU page mask is

[3.054302] arm-smmu e0a0.smmu:  Supported page sizes: 0x40201000

but after this function, we end up supporting one 2MB pages and above.
The reason for that is simple: You restrict the bitmap to PAGE_MASK and
above.

Now the big question is why you're doing that. I don't see why it would
be a problem if the IOMMU maps a page in smaller chunks.

So I tried to patch the code above with s/PAGE_MASK/1UL/ and everything
seems to run fine. But maybe we're not lacking some sanity checks?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm: warning at virt/kvm/arm/vgic.c:1468

2015-02-12 Thread Christoffer Dall
Hi Jan,

On Sun, Feb 08, 2015 at 08:48:09AM +0100, Jan Kiszka wrote:
 Hi,
 
 after fixing the VM_BUG_ON, my QEMU guest on the Jetson TK1 generally
 refuses to boot. Once in a while it does, but quickly gets stuck again.
 In one case I found this in the kernel log (never happened again so
 far):
 
 [  762.022874] WARNING: CPU: 1 PID: 972 at 
 ../arch/arm/kvm/../../../virt/kvm/arm/vgic.c:1468 
 kvm_vgic_sync_hwstate+0x314/0x344()
 [  762.022884] Modules linked in:
 [  762.022902] CPU: 1 PID: 972 Comm: qemu-system-arm Not tainted 
 3.19.0-rc7-00221-gfd7a168-dirty #13
 [  762.022911] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
 [  762.022937] [c0025d80] (unwind_backtrace) from [c00215d8] 
 (show_stack+0x10/0x14)
 [  762.022958] [c00215d8] (show_stack) from [c065f494] 
 (dump_stack+0x98/0xd8)
 [  762.022976] [c065f494] (dump_stack) from [c0035f38] 
 (warn_slowpath_common+0x80/0xb0)
 [  762.022991] [c0035f38] (warn_slowpath_common) from [c0036004] 
 (warn_slowpath_null+0x1c/0x24)
 [  762.023007] [c0036004] (warn_slowpath_null) from [c001c3c4] 
 (kvm_vgic_sync_hwstate+0x314/0x344)
 [  762.023024] [c001c3c4] (kvm_vgic_sync_hwstate) from [c00147c0] 
 (kvm_arch_vcpu_ioctl_run+0x210/0x400)
 [  762.023041] [c00147c0] (kvm_arch_vcpu_ioctl_run) from [c001063c] 
 (kvm_vcpu_ioctl+0x2e4/0x6ec)
 [  762.023059] [c001063c] (kvm_vcpu_ioctl) from [c01098a4] 
 (do_vfs_ioctl+0x40c/0x600)
 [  762.023076] [c01098a4] (do_vfs_ioctl) from [c0109acc] 
 (SyS_ioctl+0x34/0x5c)
 [  762.023091] [c0109acc] (SyS_ioctl) from [c001e320] 
 (ret_fast_syscall+0x0/0x34)

so this means your guest caused a maintenance interrupt and the bit is
set in the GICH_EISR for the LR in question but the link register state
is not 0, which is in direct violation of the GIC spec.  H.

You're not doing any IRQ forwarding stuff or device passthrough here are
you?

 
 
 BTW, KVM tracing support on ARM seems like it requires some care. E.g.:
 kvm_exit does not report an exit reason. The in-kernel vgic also seems
 to lack instrumentation. Unfortunate. Tracing is usually the first stop
 when KVM is stuck on a guest.

I know, the exit reason is on my todo list, and Alex B is sitting on
trace patches for the gic.  Coming soon to a git repo near your.

-Christoffer


signature.asc
Description: Digital signature


Re: [PATCH] ARM: KVM: Fix size check in __coherent_cache_guest_page

2015-02-12 Thread Christoffer Dall
On Sat, Feb 07, 2015 at 10:21:20PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 The check is supposed to catch page-unaligned sizes, not the inverse.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Thanks, applied.

-Christoffer


signature.asc
Description: Digital signature


Re: arm: warning at virt/kvm/arm/vgic.c:1468

2015-02-12 Thread Jan Kiszka
Hi Christoffer,

On 2015-02-13 05:46, Christoffer Dall wrote:
 Hi Jan,
 
 On Sun, Feb 08, 2015 at 08:48:09AM +0100, Jan Kiszka wrote:
 Hi,

 after fixing the VM_BUG_ON, my QEMU guest on the Jetson TK1 generally
 refuses to boot. Once in a while it does, but quickly gets stuck again.
 In one case I found this in the kernel log (never happened again so
 far):

 [  762.022874] WARNING: CPU: 1 PID: 972 at 
 ../arch/arm/kvm/../../../virt/kvm/arm/vgic.c:1468 
 kvm_vgic_sync_hwstate+0x314/0x344()
 [  762.022884] Modules linked in:
 [  762.022902] CPU: 1 PID: 972 Comm: qemu-system-arm Not tainted 
 3.19.0-rc7-00221-gfd7a168-dirty #13
 [  762.022911] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
 [  762.022937] [c0025d80] (unwind_backtrace) from [c00215d8] 
 (show_stack+0x10/0x14)
 [  762.022958] [c00215d8] (show_stack) from [c065f494] 
 (dump_stack+0x98/0xd8)
 [  762.022976] [c065f494] (dump_stack) from [c0035f38] 
 (warn_slowpath_common+0x80/0xb0)
 [  762.022991] [c0035f38] (warn_slowpath_common) from [c0036004] 
 (warn_slowpath_null+0x1c/0x24)
 [  762.023007] [c0036004] (warn_slowpath_null) from [c001c3c4] 
 (kvm_vgic_sync_hwstate+0x314/0x344)
 [  762.023024] [c001c3c4] (kvm_vgic_sync_hwstate) from [c00147c0] 
 (kvm_arch_vcpu_ioctl_run+0x210/0x400)
 [  762.023041] [c00147c0] (kvm_arch_vcpu_ioctl_run) from [c001063c] 
 (kvm_vcpu_ioctl+0x2e4/0x6ec)
 [  762.023059] [c001063c] (kvm_vcpu_ioctl) from [c01098a4] 
 (do_vfs_ioctl+0x40c/0x600)
 [  762.023076] [c01098a4] (do_vfs_ioctl) from [c0109acc] 
 (SyS_ioctl+0x34/0x5c)
 [  762.023091] [c0109acc] (SyS_ioctl) from [c001e320] 
 (ret_fast_syscall+0x0/0x34)
 
 so this means your guest caused a maintenance interrupt and the bit is
 set in the GICH_EISR for the LR in question but the link register state
 is not 0, which is in direct violation of the GIC spec.  H.
 
 You're not doing any IRQ forwarding stuff or device passthrough here are
 you?

No, just boring emulation. The command line is

qemu-system-ar -machine vexpress-a15 -kernel zImage -serial mon:stdio
-append 'console=ttyAMA0 root=/dev/mmcblk0 rw' -snapshot -sd
OpenSuse13-1_arm.img -dtb vexpress-v2p-ca15-tc1.dtb -s -enable-kvm

 


 BTW, KVM tracing support on ARM seems like it requires some care. E.g.:
 kvm_exit does not report an exit reason. The in-kernel vgic also seems
 to lack instrumentation. Unfortunate. Tracing is usually the first stop
 when KVM is stuck on a guest.
 
 I know, the exit reason is on my todo list, and Alex B is sitting on
 trace patches for the gic.  Coming soon to a git repo near your.

Cool, looking forward.

Next thing I noticed is that guest debugging via qemu causes troubles in
kvm mode. For some reason, qemu is unable to write soft-breakpoints,
thus not even a single-step works. Also known?

Jan




signature.asc
Description: OpenPGP digital signature


Re: arm: warning at virt/kvm/arm/vgic.c:1468

2015-02-12 Thread Alex Bennée

Alex Bennée alex.ben...@linaro.org writes:

 Christoffer Dall christoffer.d...@linaro.org writes:
snip
 On Sun, Feb 08, 2015 at 08:48:09AM +0100, Jan Kiszka wrote:
snip
 BTW, KVM tracing support on ARM seems like it requires some care. E.g.:
 kvm_exit does not report an exit reason. The in-kernel vgic also seems
 to lack instrumentation. Unfortunate. Tracing is usually the first stop
 when KVM is stuck on a guest.

 I know, the exit reason is on my todo list, and Alex B is sitting on
 trace patches for the gic.  Coming soon to a git repo near your.

 For the impatient the raw patches are in:

 git.linaro.org/people/alex.bennee/linux.git
 migration/v3.19-rc7-improve-tracing

OK try tracing/kvm-exit-entry for something cleaner.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 91 -
 arch/x86/kernel/kvm.c   | 10 +++--
 arch/x86/xen/spinlock.c | 10 +++--
 3 files changed, 56 insertions(+), 55 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..646a1a3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
+   set_bit(0, (volatile unsigned long *)lock-tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline 

Re: arm: warning at virt/kvm/arm/vgic.c:1468

2015-02-12 Thread Alex Bennée

Christoffer Dall christoffer.d...@linaro.org writes:

 Hi Jan,

 On Sun, Feb 08, 2015 at 08:48:09AM +0100, Jan Kiszka wrote:
 Hi,
 
 after fixing the VM_BUG_ON, my QEMU guest on the Jetson TK1 generally
 refuses to boot. Once in a while it does, but quickly gets stuck again.
 In one case I found this in the kernel log (never happened again so
 far):
 
 [  762.022874] WARNING: CPU: 1 PID: 972 at 
 ../arch/arm/kvm/../../../virt/kvm/arm/vgic.c:1468 
 kvm_vgic_sync_hwstate+0x314/0x344()
 [  762.022884] Modules linked in:
 [  762.022902] CPU: 1 PID: 972 Comm: qemu-system-arm Not tainted 
 3.19.0-rc7-00221-gfd7a168-dirty #13
 [  762.022911] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
 [  762.022937] [c0025d80] (unwind_backtrace) from [c00215d8] 
 (show_stack+0x10/0x14)
 [  762.022958] [c00215d8] (show_stack) from [c065f494] 
 (dump_stack+0x98/0xd8)
 [  762.022976] [c065f494] (dump_stack) from [c0035f38] 
 (warn_slowpath_common+0x80/0xb0)
 [  762.022991] [c0035f38] (warn_slowpath_common) from [c0036004] 
 (warn_slowpath_null+0x1c/0x24)
 [  762.023007] [c0036004] (warn_slowpath_null) from [c001c3c4] 
 (kvm_vgic_sync_hwstate+0x314/0x344)
 [  762.023024] [c001c3c4] (kvm_vgic_sync_hwstate) from [c00147c0] 
 (kvm_arch_vcpu_ioctl_run+0x210/0x400)
 [  762.023041] [c00147c0] (kvm_arch_vcpu_ioctl_run) from [c001063c] 
 (kvm_vcpu_ioctl+0x2e4/0x6ec)
 [  762.023059] [c001063c] (kvm_vcpu_ioctl) from [c01098a4] 
 (do_vfs_ioctl+0x40c/0x600)
 [  762.023076] [c01098a4] (do_vfs_ioctl) from [c0109acc] 
 (SyS_ioctl+0x34/0x5c)
 [  762.023091] [c0109acc] (SyS_ioctl) from [c001e320] 
 (ret_fast_syscall+0x0/0x34)

 so this means your guest caused a maintenance interrupt and the bit is
 set in the GICH_EISR for the LR in question but the link register state
 is not 0, which is in direct violation of the GIC spec.  H.

 You're not doing any IRQ forwarding stuff or device passthrough here are
 you?

 
 
 BTW, KVM tracing support on ARM seems like it requires some care. E.g.:
 kvm_exit does not report an exit reason. The in-kernel vgic also seems
 to lack instrumentation. Unfortunate. Tracing is usually the first stop
 when KVM is stuck on a guest.

 I know, the exit reason is on my todo list, and Alex B is sitting on
 trace patches for the gic.  Coming soon to a git repo near your.

For the impatient the raw patches are in:

git.linaro.org/people/alex.bennee/linux.git
migration/v3.19-rc7-improve-tracing

But I'll be cleaning the tracing ones up and separating them from the
rest over the next few days.


 -Christoffer
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm: warning at virt/kvm/arm/vgic.c:1468

2015-02-12 Thread Christoffer Dall
On Fri, Feb 13, 2015 at 07:21:20AM +0100, Jan Kiszka wrote:
 Hi Christoffer,
 
 On 2015-02-13 05:46, Christoffer Dall wrote:
  Hi Jan,
  
  On Sun, Feb 08, 2015 at 08:48:09AM +0100, Jan Kiszka wrote:
  Hi,
 
  after fixing the VM_BUG_ON, my QEMU guest on the Jetson TK1 generally
  refuses to boot. Once in a while it does, but quickly gets stuck again.
  In one case I found this in the kernel log (never happened again so
  far):
 
  [  762.022874] WARNING: CPU: 1 PID: 972 at 
  ../arch/arm/kvm/../../../virt/kvm/arm/vgic.c:1468 
  kvm_vgic_sync_hwstate+0x314/0x344()
  [  762.022884] Modules linked in:
  [  762.022902] CPU: 1 PID: 972 Comm: qemu-system-arm Not tainted 
  3.19.0-rc7-00221-gfd7a168-dirty #13
  [  762.022911] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
  [  762.022937] [c0025d80] (unwind_backtrace) from [c00215d8] 
  (show_stack+0x10/0x14)
  [  762.022958] [c00215d8] (show_stack) from [c065f494] 
  (dump_stack+0x98/0xd8)
  [  762.022976] [c065f494] (dump_stack) from [c0035f38] 
  (warn_slowpath_common+0x80/0xb0)
  [  762.022991] [c0035f38] (warn_slowpath_common) from [c0036004] 
  (warn_slowpath_null+0x1c/0x24)
  [  762.023007] [c0036004] (warn_slowpath_null) from [c001c3c4] 
  (kvm_vgic_sync_hwstate+0x314/0x344)
  [  762.023024] [c001c3c4] (kvm_vgic_sync_hwstate) from [c00147c0] 
  (kvm_arch_vcpu_ioctl_run+0x210/0x400)
  [  762.023041] [c00147c0] (kvm_arch_vcpu_ioctl_run) from [c001063c] 
  (kvm_vcpu_ioctl+0x2e4/0x6ec)
  [  762.023059] [c001063c] (kvm_vcpu_ioctl) from [c01098a4] 
  (do_vfs_ioctl+0x40c/0x600)
  [  762.023076] [c01098a4] (do_vfs_ioctl) from [c0109acc] 
  (SyS_ioctl+0x34/0x5c)
  [  762.023091] [c0109acc] (SyS_ioctl) from [c001e320] 
  (ret_fast_syscall+0x0/0x34)
  
  so this means your guest caused a maintenance interrupt and the bit is
  set in the GICH_EISR for the LR in question but the link register state
  is not 0, which is in direct violation of the GIC spec.  H.
  
  You're not doing any IRQ forwarding stuff or device passthrough here are
  you?
 
 No, just boring emulation. The command line is
 
 qemu-system-ar -machine vexpress-a15 -kernel zImage -serial mon:stdio
 -append 'console=ttyAMA0 root=/dev/mmcblk0 rw' -snapshot -sd
 OpenSuse13-1_arm.img -dtb vexpress-v2p-ca15-tc1.dtb -s -enable-kvm
 
  
 
 
  BTW, KVM tracing support on ARM seems like it requires some care. E.g.:
  kvm_exit does not report an exit reason. The in-kernel vgic also seems
  to lack instrumentation. Unfortunate. Tracing is usually the first stop
  when KVM is stuck on a guest.
  
  I know, the exit reason is on my todo list, and Alex B is sitting on
  trace patches for the gic.  Coming soon to a git repo near your.
 
 Cool, looking forward.
 
 Next thing I noticed is that guest debugging via qemu causes troubles in
 kvm mode. For some reason, qemu is unable to write soft-breakpoints,
 thus not even a single-step works. Also known?
 
Yes, Alex Bennee is working on this.

-Christoffer


signature.asc
Description: Digital signature


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/12, Raghavendra K T wrote:

 @@ -191,8 +189,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
 *lock)
* We need to check unlocked in a loop, tmp.head == head
* can be false positive because of overflow.
*/
 - if (tmp.head == (tmp.tail  ~TICKET_SLOWPATH_FLAG) ||
 - tmp.head != head)
 + if (__tickets_equal(tmp.head, tmp.tail) || tmp.head != head)
   break;

Ah, it seems that tmp.head != head should be turned into
!__tickets_equal(), no?

Suppose that TICKET_SLOWPATH_FLAG is set after the first ACCESS_ONCE(head),
then tmp.head != head will be true before the first unlock we are waiting
for.

And perhaps you can turn these ACCESS_ONCE into READ_ONCE as well.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:20 PM, Oleg Nesterov wrote:

On 02/12, Raghavendra K T wrote:


@@ -191,8 +189,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
 * We need to check unlocked in a loop, tmp.head == head
 * can be false positive because of overflow.
 */
-   if (tmp.head == (tmp.tail  ~TICKET_SLOWPATH_FLAG) ||
-   tmp.head != head)
+   if (__tickets_equal(tmp.head, tmp.tail) || tmp.head != head)
break;


Ah, it seems that tmp.head != head should be turned into
!__tickets_equal(), no?

Suppose that TICKET_SLOWPATH_FLAG is set after the first ACCESS_ONCE(head),
then tmp.head != head will be true before the first unlock we are waiting
for.


Good catch. othewise we would wrongly break out even when somebody
does halt wait.



And perhaps you can turn these ACCESS_ONCE into READ_ONCE as well.


Yes again :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
Damn, sorry for noise, forgot to mention...

On 02/12, Raghavendra K T wrote:

 +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
 + __ticket_t head)
 +{
 + if (head  TICKET_SLOWPATH_FLAG) {
 + arch_spinlock_t old, new;
 +
 + old.tickets.head = head;
 + new.tickets.head = head  ~TICKET_SLOWPATH_FLAG;
 + old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
 + new.tickets.tail = old.tickets.tail;
 +
 + /* try to clear slowpath flag when there are no contenders */
 + cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
 + }
 +}

...

 +clear_slowpath:
 + if (TICKET_SLOWPATH_FLAG)
 + __ticket_check_and_clear_slowpath(lock, inc.head);

I think you can remove this if (TICKET_SLOWPATH_FLAG) check. If it is
defined as zero, gcc should optimize out the code under if (head  0).

But this is purely cosmetic and subjective, feel free to ignore.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:07 PM, Oleg Nesterov wrote:

On 02/12, Raghavendra K T wrote:


@@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
 * check again make sure it didn't become free while
 * we weren't looking.
 */
-   if (ACCESS_ONCE(lock-tickets.head) == want) {
+   head = ACCESS_ONCE(lock-tickets.head);
+   if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);


While at it, perhaps it makes sense to s/ACCESS_ONCE/READ_ONCE/ but this
is cosmetic.


yes.. will do.



We also need to change another user of enter_slow_path, xen_lock_spinning()
in arch/x86/xen/spinlock.c.



Had in mind but forgot before sending. will update resend.


Other than that looks correct at first glance... but this is up to
maintainers.



Thanks

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:32 PM, Oleg Nesterov wrote:

Damn, sorry for noise, forgot to mention...

On 02/12, Raghavendra K T wrote:


+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
+   __ticket_t head)
+{
+   if (head  TICKET_SLOWPATH_FLAG) {
+   arch_spinlock_t old, new;
+
+   old.tickets.head = head;
+   new.tickets.head = head  ~TICKET_SLOWPATH_FLAG;
+   old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
+   new.tickets.tail = old.tickets.tail;
+
+   /* try to clear slowpath flag when there are no contenders */
+   cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
+   }
+}


...


+clear_slowpath:
+   if (TICKET_SLOWPATH_FLAG)
+   __ticket_check_and_clear_slowpath(lock, inc.head);


I think you can remove this if (TICKET_SLOWPATH_FLAG) check. If it is
defined as zero, gcc should optimize out the code under if (head  0).



right, the above if ( ) is unnecesary, though we would have same code
at the end, getting rid of that makes code more clean.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/12, Raghavendra K T wrote:

 @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock 
 *lock, __ticket_t want)
* check again make sure it didn't become free while
* we weren't looking.
*/
 - if (ACCESS_ONCE(lock-tickets.head) == want) {
 + head = ACCESS_ONCE(lock-tickets.head);
 + if (__tickets_equal(head, want)) {
   add_stats(TAKEN_SLOW_PICKUP, 1);

While at it, perhaps it makes sense to s/ACCESS_ONCE/READ_ONCE/ but this
is cosmetic.

We also need to change another user of enter_slow_path, xen_lock_spinning()
in arch/x86/xen/spinlock.c.

Other than that looks correct at first glance... but this is up to
maintainers.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Peter Zijlstra
On Thu, Feb 12, 2015 at 05:17:27PM +0530, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);
 
 /* add_smp() is a full mb() */
 
 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);
 
 
 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.
 
 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.
 
 So this patch implements the fix with:
 1. Moving slowpath flag to head (Oleg).
 2. use xadd to avoid read/write after unlock that checks the need for
 unlock_kick (Linus).

Maybe spend a few more words explaining these things; something like:

Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, as long as we clear it again on the first
(try)lock -- this removes the write after unlock.

By moving the slowpath flag from the tail to the head ticket we avoid
the need to access both the head and tail tickets on unlock.

We can further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Its 'obvious' now, but maybe not so much after we've all not looked at
this for a few months.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 08:30 PM, Peter Zijlstra wrote:

On Thu, Feb 12, 2015 at 05:17:27PM +0530, Raghavendra K T wrote:

[...]


Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg).
2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus).


Maybe spend a few more words explaining these things; something like:

Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, as long as we clear it again on the first
(try)lock -- this removes the write after unlock.


Nit: I 'll reword a bit here since slowpath flag would result in
unnecessary kick but otherwise harmless IMO.
something like:
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag 
would result in unnecessary kicks.



By moving the slowpath flag from the tail to the head ticket we avoid
the need to access both the head and tail tickets on unlock.

We can further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Its 'obvious' now, but maybe not so much after we've all not looked at
this for a few months.



Absolutely correct. Thanks Peter for the detailed and very helpful
writeup.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

2015-02-12 Thread Frederic Weisbecker
On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
 If exception_enter happens when already in IN_KERNEL state, the
 code still calls context_tracking_exit, which ends up in
 rcu_eqs_exit_common, which explodes with a WARN_ON when it is
 called in a situation where dynticks are not enabled.

Fortunately context_tracking_exit() already has a current_state == IN_KERNEL
check so this shouldn't be a problem.

Meanwhile I'll still take the patch, it's better to handle that
from the caller.

Thanks.

 
 This can be avoided by having exception_enter only switch to
 IN_KERNEL state if the current state is not already IN_KERNEL.
 
 Signed-off-by: Rik van Riel r...@redhat.com
 Reported-by: Luiz Capitulino lcapitul...@redhat.com
 ---
 Frederic, you will want this bonus patch, too :)
 
 Thanks to Luiz for finding this one. Whatever I was running did not
 trigger this issue...
 
  include/linux/context_tracking.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/include/linux/context_tracking.h 
 b/include/linux/context_tracking.h
 index b65fd1420e53..9da230406e8c 100644
 --- a/include/linux/context_tracking.h
 +++ b/include/linux/context_tracking.h
 @@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void)
   return 0;
  
   prev_ctx = this_cpu_read(context_tracking.state);
 - context_tracking_exit(prev_ctx);
 + if (prev_ctx != IN_KERNEL)
 + context_tracking_exit(prev_ctx);
  
   return prev_ctx;
  }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v5 0/5] rcu,nohz,kvm: use RCU extended quiescent state when running KVM guest

2015-02-12 Thread Frederic Weisbecker
On Tue, Feb 10, 2015 at 03:27:49PM -0500, r...@redhat.com wrote:
 When running a KVM guest on a system with NOHZ_FULL enabled, and the
 KVM guest running with idle=poll mode, we still get wakeups of the
 rcuos/N threads.
 
 This problem has already been solved for user space by telling the
 RCU subsystem that the CPU is in an extended quiescent state while
 running user space code.
 
 This patch series extends that code a little bit to make it usable
 to track KVM guest space, too.
 
 I tested the code by booting a KVM guest with idle=poll, on a system
 with NOHZ_FULL enabled on most CPUs, and a VCPU thread bound to a
 CPU. In a 10 second interval, rcuos/N threads on other CPUs got woken
 up several times, while the rcuos thread on the CPU running the bound
 and alwasy running VCPU thread never got woken up once.
 
 Thanks to Christian Borntraeger, Paul McKenney, Paulo Bonzini,
 Frederic Weisbecker, and Will Deacon for reviewing and improving
 earlier versions of this patch series.
 

So the patchset look good, thanks everyone. I'm applying the series
and will see anything explode.

I'll wait until the end of the merge window to push it to Ingo.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM changes for 3.20, except PPC

2015-02-12 Thread Paolo Bonzini
Linus,

The following changes since commit 056bb5f51c357ee00046fde4929a03468ff45e7a:

  arm64: kvm: decode ESR_ELx.EC when reporting exceptions (2015-01-15 12:24:52 
+)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to 6557bada461afeaa920a189fae2cff7c8fdce39f:

  KVM: ia64: drop kvm.h from installed user headers (2015-02-10 09:02:10 +0100)

I placed the correct conflict resolution in branch merge-resolution-3.20
of the KVM repository (based on commit 8cc748aa76c9 from your tree).
In hindsight, I probably should have just delayed the last batch of
ARM/ARM64 patches to the 3.20 merge window.

The KVM/PPC changes will come in through the PPC maintainers, because
I haven't received them yet and I might end up being offline for some
part of next week.


Fairly small update, but there are some interesting new features.

Common: Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other architectures).
This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
or TCP_RR netperf tests).  This also has to be enabled manually for now,
but the plan is to auto-tune this in the future.

ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
tracking

s390: several optimizations and bugfixes.  Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)

MIPS: Bugfixes.

x86: Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested virtualization
improvements (nested APICv---a nice optimization), usual round of emulation
fixes.  There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.

Some commits are common between this pull and Catalin's; I see you
have already included his tree.

ARM has other conflicts where functions are added in the same place
by 3.19-rc and 3.20 patches.  These are not large though, and entirely
within KVM.


Andre Przywara (21):
  ARM: KVM: extend WFI tracepoint to differentiate between wfi and wfe
  arm/arm64: KVM: rework MPIDR assignment and add accessors
  arm/arm64: KVM: pass down user space provided GIC type into vGIC code
  arm/arm64: KVM: refactor vgic_handle_mmio() function
  arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
  arm/arm64: KVM: introduce per-VM ops
  arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
  arm/arm64: KVM: dont rely on a valid GICH base address
  arm/arm64: KVM: make the maximum number of vCPUs a per-VM value
  arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable
  arm/arm64: KVM: refactor MMIO accessors
  arm/arm64: KVM: refactor/wrap vgic_set/get_attr()
  arm/arm64: KVM: add vgic.h header file
  arm/arm64: KVM: split GICv2 specific emulation code from vgic.c
  arm/arm64: KVM: add opaque private pointer to MMIO data
  arm/arm64: KVM: add virtual GICv3 distributor emulation
  arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields
  arm64: KVM: add SGI generation register emulation
  arm/arm64: KVM: enable kernel side of GICv3 emulation
  arm/arm64: KVM: allow userland to request a virtual GICv3
  arm/arm64: KVM: force alignment of VGIC dist/CPU/redist addresses

Borislav Petkov (1):
  kvm: Fix CR3_PCID_INVD type on 32-bit

Christian Borntraeger (4):
  KVM: s390: make local function static
  KVM: s390: no need to hold the kvm-mutex for floating interrupts
  KVM: s390: reenable LPP facility
  KVM: Disable compat ioctl for s390

Christoffer Dall (2):
  arm/arm64: KVM: Fixup incorrect config symbol in comment
  KVM: Remove unused config symbol

David Hildenbrand (13):
  KVM: s390: prevent sleep duration underflows in handle_wait()
  KVM: s390: base hrtimer on a monotonic clock
  KVM: s390: forward hrtimer if guest ckc not pending yet
  KVM: s390: new parameter for SIGP STOP irqs
  KVM: s390: handle stop irqs without action_bits
  KVM: s390: a VCPU may only stop when no interrupts are left pending
  KVM: s390: SIGP SET PREFIX cleanup
  s390/sclp: introduce check for the SIGP Interpretation Facility
  KVM: s390: only one external call may be pending at a time
  KVM: s390: clear the pfault queue if user space sets the invalid token
  KVM: s390: forward most SIGP orders to user space
  KVM: s390: avoid memory leaks if __inject_vm() fails
  KVM: s390: floating irqs: fix user triggerable endless loop

Dominik Dingel (3):
  KVM: remove unneeded return value of vcpu_postcreate
  KVM: s390: move vcpu specific initalization to a later point
  

Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

2015-02-12 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/12/2015 10:42 AM, Frederic Weisbecker wrote:
 On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
 If exception_enter happens when already in IN_KERNEL state, the 
 code still calls context_tracking_exit, which ends up in 
 rcu_eqs_exit_common, which explodes with a WARN_ON when it is 
 called in a situation where dynticks are not enabled.
 
 Fortunately context_tracking_exit() already has a current_state ==
 IN_KERNEL check so this shouldn't be a problem.

No, it had a hard-coded current_state == IN_USER check,
which is very close, but ...

... I replaced that with a state argument, and forgot to
ensure that it never gets called with state == IN_KERNEL.
This patch fixes that.

 Meanwhile I'll still take the patch, it's better to handle that 
 from the caller.

Thanks.

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJU3Mr+AAoJEM553pKExN6DYNUH/2m9CtXhLdTHOEHRvxg41PCZ
/xafetUOS9cka0CNuiYpUuvfMSucoePW7YqUXqjYSIP25DsAleOh0qdep1Ob5bH+
2BqZNMwK3QDHf1+/V7nulnjVkeHtpXJm0HIZOjc06xeL+9T6ydB1vhQGIMLrGL9S
LvOstI3fseeIgglwYc2Gx7H7e99oOkxysvwMMvcMrW0cPSRAOdYxINQnfYW8A5kq
DTTXwWuJRZa4FLtP3wLpvocm5dMGDwTsDmuOk1PmXYlsTsO6H2BmCeio0euzStoJ
l+jR4x7Aq2KXES7gnMgpPw1iON3xKJ/RbXF8IC/doII8FYEV8Raxnf7hl47etBw=
=yIjW
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/11, Jeremy Fitzhardinge wrote:

 On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
  I agree, and I have to admit I am not sure I fully understand why
  unlock uses the locked add. Except we need a barrier to avoid the race
  with the enter_slowpath() users, of course. Perhaps this is the only
  reason?

 Right now it needs to be a locked operation to prevent read-reordering.
 x86 memory ordering rules state that all writes are seen in a globally
 consistent order, and are globally ordered wrt reads *on the same
 addresses*, but reads to different addresses can be reordered wrt to writes.

 So, if the unlocking add were not a locked operation:

 __add(lock-tickets.head, TICKET_LOCK_INC);  /* not locked */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

 Then the read of lock-tickets.tail can be reordered before the unlock,
 which introduces a race:

Yes, yes, thanks, but this is what I meant. We need a barrier. Even if
Every store is a release as Linus mentioned.

 This *might* be OK, but I think it's on dubious ground:

 __add(lock-tickets.head, TICKET_LOCK_INC);  /* not locked */

   /* read overlaps write, and so is ordered */
 if (unlikely(lock-head_tail  (TICKET_SLOWPATH_FLAG  TICKET_SHIFT))
 __ticket_unlock_slowpath(lock, prev);

 because I think Intel and AMD differed in interpretation about how
 overlapping but different-sized reads  writes are ordered (or it simply
 isn't architecturally defined).

can't comment, I simply so not know how the hardware works.

 If the slowpath flag is moved to head, then it would always have to be
 locked anyway, because it needs to be atomic against other CPU's RMW
 operations setting the flag.

Yes, this is true.

But again, if we want to avoid the read-after-unlock, we need to update
this lock and read SLOWPATH atomically, it seems that we can't avoid the
locked insn.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

2015-02-12 Thread Frederic Weisbecker
On Thu, Feb 12, 2015 at 10:47:10AM -0500, Rik van Riel wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 02/12/2015 10:42 AM, Frederic Weisbecker wrote:
  On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
  If exception_enter happens when already in IN_KERNEL state, the 
  code still calls context_tracking_exit, which ends up in 
  rcu_eqs_exit_common, which explodes with a WARN_ON when it is 
  called in a situation where dynticks are not enabled.
  
  Fortunately context_tracking_exit() already has a current_state ==
  IN_KERNEL check so this shouldn't be a problem.
 
 No, it had a hard-coded current_state == IN_USER check,
 which is very close, but ...
 
 ... I replaced that with a state argument, and forgot to
 ensure that it never gets called with state == IN_KERNEL.
 This patch fixes that.

Ah that's right! Well I'm going to merge this patch to 1/5 then to
avoid breaking bisection.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html