[tip: irq/core] irq: Simplify condition in irq_matrix_reserve()

2021-03-17 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the irq/core branch of tip:

Commit-ID: 2c6b02185cc608c19a22691fadc6ca2cd114c286
Gitweb:
https://git.kernel.org/tip/2c6b02185cc608c19a22691fadc6ca2cd114c286
Author:Juergen Gross 
AuthorDate:Thu, 11 Feb 2021 08:09:53 +01:00
Committer: Thomas Gleixner 
CommitterDate: Wed, 17 Mar 2021 21:44:01 +01:00

irq: Simplify condition in irq_matrix_reserve()

The if condition in irq_matrix_reserve() can be much simpler.

While at it fix a typo in the comment.

Signed-off-by: Juergen Gross 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20210211070953.5914-1-jgr...@suse.com

---
 kernel/irq/matrix.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 7a9465f..6f8b1d1 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -337,15 +337,14 @@ void irq_matrix_assign(struct irq_matrix *m, unsigned int 
bit)
  * irq_matrix_reserve - Reserve interrupts
  * @m: Matrix pointer
  *
- * This is merily a book keeping call. It increments the number of globally
+ * This is merely a book keeping call. It increments the number of globally
  * reserved interrupt bits w/o actually allocating them. This allows to
  * setup interrupt descriptors w/o assigning low level resources to it.
  * The actual allocation happens when the interrupt gets activated.
  */
 void irq_matrix_reserve(struct irq_matrix *m)
 {
-   if (m->global_reserved <= m->global_available &&
-   m->global_reserved + 1 > m->global_available)
+   if (m->global_reserved == m->global_available)
pr_warn("Interrupt reservation exceeds available resources\n");
 
m->global_reserved++;


[tip: x86/alternatives] x86/alternative: Support not-feature

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: dda7bb76484978316bb412a353789ebc5901de36
Gitweb:
https://git.kernel.org/tip/dda7bb76484978316bb412a353789ebc5901de36
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:10 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 16:44:01 +01:00

x86/alternative: Support not-feature

Add support for alternative patching for the case a feature is not
present on the current CPU. For users of ALTERNATIVE() and friends, an
inverted feature is specified by applying the ALT_NOT() macro to it,
e.g.:

  ALTERNATIVE(old, new, ALT_NOT(feature));

Committer note:

The decision to encode the NOT-bit in the feature bit itself is because
a future change which would make objtool generate such alternative
calls, would keep the code in objtool itself fairly simple.

Also, this allows for the alternative macros to support the NOT feature
without having to change them.

Finally, the u16 cpuid member encoding the X86_FEATURE_ flags is not an
ABI so if more bits are needed, cpuid itself can be enlarged or a flags
field can be added to struct alt_instr after having considered the size
growth in either cases.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210311142319.4723-6-jgr...@suse.com
---
 arch/x86/include/asm/alternative.h |  3 +++
 arch/x86/kernel/alternative.c  | 20 +++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 53f295f..649e56f 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -6,6 +6,9 @@
 #include 
 #include 
 
+#define ALTINSTR_FLAG_INV  (1 << 15)
+#define ALT_NOT(feat)  ((feat) | ALTINSTR_FLAG_INV)
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8d778e4..133b549 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -388,21 +388,31 @@ void __init_or_module noinline apply_alternatives(struct 
alt_instr *start,
 */
for (a = start; a < end; a++) {
int insn_buff_sz = 0;
+   /* Mask away "NOT" flag bit for feature to test. */
+   u16 feature = a->cpuid & ~ALTINSTR_FLAG_INV;
 
instr = (u8 *)&a->instr_offset + a->instr_offset;
replacement = (u8 *)&a->repl_offset + a->repl_offset;
BUG_ON(a->instrlen > sizeof(insn_buff));
-   BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
-   if (!boot_cpu_has(a->cpuid)) {
+   BUG_ON(feature >= (NCAPINTS + NBUGINTS) * 32);
+
+   /*
+* Patch if either:
+* - feature is present
+* - feature not present but ALTINSTR_FLAG_INV is set to mean,
+*   patch if feature is *NOT* present.
+*/
+   if (!boot_cpu_has(feature) == !(a->cpuid & ALTINSTR_FLAG_INV)) {
if (a->padlen > 1)
optimize_nops(a, instr);
 
continue;
}
 
-   DPRINTK("feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, 
len: %d), pad: %d",
-   a->cpuid >> 5,
-   a->cpuid & 0x1f,
+   DPRINTK("feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: 
(%px, len: %d), pad: %d",
+   (a->cpuid & ALTINSTR_FLAG_INV) ? "!" : "",
+   feature >> 5,
+   feature & 0x1f,
instr, instr, a->instrlen,
replacement, a->replacementlen, a->padlen);
 


[tip: x86/alternatives] x86/alternative: Merge include files

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 5e21a3ecad1500e35b46701e7f3f232e15d78e69
Gitweb:
https://git.kernel.org/tip/5e21a3ecad1500e35b46701e7f3f232e15d78e69
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:06 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 15:58:02 +01:00

x86/alternative: Merge include files

Merge arch/x86/include/asm/alternative-asm.h into
arch/x86/include/asm/alternative.h in order to make it easier to use
common definitions later.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgr...@suse.com
---
 arch/x86/entry/entry_32.S|   2 +-
 arch/x86/entry/vdso/vdso32/system_call.S |   2 +-
 arch/x86/include/asm/alternative-asm.h   | 114 +--
 arch/x86/include/asm/alternative.h   | 112 +-
 arch/x86/include/asm/nospec-branch.h |   1 +-
 arch/x86/include/asm/smap.h  |   5 +-
 arch/x86/lib/atomic64_386_32.S   |   2 +-
 arch/x86/lib/atomic64_cx8_32.S   |   2 +-
 arch/x86/lib/copy_page_64.S  |   2 +-
 arch/x86/lib/copy_user_64.S  |   2 +-
 arch/x86/lib/memcpy_64.S |   2 +-
 arch/x86/lib/memmove_64.S|   2 +-
 arch/x86/lib/memset_64.S |   2 +-
 arch/x86/lib/retpoline.S |   2 +-
 14 files changed, 120 insertions(+), 132 deletions(-)
 delete mode 100644 arch/x86/include/asm/alternative-asm.h

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017..4e079f2 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -40,7 +40,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/entry/vdso/vdso32/system_call.S 
b/arch/x86/entry/vdso/vdso32/system_call.S
index de1fff7..d6a6080 100644
--- a/arch/x86/entry/vdso/vdso32/system_call.S
+++ b/arch/x86/entry/vdso/vdso32/system_call.S
@@ -6,7 +6,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
.text
.globl __kernel_vsyscall
diff --git a/arch/x86/include/asm/alternative-asm.h 
b/arch/x86/include/asm/alternative-asm.h
deleted file mode 100644
index 464034d..000
--- a/arch/x86/include/asm/alternative-asm.h
+++ /dev/null
@@ -1,114 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_ALTERNATIVE_ASM_H
-#define _ASM_X86_ALTERNATIVE_ASM_H
-
-#ifdef __ASSEMBLY__
-
-#include 
-
-#ifdef CONFIG_SMP
-   .macro LOCK_PREFIX
-672:   lock
-   .pushsection .smp_locks,"a"
-   .balign 4
-   .long 672b - .
-   .popsection
-   .endm
-#else
-   .macro LOCK_PREFIX
-   .endm
-#endif
-
-/*
- * objtool annotation to ignore the alternatives and only consider the original
- * instruction(s).
- */
-.macro ANNOTATE_IGNORE_ALTERNATIVE
-   .Lannotate_\@:
-   .pushsection .discard.ignore_alts
-   .long .Lannotate_\@ - .
-   .popsection
-.endm
-
-/*
- * Issue one struct alt_instr descriptor entry (need to put it into
- * the section .altinstructions, see below). This entry contains
- * enough information for the alternatives patching code to patch an
- * instruction. See apply_alternatives().
- */
-.macro altinstruction_entry orig alt feature orig_len alt_len pad_len
-   .long \orig - .
-   .long \alt - .
-   .word \feature
-   .byte \orig_len
-   .byte \alt_len
-   .byte \pad_len
-.endm
-
-/*
- * Define an alternative between two instructions. If @feature is
- * present, early code in apply_alternatives() replaces @oldinstr with
- * @newinstr. ".skip" directive takes care of proper instruction padding
- * in case @newinstr is longer than @oldinstr.
- */
-.macro ALTERNATIVE oldinstr, newinstr, feature
-140:
-   \oldinstr
-141:
-   .skip -(((144f-143f)-(141b-140b)) > 0) * ((144f-143f)-(141b-140b)),0x90
-142:
-
-   .pushsection .altinstructions,"a"
-   altinstruction_entry 140b,143f,\feature,142b-140b,144f-143f,142b-141b
-   .popsection
-
-   .pushsection .altinstr_replacement,"ax"
-143:
-   \newinstr
-144:
-   .popsection
-.endm
-
-#define old_len141b-140b
-#define new_len1   144f-143f
-#define new_len2   145f-144f
-
-/*
- * gas compatible max based on the idea from:
- * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
- *
- * The additional "-" is needed because gas uses a "true" value of -1.
- */
-#define alt_max_short(a, b)((a) ^ (((a) ^ (b)) & -(-((a) < (b)
-
-
-/*
- * Same as ALTERNATIVE macro above but for two alternatives. If CPU
- * has @feature1, it replaces @oldinstr with @newinstr1. If CPU has
- * @feature2, it replaces @oldinstr with @feature2.
- */
-.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2
-140:
-   \oldinstr
-141:
-   .skip -((alt_max_short(new_len1, new_len2) - (old_len)) > 0) * \

[tip: x86/alternatives] static_call: Add function to query current function

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 6ea312d95e0226b306bb4b8ee3a0727d880378cb
Gitweb:
https://git.kernel.org/tip/6ea312d95e0226b306bb4b8ee3a0727d880378cb
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:08 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 16:12:33 +01:00

static_call: Add function to query current function

Some users of paravirtualized functions need to query which function
has been specified in a pv_ops vector element. In order to be able to
switch such paravirtualized functions to static_calls instead, there
needs to be a function to query the function which will be called via
static_call().

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-4-jgr...@suse.com
---
 include/linux/static_call.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 76b8812..e01b61a 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -20,6 +20,7 @@
  *   static_call(name)(args...);
  *   static_call_cond(name)(args...);
  *   static_call_update(name, func);
+ *   static_call_query(name);
  *
  * Usage example:
  *
@@ -91,6 +92,10 @@
  *
  *   which will include the required value tests to avoid NULL-pointer
  *   dereferences.
+ *
+ *   To query which function is currently set to be called, use:
+ *
+ *   func = static_call_query(name);
  */
 
 #include 
@@ -118,6 +123,8 @@ extern void arch_static_call_transform(void *site, void 
*tramp, void *func, bool
 STATIC_CALL_TRAMP_ADDR(name), func);   \
 })
 
+#define static_call_query(name) (READ_ONCE(STATIC_CALL_KEY(name).func))
+
 #ifdef CONFIG_HAVE_STATIC_CALL_INLINE
 
 extern int __init static_call_init(void);
@@ -191,6 +198,7 @@ static inline int static_call_init(void) { return 0; }
};  \
ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)
 
+
 #define static_call_cond(name) (void)__static_call(name)
 
 static inline


[tip: x86/alternatives] static_call: Move struct static_call_key definition to static_call_types.h

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: b046664872dd78a8bebe3d5f3bb9da9baa93f5ca
Gitweb:
https://git.kernel.org/tip/b046664872dd78a8bebe3d5f3bb9da9baa93f5ca
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:07 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 16:04:39 +01:00

static_call: Move struct static_call_key definition to static_call_types.h

Having the definition of static_call() in static_call_types.h makes
no sense as long struct static_call_key isn't defined there, as the
generic implementation of static_call() is referencing this structure.

So move the definition of struct static_call_key to static_call_types.h.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-3-jgr...@suse.com
---
 include/linux/static_call.h | 18 --
 include/linux/static_call_types.h   | 18 ++
 tools/include/linux/static_call_types.h | 18 ++
 3 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 85ecc78..76b8812 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -128,16 +128,6 @@ struct static_call_mod {
struct static_call_site *sites;
 };
 
-struct static_call_key {
-   void *func;
-   union {
-   /* bit 0: 0 = mods, 1 = sites */
-   unsigned long type;
-   struct static_call_mod *mods;
-   struct static_call_site *sites;
-   };
-};
-
 /* For finding the key associated with a trampoline */
 struct static_call_tramp_key {
s32 tramp;
@@ -187,10 +177,6 @@ extern long __static_call_return0(void);
 
 static inline int static_call_init(void) { return 0; }
 
-struct static_call_key {
-   void *func;
-};
-
 #define __DEFINE_STATIC_CALL(name, _func, _func_init)  \
DECLARE_STATIC_CALL(name, _func);   \
struct static_call_key STATIC_CALL_KEY(name) = {\
@@ -243,10 +229,6 @@ static inline long __static_call_return0(void)
 
 static inline int static_call_init(void) { return 0; }
 
-struct static_call_key {
-   void *func;
-};
-
 static inline long __static_call_return0(void)
 {
return 0;
diff --git a/include/linux/static_call_types.h 
b/include/linux/static_call_types.h
index ae5662d..5a00b8b 100644
--- a/include/linux/static_call_types.h
+++ b/include/linux/static_call_types.h
@@ -58,11 +58,25 @@ struct static_call_site {
__raw_static_call(name);\
 })
 
+struct static_call_key {
+   void *func;
+   union {
+   /* bit 0: 0 = mods, 1 = sites */
+   unsigned long type;
+   struct static_call_mod *mods;
+   struct static_call_site *sites;
+   };
+};
+
 #else /* !CONFIG_HAVE_STATIC_CALL_INLINE */
 
 #define __STATIC_CALL_ADDRESSABLE(name)
 #define __static_call(name)__raw_static_call(name)
 
+struct static_call_key {
+   void *func;
+};
+
 #endif /* CONFIG_HAVE_STATIC_CALL_INLINE */
 
 #ifdef MODULE
@@ -77,6 +91,10 @@ struct static_call_site {
 
 #else
 
+struct static_call_key {
+   void *func;
+};
+
 #define static_call(name)  \
((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func))
 
diff --git a/tools/include/linux/static_call_types.h 
b/tools/include/linux/static_call_types.h
index ae5662d..5a00b8b 100644
--- a/tools/include/linux/static_call_types.h
+++ b/tools/include/linux/static_call_types.h
@@ -58,11 +58,25 @@ struct static_call_site {
__raw_static_call(name);\
 })
 
+struct static_call_key {
+   void *func;
+   union {
+   /* bit 0: 0 = mods, 1 = sites */
+   unsigned long type;
+   struct static_call_mod *mods;
+   struct static_call_site *sites;
+   };
+};
+
 #else /* !CONFIG_HAVE_STATIC_CALL_INLINE */
 
 #define __STATIC_CALL_ADDRESSABLE(name)
 #define __static_call(name)__raw_static_call(name)
 
+struct static_call_key {
+   void *func;
+};
+
 #endif /* CONFIG_HAVE_STATIC_CALL_INLINE */
 
 #ifdef MODULE
@@ -77,6 +91,10 @@ struct static_call_site {
 
 #else
 
+struct static_call_key {
+   void *func;
+};
+
 #define static_call(name)  \
((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func))
 


[tip: x86/alternatives] x86/alternative: Support ALTERNATIVE_TERNARY

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: e208b3c4a9748b2c17aa09ba663b5096ccf82dce
Gitweb:
https://git.kernel.org/tip/e208b3c4a9748b2c17aa09ba663b5096ccf82dce
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:11 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 16:57:31 +01:00

x86/alternative: Support ALTERNATIVE_TERNARY

Add ALTERNATIVE_TERNARY support for replacing an initial instruction
with either of two instructions depending on a feature:

  ALTERNATIVE_TERNARY "default_instr", FEATURE_NR,
  "feature_on_instr", "feature_off_instr"

which will start with "default_instr" and at patch time will,
depending on FEATURE_NR being set or not, patch that with either
"feature_on_instr" or "feature_off_instr".

 [ bp: Add comment ontop. ]

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-7-jgr...@suse.com
---
 arch/x86/include/asm/alternative.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 649e56f..17b3609 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -179,6 +179,11 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
ALTINSTR_REPLACEMENT(newinstr2, 2)  \
".popsection\n"
 
+/* If @feature is set, patch in @newinstr_yes, otherwise @newinstr_no. */
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr_yes, newinstr_no) \
+   ALTERNATIVE_2(oldinstr, newinstr_no, X86_FEATURE_ALWAYS,\
+ newinstr_yes, feature)
+
 #define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, 
feat3) \
OLDINSTR_3(oldinsn, 1, 2, 3)
\
".pushsection .altinstructions,\"a\"\n" 
\
@@ -210,6 +215,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
 #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
asm_inline volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, 
newinstr2, feature2) ::: "memory")
 
+#define alternative_ternary(oldinstr, feature, newinstr_yes, newinstr_no) \
+   asm_inline volatile(ALTERNATIVE_TERNARY(oldinstr, feature, 
newinstr_yes, newinstr_no) ::: "memory")
+
 /*
  * Alternative inline assembly with input.
  *
@@ -380,6 +388,11 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
.popsection
 .endm
 
+/* If @feature is set, patch in @newinstr_yes, otherwise @newinstr_no. */
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr_yes, newinstr_no) \
+   ALTERNATIVE_2 oldinstr, newinstr_no, X86_FEATURE_ALWAYS,\
+   newinstr_yes, feature
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_ALTERNATIVE_H */


[tip: x86/alternatives] x86/paravirt: Switch time pvops functions to use static_call()

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: a0e2bf7cb7006b5a58ee81f4da4fe575875f2781
Gitweb:
https://git.kernel.org/tip/a0e2bf7cb7006b5a58ee81f4da4fe575875f2781
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:09 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 16:17:52 +01:00

x86/paravirt: Switch time pvops functions to use static_call()

The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgr...@suse.com
---
 arch/arm/include/asm/paravirt.h   | 14 +-
 arch/arm/kernel/paravirt.c|  9 +++--
 arch/arm64/include/asm/paravirt.h | 14 +-
 arch/arm64/kernel/paravirt.c  | 13 +
 arch/x86/Kconfig  |  1 +-
 arch/x86/include/asm/mshyperv.h   |  2 +-
 arch/x86/include/asm/paravirt.h   | 15 ---
 arch/x86/include/asm/paravirt_types.h |  6 +--
 arch/x86/kernel/cpu/vmware.c  |  5 +++--
 arch/x86/kernel/kvm.c |  2 +-
 arch/x86/kernel/kvmclock.c|  2 +-
 arch/x86/kernel/paravirt.c| 13 +
 arch/x86/kernel/tsc.c |  3 ++-
 arch/x86/xen/time.c   | 26 +-
 drivers/xen/time.c|  3 ++-
 15 files changed, 71 insertions(+), 57 deletions(-)

diff --git a/arch/arm/include/asm/paravirt.h b/arch/arm/include/asm/paravirt.h
index cdbf02d..95d5b0d 100644
--- a/arch/arm/include/asm/paravirt.h
+++ b/arch/arm/include/asm/paravirt.h
@@ -3,23 +3,19 @@
 #define _ASM_ARM_PARAVIRT_H
 
 #ifdef CONFIG_PARAVIRT
+#include 
+
 struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-struct pv_time_ops {
-   unsigned long long (*steal_clock)(int cpu);
-};
-
-struct paravirt_patch_template {
-   struct pv_time_ops time;
-};
+u64 dummy_steal_clock(int cpu);
 
-extern struct paravirt_patch_template pv_ops;
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
 
 static inline u64 paravirt_steal_clock(int cpu)
 {
-   return pv_ops.time.steal_clock(cpu);
+   return static_call(pv_steal_clock)(cpu);
 }
 #endif
 
diff --git a/arch/arm/kernel/paravirt.c b/arch/arm/kernel/paravirt.c
index 4cfed91..7dd9806 100644
--- a/arch/arm/kernel/paravirt.c
+++ b/arch/arm/kernel/paravirt.c
@@ -9,10 +9,15 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
-struct paravirt_patch_template pv_ops;
-EXPORT_SYMBOL_GPL(pv_ops);
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
diff --git a/arch/arm64/include/asm/paravirt.h 
b/arch/arm64/include/asm/paravirt.h
index cf3a0fd..9aa193e 100644
--- a/arch/arm64/include/asm/paravirt.h
+++ b/arch/arm64/include/asm/paravirt.h
@@ -3,23 +3,19 @@
 #define _ASM_ARM64_PARAVIRT_H
 
 #ifdef CONFIG_PARAVIRT
+#include 
+
 struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-struct pv_time_ops {
-   unsigned long long (*steal_clock)(int cpu);
-};
-
-struct paravirt_patch_template {
-   struct pv_time_ops time;
-};
+u64 dummy_steal_clock(int cpu);
 
-extern struct paravirt_patch_template pv_ops;
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
 
 static inline u64 paravirt_steal_clock(int cpu)
 {
-   return pv_ops.time.steal_clock(cpu);
+   return static_call(pv_steal_clock)(cpu);
 }
 
 int __init pv_time_init(void);
diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
index c07d7a0..75fed44 100644
--- a/arch/arm64/kernel/paravirt.c
+++ b/arch/arm64/kernel/paravirt.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -26,8 +27,12 @@
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
-struct paravirt_patch_template pv_ops;
-EXPORT_SYMBOL_GPL(pv_ops);
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
 
 struct pv_time_stolen_time_region {
struct pvclock_vcpu_stolen_time *kaddr;
@@ -45,7 +50,7 @@ static int __init parse_no_stealacc(char *arg)
 early_param("no-steal-acc", parse_no_stealacc);
 
 /* return stolen time in ns by asking the hypervisor */
-static u64 pv_steal_clock(int cpu)
+static u64 para_steal_clock(int cpu)
 {
struct pv_time_stolen_time_region *reg;
 
@@ -150,7 +155,7 @@ int __init pv_time_init(void)
if (ret)
  

[tip: x86/alternatives] x86/paravirt: Remove no longer needed 32-bit pvops cruft

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 33634e42e38be61f320183dfc264b9caba292d4e
Gitweb:
https://git.kernel.org/tip/33634e42e38be61f320183dfc264b9caba292d4e
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:14 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 19:51:55 +01:00

x86/paravirt: Remove no longer needed 32-bit pvops cruft

PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.

This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.

PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.

Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().

DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.

INTERRUPT_RETURN in 32-bit code can be replaced by iret.

ENABLE_INTERRUPTS is used nowhere, so it can be removed.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-10-jgr...@suse.com
---
 arch/x86/entry/entry_32.S |   4 +-
 arch/x86/include/asm/irqflags.h   |   5 +-
 arch/x86/include/asm/paravirt.h   |  35 +
 arch/x86/include/asm/paravirt_types.h | 112 +++--
 arch/x86/kernel/asm-offsets.c |   2 +-
 5 files changed, 35 insertions(+), 123 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 4e079f2..96f0848 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -430,7 +430,7 @@
 * will soon execute iret and the tracer was already set to
 * the irqstate after the IRET:
 */
-   DISABLE_INTERRUPTS(CLBR_ANY)
+   cli
lss (%esp), %esp/* switch to espfix segment */
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
@@ -1077,7 +1077,7 @@ restore_all_switch_stack:
 * when returning from IPI handler and when returning from
 * scheduler to user-space.
 */
-   INTERRUPT_RETURN
+   iret
 
 .section .fixup, "ax"
 SYM_CODE_START(asm_iret_error)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 144d70e..a0efbcd 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -109,9 +109,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 }
 #else
 
-#define ENABLE_INTERRUPTS(x)   sti
-#define DISABLE_INTERRUPTS(x)  cli
-
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(x)  pushfq; popq %rax
@@ -119,8 +116,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 
 #define INTERRUPT_RETURN   jmp native_iret
 
-#else
-#define INTERRUPT_RETURN   iret
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index def450f..a780509 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -719,6 +719,7 @@ extern void default_banner(void);
.if ((~(set)) & mask); pop %reg; .endif
 
 #ifdef CONFIG_X86_64
+#ifdef CONFIG_PARAVIRT_XXL
 
 #define PV_SAVE_REGS(set)  \
COND_PUSH(set, CLBR_RAX, rax);  \
@@ -744,46 +745,12 @@ extern void default_banner(void);
 #define PARA_PATCH(off)((off) / 8)
 #define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .quad, 8)
 #define PARA_INDIRECT(addr)*addr(%rip)
-#else
-#define PV_SAVE_REGS(set)  \
-   COND_PUSH(set, CLBR_EAX, eax);  \
-   COND_PUSH(set, CLBR_EDI, edi);  \
-   COND_PUSH(set, CLBR_ECX, ecx);  \
-   COND_PUSH(set, CLBR_EDX, edx)
-#define PV_RESTORE_REGS(set)   \
-   COND_POP(set, CLBR_EDX, edx);   \
-   COND_POP(set, CLBR_ECX, ecx);   \
-   COND_POP(set, CLBR_EDI, edi);   \
-   COND_POP(set, CLBR_EAX, eax)
-
-#define PARA_PATCH(off)((off) / 4)
-#define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .long, 4)
-#define PARA_INDIRECT(addr)*%cs:addr
-#endif
 
-#ifdef CONFIG_PARAVIRT_XXL
 #define INTERRUPT_RETURN   \
PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
  ANNOTATE_RETPOLINE_SAFE;  \
  jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
 
-#define DISABLE_INTERRUPTS(clobbers)   \
-   PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable),   \
- PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- ANNOTATE_RETPOLINE_SAFE;  \
- 

[tip: x86/alternatives] x86/alternative: Use ALTERNATIVE_TERNARY() in _static_cpu_has()

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 2fe2a2c7a97c9bc32acc79154b75e754280f7867
Gitweb:
https://git.kernel.org/tip/2fe2a2c7a97c9bc32acc79154b75e754280f7867
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:12 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 19:33:43 +01:00

x86/alternative: Use ALTERNATIVE_TERNARY() in _static_cpu_has()

_static_cpu_has() contains a completely open coded version of
ALTERNATIVE_TERNARY(). Replace that with the macro instead.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210311142319.4723-8-jgr...@suse.com
---
 arch/x86/include/asm/cpufeature.h | 41 ++
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 1728d4c..16a51e7 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 enum cpuid_leafs
 {
@@ -175,39 +176,15 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned 
int bit);
  */
 static __always_inline bool _static_cpu_has(u16 bit)
 {
-   asm_volatile_goto("1: jmp 6f\n"
-"2:\n"
-".skip -(((5f-4f) - (2b-1b)) > 0) * "
-"((5f-4f) - (2b-1b)),0x90\n"
-"3:\n"
-".section .altinstructions,\"a\"\n"
-" .long 1b - .\n"  /* src offset */
-" .long 4f - .\n"  /* repl offset */
-" .word %P[always]\n"  /* always replace */
-" .byte 3b - 1b\n" /* src len */
-" .byte 5f - 4f\n" /* repl len */
-" .byte 3b - 2b\n" /* pad len */
-".previous\n"
-".section .altinstr_replacement,\"ax\"\n"
-"4: jmp %l[t_no]\n"
-"5:\n"
-".previous\n"
-".section .altinstructions,\"a\"\n"
-" .long 1b - .\n"  /* src offset */
-" .long 0\n"   /* no replacement */
-" .word %P[feature]\n" /* feature bit */
-" .byte 3b - 1b\n" /* src len */
-" .byte 0\n"   /* repl len */
-" .byte 0\n"   /* pad len */
-".previous\n"
-".section .altinstr_aux,\"ax\"\n"
-"6:\n"
-" testb %[bitnum],%[cap_byte]\n"
-" jnz %l[t_yes]\n"
-" jmp %l[t_no]\n"
-".previous\n"
+   asm_volatile_goto(
+   ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]")
+   ".section .altinstr_aux,\"ax\"\n"
+   "6:\n"
+   " testb %[bitnum],%[cap_byte]\n"
+   " jnz %l[t_yes]\n"
+   " jmp %l[t_no]\n"
+   ".previous\n"
 : : [feature]  "i" (bit),
-[always]   "i" (X86_FEATURE_ALWAYS),
 [bitnum]   "i" (1 << (bit & 7)),
 [cap_byte] "m" (((const char 
*)boot_cpu_data.x86_capability)[bit >> 3])
 : : t_yes, t_no);


[tip: x86/alternatives] x86/paravirt: Add new features for paravirt patching

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 4e6292114c741221479046515b1aa8145cf1e3f6
Gitweb:
https://git.kernel.org/tip/4e6292114c741221479046515b1aa8145cf1e3f6
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:13 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 19:51:49 +01:00

x86/paravirt: Add new features for paravirt patching

For being able to switch paravirt patching from special cased custom
code sequences to ALTERNATIVE handling some X86_FEATURE_* are needed
as new features. This enables to have the standard indirect pv call
as the default code and to patch that with the non-Xen custom code
sequence via ALTERNATIVE patching later.

Make sure paravirt patching is performed before alternatives patching.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-9-jgr...@suse.com
---
 arch/x86/include/asm/cpufeatures.h   |  2 ++-
 arch/x86/include/asm/paravirt.h  | 10 +-
 arch/x86/kernel/alternative.c| 30 +--
 arch/x86/kernel/paravirt-spinlocks.c |  9 -
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index cc96e26..b440c95 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -236,6 +236,8 @@
 #define X86_FEATURE_EPT_AD ( 8*32+17) /* Intel Extended Page Table 
access-dirty bit */
 #define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports 
the VMCALL instruction */
 #define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
+#define X86_FEATURE_PVUNLOCK   ( 8*32+20) /* "" PV unlock function */
+#define X86_FEATURE_VCPUPREEMPT( 8*32+21) /* "" PV 
vcpu_is_preempted function */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6408fd0..def450f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -45,6 +45,10 @@ static inline u64 paravirt_steal_clock(int cpu)
return static_call(pv_steal_clock)(cpu);
 }
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init paravirt_set_cap(void);
+#endif
+
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
@@ -809,5 +813,11 @@ static inline void paravirt_arch_exit_mmap(struct 
mm_struct *mm)
 {
 }
 #endif
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+static inline void paravirt_set_cap(void)
+{
+}
+#endif
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PARAVIRT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 133b549..76ad4ce 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int __read_mostly alternatives_patched;
 
@@ -733,6 +734,33 @@ void __init alternative_instructions(void)
 * patching.
 */
 
+   /*
+* Paravirt patching and alternative patching can be combined to
+* replace a function call with a short direct code sequence (e.g.
+* by setting a constant return value instead of doing that in an
+* external function).
+* In order to make this work the following sequence is required:
+* 1. set (artificial) features depending on used paravirt
+*functions which can later influence alternative patching
+* 2. apply paravirt patching (generally replacing an indirect
+*function call with a direct one)
+* 3. apply alternative patching (e.g. replacing a direct function
+*call with a custom code sequence)
+* Doing paravirt patching after alternative patching would clobber
+* the optimization of the custom code with a function call again.
+*/
+   paravirt_set_cap();
+
+   /*
+* First patch paravirt functions, such that we overwrite the indirect
+* call with the direct call.
+*/
+   apply_paravirt(__parainstructions, __parainstructions_end);
+
+   /*
+* Then patch alternatives, such that those paravirt calls that are in
+* alternatives can be overwritten by their immediate fragments.
+*/
apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 #ifdef CONFIG_SMP
@@ -751,8 +779,6 @@ void __init alternative_instructions(void)
}
 #endif
 
-   apply_paravirt(__parainstructions, __parainstructions_end);
-
restart_nmi();
alternatives_patched = 1;
 }
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index 4f75d0c..9e1ea99 100644
---

[tip: x86/alternatives] x86/paravirt: Add new PVOP_ALT* macros to support pvops in ALTERNATIVEs

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 00aa3193ab7a04b25bb8c68e377815696eb5bf56
Gitweb:
https://git.kernel.org/tip/00aa3193ab7a04b25bb8c68e377815696eb5bf56
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:17 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 20:05:44 +01:00

x86/paravirt: Add new PVOP_ALT* macros to support pvops in ALTERNATIVEs

Instead of using paravirt patching for custom code sequences add
support for using ALTERNATIVE handling combined with paravirt call
patching.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-13-jgr...@suse.com
---
 arch/x86/include/asm/paravirt_types.h | 49 +-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0afdac8..0ed9762 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -477,44 +477,91 @@ int paravirt_disable_iospace(void);
ret;\
})
 
+#define PVOP_ALT_CALL(ret, op, alt, cond, clbr, call_clbr, \
+ extra_clbr, ...)  \
+   ({  \
+   PVOP_CALL_ARGS; \
+   PVOP_TEST_NULL(op); \
+   asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),   \
+alt, cond) \
+: call_clbr, ASM_CALL_CONSTRAINT   \
+: paravirt_type(op),   \
+  paravirt_clobber(clbr),  \
+  ##__VA_ARGS__\
+: "memory", "cc" extra_clbr);  \
+   ret;\
+   })
+
 #define __PVOP_CALL(rettype, op, ...)  \
PVOP_CALL(PVOP_RETVAL(rettype), op, CLBR_ANY,   \
  PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALL(rettype, op, alt, cond, ...)   \
+   PVOP_ALT_CALL(PVOP_RETVAL(rettype), op, alt, cond, CLBR_ANY,\
+ PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS,   \
+ ##__VA_ARGS__)
+
 #define __PVOP_CALLEESAVE(rettype, op, ...)\
PVOP_CALL(PVOP_RETVAL(rettype), op.func, CLBR_RET_REG,  \
  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond, ...) \
+   PVOP_ALT_CALL(PVOP_RETVAL(rettype), op.func, alt, cond, \
+ CLBR_RET_REG, PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
+
+
 #define __PVOP_VCALL(op, ...)  \
(void)PVOP_CALL(, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\
   VEXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALL(op, alt, cond, ...)   \
+   (void)PVOP_ALT_CALL(, op, alt, cond, CLBR_ANY,  \
+   PVOP_VCALL_CLOBBERS, VEXTRA_CLOBBERS,   \
+   ##__VA_ARGS__)
+
 #define __PVOP_VCALLEESAVE(op, ...)\
(void)PVOP_CALL(, op.func, CLBR_RET_REG,\
- PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
+   PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALLEESAVE(op, alt, cond, ...) \
+   (void)PVOP_ALT_CALL(, op.func, alt, cond, CLBR_RET_REG, \
+   PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
 
 #define PVOP_CALL0(rettype, op)
\
__PVOP_CALL(rettype, op)
 #define PVOP_VCALL0(op)
\
__PVOP_VCALL(op)
+#define PVOP_ALT_CALL0(rettype, op, alt, cond) \
+   __PVOP_ALT_CALL(rettype, op, alt, cond)
+#define PVOP_ALT_VCALL0(op, alt, cond) \
+   __PVOP_ALT_VCALL(op, alt, cond)
 
 #define PVOP_CALLEE0(rettype, op)  \
__PVOP_CALLEESAVE(rettype, op)
 #define PVOP_VCALLEE0(op)  \
__PVOP_VCALLEESAVE(op)
+#define PVOP_ALT_CALLEE0(rettype, op, alt, cond)   \
+   __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond)
+#define PVOP_ALT_VCALLEE0(op, alt, cond)

[tip: x86/alternatives] x86/paravirt: Switch iret pvops to ALTERNATIVE

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: ae755b5a45482b5de4d96d6f35823076af77445e
Gitweb:
https://git.kernel.org/tip/ae755b5a45482b5de4d96d6f35823076af77445e
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:16 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 19:58:54 +01:00

x86/paravirt: Switch iret pvops to ALTERNATIVE

The iret paravirt op is rather special as it is using a jmp instead
of a call instruction. Switch it to ALTERNATIVE.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-12-jgr...@suse.com
---
 arch/x86/include/asm/paravirt.h   |  6 +++---
 arch/x86/include/asm/paravirt_types.h |  5 +
 arch/x86/kernel/asm-offsets.c |  5 +-
 arch/x86/kernel/paravirt.c| 26 ++
 arch/x86/xen/enlighten_pv.c   |  3 +--
 5 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a780509..913acf7 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -747,9 +747,9 @@ extern void default_banner(void);
 #define PARA_INDIRECT(addr)*addr(%rip)
 
 #define INTERRUPT_RETURN   \
-   PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
+   ANNOTATE_RETPOLINE_SAFE;\
+   ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",\
+   X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
 
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 45bd216..0afdac8 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -151,10 +151,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /* Normal iret.  Jump to this with the standard iret stack
-  frame set up. */
-   void (*iret)(void);
-
void (*start_context_switch)(struct task_struct *prev);
void (*end_context_switch)(struct task_struct *next);
 #endif
@@ -294,6 +290,7 @@ struct paravirt_patch_template {
 
 extern struct pv_info pv_info;
 extern struct paravirt_patch_template pv_ops;
+extern void (*paravirt_iret)(void);
 
 #define PARAVIRT_PATCH(x)  \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 7365080..ecd3fd6 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -61,11 +61,6 @@ static void __used common(void)
OFFSET(IA32_RT_SIGFRAME_sigcontext, rt_sigframe_ia32, uc.uc_mcontext);
 #endif
 
-#ifdef CONFIG_PARAVIRT_XXL
-   BLANK();
-   OFFSET(PV_CPU_iret, paravirt_patch_template, cpu.iret);
-#endif
-
 #ifdef CONFIG_XEN
BLANK();
OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index a688edf..9b0f568 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -86,25 +86,6 @@ u64 notrace _paravirt_ident_64(u64 x)
 {
return x;
 }
-
-static unsigned paravirt_patch_jmp(void *insn_buff, const void *target,
-  unsigned long addr, unsigned len)
-{
-   struct branch *b = insn_buff;
-   unsigned long delta = (unsigned long)target - (addr+5);
-
-   if (len < 5) {
-#ifdef CONFIG_RETPOLINE
-   WARN_ONCE(1, "Failing to patch indirect JMP in %ps\n", (void 
*)addr);
-#endif
-   return len; /* call too long for patch site */
-   }
-
-   b->opcode = 0xe9;   /* jmp */
-   b->delta = delta;
-
-   return 5;
-}
 #endif
 
 DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
@@ -136,9 +117,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
else if (opfunc == _paravirt_ident_64)
ret = paravirt_patch_ident_64(insn_buff, len);
 
-   else if (type == PARAVIRT_PATCH(cpu.iret))
-   /* If operation requires a jmp, then jmp */
-   ret = paravirt_patch_jmp(insn_buff, opfunc, addr, len);
 #endif
else
/* Otherwise call the function. */
@@ -313,8 +291,6 @@ struct paravirt_patch_template pv_ops = {
 
.cpu.load_sp0   = native_load_sp0,
 
-   .cpu.iret   = native_iret,
-
 #ifdef CONFIG_X86_IOPL_IOPERM
.cpu.invalidate_io_bitmap   = native_tss_invalidate_io_bitmap,
.cpu.update_io_bitmap   = native_tss_update_io_bitmap,
@@ -419,6 +395,8 @@ st

[tip: x86/alternatives] x86/paravirt: Simplify paravirt macros

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 0b8d366a942fd48a83dfa728e9f8a8d8b20e735f
Gitweb:
https://git.kernel.org/tip/0b8d366a942fd48a83dfa728e9f8a8d8b20e735f
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:15 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 19:52:52 +01:00

x86/paravirt: Simplify paravirt macros

The central pvops call macros PVOP_CALL() and PVOP_VCALL() are
looking very similar now.

The main differences are using PVOP_VCALL_ARGS or PVOP_CALL_ARGS, which
are identical, and the return value handling.

So drop PVOP_VCALL_ARGS and instead of PVOP_VCALL() just use
(void)PVOP_CALL(long, ...).

Note that it isn't easily possible to just redefine PVOP_VCALL()
to use PVOP_CALL() instead, as this would require further hiding of
commas in macro parameters.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-11-jgr...@suse.com
---
 arch/x86/include/asm/paravirt_types.h | 41 +++---
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 42f9eef..45bd216 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -408,11 +408,9 @@ int paravirt_disable_iospace(void);
  * makes sure the incoming and outgoing types are always correct.
  */
 #ifdef CONFIG_X86_32
-#define PVOP_VCALL_ARGS
\
+#define PVOP_CALL_ARGS \
unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "a" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "d" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "c" ((unsigned long)(x))
@@ -428,12 +426,10 @@ int paravirt_disable_iospace(void);
 #define VEXTRA_CLOBBERS
 #else  /* CONFIG_X86_64 */
 /* [re]ax isn't an arg, but the return val */
-#define PVOP_VCALL_ARGS\
+#define PVOP_CALL_ARGS \
unsigned long __edi = __edi, __esi = __esi, \
__edx = __edx, __ecx = __ecx, __eax = __eax;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "D" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "S" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "d" ((unsigned long)(x))
@@ -458,59 +454,46 @@ int paravirt_disable_iospace(void);
 #define PVOP_TEST_NULL(op) ((void)pv_ops.op)
 #endif
 
-#define PVOP_RETMASK(rettype)  \
+#define PVOP_RETVAL(rettype)   \
({  unsigned long __mask = ~0UL;\
+   BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));  \
switch (sizeof(rettype)) {  \
case 1: __mask =   0xffUL; break;   \
case 2: __mask = 0xUL; break;   \
case 4: __mask = 0xUL; break;   \
default: break; \
}   \
-   __mask; \
+   __mask & __eax; \
})
 
 
-#define PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, ...)   \
+#define PVOP_CALL(ret, op, clbr, call_clbr, extra_clbr, ...)   \
({  \
PVOP_CALL_ARGS; \
PVOP_TEST_NULL(op); \
-   BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));  \
asm volatile(paravirt_alt(PARAVIRT_CALL)\
 : call_clbr, ASM_CALL_CONSTRAINT   \
 : paravirt_type(op),   \
   paravirt_clobber(clbr),  \
   ##__VA_ARGS__\
 : "memory", "cc" extra_clbr);  \
-   (rettype)(__eax & PVOP_RETMASK(rettype));   \
+   ret;\
})
 
 #define __PVOP_CALL(rettype, op, ...)  \
-   PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,\
- EXTRA_CLOBBERS, ##__VA_ARGS__)
+   PVOP_CA

[tip: x86/alternatives] x86/paravirt: Switch functions with custom code to ALTERNATIVE

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: fafe5e74229fd3f425e3cbfc68b90e615aa6d62f
Gitweb:
https://git.kernel.org/tip/fafe5e74229fd3f425e3cbfc68b90e615aa6d62f
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:18 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 20:07:01 +01:00

x86/paravirt: Switch functions with custom code to ALTERNATIVE

Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.

Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.

Simplify the register defines for assembler paravirt calling, as there
isn't much usage left.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-14-jgr...@suse.com
---
 arch/x86/entry/entry_64.S |   2 +-
 arch/x86/include/asm/irqflags.h   |   2 +-
 arch/x86/include/asm/paravirt.h   | 101 -
 arch/x86/include/asm/paravirt_types.h |   6 +-
 arch/x86/kernel/paravirt.c|  16 +---
 arch/x86/kernel/paravirt_patch.c  |  88 +--
 6 files changed, 58 insertions(+), 157 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 400908d..12e2e3c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -305,7 +305,7 @@ SYM_CODE_END(ret_from_fork)
 .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
 #ifdef CONFIG_DEBUG_ENTRY
pushq %rax
-   SAVE_FLAGS(CLBR_RAX)
+   SAVE_FLAGS
testl $X86_EFLAGS_IF, %eax
jz .Lokay_\@
ud2
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index a0efbcd..c5ce984 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -111,7 +111,7 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
-#define SAVE_FLAGS(x)  pushfq; popq %rax
+#define SAVE_FLAGS pushfq; popq %rax
 #endif
 
 #define INTERRUPT_RETURN   jmp native_iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 913acf7..43992e5 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -135,7 +135,9 @@ static inline void write_cr0(unsigned long x)
 
 static inline unsigned long read_cr2(void)
 {
-   return PVOP_CALLEE0(unsigned long, mmu.read_cr2);
+   return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2,
+   "mov %%cr2, %%rax;",
+   ALT_NOT(X86_FEATURE_XENPV));
 }
 
 static inline void write_cr2(unsigned long x)
@@ -145,12 +147,14 @@ static inline void write_cr2(unsigned long x)
 
 static inline unsigned long __read_cr3(void)
 {
-   return PVOP_CALL0(unsigned long, mmu.read_cr3);
+   return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3,
+ "mov %%cr3, %%rax;", ALT_NOT(X86_FEATURE_XENPV));
 }
 
 static inline void write_cr3(unsigned long x)
 {
-   PVOP_VCALL1(mmu.write_cr3, x);
+   PVOP_ALT_VCALL1(mmu.write_cr3, x,
+   "mov %%rdi, %%cr3", ALT_NOT(X86_FEATURE_XENPV));
 }
 
 static inline void __write_cr4(unsigned long x)
@@ -170,7 +174,7 @@ static inline void halt(void)
 
 static inline void wbinvd(void)
 {
-   PVOP_VCALL0(cpu.wbinvd);
+   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
 }
 
 static inline u64 paravirt_read_msr(unsigned msr)
@@ -384,22 +388,28 @@ static inline void paravirt_release_p4d(unsigned long pfn)
 
 static inline pte_t __pte(pteval_t val)
 {
-   return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) };
+   return (pte_t) { PVOP_ALT_CALLEE1(pteval_t, mmu.make_pte, val,
+ "mov %%rdi, %%rax",
+ ALT_NOT(X86_FEATURE_XENPV)) };
 }
 
 static inline pteval_t pte_val(pte_t pte)
 {
-   return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
+   return PVOP_ALT_CALLEE1(pteval_t, mmu.pte_val, pte.pte,
+   "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
 }
 
 static inline pgd_t __pgd(pgdval_t val)
 {
-   return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) };
+   return (pgd_t) { PVOP_ALT_CALLEE1(pgdval_t, mmu.make_pgd, val,
+ "mov %%rdi, %%rax",
+ ALT_NOT(X86_FEATURE_XENPV)) };
 }
 
 static inline pgdval_t pgd_val(pgd_t pgd)
 {
-   return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
+   return PVOP_ALT_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd,
+   "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -432,12 +442,15 @@ stati

[tip: x86/alternatives] x86/paravirt: Have only one paravirt patch function

2021-03-12 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: 054ac8ad5ebe4a69e1f0e842483821ddbe560121
Gitweb:
https://git.kernel.org/tip/054ac8ad5ebe4a69e1f0e842483821ddbe560121
Author:Juergen Gross 
AuthorDate:Thu, 11 Mar 2021 15:23:19 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 11 Mar 2021 20:11:09 +01:00

x86/paravirt: Have only one paravirt patch function

There is no need any longer to have different paravirt patch functions
for native and Xen. Eliminate native_patch() and rename
paravirt_patch_default() to paravirt_patch().

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210311142319.4723-15-jgr...@suse.com
---
 arch/x86/include/asm/paravirt_types.h | 19 +--
 arch/x86/kernel/Makefile  |  3 +--
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/paravirt.c| 20 ++--
 arch/x86/kernel/paravirt_patch.c  | 11 ---
 arch/x86/xen/enlighten_pv.c   |  1 -
 6 files changed, 5 insertions(+), 51 deletions(-)
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 588ff14..9d1ddb7 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -68,19 +68,6 @@ struct pv_info {
const char *name;
 };
 
-struct pv_init_ops {
-   /*
-* Patch may replace one of the defined code sequences with
-* arbitrary code, subject to the same register constraints.
-* This generally means the code is not free to clobber any
-* registers other than EAX.  The patch function should return
-* the number of bytes of code generated, as we nop pad the
-* rest in generic code.
-*/
-   unsigned (*patch)(u8 type, void *insn_buff,
- unsigned long addr, unsigned len);
-} __no_randomize_layout;
-
 #ifdef CONFIG_PARAVIRT_XXL
 struct pv_lazy_ops {
/* Set deferred update mode, used for batching operations. */
@@ -276,7 +263,6 @@ struct pv_lock_ops {
  * number for each function using the offset which we use to indicate
  * what to patch. */
 struct paravirt_patch_template {
-   struct pv_init_ops  init;
struct pv_cpu_ops   cpu;
struct pv_irq_ops   irq;
struct pv_mmu_ops   mmu;
@@ -317,10 +303,7 @@ extern void (*paravirt_iret)(void);
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, 
unsigned len);
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char 
*start, const char *end);
-
-unsigned native_patch(u8 type, void *insn_buff, unsigned long addr, unsigned 
len);
+unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr, 
unsigned int len);
 
 int paravirt_disable_iospace(void);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ddf083..0704c2a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -35,7 +35,6 @@ KASAN_SANITIZE_sev-es.o   
:= n
 KCSAN_SANITIZE := n
 
 OBJECT_FILES_NON_STANDARD_test_nx.o:= y
-OBJECT_FILES_NON_STANDARD_paravirt_patch.o := y
 
 ifdef CONFIG_FRAME_POINTER
 OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y
@@ -121,7 +120,7 @@ obj-$(CONFIG_AMD_NB)+= amd_nb.o
 obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
 
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvmclock.o
-obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 76ad4ce..f810e6f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -616,7 +616,7 @@ void __init_or_module apply_paravirt(struct 
paravirt_patch_site *start,
BUG_ON(p->len > MAX_PATCH_LEN);
/* prep the buffer with the original instructions */
memcpy(insn_buff, p->instr, p->len);
-   used = pv_ops.init.patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
+   used = paravirt_patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
 
BUG_ON(used > p->len);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 855ae08..d073026 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -99,8 +99,8 @@ void __init native_pv_lock_init(void)
static_branch_disable(&virt_spin_lock_key);
 }
 
-unsigned paravirt_patch_default(u8 ty

[tip: x86/alternatives] x86/alternative: Drop unused feature parameter from ALTINSTR_REPLACEMENT()

2021-03-09 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID: db16e07269c2b4346e4332e43f04e447ef14fd2f
Gitweb:
https://git.kernel.org/tip/db16e07269c2b4346e4332e43f04e447ef14fd2f
Author:Juergen Gross 
AuthorDate:Tue, 09 Mar 2021 14:48:04 +01:00
Committer: Borislav Petkov 
CommitterDate: Tue, 09 Mar 2021 20:08:28 +01:00

x86/alternative: Drop unused feature parameter from ALTINSTR_REPLACEMENT()

The macro ALTINSTR_REPLACEMENT() doesn't make use of the feature
parameter, so drop it.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210309134813.23912-4-jgr...@suse.com
---
 arch/x86/include/asm/alternative.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 13adca3..5753fb2 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -150,7 +150,7 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
" .byte " alt_rlen(num) "\n"/* replacement len */ \
" .byte " alt_pad_len "\n"  /* pad len */
 
-#define ALTINSTR_REPLACEMENT(newinstr, feature, num)   /* replacement */   
\
+#define ALTINSTR_REPLACEMENT(newinstr, num)/* replacement */   
\
"# ALT: replacement " #num "\n" 
\
b_replacement(num)":\n\t" newinstr "\n" e_replacement(num) ":\n"
 
@@ -161,7 +161,7 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
ALTINSTR_ENTRY(feature, 1)  \
".popsection\n" \
".pushsection .altinstr_replacement, \"ax\"\n"  \
-   ALTINSTR_REPLACEMENT(newinstr, feature, 1)  \
+   ALTINSTR_REPLACEMENT(newinstr, 1)   \
".popsection\n"
 
 #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
@@ -171,8 +171,8 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
ALTINSTR_ENTRY(feature2, 2) \
".popsection\n" \
".pushsection .altinstr_replacement, \"ax\"\n"  \
-   ALTINSTR_REPLACEMENT(newinstr1, feature1, 1)\
-   ALTINSTR_REPLACEMENT(newinstr2, feature2, 2)\
+   ALTINSTR_REPLACEMENT(newinstr1, 1)  \
+   ALTINSTR_REPLACEMENT(newinstr2, 2)  \
".popsection\n"
 
 #define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, 
feat3) \
@@ -183,9 +183,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
ALTINSTR_ENTRY(feat3, 3)
\
".popsection\n" 
\
".pushsection .altinstr_replacement, \"ax\"\n"  
\
-   ALTINSTR_REPLACEMENT(newinsn1, feat1, 1)
\
-   ALTINSTR_REPLACEMENT(newinsn2, feat2, 2)
\
-   ALTINSTR_REPLACEMENT(newinsn3, feat3, 3)
\
+   ALTINSTR_REPLACEMENT(newinsn1, 1)   
\
+   ALTINSTR_REPLACEMENT(newinsn2, 2)   
\
+   ALTINSTR_REPLACEMENT(newinsn3, 3)   
\
".popsection\n"
 
 /*


[tip: locking/core] locking/csd_lock: Prepare more CSD lock debugging

2021-03-06 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: de7b09ef658d637eed0584eaba30884e409aef31
Gitweb:
https://git.kernel.org/tip/de7b09ef658d637eed0584eaba30884e409aef31
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:35 +01:00
Committer: Ingo Molnar 
CommitterDate: Sat, 06 Mar 2021 12:49:48 +01:00

locking/csd_lock: Prepare more CSD lock debugging

In order to be able to easily add more CSD lock debugging data to
struct call_function_data->csd move the call_single_data_t element
into a sub-structure.

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20210301101336.7797-3-jgr...@suse.com
---
 kernel/smp.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index d5f0b21..6d7e6db 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -31,8 +31,12 @@
 
 #define CSD_TYPE(_csd) ((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
 
+struct cfd_percpu {
+   call_single_data_t  csd;
+};
+
 struct call_function_data {
-   call_single_data_t  __percpu *csd;
+   struct cfd_percpu   __percpu *pcpu;
cpumask_var_t   cpumask;
cpumask_var_t   cpumask_ipi;
 };
@@ -55,8 +59,8 @@ int smpcfd_prepare_cpu(unsigned int cpu)
free_cpumask_var(cfd->cpumask);
return -ENOMEM;
}
-   cfd->csd = alloc_percpu(call_single_data_t);
-   if (!cfd->csd) {
+   cfd->pcpu = alloc_percpu(struct cfd_percpu);
+   if (!cfd->pcpu) {
free_cpumask_var(cfd->cpumask);
free_cpumask_var(cfd->cpumask_ipi);
return -ENOMEM;
@@ -71,7 +75,7 @@ int smpcfd_dead_cpu(unsigned int cpu)
 
free_cpumask_var(cfd->cpumask);
free_cpumask_var(cfd->cpumask_ipi);
-   free_percpu(cfd->csd);
+   free_percpu(cfd->pcpu);
return 0;
 }
 
@@ -694,7 +698,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
 
cpumask_clear(cfd->cpumask_ipi);
for_each_cpu(cpu, cfd->cpumask) {
-   call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
+   call_single_data_t *csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd;
 
if (cond_func && !cond_func(cpu, info))
continue;
@@ -719,7 +723,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
for_each_cpu(cpu, cfd->cpumask) {
call_single_data_t *csd;
 
-   csd = per_cpu_ptr(cfd->csd, cpu);
+   csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd;
csd_lock_wait(csd);
}
}


[tip: locking/core] locking/csd_lock: Add more data to CSD lock debugging

2021-03-06 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: a5aabace5fb8abf2adcfcf0fe54c089b20d71755
Gitweb:
https://git.kernel.org/tip/a5aabace5fb8abf2adcfcf0fe54c089b20d71755
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:36 +01:00
Committer: Ingo Molnar 
CommitterDate: Sat, 06 Mar 2021 12:49:48 +01:00

locking/csd_lock: Add more data to CSD lock debugging

In order to help identifying problems with IPI handling and remote
function execution add some more data to IPI debugging code.

There have been multiple reports of CPUs looping long times (many
seconds) in smp_call_function_many() waiting for another CPU executing
a function like tlb flushing. Most of these reports have been for
cases where the kernel was running as a guest on top of KVM or Xen
(there are rumours of that happening under VMWare, too, and even on
bare metal).

Finding the root cause hasn't been successful yet, even after more than
2 years of chasing this bug by different developers.

Commit:

  35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics")

tried to address this by adding some debug code and by issuing another
IPI when a hang was detected. This helped mitigating the problem
(the repeated IPI unlocks the hang), but the root cause is still unknown.

Current available data suggests that either an IPI wasn't sent when it
should have been, or that the IPI didn't result in the target CPU
executing the queued function (due to the IPI not reaching the CPU,
the IPI handler not being called, or the handler not seeing the queued
request).

Try to add more diagnostic data by introducing a global atomic counter
which is being incremented when doing critical operations (before and
after queueing a new request, when sending an IPI, and when dequeueing
a request). The counter value is stored in percpu variables which can
be printed out when a hang is detected.

The data of the last event (consisting of sequence counter, source
CPU, target CPU, and event type) is stored in a global variable. When
a new event is to be traced, the data of the last event is stored in
the event related percpu location and the global data is updated with
the new event's data. This allows to track two events in one data
location: one by the value of the event data (the event before the
current one), and one by the location itself (the current event).

A typical printout with a detected hang will look like this:

csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 53 ns for 
CPU#06 scf_handler_1+0x0/0x50(0xa2a881bb1410).
csd: CSD lock (#1) handling prior 
scf_handler_1+0x0/0x50(0xa2a8813823c0) request.
csd: cnt(8cc): -> dequeue (src cpu 0 == empty)
csd: cnt(8cd): ->0006 idle
csd: cnt(0003668): 0001->0006 queue
csd: cnt(0003669): 0001->0006 ipi
csd: cnt(0003e0f): 0007->000a queue
csd: cnt(0003e10): 0001-> ping
csd: cnt(0003e71): 0003-> ping
csd: cnt(0003e72): ->0006 gotipi
csd: cnt(0003e73): ->0006 handle
csd: cnt(0003e74): ->0006 dequeue (src cpu 0 == empty)
csd: cnt(0003e7f): 0004->0006 ping
csd: cnt(0003e80): 0001-> pinged
csd: cnt(0003eb2): 0005->0001 noipi
csd: cnt(0003eb3): 0001->0006 queue
csd: cnt(0003eb4): 0001->0006 noipi
csd: cnt now: 0003f00

The idea is to print only relevant entries. Those are all events which
are associated with the hang (so sender side events for the source CPU
of the hanging request, and receiver side events for the target CPU),
and the related events just before those (for adding data needed to
identify a possible race). Printing all available data would be
possible, but this would add large amounts of data printed on larger
configurations.

Signed-off-by: Juergen Gross 
[ Minor readability edits. Breaks col80 but is far more readable. ]
Signed-off-by: Ingo Molnar 
Tested-by: Paul E. McKenney 
Link: https://lore.kernel.org/r/20210301101336.7797-4-jgr...@suse.com
---
 Documentation/admin-guide/kernel-parameters.txt |   4 +-
 kernel/smp.c| 226 ++-
 2 files changed, 226 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 98dbffa..1fe9d38 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -789,6 +789,10 @@
printed to the console in case a hanging CPU is
detected, and that CPU is pinged again in order to try
to resolve the hang situation.
+   0: disable csdlock debugging (default)
+   1: enable basic csdlock debugging (minor impact)
+   ext: enable extended csdlock debugging (more impact,
+but mor

[tip: locking/core] locking/csd_lock: Add boot parameter for controlling CSD lock debugging

2021-03-06 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: 8d0968cc6b8ffd8496c2ebffdfdc801f949a85e5
Gitweb:
https://git.kernel.org/tip/8d0968cc6b8ffd8496c2ebffdfdc801f949a85e5
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:34 +01:00
Committer: Ingo Molnar 
CommitterDate: Sat, 06 Mar 2021 12:49:48 +01:00

locking/csd_lock: Add boot parameter for controlling CSD lock debugging

Currently CSD lock debugging can be switched on and off via a kernel
config option only. Unfortunately there is at least one problem with
CSD lock handling pending for about 2 years now, which has been seen
in different environments (mostly when running virtualized under KVM
or Xen, at least once on bare metal). Multiple attempts to catch this
issue have finally led to introduction of CSD lock debug code, but
this code is not in use in most distros as it has some impact on
performance.

In order to be able to ship kernels with CONFIG_CSD_LOCK_WAIT_DEBUG
enabled even for production use, add a boot parameter for switching
the debug functionality on. This will reduce any performance impact
of the debug coding to a bare minimum when not being used.

Signed-off-by: Juergen Gross 
[ Minor edits. ]
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20210301101336.7797-2-jgr...@suse.com
---
 Documentation/admin-guide/kernel-parameters.txt |  6 +++-
 kernel/smp.c| 38 ++--
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 0454572..98dbffa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -784,6 +784,12 @@
cs89x0_media=   [HW,NET]
Format: { rj45 | aui | bnc }
 
+   csdlock_debug=  [KNL] Enable debug add-ons of cross-CPU function call
+   handling. When switched on, additional debug data is
+   printed to the console in case a hanging CPU is
+   detected, and that CPU is pinged again in order to try
+   to resolve the hang situation.
+
dasd=   [HW,NET]
See header of drivers/s390/block/dasd_devmap.c.
 
diff --git a/kernel/smp.c b/kernel/smp.c
index aeb0adf..d5f0b21 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "smpboot.h"
 #include "sched/smp.h"
@@ -102,6 +103,20 @@ void __init call_function_init(void)
 
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
+static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
+
+static int __init csdlock_debug(char *str)
+{
+   unsigned int val = 0;
+
+   get_option(&str, &val);
+   if (val)
+   static_branch_enable(&csdlock_debug_enabled);
+
+   return 0;
+}
+early_param("csdlock_debug", csdlock_debug);
+
 static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
 static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
 static DEFINE_PER_CPU(void *, cur_csd_info);
@@ -110,7 +125,7 @@ static DEFINE_PER_CPU(void *, cur_csd_info);
 static atomic_t csd_bug_count = ATOMIC_INIT(0);
 
 /* Record current CSD work for current CPU, NULL to erase. */
-static void csd_lock_record(call_single_data_t *csd)
+static void __csd_lock_record(call_single_data_t *csd)
 {
if (!csd) {
smp_mb(); /* NULL cur_csd after unlock. */
@@ -125,7 +140,13 @@ static void csd_lock_record(call_single_data_t *csd)
  /* Or before unlock, as the case may be. */
 }
 
-static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd)
+static __always_inline void csd_lock_record(call_single_data_t *csd)
+{
+   if (static_branch_unlikely(&csdlock_debug_enabled))
+   __csd_lock_record(csd);
+}
+
+static int csd_lock_wait_getcpu(call_single_data_t *csd)
 {
unsigned int csd_type;
 
@@ -140,7 +161,7 @@ static __always_inline int 
csd_lock_wait_getcpu(call_single_data_t *csd)
  * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
  * so waiting on other types gets much less information.
  */
-static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 
ts0, u64 *ts1, int *bug_id)
+static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, 
int *bug_id)
 {
int cpu = -1;
int cpux;
@@ -204,7 +225,7 @@ static __always_inline bool 
csd_lock_wait_toolong(call_single_data_t *csd, u64 t
  * previous function call. For multi-cpu calls its even more interesting
  * as we'll have to ensure no other cpu is observing our csd.
  */
-static __always_inline void csd_lock_wait(call_single_data_t *csd)
+static void __csd_lock_wait(call_single_data_t *csd)
 {
int bug_id = 0;
u64 ts0, ts1;
@@ -218,6 +239,15 @@ static __always_inline void 
csd_lock_wait(call_single_data_t *csd)
smp_acquire__a

[tip: locking/core] locking/csd_lock: Add boot parameter for controlling CSD lock debugging

2021-03-01 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: 4b816578c16b92b68fb9842dcec0bc2fdc2b36d8
Gitweb:
https://git.kernel.org/tip/4b816578c16b92b68fb9842dcec0bc2fdc2b36d8
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:34 +01:00
Committer: Ingo Molnar 
CommitterDate: Mon, 01 Mar 2021 14:27:58 +01:00

locking/csd_lock: Add boot parameter for controlling CSD lock debugging

Currently CSD lock debugging can be switched on and off via a kernel
config option only. Unfortunately there is at least one problem with
CSD lock handling pending for about 2 years now, which has been seen
in different environments (mostly when running virtualized under KVM
or Xen, at least once on bare metal). Multiple attempts to catch this
issue have finally led to introduction of CSD lock debug code, but
this code is not in use in most distros as it has some impact on
performance.

In order to be able to ship kernels with CONFIG_CSD_LOCK_WAIT_DEBUG
enabled even for production use, add a boot parameter for switching
the debug functionality on. This will reduce any performance impact
of the debug coding to a bare minimum when not being used.

Signed-off-by: Juergen Gross 
[ Minor edits. ]
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20210301101336.7797-2-jgr...@suse.com
---
 Documentation/admin-guide/kernel-parameters.txt |  6 +++-
 kernel/smp.c| 38 ++--
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 0454572..98dbffa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -784,6 +784,12 @@
cs89x0_media=   [HW,NET]
Format: { rj45 | aui | bnc }
 
+   csdlock_debug=  [KNL] Enable debug add-ons of cross-CPU function call
+   handling. When switched on, additional debug data is
+   printed to the console in case a hanging CPU is
+   detected, and that CPU is pinged again in order to try
+   to resolve the hang situation.
+
dasd=   [HW,NET]
See header of drivers/s390/block/dasd_devmap.c.
 
diff --git a/kernel/smp.c b/kernel/smp.c
index aeb0adf..d5f0b21 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "smpboot.h"
 #include "sched/smp.h"
@@ -102,6 +103,20 @@ void __init call_function_init(void)
 
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
+static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
+
+static int __init csdlock_debug(char *str)
+{
+   unsigned int val = 0;
+
+   get_option(&str, &val);
+   if (val)
+   static_branch_enable(&csdlock_debug_enabled);
+
+   return 0;
+}
+early_param("csdlock_debug", csdlock_debug);
+
 static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
 static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
 static DEFINE_PER_CPU(void *, cur_csd_info);
@@ -110,7 +125,7 @@ static DEFINE_PER_CPU(void *, cur_csd_info);
 static atomic_t csd_bug_count = ATOMIC_INIT(0);
 
 /* Record current CSD work for current CPU, NULL to erase. */
-static void csd_lock_record(call_single_data_t *csd)
+static void __csd_lock_record(call_single_data_t *csd)
 {
if (!csd) {
smp_mb(); /* NULL cur_csd after unlock. */
@@ -125,7 +140,13 @@ static void csd_lock_record(call_single_data_t *csd)
  /* Or before unlock, as the case may be. */
 }
 
-static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd)
+static __always_inline void csd_lock_record(call_single_data_t *csd)
+{
+   if (static_branch_unlikely(&csdlock_debug_enabled))
+   __csd_lock_record(csd);
+}
+
+static int csd_lock_wait_getcpu(call_single_data_t *csd)
 {
unsigned int csd_type;
 
@@ -140,7 +161,7 @@ static __always_inline int 
csd_lock_wait_getcpu(call_single_data_t *csd)
  * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
  * so waiting on other types gets much less information.
  */
-static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 
ts0, u64 *ts1, int *bug_id)
+static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, 
int *bug_id)
 {
int cpu = -1;
int cpux;
@@ -204,7 +225,7 @@ static __always_inline bool 
csd_lock_wait_toolong(call_single_data_t *csd, u64 t
  * previous function call. For multi-cpu calls its even more interesting
  * as we'll have to ensure no other cpu is observing our csd.
  */
-static __always_inline void csd_lock_wait(call_single_data_t *csd)
+static void __csd_lock_wait(call_single_data_t *csd)
 {
int bug_id = 0;
u64 ts0, ts1;
@@ -218,6 +239,15 @@ static __always_inline void 
csd_lock_wait(call_single_data_t *csd)
smp_acquire__a

[tip: locking/core] locking/csd_lock: Add more data to CSD lock debugging

2021-03-01 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: 6bf3195fdbab92b57f3167101a0b651b93dbeae7
Gitweb:
https://git.kernel.org/tip/6bf3195fdbab92b57f3167101a0b651b93dbeae7
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:36 +01:00
Committer: Ingo Molnar 
CommitterDate: Mon, 01 Mar 2021 14:27:59 +01:00

locking/csd_lock: Add more data to CSD lock debugging

In order to help identifying problems with IPI handling and remote
function execution add some more data to IPI debugging code.

There have been multiple reports of CPUs looping long times (many
seconds) in smp_call_function_many() waiting for another CPU executing
a function like tlb flushing. Most of these reports have been for
cases where the kernel was running as a guest on top of KVM or Xen
(there are rumours of that happening under VMWare, too, and even on
bare metal).

Finding the root cause hasn't been successful yet, even after more than
2 years of chasing this bug by different developers.

Commit:

  35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics")

tried to address this by adding some debug code and by issuing another
IPI when a hang was detected. This helped mitigating the problem
(the repeated IPI unlocks the hang), but the root cause is still unknown.

Current available data suggests that either an IPI wasn't sent when it
should have been, or that the IPI didn't result in the target CPU
executing the queued function (due to the IPI not reaching the CPU,
the IPI handler not being called, or the handler not seeing the queued
request).

Try to add more diagnostic data by introducing a global atomic counter
which is being incremented when doing critical operations (before and
after queueing a new request, when sending an IPI, and when dequeueing
a request). The counter value is stored in percpu variables which can
be printed out when a hang is detected.

The data of the last event (consisting of sequence counter, source
CPU, target CPU, and event type) is stored in a global variable. When
a new event is to be traced, the data of the last event is stored in
the event related percpu location and the global data is updated with
the new event's data. This allows to track two events in one data
location: one by the value of the event data (the event before the
current one), and one by the location itself (the current event).

A typical printout with a detected hang will look like this:

csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 53 ns for 
CPU#06 scf_handler_1+0x0/0x50(0xa2a881bb1410).
csd: CSD lock (#1) handling prior 
scf_handler_1+0x0/0x50(0xa2a8813823c0) request.
csd: cnt(8cc): -> dequeue (src cpu 0 == empty)
csd: cnt(8cd): ->0006 idle
csd: cnt(0003668): 0001->0006 queue
csd: cnt(0003669): 0001->0006 ipi
csd: cnt(0003e0f): 0007->000a queue
csd: cnt(0003e10): 0001-> ping
csd: cnt(0003e71): 0003-> ping
csd: cnt(0003e72): ->0006 gotipi
csd: cnt(0003e73): ->0006 handle
csd: cnt(0003e74): ->0006 dequeue (src cpu 0 == empty)
csd: cnt(0003e7f): 0004->0006 ping
csd: cnt(0003e80): 0001-> pinged
csd: cnt(0003eb2): 0005->0001 noipi
csd: cnt(0003eb3): 0001->0006 queue
csd: cnt(0003eb4): 0001->0006 noipi
csd: cnt now: 0003f00

The idea is to print only relevant entries. Those are all events which
are associated with the hang (so sender side events for the source CPU
of the hanging request, and receiver side events for the target CPU),
and the related events just before those (for adding data needed to
identify a possible race). Printing all available data would be
possible, but this would add large amounts of data printed on larger
configurations.

Signed-off-by: Juergen Gross 
[ Minor readability edits. Breaks col80 but is far more readable. ]
Signed-off-by: Ingo Molnar 
Tested-by: Paul E. McKenney 
Link: https://lore.kernel.org/r/20210301101336.7797-4-jgr...@suse.com
---
 Documentation/admin-guide/kernel-parameters.txt |   4 +-
 kernel/smp.c| 226 ++-
 2 files changed, 226 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 98dbffa..1fe9d38 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -789,6 +789,10 @@
printed to the console in case a hanging CPU is
detected, and that CPU is pinged again in order to try
to resolve the hang situation.
+   0: disable csdlock debugging (default)
+   1: enable basic csdlock debugging (minor impact)
+   ext: enable extended csdlock debugging (more impact,
+but mor

[tip: locking/core] locking/csd_lock: Prepare more CSD lock debugging

2021-03-01 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: b3e3bc34b1e938c6447fa8b646010c4016be7fad
Gitweb:
https://git.kernel.org/tip/b3e3bc34b1e938c6447fa8b646010c4016be7fad
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:35 +01:00
Committer: Ingo Molnar 
CommitterDate: Mon, 01 Mar 2021 14:27:58 +01:00

locking/csd_lock: Prepare more CSD lock debugging

In order to be able to easily add more CSD lock debugging data to
struct call_function_data->csd move the call_single_data_t element
into a sub-structure.

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20210301101336.7797-3-jgr...@suse.com
---
 kernel/smp.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index d5f0b21..6d7e6db 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -31,8 +31,12 @@
 
 #define CSD_TYPE(_csd) ((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
 
+struct cfd_percpu {
+   call_single_data_t  csd;
+};
+
 struct call_function_data {
-   call_single_data_t  __percpu *csd;
+   struct cfd_percpu   __percpu *pcpu;
cpumask_var_t   cpumask;
cpumask_var_t   cpumask_ipi;
 };
@@ -55,8 +59,8 @@ int smpcfd_prepare_cpu(unsigned int cpu)
free_cpumask_var(cfd->cpumask);
return -ENOMEM;
}
-   cfd->csd = alloc_percpu(call_single_data_t);
-   if (!cfd->csd) {
+   cfd->pcpu = alloc_percpu(struct cfd_percpu);
+   if (!cfd->pcpu) {
free_cpumask_var(cfd->cpumask);
free_cpumask_var(cfd->cpumask_ipi);
return -ENOMEM;
@@ -71,7 +75,7 @@ int smpcfd_dead_cpu(unsigned int cpu)
 
free_cpumask_var(cfd->cpumask);
free_cpumask_var(cfd->cpumask_ipi);
-   free_percpu(cfd->csd);
+   free_percpu(cfd->pcpu);
return 0;
 }
 
@@ -694,7 +698,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
 
cpumask_clear(cfd->cpumask_ipi);
for_each_cpu(cpu, cfd->cpumask) {
-   call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
+   call_single_data_t *csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd;
 
if (cond_func && !cond_func(cpu, info))
continue;
@@ -719,7 +723,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
for_each_cpu(cpu, cfd->cpumask) {
call_single_data_t *csd;
 
-   csd = per_cpu_ptr(cfd->csd, cpu);
+   csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd;
csd_lock_wait(csd);
}
}


[tip: x86/paravirt] x86/xen: Drop USERGS_SYSRET64 paravirt call

2021-02-10 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: afd30525a659ac0ae0904f0cb4a2ca75522c3123
Gitweb:
https://git.kernel.org/tip/afd30525a659ac0ae0904f0cb4a2ca75522c3123
Author:Juergen Gross 
AuthorDate:Wed, 20 Jan 2021 14:55:45 +01:00
Committer: Borislav Petkov 
CommitterDate: Wed, 10 Feb 2021 12:32:07 +01:00

x86/xen: Drop USERGS_SYSRET64 paravirt call

USERGS_SYSRET64 is used to return from a syscall via SYSRET, but
a Xen PV guest will nevertheless use the IRET hypercall, as there
is no sysret PV hypercall defined.

So instead of testing all the prerequisites for doing a sysret and
then mangling the stack for Xen PV again for doing an iret just use
the iret exit from the beginning.

This can easily be done via an ALTERNATIVE like it is done for the
sysenter compat case already.

It should be noted that this drops the optimization in Xen for not
restoring a few registers when returning to user mode, but it seems
as if the saved instructions in the kernel more than compensate for
this drop (a kernel build in a Xen PV guest was slightly faster with
this patch applied).

While at it remove the stale sysret32 remnants.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/2021012013.32594-6-jgr...@suse.com
---
 arch/x86/entry/entry_64.S | 16 +++-
 arch/x86/include/asm/irqflags.h   |  6 --
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  8 
 arch/x86/kernel/asm-offsets_64.c  |  2 --
 arch/x86/kernel/paravirt.c|  5 +
 arch/x86/kernel/paravirt_patch.c  |  4 
 arch/x86/xen/enlighten_pv.c   |  1 -
 arch/x86/xen/xen-asm.S| 20 
 arch/x86/xen/xen-ops.h|  2 --
 10 files changed, 8 insertions(+), 61 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a876204..ce0464d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -46,14 +46,6 @@
 .code64
 .section .entry.text, "ax"
 
-#ifdef CONFIG_PARAVIRT_XXL
-SYM_CODE_START(native_usergs_sysret64)
-   UNWIND_HINT_EMPTY
-   swapgs
-   sysretq
-SYM_CODE_END(native_usergs_sysret64)
-#endif /* CONFIG_PARAVIRT_XXL */
-
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
  *
@@ -123,7 +115,12 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
SYM_L_GLOBAL)
 * Try to use SYSRET instead of IRET if we're returning to
 * a completely clean 64-bit userspace context.  If we're not,
 * go to the slow exit path.
+* In the Xen PV case we must use iret anyway.
 */
+
+   ALTERNATIVE "", "jmpswapgs_restore_regs_and_return_to_usermode", \
+   X86_FEATURE_XENPV
+
movqRCX(%rsp), %rcx
movqRIP(%rsp), %r11
 
@@ -215,7 +212,8 @@ syscall_return_via_sysret:
 
popq%rdi
popq%rsp
-   USERGS_SYSRET64
+   swapgs
+   sysretq
 SYM_CODE_END(entry_SYSCALL_64)
 
 /*
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8c86ede..e585a47 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -132,12 +132,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #endif
 
 #define INTERRUPT_RETURN   jmp native_iret
-#define USERGS_SYSRET64\
-   swapgs; \
-   sysretq;
-#define USERGS_SYSRET32\
-   swapgs; \
-   sysretl
 
 #else
 #define INTERRUPT_RETURN   iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f2ebe10..dd43b11 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,11 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-#define USERGS_SYSRET64
\
-   PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),   \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
-
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),   \
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 130f428..0169365 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -156,14 +156,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /*
-* Switch to usermode gs and return to 64-bit usermode using
-* sysret.  Only used in 64-bit kernels to return to 64-bit
-* processes.  Usermode register state, including %rsp, must
-

[tip: x86/paravirt] x86/pv: Switch SWAPGS to ALTERNATIVE

2021-02-10 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: 53c9d9240944088274aadbbbafc6138ca462db4f
Gitweb:
https://git.kernel.org/tip/53c9d9240944088274aadbbbafc6138ca462db4f
Author:Juergen Gross 
AuthorDate:Wed, 20 Jan 2021 14:55:44 +01:00
Committer: Borislav Petkov 
CommitterDate: Wed, 10 Feb 2021 12:25:49 +01:00

x86/pv: Switch SWAPGS to ALTERNATIVE

SWAPGS is used only for interrupts coming from user mode or for
returning to user mode. So there is no reason to use the PARAVIRT
framework, as it can easily be replaced by an ALTERNATIVE depending
on X86_FEATURE_XENPV.

There are several instances using the PV-aware SWAPGS macro in paths
which are never executed in a Xen PV guest. Replace those with the
plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
Acked-by: Andy Lutomirski 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/2021012013.32594-5-jgr...@suse.com
---
 arch/x86/entry/entry_64.S | 10 +-
 arch/x86/include/asm/irqflags.h   | 20 
 arch/x86/include/asm/paravirt.h   | 20 
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/asm-offsets_64.c  |  1 -
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/paravirt_patch.c  |  3 ---
 arch/x86/xen/enlighten_pv.c   |  3 ---
 8 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cad0870..a876204 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -669,7 +669,7 @@ native_irq_return_ldt:
 */
 
pushq   %rdi/* Stash user RDI */
-   SWAPGS  /* to kernel GS */
+   swapgs  /* to kernel GS */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi   /* to kernel CR3 */
 
movqPER_CPU_VAR(espfix_waddr), %rdi
@@ -699,7 +699,7 @@ native_irq_return_ldt:
orq PER_CPU_VAR(espfix_stack), %rax
 
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
-   SWAPGS  /* to user GS */
+   swapgs  /* to user GS */
popq%rdi/* Restore user RDI */
 
movq%rax, %rsp
@@ -943,7 +943,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
ret
 
 .Lparanoid_entry_swapgs:
-   SWAPGS
+   swapgs
 
/*
 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
@@ -1001,7 +1001,7 @@ SYM_CODE_START_LOCAL(paranoid_exit)
jnz restore_regs_and_return_to_kernel
 
/* We are returning to a context with user GSBASE */
-   SWAPGS_UNSAFE_STACK
+   swapgs
jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
@@ -1426,7 +1426,7 @@ nmi_no_fsgsbase:
jnz nmi_restore
 
 nmi_swapgs:
-   SWAPGS_UNSAFE_STACK
+   swapgs
 
 nmi_restore:
POP_REGS
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 2dfc8d3..8c86ede 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -131,18 +131,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #define SAVE_FLAGS(x)  pushfq; popq %rax
 #endif
 
-#define SWAPGS swapgs
-/*
- * Currently paravirt can't handle swapgs nicely when we
- * don't have a stack we can rely on (such as a user space
- * stack).  So we either find a way around these or just fault
- * and emulate if a guest tries to call swapgs directly.
- *
- * Either way, this is a good way to document that we don't
- * have a reliable stack. x86_64 only.
- */
-#define SWAPGS_UNSAFE_STACKswapgs
-
 #define INTERRUPT_RETURN   jmp native_iret
 #define USERGS_SYSRET64\
swapgs; \
@@ -170,6 +158,14 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+#else
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PV
+#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
+#else
+#define SWAPGS swapgs
+#endif
+#endif
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f8dce11..f2ebe10 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,26 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-/*
- * If swapgs is used while the userspace stack is still current,
- * there's no way to call a pvop.  The PV replacement *must* be
- * inlined, or the swapgs instruction must be trapped and emulated.
- */
-#define SWAPGS_UNSAFE_STACK\
-   PARA_SITE(PARA_PATCH(PV_CPU_sw

[tip: x86/paravirt] x86/xen: Use specific Xen pv interrupt entry for MCE

2021-02-10 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: c3d7fa6684b5b3a07a48fc379d27bfb8a96661d9
Gitweb:
https://git.kernel.org/tip/c3d7fa6684b5b3a07a48fc379d27bfb8a96661d9
Author:Juergen Gross 
AuthorDate:Wed, 20 Jan 2021 14:55:42 +01:00
Committer: Borislav Petkov 
CommitterDate: Wed, 10 Feb 2021 12:07:10 +01:00

x86/xen: Use specific Xen pv interrupt entry for MCE

Xen PV guests don't use IST. For machine check interrupts, switch to the
same model as debug interrupts.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/2021012013.32594-3-jgr...@suse.com
---
 arch/x86/include/asm/idtentry.h |  3 +++
 arch/x86/xen/enlighten_pv.c | 16 +++-
 arch/x86/xen/xen-asm.S  |  2 +-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index f656aab..616909e 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -585,6 +585,9 @@ DECLARE_IDTENTRY_MCE(X86_TRAP_MC,   exc_machine_check);
 #else
 DECLARE_IDTENTRY_RAW(X86_TRAP_MC,  exc_machine_check);
 #endif
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW(X86_TRAP_MC,  xenpv_exc_machine_check);
+#endif
 #endif
 
 /* NMI */
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9a5a50c..9db1d31 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -590,6 +590,20 @@ DEFINE_IDTENTRY_RAW(exc_xen_unknown_trap)
BUG();
 }
 
+#ifdef CONFIG_X86_MCE
+DEFINE_IDTENTRY_RAW(xenpv_exc_machine_check)
+{
+   /*
+* There's no IST on Xen PV, but we still need to dispatch
+* to the correct handler.
+*/
+   if (user_mode(regs))
+   noist_exc_machine_check(regs);
+   else
+   exc_machine_check(regs);
+}
+#endif
+
 struct trap_array_entry {
void (*orig)(void);
void (*xen)(void);
@@ -610,7 +624,7 @@ static struct trap_array_entry trap_array[] = {
TRAP_ENTRY_REDIR(exc_debug, true  ),
TRAP_ENTRY(exc_double_fault,true  ),
 #ifdef CONFIG_X86_MCE
-   TRAP_ENTRY(exc_machine_check,   true  ),
+   TRAP_ENTRY_REDIR(exc_machine_check, true  ),
 #endif
TRAP_ENTRY_REDIR(exc_nmi,   true  ),
TRAP_ENTRY(exc_int3,false ),
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 53cf8aa..cd330ce 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -172,7 +172,7 @@ xen_pv_trap asm_exc_spurious_interrupt_bug
 xen_pv_trap asm_exc_coprocessor_error
 xen_pv_trap asm_exc_alignment_check
 #ifdef CONFIG_X86_MCE
-xen_pv_trap asm_exc_machine_check
+xen_pv_trap asm_xenpv_exc_machine_check
 #endif /* CONFIG_X86_MCE */
 xen_pv_trap asm_exc_simd_coprocessor_error
 #ifdef CONFIG_IA32_EMULATION


[tip: x86/paravirt] x86/xen: Use specific Xen pv interrupt entry for DF

2021-02-10 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: 5b4c6d65019bff65757f61adbbad5e45a333b800
Gitweb:
https://git.kernel.org/tip/5b4c6d65019bff65757f61adbbad5e45a333b800
Author:Juergen Gross 
AuthorDate:Wed, 20 Jan 2021 14:55:43 +01:00
Committer: Borislav Petkov 
CommitterDate: Wed, 10 Feb 2021 12:13:40 +01:00

x86/xen: Use specific Xen pv interrupt entry for DF

Xen PV guests don't use IST. For double fault interrupts, switch to
the same model as NMI.

Correct a typo in a comment while copying it.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/2021012013.32594-4-jgr...@suse.com
---
 arch/x86/include/asm/idtentry.h |  3 +++
 arch/x86/xen/enlighten_pv.c | 10 --
 arch/x86/xen/xen-asm.S  |  2 +-
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 616909e..41e2e2e 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -608,6 +608,9 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_DB,   xenpv_exc_debug);
 
 /* #DF */
 DECLARE_IDTENTRY_DF(X86_TRAP_DF,   exc_double_fault);
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,xenpv_exc_double_fault);
+#endif
 
 /* #VC */
 #ifdef CONFIG_AMD_MEM_ENCRYPT
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9db1d31..1fec2ee 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -567,10 +567,16 @@ void noist_exc_debug(struct pt_regs *regs);
 
 DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
 {
-   /* On Xen PV, NMI doesn't use IST.  The C part is the sane as native. */
+   /* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
exc_nmi(regs);
 }
 
+DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
+{
+   /* On Xen PV, DF doesn't use IST.  The C part is the same as native. */
+   exc_double_fault(regs, error_code);
+}
+
 DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
 {
/*
@@ -622,7 +628,7 @@ struct trap_array_entry {
 
 static struct trap_array_entry trap_array[] = {
TRAP_ENTRY_REDIR(exc_debug, true  ),
-   TRAP_ENTRY(exc_double_fault,true  ),
+   TRAP_ENTRY_REDIR(exc_double_fault,  true  ),
 #ifdef CONFIG_X86_MCE
TRAP_ENTRY_REDIR(exc_machine_check, true  ),
 #endif
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index cd330ce..eac9dac 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -161,7 +161,7 @@ xen_pv_trap asm_exc_overflow
 xen_pv_trap asm_exc_bounds
 xen_pv_trap asm_exc_invalid_op
 xen_pv_trap asm_exc_device_not_available
-xen_pv_trap asm_exc_double_fault
+xen_pv_trap asm_xenpv_exc_double_fault
 xen_pv_trap asm_exc_coproc_segment_overrun
 xen_pv_trap asm_exc_invalid_tss
 xen_pv_trap asm_exc_segment_not_present


[tip: x86/paravirt] x86/pv: Rework arch_local_irq_restore() to not use popf

2021-02-10 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: ab234a260b1f625b26cbefa93ca365b0ae66df33
Gitweb:
https://git.kernel.org/tip/ab234a260b1f625b26cbefa93ca365b0ae66df33
Author:Juergen Gross 
AuthorDate:Wed, 20 Jan 2021 14:55:46 +01:00
Committer: Borislav Petkov 
CommitterDate: Wed, 10 Feb 2021 12:36:45 +01:00

x86/pv: Rework arch_local_irq_restore() to not use popf

POPF is a rather expensive operation, so don't use it for restoring
irq flags. Instead, test whether interrupts are enabled in the flags
parameter and enable interrupts via STI in that case.

This results in the restore_fl paravirt op to be no longer needed.

Suggested-by: Andy Lutomirski 
Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/2021012013.32594-7-jgr...@suse.com
---
 arch/x86/include/asm/irqflags.h   | 20 +--
 arch/x86/include/asm/paravirt.h   |  5 +-
 arch/x86/include/asm/paravirt_types.h |  7 ++-
 arch/x86/kernel/irqflags.S| 11 +--
 arch/x86/kernel/paravirt.c|  1 +-
 arch/x86/kernel/paravirt_patch.c  |  3 +---
 arch/x86/xen/enlighten_pv.c   |  2 +--
 arch/x86/xen/irq.c| 23 +-
 arch/x86/xen/xen-asm.S| 28 +--
 arch/x86/xen/xen-ops.h|  1 +-
 10 files changed, 8 insertions(+), 93 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index e585a47..144d70e 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -35,15 +35,6 @@ extern __always_inline unsigned long native_save_fl(void)
return flags;
 }
 
-extern inline void native_restore_fl(unsigned long flags);
-extern inline void native_restore_fl(unsigned long flags)
-{
-   asm volatile("push %0 ; popf"
-: /* no output */
-:"g" (flags)
-:"memory", "cc");
-}
-
 static __always_inline void native_irq_disable(void)
 {
asm volatile("cli": : :"memory");
@@ -79,11 +70,6 @@ static __always_inline unsigned long 
arch_local_save_flags(void)
return native_save_fl();
 }
 
-static __always_inline void arch_local_irq_restore(unsigned long flags)
-{
-   native_restore_fl(flags);
-}
-
 static __always_inline void arch_local_irq_disable(void)
 {
native_irq_disable();
@@ -152,6 +138,12 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+
+static __always_inline void arch_local_irq_restore(unsigned long flags)
+{
+   if (!arch_irqs_disabled_flags(flags))
+   arch_local_irq_enable();
+}
 #else
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index dd43b11..4abf110 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -648,11 +648,6 @@ static inline notrace unsigned long 
arch_local_save_flags(void)
return PVOP_CALLEE0(unsigned long, irq.save_fl);
 }
 
-static inline notrace void arch_local_irq_restore(unsigned long f)
-{
-   PVOP_VCALLEE1(irq.restore_fl, f);
-}
-
 static inline notrace void arch_local_irq_disable(void)
 {
PVOP_VCALLEE0(irq.irq_disable);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0169365..de87087 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -168,16 +168,13 @@ struct pv_cpu_ops {
 struct pv_irq_ops {
 #ifdef CONFIG_PARAVIRT_XXL
/*
-* Get/set interrupt state.  save_fl and restore_fl are only
-* expected to use X86_EFLAGS_IF; all other bits
-* returned from save_fl are undefined, and may be ignored by
-* restore_fl.
+* Get/set interrupt state.  save_fl is expected to use X86_EFLAGS_IF;
+* all other bits returned from save_fl are undefined.
 *
 * NOTE: These functions callers expect the callee to preserve
 * more registers than the standard C calling convention.
 */
struct paravirt_callee_save save_fl;
-   struct paravirt_callee_save restore_fl;
struct paravirt_callee_save irq_disable;
struct paravirt_callee_save irq_enable;
 
diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S
index 0db0375..8ef3506 100644
--- a/arch/x86/kernel/irqflags.S
+++ b/arch/x86/kernel/irqflags.S
@@ -13,14 +13,3 @@ SYM_FUNC_START(native_save_fl)
ret
 SYM_FUNC_END(native_save_fl)
 EXPORT_SYMBOL(native_save_fl)
-
-/*
- * void native_restore_fl(unsigned long flags)
- * %eax/%rdi: flags
- */
-SYM_FUNC_START(native_restore_fl)
-   push %_ASM_ARG1
-   popf
-   ret
-SYM_FUNC_END(native_restore_fl)
-EXPORT_SYMBOL(native_restore_fl)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 18560b7..c60222a 100644
-

[tip: x86/urgent] x86/alternative: Don't call text_poke() in lazy TLB mode

2020-10-22 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: abee7c494d8c41bb388839bccc47e06247f0d7de
Gitweb:
https://git.kernel.org/tip/abee7c494d8c41bb388839bccc47e06247f0d7de
Author:Juergen Gross 
AuthorDate:Fri, 09 Oct 2020 16:42:25 +02:00
Committer: Peter Zijlstra 
CommitterDate: Thu, 22 Oct 2020 12:37:23 +02:00

x86/alternative: Don't call text_poke() in lazy TLB mode

When running in lazy TLB mode the currently active page tables might
be the ones of a previous process, e.g. when running a kernel thread.

This can be problematic in case kernel code is being modified via
text_poke() in a kernel thread, and on another processor exit_mmap()
is active for the process which was running on the first cpu before
the kernel thread.

As text_poke() is using a temporary address space and the former
address space (obtained via cpu_tlbstate.loaded_mm) is restored
afterwards, there is a race possible in case the cpu on which
exit_mmap() is running wants to make sure there are no stale
references to that address space on any cpu active (this e.g. is
required when running as a Xen PV guest, where this problem has been
observed and analyzed).

In order to avoid that, drop off TLB lazy mode before switching to the
temporary address space.

Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
Signed-off-by: Juergen Gross 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20201009144225.12019-1-jgr...@suse.com
---
 arch/x86/kernel/alternative.c |  9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cdaab30..cd6be6f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct 
mm_struct *mm)
temp_mm_state_t temp_state;
 
lockdep_assert_irqs_disabled();
+
+   /*
+* Make sure not to be in TLB lazy mode, as otherwise we'll end up
+* with a stale address space WITHOUT being in lazy mode after
+* restoring the previous mm.
+*/
+   if (this_cpu_read(cpu_tlbstate.is_lazy))
+   leave_mm(smp_processor_id());
+
temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
switch_mm_irqs_off(NULL, mm, current);
 


[tip: x86/paravirt] x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT

2020-08-15 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: ecac71816a1829c0e54c41c5f1845f75b55dc618
Gitweb:
https://git.kernel.org/tip/ecac71816a1829c0e54c41c5f1845f75b55dc618
Author:Juergen Gross 
AuthorDate:Sat, 15 Aug 2020 12:06:38 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:52:11 +02:00

x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT

There are some code parts using CONFIG_PARAVIRT for Xen pvops related
issues instead of the more stringent CONFIG_PARAVIRT_XXL.

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200815100641.26362-4-jgr...@suse.com
---
 arch/x86/entry/entry_64.S| 4 ++--
 arch/x86/include/asm/fixmap.h| 2 +-
 arch/x86/include/asm/required-features.h | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 70dea93..26fc9b4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -46,13 +46,13 @@
 .code64
 .section .entry.text, "ax"
 
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
 SYM_CODE_START(native_usergs_sysret64)
UNWIND_HINT_EMPTY
swapgs
sysretq
 SYM_CODE_END(native_usergs_sysret64)
-#endif /* CONFIG_PARAVIRT */
+#endif /* CONFIG_PARAVIRT_XXL */
 
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 0f0dd64..77217bd 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -99,7 +99,7 @@ enum fixed_addresses {
FIX_PCIE_MCFG,
 #endif
 #endif
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
FIX_PARAVIRT_BOOTMAP,
 #endif
 #ifdef CONFIG_X86_INTEL_MID
diff --git a/arch/x86/include/asm/required-features.h 
b/arch/x86/include/asm/required-features.h
index 6847d85..3ff0d48 100644
--- a/arch/x86/include/asm/required-features.h
+++ b/arch/x86/include/asm/required-features.h
@@ -54,7 +54,7 @@
 #endif
 
 #ifdef CONFIG_X86_64
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
 /* Paravirtualized systems may not have PSE or PGE available */
 #define NEED_PSE   0
 #define NEED_PGE   0


[tip: x86/paravirt] x86/paravirt: Remove set_pte_at() pv-op

2020-08-15 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: e1ac3e66d301e57472f31ebee81b916e9fa8b35b
Gitweb:
https://git.kernel.org/tip/e1ac3e66d301e57472f31ebee81b916e9fa8b35b
Author:Juergen Gross 
AuthorDate:Sat, 15 Aug 2020 12:06:40 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:52:12 +02:00

x86/paravirt: Remove set_pte_at() pv-op

On x86 set_pte_at() is now always falling back to set_pte(). So instead
of having this fallback after the paravirt maze just drop the
set_pte_at paravirt operation and let set_pte_at() use the set_pte()
function directly.

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200815100641.26362-6-jgr...@suse.com
---
 arch/x86/include/asm/paravirt.h   |  8 +---
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/include/asm/pgtable.h|  7 +++
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/xen/mmu_pv.c |  8 
 include/trace/events/xen.h| 20 
 6 files changed, 4 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index e02c409..f0464b8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -412,12 +412,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
PVOP_VCALL2(mmu.set_pte, ptep, pte.pte);
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
-   PVOP_VCALL4(mmu.set_pte_at, mm, addr, ptep, pte.pte);
-}
-
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
PVOP_VCALL2(mmu.set_pmd, pmdp, native_pmd_val(pmd));
@@ -510,7 +504,7 @@ static inline void set_pte_atomic(pte_t *ptep, pte_t pte)
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep)
 {
-   set_pte_at(mm, addr, ptep, __pte(0));
+   set_pte(ptep, __pte(0));
 }
 
 static inline void pmd_clear(pmd_t *pmdp)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index f27c3fe..0fad9f6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -242,8 +242,6 @@ struct pv_mmu_ops {
 
/* Pagetable manipulation functions */
void (*set_pte)(pte_t *ptep, pte_t pteval);
-   void (*set_pte_at)(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pteval);
void (*set_pmd)(pmd_t *pmdp, pmd_t pmdval);
 
pte_t (*ptep_modify_prot_start)(struct vm_area_struct *vma, unsigned 
long addr,
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b836138..5e0dcc2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -63,7 +63,6 @@ extern pmdval_t early_pmd_flags;
 #include 
 #else  /* !CONFIG_PARAVIRT_XXL */
 #define set_pte(ptep, pte) native_set_pte(ptep, pte)
-#define set_pte_at(mm, addr, ptep, pte)native_set_pte_at(mm, addr, 
ptep, pte)
 
 #define set_pte_atomic(ptep, pte)  \
native_set_pte_atomic(ptep, pte)
@@ -1033,10 +1032,10 @@ static inline pud_t 
native_local_pudp_get_and_clear(pud_t *pudp)
return res;
 }
 
-static inline void native_set_pte_at(struct mm_struct *mm, unsigned long addr,
-pte_t *ptep , pte_t pte)
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
 {
-   native_set_pte(ptep, pte);
+   set_pte(ptep, pte);
 }
 
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index e56a144..6c3407b 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -360,7 +360,6 @@ struct paravirt_patch_template pv_ops = {
.mmu.release_p4d= paravirt_nop,
 
.mmu.set_pte= native_set_pte,
-   .mmu.set_pte_at = native_set_pte_at,
.mmu.set_pmd= native_set_pmd,
 
.mmu.ptep_modify_prot_start = __ptep_modify_prot_start,
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 3273c98..eda7814 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -285,13 +285,6 @@ static void xen_set_pte(pte_t *ptep, pte_t pteval)
__xen_set_pte(ptep, pteval);
 }
 
-static void xen_set_pte_at(struct mm_struct *mm, unsigned long addr,
-   pte_t *ptep, pte_t pteval)
-{
-   trace_xen_mmu_set_pte_at(mm, addr, ptep, pteval);
-   __xen_set_pte(ptep, pteval);
-}
-
 pte_t xen_ptep_modify_prot_start(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
@@ -2105,7 +2098,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
.release_pmd = xen_relea

[tip: x86/paravirt] x86/entry/32: Simplify CONFIG_XEN_PV build dependency

2020-08-15 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: 76fdb041c1f02311e6e05211c895e932af08b041
Gitweb:
https://git.kernel.org/tip/76fdb041c1f02311e6e05211c895e932af08b041
Author:Juergen Gross 
AuthorDate:Sat, 15 Aug 2020 12:06:39 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:52:12 +02:00

x86/entry/32: Simplify CONFIG_XEN_PV build dependency

With 32-bit Xen PV support gone, the following commit is not needed
anymore:

  a4c0e91d1d65bc58 ("x86/entry/32: Fix XEN_PV build dependency")

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200815100641.26362-5-jgr...@suse.com
---
 arch/x86/include/asm/idtentry.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index a433661..337dcfd 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -547,7 +547,7 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,   exc_machine_check);
 
 /* NMI */
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
-#if defined(CONFIG_XEN_PV) && defined(CONFIG_X86_64)
+#ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, xenpv_exc_nmi);
 #endif
 
@@ -557,7 +557,7 @@ DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB, exc_debug);
 #else
 DECLARE_IDTENTRY_RAW(X86_TRAP_DB,  exc_debug);
 #endif
-#if defined(CONFIG_XEN_PV) && defined(CONFIG_X86_64)
+#ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_DB,  xenpv_exc_debug);
 #endif
 


[tip: x86/paravirt] x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXL

2020-08-15 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: 0cabf9914990dc59a7e1793ef2fb294d578dc210
Gitweb:
https://git.kernel.org/tip/0cabf9914990dc59a7e1793ef2fb294d578dc210
Author:Juergen Gross 
AuthorDate:Sat, 15 Aug 2020 12:06:36 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:52:11 +02:00

x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXL

The last 32-bit user of stuff under CONFIG_PARAVIRT_XXL is gone.

Remove 32-bit specific parts.

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200815100641.26362-2-jgr...@suse.com
---
 arch/x86/entry/vdso/vdso32/vclock_gettime.c |   1 +-
 arch/x86/include/asm/paravirt.h | 120 +--
 arch/x86/include/asm/paravirt_types.h   |  21 +---
 arch/x86/include/asm/pgtable-3level_types.h |   5 +-
 arch/x86/include/asm/segment.h  |   4 +-
 arch/x86/kernel/cpu/common.c|   8 +-
 arch/x86/kernel/kprobes/core.c  |   1 +-
 arch/x86/kernel/kprobes/opt.c   |   1 +-
 arch/x86/kernel/paravirt.c  |  18 +---
 arch/x86/kernel/paravirt_patch.c|  17 +---
 arch/x86/xen/enlighten_pv.c |   6 +-
 11 files changed, 13 insertions(+), 189 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c 
b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
index 84a4a73..283ed9d 100644
--- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
@@ -14,6 +14,7 @@
 #undef CONFIG_ILLEGAL_POINTER_VALUE
 #undef CONFIG_SPARSEMEM_VMEMMAP
 #undef CONFIG_NR_CPUS
+#undef CONFIG_PARAVIRT_XXL
 
 #define CONFIG_X86_32 1
 #define CONFIG_PGTABLE_LEVELS 2
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 3d2afec..25c7a73 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -160,8 +160,6 @@ static inline void wbinvd(void)
PVOP_VCALL0(cpu.wbinvd);
 }
 
-#define get_kernel_rpl()  (pv_info.kernel_rpl)
-
 static inline u64 paravirt_read_msr(unsigned msr)
 {
return PVOP_CALL1(u64, cpu.read_msr, msr);
@@ -277,12 +275,10 @@ static inline void load_TLS(struct thread_struct *t, 
unsigned cpu)
PVOP_VCALL2(cpu.load_tls, t, cpu);
 }
 
-#ifdef CONFIG_X86_64
 static inline void load_gs_index(unsigned int gs)
 {
PVOP_VCALL1(cpu.load_gs_index, gs);
 }
-#endif
 
 static inline void write_ldt_entry(struct desc_struct *dt, int entry,
   const void *desc)
@@ -375,52 +371,22 @@ static inline void paravirt_release_p4d(unsigned long pfn)
 
 static inline pte_t __pte(pteval_t val)
 {
-   pteval_t ret;
-
-   if (sizeof(pteval_t) > sizeof(long))
-   ret = PVOP_CALLEE2(pteval_t, mmu.make_pte, val, (u64)val >> 32);
-   else
-   ret = PVOP_CALLEE1(pteval_t, mmu.make_pte, val);
-
-   return (pte_t) { .pte = ret };
+   return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) };
 }
 
 static inline pteval_t pte_val(pte_t pte)
 {
-   pteval_t ret;
-
-   if (sizeof(pteval_t) > sizeof(long))
-   ret = PVOP_CALLEE2(pteval_t, mmu.pte_val,
-  pte.pte, (u64)pte.pte >> 32);
-   else
-   ret = PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
-
-   return ret;
+   return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
 }
 
 static inline pgd_t __pgd(pgdval_t val)
 {
-   pgdval_t ret;
-
-   if (sizeof(pgdval_t) > sizeof(long))
-   ret = PVOP_CALLEE2(pgdval_t, mmu.make_pgd, val, (u64)val >> 32);
-   else
-   ret = PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val);
-
-   return (pgd_t) { ret };
+   return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) };
 }
 
 static inline pgdval_t pgd_val(pgd_t pgd)
 {
-   pgdval_t ret;
-
-   if (sizeof(pgdval_t) > sizeof(long))
-   ret =  PVOP_CALLEE2(pgdval_t, mmu.pgd_val,
-   pgd.pgd, (u64)pgd.pgd >> 32);
-   else
-   ret =  PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
-
-   return ret;
+   return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -438,78 +404,40 @@ static inline void ptep_modify_prot_commit(struct 
vm_area_struct *vma, unsigned 
   pte_t *ptep, pte_t old_pte, pte_t 
pte)
 {
 
-   if (sizeof(pteval_t) > sizeof(long))
-   /* 5 arg words */
-   pv_ops.mmu.ptep_modify_prot_commit(vma, addr, ptep, pte);
-   else
-   PVOP_VCALL4(mmu.ptep_modify_prot_commit,
-   vma, addr, ptep, pte.pte);
+   PVOP_VCALL4(mmu.ptep_modify_prot_commit, vma, addr, ptep, pte.pte);
 }
 
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
-   if (sizeof(pteval_t) > sizeof(long))
-   PVOP_VCALL3(mmu.set_

[tip: x86/paravirt] x86/paravirt: Avoid needless paravirt step clearing page table entries

2020-08-15 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: 7c9f80cb76ec9f14c3b25509168b1a2f7942e418
Gitweb:
https://git.kernel.org/tip/7c9f80cb76ec9f14c3b25509168b1a2f7942e418
Author:Juergen Gross 
AuthorDate:Sat, 15 Aug 2020 12:06:41 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:52:12 +02:00

x86/paravirt: Avoid needless paravirt step clearing page table entries

pte_clear() et al are based on two paravirt steps today: one step to
create a page table entry with all zeroes, and one step to write this
entry value.

Drop the first step as it is completely useless.

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200815100641.26362-7-jgr...@suse.com
---
 arch/x86/include/asm/paravirt.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f0464b8..d25cc68 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -448,7 +448,7 @@ static inline pudval_t pud_val(pud_t pud)
 
 static inline void pud_clear(pud_t *pudp)
 {
-   set_pud(pudp, __pud(0));
+   set_pud(pudp, native_make_pud(0));
 }
 
 static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
@@ -485,15 +485,15 @@ static inline void __set_pgd(pgd_t *pgdp, pgd_t pgd)
 } while (0)
 
 #define pgd_clear(pgdp) do {   \
-   if (pgtable_l5_enabled())   
\
-   set_pgd(pgdp, __pgd(0));\
+   if (pgtable_l5_enabled())   \
+   set_pgd(pgdp, native_make_pgd(0));  \
 } while (0)
 
 #endif  /* CONFIG_PGTABLE_LEVELS == 5 */
 
 static inline void p4d_clear(p4d_t *p4dp)
 {
-   set_p4d(p4dp, __p4d(0));
+   set_p4d(p4dp, native_make_p4d(0));
 }
 
 static inline void set_pte_atomic(pte_t *ptep, pte_t pte)
@@ -504,12 +504,12 @@ static inline void set_pte_atomic(pte_t *ptep, pte_t pte)
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep)
 {
-   set_pte(ptep, __pte(0));
+   set_pte(ptep, native_make_pte(0));
 }
 
 static inline void pmd_clear(pmd_t *pmdp)
 {
-   set_pmd(pmdp, __pmd(0));
+   set_pmd(pmdp, native_make_pmd(0));
 }
 
 #define  __HAVE_ARCH_START_CONTEXT_SWITCH


[tip: x86/paravirt] x86/paravirt: Clean up paravirt macros

2020-08-15 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the x86/paravirt branch of tip:

Commit-ID: 94b827becc6a87c905ab30b398e12a266518acbb
Gitweb:
https://git.kernel.org/tip/94b827becc6a87c905ab30b398e12a266518acbb
Author:Juergen Gross 
AuthorDate:Sat, 15 Aug 2020 12:06:37 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:52:11 +02:00

x86/paravirt: Clean up paravirt macros

Some paravirt macros are no longer used, delete them.

Signed-off-by: Juergen Gross 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200815100641.26362-3-jgr...@suse.com
---
 arch/x86/include/asm/paravirt.h | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 25c7a73..e02c409 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -586,16 +586,9 @@ bool __raw_callee_save___native_vcpu_is_preempted(long 
cpu);
 #endif /* SMP && PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
-#define PV_SAVE_REGS "pushl %ecx; pushl %edx;"
-#define PV_RESTORE_REGS "popl %edx; popl %ecx;"
-
 /* save and restore all caller-save registers, except return value */
 #define PV_SAVE_ALL_CALLER_REGS"pushl %ecx;"
 #define PV_RESTORE_ALL_CALLER_REGS "popl  %ecx;"
-
-#define PV_FLAGS_ARG "0"
-#define PV_EXTRA_CLOBBERS
-#define PV_VEXTRA_CLOBBERS
 #else
 /* save and restore all caller-save registers, except return value */
 #define PV_SAVE_ALL_CALLER_REGS
\
@@ -616,14 +609,6 @@ bool __raw_callee_save___native_vcpu_is_preempted(long 
cpu);
"pop %rsi;" \
"pop %rdx;" \
"pop %rcx;"
-
-/* We save some registers, but all of them, that's too much. We clobber all
- * caller saved registers but the argument parameter */
-#define PV_SAVE_REGS "pushq %%rdi;"
-#define PV_RESTORE_REGS "popq %%rdi;"
-#define PV_EXTRA_CLOBBERS EXTRA_CLOBBERS, "rcx" , "rdx", "rsi"
-#define PV_VEXTRA_CLOBBERS EXTRA_CLOBBERS, "rdi", "rcx" , "rdx", "rsi"
-#define PV_FLAGS_ARG "D"
 #endif
 
 /*