[tip: irq/core] irq: Simplify condition in irq_matrix_reserve()
The following commit has been merged into the irq/core branch of tip: Commit-ID: 2c6b02185cc608c19a22691fadc6ca2cd114c286 Gitweb: https://git.kernel.org/tip/2c6b02185cc608c19a22691fadc6ca2cd114c286 Author:Juergen Gross AuthorDate:Thu, 11 Feb 2021 08:09:53 +01:00 Committer: Thomas Gleixner CommitterDate: Wed, 17 Mar 2021 21:44:01 +01:00 irq: Simplify condition in irq_matrix_reserve() The if condition in irq_matrix_reserve() can be much simpler. While at it fix a typo in the comment. Signed-off-by: Juergen Gross Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20210211070953.5914-1-jgr...@suse.com --- kernel/irq/matrix.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c index 7a9465f..6f8b1d1 100644 --- a/kernel/irq/matrix.c +++ b/kernel/irq/matrix.c @@ -337,15 +337,14 @@ void irq_matrix_assign(struct irq_matrix *m, unsigned int bit) * irq_matrix_reserve - Reserve interrupts * @m: Matrix pointer * - * This is merily a book keeping call. It increments the number of globally + * This is merely a book keeping call. It increments the number of globally * reserved interrupt bits w/o actually allocating them. This allows to * setup interrupt descriptors w/o assigning low level resources to it. * The actual allocation happens when the interrupt gets activated. */ void irq_matrix_reserve(struct irq_matrix *m) { - if (m->global_reserved <= m->global_available && - m->global_reserved + 1 > m->global_available) + if (m->global_reserved == m->global_available) pr_warn("Interrupt reservation exceeds available resources\n"); m->global_reserved++;
[tip: x86/alternatives] x86/alternative: Support not-feature
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: dda7bb76484978316bb412a353789ebc5901de36 Gitweb: https://git.kernel.org/tip/dda7bb76484978316bb412a353789ebc5901de36 Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:10 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 16:44:01 +01:00 x86/alternative: Support not-feature Add support for alternative patching for the case a feature is not present on the current CPU. For users of ALTERNATIVE() and friends, an inverted feature is specified by applying the ALT_NOT() macro to it, e.g.: ALTERNATIVE(old, new, ALT_NOT(feature)); Committer note: The decision to encode the NOT-bit in the feature bit itself is because a future change which would make objtool generate such alternative calls, would keep the code in objtool itself fairly simple. Also, this allows for the alternative macros to support the NOT feature without having to change them. Finally, the u16 cpuid member encoding the X86_FEATURE_ flags is not an ABI so if more bits are needed, cpuid itself can be enlarged or a flags field can be added to struct alt_instr after having considered the size growth in either cases. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/20210311142319.4723-6-jgr...@suse.com --- arch/x86/include/asm/alternative.h | 3 +++ arch/x86/kernel/alternative.c | 20 +++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 53f295f..649e56f 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -6,6 +6,9 @@ #include #include +#define ALTINSTR_FLAG_INV (1 << 15) +#define ALT_NOT(feat) ((feat) | ALTINSTR_FLAG_INV) + #ifndef __ASSEMBLY__ #include diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 8d778e4..133b549 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -388,21 +388,31 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, */ for (a = start; a < end; a++) { int insn_buff_sz = 0; + /* Mask away "NOT" flag bit for feature to test. */ + u16 feature = a->cpuid & ~ALTINSTR_FLAG_INV; instr = (u8 *)&a->instr_offset + a->instr_offset; replacement = (u8 *)&a->repl_offset + a->repl_offset; BUG_ON(a->instrlen > sizeof(insn_buff)); - BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32); - if (!boot_cpu_has(a->cpuid)) { + BUG_ON(feature >= (NCAPINTS + NBUGINTS) * 32); + + /* +* Patch if either: +* - feature is present +* - feature not present but ALTINSTR_FLAG_INV is set to mean, +* patch if feature is *NOT* present. +*/ + if (!boot_cpu_has(feature) == !(a->cpuid & ALTINSTR_FLAG_INV)) { if (a->padlen > 1) optimize_nops(a, instr); continue; } - DPRINTK("feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d), pad: %d", - a->cpuid >> 5, - a->cpuid & 0x1f, + DPRINTK("feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d), pad: %d", + (a->cpuid & ALTINSTR_FLAG_INV) ? "!" : "", + feature >> 5, + feature & 0x1f, instr, instr, a->instrlen, replacement, a->replacementlen, a->padlen);
[tip: x86/alternatives] x86/alternative: Merge include files
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 5e21a3ecad1500e35b46701e7f3f232e15d78e69 Gitweb: https://git.kernel.org/tip/5e21a3ecad1500e35b46701e7f3f232e15d78e69 Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:06 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 15:58:02 +01:00 x86/alternative: Merge include files Merge arch/x86/include/asm/alternative-asm.h into arch/x86/include/asm/alternative.h in order to make it easier to use common definitions later. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgr...@suse.com --- arch/x86/entry/entry_32.S| 2 +- arch/x86/entry/vdso/vdso32/system_call.S | 2 +- arch/x86/include/asm/alternative-asm.h | 114 +-- arch/x86/include/asm/alternative.h | 112 +- arch/x86/include/asm/nospec-branch.h | 1 +- arch/x86/include/asm/smap.h | 5 +- arch/x86/lib/atomic64_386_32.S | 2 +- arch/x86/lib/atomic64_cx8_32.S | 2 +- arch/x86/lib/copy_page_64.S | 2 +- arch/x86/lib/copy_user_64.S | 2 +- arch/x86/lib/memcpy_64.S | 2 +- arch/x86/lib/memmove_64.S| 2 +- arch/x86/lib/memset_64.S | 2 +- arch/x86/lib/retpoline.S | 2 +- 14 files changed, 120 insertions(+), 132 deletions(-) delete mode 100644 arch/x86/include/asm/alternative-asm.h diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index df8c017..4e079f2 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -40,7 +40,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/arch/x86/entry/vdso/vdso32/system_call.S b/arch/x86/entry/vdso/vdso32/system_call.S index de1fff7..d6a6080 100644 --- a/arch/x86/entry/vdso/vdso32/system_call.S +++ b/arch/x86/entry/vdso/vdso32/system_call.S @@ -6,7 +6,7 @@ #include #include #include -#include +#include .text .globl __kernel_vsyscall diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h deleted file mode 100644 index 464034d..000 --- a/arch/x86/include/asm/alternative-asm.h +++ /dev/null @@ -1,114 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ASM_X86_ALTERNATIVE_ASM_H -#define _ASM_X86_ALTERNATIVE_ASM_H - -#ifdef __ASSEMBLY__ - -#include - -#ifdef CONFIG_SMP - .macro LOCK_PREFIX -672: lock - .pushsection .smp_locks,"a" - .balign 4 - .long 672b - . - .popsection - .endm -#else - .macro LOCK_PREFIX - .endm -#endif - -/* - * objtool annotation to ignore the alternatives and only consider the original - * instruction(s). - */ -.macro ANNOTATE_IGNORE_ALTERNATIVE - .Lannotate_\@: - .pushsection .discard.ignore_alts - .long .Lannotate_\@ - . - .popsection -.endm - -/* - * Issue one struct alt_instr descriptor entry (need to put it into - * the section .altinstructions, see below). This entry contains - * enough information for the alternatives patching code to patch an - * instruction. See apply_alternatives(). - */ -.macro altinstruction_entry orig alt feature orig_len alt_len pad_len - .long \orig - . - .long \alt - . - .word \feature - .byte \orig_len - .byte \alt_len - .byte \pad_len -.endm - -/* - * Define an alternative between two instructions. If @feature is - * present, early code in apply_alternatives() replaces @oldinstr with - * @newinstr. ".skip" directive takes care of proper instruction padding - * in case @newinstr is longer than @oldinstr. - */ -.macro ALTERNATIVE oldinstr, newinstr, feature -140: - \oldinstr -141: - .skip -(((144f-143f)-(141b-140b)) > 0) * ((144f-143f)-(141b-140b)),0x90 -142: - - .pushsection .altinstructions,"a" - altinstruction_entry 140b,143f,\feature,142b-140b,144f-143f,142b-141b - .popsection - - .pushsection .altinstr_replacement,"ax" -143: - \newinstr -144: - .popsection -.endm - -#define old_len141b-140b -#define new_len1 144f-143f -#define new_len2 145f-144f - -/* - * gas compatible max based on the idea from: - * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax - * - * The additional "-" is needed because gas uses a "true" value of -1. - */ -#define alt_max_short(a, b)((a) ^ (((a) ^ (b)) & -(-((a) < (b) - - -/* - * Same as ALTERNATIVE macro above but for two alternatives. If CPU - * has @feature1, it replaces @oldinstr with @newinstr1. If CPU has - * @feature2, it replaces @oldinstr with @feature2. - */ -.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2 -140: - \oldinstr -141: - .skip -((alt_max_short(new_len1, new_len2) - (old_len)) > 0) * \
[tip: x86/alternatives] static_call: Add function to query current function
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 6ea312d95e0226b306bb4b8ee3a0727d880378cb Gitweb: https://git.kernel.org/tip/6ea312d95e0226b306bb4b8ee3a0727d880378cb Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:08 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 16:12:33 +01:00 static_call: Add function to query current function Some users of paravirtualized functions need to query which function has been specified in a pv_ops vector element. In order to be able to switch such paravirtualized functions to static_calls instead, there needs to be a function to query the function which will be called via static_call(). Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-4-jgr...@suse.com --- include/linux/static_call.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/static_call.h b/include/linux/static_call.h index 76b8812..e01b61a 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -20,6 +20,7 @@ * static_call(name)(args...); * static_call_cond(name)(args...); * static_call_update(name, func); + * static_call_query(name); * * Usage example: * @@ -91,6 +92,10 @@ * * which will include the required value tests to avoid NULL-pointer * dereferences. + * + * To query which function is currently set to be called, use: + * + * func = static_call_query(name); */ #include @@ -118,6 +123,8 @@ extern void arch_static_call_transform(void *site, void *tramp, void *func, bool STATIC_CALL_TRAMP_ADDR(name), func); \ }) +#define static_call_query(name) (READ_ONCE(STATIC_CALL_KEY(name).func)) + #ifdef CONFIG_HAVE_STATIC_CALL_INLINE extern int __init static_call_init(void); @@ -191,6 +198,7 @@ static inline int static_call_init(void) { return 0; } }; \ ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) + #define static_call_cond(name) (void)__static_call(name) static inline
[tip: x86/alternatives] static_call: Move struct static_call_key definition to static_call_types.h
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: b046664872dd78a8bebe3d5f3bb9da9baa93f5ca Gitweb: https://git.kernel.org/tip/b046664872dd78a8bebe3d5f3bb9da9baa93f5ca Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:07 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 16:04:39 +01:00 static_call: Move struct static_call_key definition to static_call_types.h Having the definition of static_call() in static_call_types.h makes no sense as long struct static_call_key isn't defined there, as the generic implementation of static_call() is referencing this structure. So move the definition of struct static_call_key to static_call_types.h. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-3-jgr...@suse.com --- include/linux/static_call.h | 18 -- include/linux/static_call_types.h | 18 ++ tools/include/linux/static_call_types.h | 18 ++ 3 files changed, 36 insertions(+), 18 deletions(-) diff --git a/include/linux/static_call.h b/include/linux/static_call.h index 85ecc78..76b8812 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -128,16 +128,6 @@ struct static_call_mod { struct static_call_site *sites; }; -struct static_call_key { - void *func; - union { - /* bit 0: 0 = mods, 1 = sites */ - unsigned long type; - struct static_call_mod *mods; - struct static_call_site *sites; - }; -}; - /* For finding the key associated with a trampoline */ struct static_call_tramp_key { s32 tramp; @@ -187,10 +177,6 @@ extern long __static_call_return0(void); static inline int static_call_init(void) { return 0; } -struct static_call_key { - void *func; -}; - #define __DEFINE_STATIC_CALL(name, _func, _func_init) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = {\ @@ -243,10 +229,6 @@ static inline long __static_call_return0(void) static inline int static_call_init(void) { return 0; } -struct static_call_key { - void *func; -}; - static inline long __static_call_return0(void) { return 0; diff --git a/include/linux/static_call_types.h b/include/linux/static_call_types.h index ae5662d..5a00b8b 100644 --- a/include/linux/static_call_types.h +++ b/include/linux/static_call_types.h @@ -58,11 +58,25 @@ struct static_call_site { __raw_static_call(name);\ }) +struct static_call_key { + void *func; + union { + /* bit 0: 0 = mods, 1 = sites */ + unsigned long type; + struct static_call_mod *mods; + struct static_call_site *sites; + }; +}; + #else /* !CONFIG_HAVE_STATIC_CALL_INLINE */ #define __STATIC_CALL_ADDRESSABLE(name) #define __static_call(name)__raw_static_call(name) +struct static_call_key { + void *func; +}; + #endif /* CONFIG_HAVE_STATIC_CALL_INLINE */ #ifdef MODULE @@ -77,6 +91,10 @@ struct static_call_site { #else +struct static_call_key { + void *func; +}; + #define static_call(name) \ ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func)) diff --git a/tools/include/linux/static_call_types.h b/tools/include/linux/static_call_types.h index ae5662d..5a00b8b 100644 --- a/tools/include/linux/static_call_types.h +++ b/tools/include/linux/static_call_types.h @@ -58,11 +58,25 @@ struct static_call_site { __raw_static_call(name);\ }) +struct static_call_key { + void *func; + union { + /* bit 0: 0 = mods, 1 = sites */ + unsigned long type; + struct static_call_mod *mods; + struct static_call_site *sites; + }; +}; + #else /* !CONFIG_HAVE_STATIC_CALL_INLINE */ #define __STATIC_CALL_ADDRESSABLE(name) #define __static_call(name)__raw_static_call(name) +struct static_call_key { + void *func; +}; + #endif /* CONFIG_HAVE_STATIC_CALL_INLINE */ #ifdef MODULE @@ -77,6 +91,10 @@ struct static_call_site { #else +struct static_call_key { + void *func; +}; + #define static_call(name) \ ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func))
[tip: x86/alternatives] x86/alternative: Support ALTERNATIVE_TERNARY
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: e208b3c4a9748b2c17aa09ba663b5096ccf82dce Gitweb: https://git.kernel.org/tip/e208b3c4a9748b2c17aa09ba663b5096ccf82dce Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:11 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 16:57:31 +01:00 x86/alternative: Support ALTERNATIVE_TERNARY Add ALTERNATIVE_TERNARY support for replacing an initial instruction with either of two instructions depending on a feature: ALTERNATIVE_TERNARY "default_instr", FEATURE_NR, "feature_on_instr", "feature_off_instr" which will start with "default_instr" and at patch time will, depending on FEATURE_NR being set or not, patch that with either "feature_on_instr" or "feature_off_instr". [ bp: Add comment ontop. ] Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-7-jgr...@suse.com --- arch/x86/include/asm/alternative.h | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 649e56f..17b3609 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -179,6 +179,11 @@ static inline int alternatives_text_reserved(void *start, void *end) ALTINSTR_REPLACEMENT(newinstr2, 2) \ ".popsection\n" +/* If @feature is set, patch in @newinstr_yes, otherwise @newinstr_no. */ +#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr_yes, newinstr_no) \ + ALTERNATIVE_2(oldinstr, newinstr_no, X86_FEATURE_ALWAYS,\ + newinstr_yes, feature) + #define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, feat3) \ OLDINSTR_3(oldinsn, 1, 2, 3) \ ".pushsection .altinstructions,\"a\"\n" \ @@ -210,6 +215,9 @@ static inline int alternatives_text_reserved(void *start, void *end) #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \ asm_inline volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory") +#define alternative_ternary(oldinstr, feature, newinstr_yes, newinstr_no) \ + asm_inline volatile(ALTERNATIVE_TERNARY(oldinstr, feature, newinstr_yes, newinstr_no) ::: "memory") + /* * Alternative inline assembly with input. * @@ -380,6 +388,11 @@ static inline int alternatives_text_reserved(void *start, void *end) .popsection .endm +/* If @feature is set, patch in @newinstr_yes, otherwise @newinstr_no. */ +#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr_yes, newinstr_no) \ + ALTERNATIVE_2 oldinstr, newinstr_no, X86_FEATURE_ALWAYS,\ + newinstr_yes, feature + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_ALTERNATIVE_H */
[tip: x86/alternatives] x86/paravirt: Switch time pvops functions to use static_call()
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: a0e2bf7cb7006b5a58ee81f4da4fe575875f2781 Gitweb: https://git.kernel.org/tip/a0e2bf7cb7006b5a58ee81f4da4fe575875f2781 Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:09 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 16:17:52 +01:00 x86/paravirt: Switch time pvops functions to use static_call() The time pvops functions are the only ones left which might be used in 32-bit mode and which return a 64-bit value. Switch them to use the static_call() mechanism instead of pvops, as this allows quite some simplification of the pvops implementation. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgr...@suse.com --- arch/arm/include/asm/paravirt.h | 14 +- arch/arm/kernel/paravirt.c| 9 +++-- arch/arm64/include/asm/paravirt.h | 14 +- arch/arm64/kernel/paravirt.c | 13 + arch/x86/Kconfig | 1 +- arch/x86/include/asm/mshyperv.h | 2 +- arch/x86/include/asm/paravirt.h | 15 --- arch/x86/include/asm/paravirt_types.h | 6 +-- arch/x86/kernel/cpu/vmware.c | 5 +++-- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/kvmclock.c| 2 +- arch/x86/kernel/paravirt.c| 13 + arch/x86/kernel/tsc.c | 3 ++- arch/x86/xen/time.c | 26 +- drivers/xen/time.c| 3 ++- 15 files changed, 71 insertions(+), 57 deletions(-) diff --git a/arch/arm/include/asm/paravirt.h b/arch/arm/include/asm/paravirt.h index cdbf02d..95d5b0d 100644 --- a/arch/arm/include/asm/paravirt.h +++ b/arch/arm/include/asm/paravirt.h @@ -3,23 +3,19 @@ #define _ASM_ARM_PARAVIRT_H #ifdef CONFIG_PARAVIRT +#include + struct static_key; extern struct static_key paravirt_steal_enabled; extern struct static_key paravirt_steal_rq_enabled; -struct pv_time_ops { - unsigned long long (*steal_clock)(int cpu); -}; - -struct paravirt_patch_template { - struct pv_time_ops time; -}; +u64 dummy_steal_clock(int cpu); -extern struct paravirt_patch_template pv_ops; +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); static inline u64 paravirt_steal_clock(int cpu) { - return pv_ops.time.steal_clock(cpu); + return static_call(pv_steal_clock)(cpu); } #endif diff --git a/arch/arm/kernel/paravirt.c b/arch/arm/kernel/paravirt.c index 4cfed91..7dd9806 100644 --- a/arch/arm/kernel/paravirt.c +++ b/arch/arm/kernel/paravirt.c @@ -9,10 +9,15 @@ #include #include #include +#include #include struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; -struct paravirt_patch_template pv_ops; -EXPORT_SYMBOL_GPL(pv_ops); +static u64 native_steal_clock(int cpu) +{ + return 0; +} + +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h index cf3a0fd..9aa193e 100644 --- a/arch/arm64/include/asm/paravirt.h +++ b/arch/arm64/include/asm/paravirt.h @@ -3,23 +3,19 @@ #define _ASM_ARM64_PARAVIRT_H #ifdef CONFIG_PARAVIRT +#include + struct static_key; extern struct static_key paravirt_steal_enabled; extern struct static_key paravirt_steal_rq_enabled; -struct pv_time_ops { - unsigned long long (*steal_clock)(int cpu); -}; - -struct paravirt_patch_template { - struct pv_time_ops time; -}; +u64 dummy_steal_clock(int cpu); -extern struct paravirt_patch_template pv_ops; +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); static inline u64 paravirt_steal_clock(int cpu) { - return pv_ops.time.steal_clock(cpu); + return static_call(pv_steal_clock)(cpu); } int __init pv_time_init(void); diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c index c07d7a0..75fed44 100644 --- a/arch/arm64/kernel/paravirt.c +++ b/arch/arm64/kernel/paravirt.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -26,8 +27,12 @@ struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; -struct paravirt_patch_template pv_ops; -EXPORT_SYMBOL_GPL(pv_ops); +static u64 native_steal_clock(int cpu) +{ + return 0; +} + +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); struct pv_time_stolen_time_region { struct pvclock_vcpu_stolen_time *kaddr; @@ -45,7 +50,7 @@ static int __init parse_no_stealacc(char *arg) early_param("no-steal-acc", parse_no_stealacc); /* return stolen time in ns by asking the hypervisor */ -static u64 pv_steal_clock(int cpu) +static u64 para_steal_clock(int cpu) { struct pv_time_stolen_time_region *reg; @@ -150,7 +155,7 @@ int __init pv_time_init(void) if (ret)
[tip: x86/alternatives] x86/paravirt: Remove no longer needed 32-bit pvops cruft
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 33634e42e38be61f320183dfc264b9caba292d4e Gitweb: https://git.kernel.org/tip/33634e42e38be61f320183dfc264b9caba292d4e Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:14 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 19:51:55 +01:00 x86/paravirt: Remove no longer needed 32-bit pvops cruft PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons. This allows to remove the 32-bit definitions of those macros leading to a substantial simplification of the paravirt macros, as those were the only ones needing non-empty "pre" and "post" parameters. PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them. Another no longer needed case is special handling of return types larger than unsigned long. Replace that with a BUILD_BUG_ON(). DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be replaced by cli. INTERRUPT_RETURN in 32-bit code can be replaced by iret. ENABLE_INTERRUPTS is used nowhere, so it can be removed. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-10-jgr...@suse.com --- arch/x86/entry/entry_32.S | 4 +- arch/x86/include/asm/irqflags.h | 5 +- arch/x86/include/asm/paravirt.h | 35 + arch/x86/include/asm/paravirt_types.h | 112 +++-- arch/x86/kernel/asm-offsets.c | 2 +- 5 files changed, 35 insertions(+), 123 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 4e079f2..96f0848 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -430,7 +430,7 @@ * will soon execute iret and the tracer was already set to * the irqstate after the IRET: */ - DISABLE_INTERRUPTS(CLBR_ANY) + cli lss (%esp), %esp/* switch to espfix segment */ .Lend_\@: #endif /* CONFIG_X86_ESPFIX32 */ @@ -1077,7 +1077,7 @@ restore_all_switch_stack: * when returning from IPI handler and when returning from * scheduler to user-space. */ - INTERRUPT_RETURN + iret .section .fixup, "ax" SYM_CODE_START(asm_iret_error) diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 144d70e..a0efbcd 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -109,9 +109,6 @@ static __always_inline unsigned long arch_local_irq_save(void) } #else -#define ENABLE_INTERRUPTS(x) sti -#define DISABLE_INTERRUPTS(x) cli - #ifdef CONFIG_X86_64 #ifdef CONFIG_DEBUG_ENTRY #define SAVE_FLAGS(x) pushfq; popq %rax @@ -119,8 +116,6 @@ static __always_inline unsigned long arch_local_irq_save(void) #define INTERRUPT_RETURN jmp native_iret -#else -#define INTERRUPT_RETURN iret #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index def450f..a780509 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -719,6 +719,7 @@ extern void default_banner(void); .if ((~(set)) & mask); pop %reg; .endif #ifdef CONFIG_X86_64 +#ifdef CONFIG_PARAVIRT_XXL #define PV_SAVE_REGS(set) \ COND_PUSH(set, CLBR_RAX, rax); \ @@ -744,46 +745,12 @@ extern void default_banner(void); #define PARA_PATCH(off)((off) / 8) #define PARA_SITE(ptype, ops) _PVSITE(ptype, ops, .quad, 8) #define PARA_INDIRECT(addr)*addr(%rip) -#else -#define PV_SAVE_REGS(set) \ - COND_PUSH(set, CLBR_EAX, eax); \ - COND_PUSH(set, CLBR_EDI, edi); \ - COND_PUSH(set, CLBR_ECX, ecx); \ - COND_PUSH(set, CLBR_EDX, edx) -#define PV_RESTORE_REGS(set) \ - COND_POP(set, CLBR_EDX, edx); \ - COND_POP(set, CLBR_ECX, ecx); \ - COND_POP(set, CLBR_EDI, edi); \ - COND_POP(set, CLBR_EAX, eax) - -#define PARA_PATCH(off)((off) / 4) -#define PARA_SITE(ptype, ops) _PVSITE(ptype, ops, .long, 4) -#define PARA_INDIRECT(addr)*%cs:addr -#endif -#ifdef CONFIG_PARAVIRT_XXL #define INTERRUPT_RETURN \ PARA_SITE(PARA_PATCH(PV_CPU_iret), \ ANNOTATE_RETPOLINE_SAFE; \ jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);) -#define DISABLE_INTERRUPTS(clobbers) \ - PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable), \ - PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\ - ANNOTATE_RETPOLINE_SAFE; \ -
[tip: x86/alternatives] x86/alternative: Use ALTERNATIVE_TERNARY() in _static_cpu_has()
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 2fe2a2c7a97c9bc32acc79154b75e754280f7867 Gitweb: https://git.kernel.org/tip/2fe2a2c7a97c9bc32acc79154b75e754280f7867 Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:12 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 19:33:43 +01:00 x86/alternative: Use ALTERNATIVE_TERNARY() in _static_cpu_has() _static_cpu_has() contains a completely open coded version of ALTERNATIVE_TERNARY(). Replace that with the macro instead. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/20210311142319.4723-8-jgr...@suse.com --- arch/x86/include/asm/cpufeature.h | 41 ++ 1 file changed, 9 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 1728d4c..16a51e7 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -8,6 +8,7 @@ #include #include +#include enum cpuid_leafs { @@ -175,39 +176,15 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); */ static __always_inline bool _static_cpu_has(u16 bit) { - asm_volatile_goto("1: jmp 6f\n" -"2:\n" -".skip -(((5f-4f) - (2b-1b)) > 0) * " -"((5f-4f) - (2b-1b)),0x90\n" -"3:\n" -".section .altinstructions,\"a\"\n" -" .long 1b - .\n" /* src offset */ -" .long 4f - .\n" /* repl offset */ -" .word %P[always]\n" /* always replace */ -" .byte 3b - 1b\n" /* src len */ -" .byte 5f - 4f\n" /* repl len */ -" .byte 3b - 2b\n" /* pad len */ -".previous\n" -".section .altinstr_replacement,\"ax\"\n" -"4: jmp %l[t_no]\n" -"5:\n" -".previous\n" -".section .altinstructions,\"a\"\n" -" .long 1b - .\n" /* src offset */ -" .long 0\n" /* no replacement */ -" .word %P[feature]\n" /* feature bit */ -" .byte 3b - 1b\n" /* src len */ -" .byte 0\n" /* repl len */ -" .byte 0\n" /* pad len */ -".previous\n" -".section .altinstr_aux,\"ax\"\n" -"6:\n" -" testb %[bitnum],%[cap_byte]\n" -" jnz %l[t_yes]\n" -" jmp %l[t_no]\n" -".previous\n" + asm_volatile_goto( + ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]") + ".section .altinstr_aux,\"ax\"\n" + "6:\n" + " testb %[bitnum],%[cap_byte]\n" + " jnz %l[t_yes]\n" + " jmp %l[t_no]\n" + ".previous\n" : : [feature] "i" (bit), -[always] "i" (X86_FEATURE_ALWAYS), [bitnum] "i" (1 << (bit & 7)), [cap_byte] "m" (((const char *)boot_cpu_data.x86_capability)[bit >> 3]) : : t_yes, t_no);
[tip: x86/alternatives] x86/paravirt: Add new features for paravirt patching
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 4e6292114c741221479046515b1aa8145cf1e3f6 Gitweb: https://git.kernel.org/tip/4e6292114c741221479046515b1aa8145cf1e3f6 Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:13 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 19:51:49 +01:00 x86/paravirt: Add new features for paravirt patching For being able to switch paravirt patching from special cased custom code sequences to ALTERNATIVE handling some X86_FEATURE_* are needed as new features. This enables to have the standard indirect pv call as the default code and to patch that with the non-Xen custom code sequence via ALTERNATIVE patching later. Make sure paravirt patching is performed before alternatives patching. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-9-jgr...@suse.com --- arch/x86/include/asm/cpufeatures.h | 2 ++- arch/x86/include/asm/paravirt.h | 10 +- arch/x86/kernel/alternative.c| 30 +-- arch/x86/kernel/paravirt-spinlocks.c | 9 - 4 files changed, 49 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cc96e26..b440c95 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -236,6 +236,8 @@ #define X86_FEATURE_EPT_AD ( 8*32+17) /* Intel Extended Page Table access-dirty bit */ #define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */ #define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */ +#define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */ +#define X86_FEATURE_VCPUPREEMPT( 8*32+21) /* "" PV vcpu_is_preempted function */ /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 6408fd0..def450f 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -45,6 +45,10 @@ static inline u64 paravirt_steal_clock(int cpu) return static_call(pv_steal_clock)(cpu); } +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void __init paravirt_set_cap(void); +#endif + /* The paravirtualized I/O functions */ static inline void slow_down_io(void) { @@ -809,5 +813,11 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm) { } #endif + +#ifndef CONFIG_PARAVIRT_SPINLOCKS +static inline void paravirt_set_cap(void) +{ +} +#endif #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PARAVIRT_H */ diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 133b549..76ad4ce 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -28,6 +28,7 @@ #include #include #include +#include int __read_mostly alternatives_patched; @@ -733,6 +734,33 @@ void __init alternative_instructions(void) * patching. */ + /* +* Paravirt patching and alternative patching can be combined to +* replace a function call with a short direct code sequence (e.g. +* by setting a constant return value instead of doing that in an +* external function). +* In order to make this work the following sequence is required: +* 1. set (artificial) features depending on used paravirt +*functions which can later influence alternative patching +* 2. apply paravirt patching (generally replacing an indirect +*function call with a direct one) +* 3. apply alternative patching (e.g. replacing a direct function +*call with a custom code sequence) +* Doing paravirt patching after alternative patching would clobber +* the optimization of the custom code with a function call again. +*/ + paravirt_set_cap(); + + /* +* First patch paravirt functions, such that we overwrite the indirect +* call with the direct call. +*/ + apply_paravirt(__parainstructions, __parainstructions_end); + + /* +* Then patch alternatives, such that those paravirt calls that are in +* alternatives can be overwritten by their immediate fragments. +*/ apply_alternatives(__alt_instructions, __alt_instructions_end); #ifdef CONFIG_SMP @@ -751,8 +779,6 @@ void __init alternative_instructions(void) } #endif - apply_paravirt(__parainstructions, __parainstructions_end); - restart_nmi(); alternatives_patched = 1; } diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index 4f75d0c..9e1ea99 100644 ---
[tip: x86/alternatives] x86/paravirt: Add new PVOP_ALT* macros to support pvops in ALTERNATIVEs
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 00aa3193ab7a04b25bb8c68e377815696eb5bf56 Gitweb: https://git.kernel.org/tip/00aa3193ab7a04b25bb8c68e377815696eb5bf56 Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:17 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 20:05:44 +01:00 x86/paravirt: Add new PVOP_ALT* macros to support pvops in ALTERNATIVEs Instead of using paravirt patching for custom code sequences add support for using ALTERNATIVE handling combined with paravirt call patching. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-13-jgr...@suse.com --- arch/x86/include/asm/paravirt_types.h | 49 +- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 0afdac8..0ed9762 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -477,44 +477,91 @@ int paravirt_disable_iospace(void); ret;\ }) +#define PVOP_ALT_CALL(ret, op, alt, cond, clbr, call_clbr, \ + extra_clbr, ...) \ + ({ \ + PVOP_CALL_ARGS; \ + PVOP_TEST_NULL(op); \ + asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), \ +alt, cond) \ +: call_clbr, ASM_CALL_CONSTRAINT \ +: paravirt_type(op), \ + paravirt_clobber(clbr), \ + ##__VA_ARGS__\ +: "memory", "cc" extra_clbr); \ + ret;\ + }) + #define __PVOP_CALL(rettype, op, ...) \ PVOP_CALL(PVOP_RETVAL(rettype), op, CLBR_ANY, \ PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS, ##__VA_ARGS__) +#define __PVOP_ALT_CALL(rettype, op, alt, cond, ...) \ + PVOP_ALT_CALL(PVOP_RETVAL(rettype), op, alt, cond, CLBR_ANY,\ + PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS, \ + ##__VA_ARGS__) + #define __PVOP_CALLEESAVE(rettype, op, ...)\ PVOP_CALL(PVOP_RETVAL(rettype), op.func, CLBR_RET_REG, \ PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__) +#define __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond, ...) \ + PVOP_ALT_CALL(PVOP_RETVAL(rettype), op.func, alt, cond, \ + CLBR_RET_REG, PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__) + + #define __PVOP_VCALL(op, ...) \ (void)PVOP_CALL(, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\ VEXTRA_CLOBBERS, ##__VA_ARGS__) +#define __PVOP_ALT_VCALL(op, alt, cond, ...) \ + (void)PVOP_ALT_CALL(, op, alt, cond, CLBR_ANY, \ + PVOP_VCALL_CLOBBERS, VEXTRA_CLOBBERS, \ + ##__VA_ARGS__) + #define __PVOP_VCALLEESAVE(op, ...)\ (void)PVOP_CALL(, op.func, CLBR_RET_REG,\ - PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__) + PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__) +#define __PVOP_ALT_VCALLEESAVE(op, alt, cond, ...) \ + (void)PVOP_ALT_CALL(, op.func, alt, cond, CLBR_RET_REG, \ + PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__) #define PVOP_CALL0(rettype, op) \ __PVOP_CALL(rettype, op) #define PVOP_VCALL0(op) \ __PVOP_VCALL(op) +#define PVOP_ALT_CALL0(rettype, op, alt, cond) \ + __PVOP_ALT_CALL(rettype, op, alt, cond) +#define PVOP_ALT_VCALL0(op, alt, cond) \ + __PVOP_ALT_VCALL(op, alt, cond) #define PVOP_CALLEE0(rettype, op) \ __PVOP_CALLEESAVE(rettype, op) #define PVOP_VCALLEE0(op) \ __PVOP_VCALLEESAVE(op) +#define PVOP_ALT_CALLEE0(rettype, op, alt, cond) \ + __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond) +#define PVOP_ALT_VCALLEE0(op, alt, cond)
[tip: x86/alternatives] x86/paravirt: Switch iret pvops to ALTERNATIVE
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: ae755b5a45482b5de4d96d6f35823076af77445e Gitweb: https://git.kernel.org/tip/ae755b5a45482b5de4d96d6f35823076af77445e Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:16 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 19:58:54 +01:00 x86/paravirt: Switch iret pvops to ALTERNATIVE The iret paravirt op is rather special as it is using a jmp instead of a call instruction. Switch it to ALTERNATIVE. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-12-jgr...@suse.com --- arch/x86/include/asm/paravirt.h | 6 +++--- arch/x86/include/asm/paravirt_types.h | 5 + arch/x86/kernel/asm-offsets.c | 5 +- arch/x86/kernel/paravirt.c| 26 ++ arch/x86/xen/enlighten_pv.c | 3 +-- 5 files changed, 7 insertions(+), 38 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index a780509..913acf7 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -747,9 +747,9 @@ extern void default_banner(void); #define PARA_INDIRECT(addr)*addr(%rip) #define INTERRUPT_RETURN \ - PARA_SITE(PARA_PATCH(PV_CPU_iret), \ - ANNOTATE_RETPOLINE_SAFE; \ - jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);) + ANNOTATE_RETPOLINE_SAFE;\ + ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",\ + X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;") #ifdef CONFIG_DEBUG_ENTRY #define SAVE_FLAGS(clobbers)\ diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 45bd216..0afdac8 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -151,10 +151,6 @@ struct pv_cpu_ops { u64 (*read_pmc)(int counter); - /* Normal iret. Jump to this with the standard iret stack - frame set up. */ - void (*iret)(void); - void (*start_context_switch)(struct task_struct *prev); void (*end_context_switch)(struct task_struct *next); #endif @@ -294,6 +290,7 @@ struct paravirt_patch_template { extern struct pv_info pv_info; extern struct paravirt_patch_template pv_ops; +extern void (*paravirt_iret)(void); #define PARAVIRT_PATCH(x) \ (offsetof(struct paravirt_patch_template, x) / sizeof(void *)) diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 7365080..ecd3fd6 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -61,11 +61,6 @@ static void __used common(void) OFFSET(IA32_RT_SIGFRAME_sigcontext, rt_sigframe_ia32, uc.uc_mcontext); #endif -#ifdef CONFIG_PARAVIRT_XXL - BLANK(); - OFFSET(PV_CPU_iret, paravirt_patch_template, cpu.iret); -#endif - #ifdef CONFIG_XEN BLANK(); OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index a688edf..9b0f568 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -86,25 +86,6 @@ u64 notrace _paravirt_ident_64(u64 x) { return x; } - -static unsigned paravirt_patch_jmp(void *insn_buff, const void *target, - unsigned long addr, unsigned len) -{ - struct branch *b = insn_buff; - unsigned long delta = (unsigned long)target - (addr+5); - - if (len < 5) { -#ifdef CONFIG_RETPOLINE - WARN_ONCE(1, "Failing to patch indirect JMP in %ps\n", (void *)addr); -#endif - return len; /* call too long for patch site */ - } - - b->opcode = 0xe9; /* jmp */ - b->delta = delta; - - return 5; -} #endif DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key); @@ -136,9 +117,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff, else if (opfunc == _paravirt_ident_64) ret = paravirt_patch_ident_64(insn_buff, len); - else if (type == PARAVIRT_PATCH(cpu.iret)) - /* If operation requires a jmp, then jmp */ - ret = paravirt_patch_jmp(insn_buff, opfunc, addr, len); #endif else /* Otherwise call the function. */ @@ -313,8 +291,6 @@ struct paravirt_patch_template pv_ops = { .cpu.load_sp0 = native_load_sp0, - .cpu.iret = native_iret, - #ifdef CONFIG_X86_IOPL_IOPERM .cpu.invalidate_io_bitmap = native_tss_invalidate_io_bitmap, .cpu.update_io_bitmap = native_tss_update_io_bitmap, @@ -419,6 +395,8 @@ st
[tip: x86/alternatives] x86/paravirt: Simplify paravirt macros
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 0b8d366a942fd48a83dfa728e9f8a8d8b20e735f Gitweb: https://git.kernel.org/tip/0b8d366a942fd48a83dfa728e9f8a8d8b20e735f Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:15 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 19:52:52 +01:00 x86/paravirt: Simplify paravirt macros The central pvops call macros PVOP_CALL() and PVOP_VCALL() are looking very similar now. The main differences are using PVOP_VCALL_ARGS or PVOP_CALL_ARGS, which are identical, and the return value handling. So drop PVOP_VCALL_ARGS and instead of PVOP_VCALL() just use (void)PVOP_CALL(long, ...). Note that it isn't easily possible to just redefine PVOP_VCALL() to use PVOP_CALL() instead, as this would require further hiding of commas in macro parameters. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-11-jgr...@suse.com --- arch/x86/include/asm/paravirt_types.h | 41 +++--- 1 file changed, 12 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 42f9eef..45bd216 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -408,11 +408,9 @@ int paravirt_disable_iospace(void); * makes sure the incoming and outgoing types are always correct. */ #ifdef CONFIG_X86_32 -#define PVOP_VCALL_ARGS \ +#define PVOP_CALL_ARGS \ unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; -#define PVOP_CALL_ARGS PVOP_VCALL_ARGS - #define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x)) #define PVOP_CALL_ARG2(x) "d" ((unsigned long)(x)) #define PVOP_CALL_ARG3(x) "c" ((unsigned long)(x)) @@ -428,12 +426,10 @@ int paravirt_disable_iospace(void); #define VEXTRA_CLOBBERS #else /* CONFIG_X86_64 */ /* [re]ax isn't an arg, but the return val */ -#define PVOP_VCALL_ARGS\ +#define PVOP_CALL_ARGS \ unsigned long __edi = __edi, __esi = __esi, \ __edx = __edx, __ecx = __ecx, __eax = __eax; -#define PVOP_CALL_ARGS PVOP_VCALL_ARGS - #define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x)) #define PVOP_CALL_ARG2(x) "S" ((unsigned long)(x)) #define PVOP_CALL_ARG3(x) "d" ((unsigned long)(x)) @@ -458,59 +454,46 @@ int paravirt_disable_iospace(void); #define PVOP_TEST_NULL(op) ((void)pv_ops.op) #endif -#define PVOP_RETMASK(rettype) \ +#define PVOP_RETVAL(rettype) \ ({ unsigned long __mask = ~0UL;\ + BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long)); \ switch (sizeof(rettype)) { \ case 1: __mask = 0xffUL; break; \ case 2: __mask = 0xUL; break; \ case 4: __mask = 0xUL; break; \ default: break; \ } \ - __mask; \ + __mask & __eax; \ }) -#define PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, ...) \ +#define PVOP_CALL(ret, op, clbr, call_clbr, extra_clbr, ...) \ ({ \ PVOP_CALL_ARGS; \ PVOP_TEST_NULL(op); \ - BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long)); \ asm volatile(paravirt_alt(PARAVIRT_CALL)\ : call_clbr, ASM_CALL_CONSTRAINT \ : paravirt_type(op), \ paravirt_clobber(clbr), \ ##__VA_ARGS__\ : "memory", "cc" extra_clbr); \ - (rettype)(__eax & PVOP_RETMASK(rettype)); \ + ret;\ }) #define __PVOP_CALL(rettype, op, ...) \ - PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,\ - EXTRA_CLOBBERS, ##__VA_ARGS__) + PVOP_CA
[tip: x86/alternatives] x86/paravirt: Switch functions with custom code to ALTERNATIVE
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: fafe5e74229fd3f425e3cbfc68b90e615aa6d62f Gitweb: https://git.kernel.org/tip/fafe5e74229fd3f425e3cbfc68b90e615aa6d62f Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:18 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 20:07:01 +01:00 x86/paravirt: Switch functions with custom code to ALTERNATIVE Instead of using paravirt patching for custom code sequences use ALTERNATIVE for the functions with custom code replacements. Instead of patching an ud2 instruction for unpopulated vector entries into the caller site, use a simple function just calling BUG() as a replacement. Simplify the register defines for assembler paravirt calling, as there isn't much usage left. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-14-jgr...@suse.com --- arch/x86/entry/entry_64.S | 2 +- arch/x86/include/asm/irqflags.h | 2 +- arch/x86/include/asm/paravirt.h | 101 - arch/x86/include/asm/paravirt_types.h | 6 +- arch/x86/kernel/paravirt.c| 16 +--- arch/x86/kernel/paravirt_patch.c | 88 +-- 6 files changed, 58 insertions(+), 157 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 400908d..12e2e3c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -305,7 +305,7 @@ SYM_CODE_END(ret_from_fork) .macro DEBUG_ENTRY_ASSERT_IRQS_OFF #ifdef CONFIG_DEBUG_ENTRY pushq %rax - SAVE_FLAGS(CLBR_RAX) + SAVE_FLAGS testl $X86_EFLAGS_IF, %eax jz .Lokay_\@ ud2 diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index a0efbcd..c5ce984 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -111,7 +111,7 @@ static __always_inline unsigned long arch_local_irq_save(void) #ifdef CONFIG_X86_64 #ifdef CONFIG_DEBUG_ENTRY -#define SAVE_FLAGS(x) pushfq; popq %rax +#define SAVE_FLAGS pushfq; popq %rax #endif #define INTERRUPT_RETURN jmp native_iret diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 913acf7..43992e5 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -135,7 +135,9 @@ static inline void write_cr0(unsigned long x) static inline unsigned long read_cr2(void) { - return PVOP_CALLEE0(unsigned long, mmu.read_cr2); + return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2, + "mov %%cr2, %%rax;", + ALT_NOT(X86_FEATURE_XENPV)); } static inline void write_cr2(unsigned long x) @@ -145,12 +147,14 @@ static inline void write_cr2(unsigned long x) static inline unsigned long __read_cr3(void) { - return PVOP_CALL0(unsigned long, mmu.read_cr3); + return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3, + "mov %%cr3, %%rax;", ALT_NOT(X86_FEATURE_XENPV)); } static inline void write_cr3(unsigned long x) { - PVOP_VCALL1(mmu.write_cr3, x); + PVOP_ALT_VCALL1(mmu.write_cr3, x, + "mov %%rdi, %%cr3", ALT_NOT(X86_FEATURE_XENPV)); } static inline void __write_cr4(unsigned long x) @@ -170,7 +174,7 @@ static inline void halt(void) static inline void wbinvd(void) { - PVOP_VCALL0(cpu.wbinvd); + PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV)); } static inline u64 paravirt_read_msr(unsigned msr) @@ -384,22 +388,28 @@ static inline void paravirt_release_p4d(unsigned long pfn) static inline pte_t __pte(pteval_t val) { - return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) }; + return (pte_t) { PVOP_ALT_CALLEE1(pteval_t, mmu.make_pte, val, + "mov %%rdi, %%rax", + ALT_NOT(X86_FEATURE_XENPV)) }; } static inline pteval_t pte_val(pte_t pte) { - return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte); + return PVOP_ALT_CALLEE1(pteval_t, mmu.pte_val, pte.pte, + "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV)); } static inline pgd_t __pgd(pgdval_t val) { - return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) }; + return (pgd_t) { PVOP_ALT_CALLEE1(pgdval_t, mmu.make_pgd, val, + "mov %%rdi, %%rax", + ALT_NOT(X86_FEATURE_XENPV)) }; } static inline pgdval_t pgd_val(pgd_t pgd) { - return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd); + return PVOP_ALT_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd, + "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV)); } #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION @@ -432,12 +442,15 @@ stati
[tip: x86/alternatives] x86/paravirt: Have only one paravirt patch function
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: 054ac8ad5ebe4a69e1f0e842483821ddbe560121 Gitweb: https://git.kernel.org/tip/054ac8ad5ebe4a69e1f0e842483821ddbe560121 Author:Juergen Gross AuthorDate:Thu, 11 Mar 2021 15:23:19 +01:00 Committer: Borislav Petkov CommitterDate: Thu, 11 Mar 2021 20:11:09 +01:00 x86/paravirt: Have only one paravirt patch function There is no need any longer to have different paravirt patch functions for native and Xen. Eliminate native_patch() and rename paravirt_patch_default() to paravirt_patch(). Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210311142319.4723-15-jgr...@suse.com --- arch/x86/include/asm/paravirt_types.h | 19 +-- arch/x86/kernel/Makefile | 3 +-- arch/x86/kernel/alternative.c | 2 +- arch/x86/kernel/paravirt.c| 20 ++-- arch/x86/kernel/paravirt_patch.c | 11 --- arch/x86/xen/enlighten_pv.c | 1 - 6 files changed, 5 insertions(+), 51 deletions(-) delete mode 100644 arch/x86/kernel/paravirt_patch.c diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 588ff14..9d1ddb7 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -68,19 +68,6 @@ struct pv_info { const char *name; }; -struct pv_init_ops { - /* -* Patch may replace one of the defined code sequences with -* arbitrary code, subject to the same register constraints. -* This generally means the code is not free to clobber any -* registers other than EAX. The patch function should return -* the number of bytes of code generated, as we nop pad the -* rest in generic code. -*/ - unsigned (*patch)(u8 type, void *insn_buff, - unsigned long addr, unsigned len); -} __no_randomize_layout; - #ifdef CONFIG_PARAVIRT_XXL struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ @@ -276,7 +263,6 @@ struct pv_lock_ops { * number for each function using the offset which we use to indicate * what to patch. */ struct paravirt_patch_template { - struct pv_init_ops init; struct pv_cpu_ops cpu; struct pv_irq_ops irq; struct pv_mmu_ops mmu; @@ -317,10 +303,7 @@ extern void (*paravirt_iret)(void); /* Simple instruction patching code. */ #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t" -unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, unsigned len); -unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char *start, const char *end); - -unsigned native_patch(u8 type, void *insn_buff, unsigned long addr, unsigned len); +unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr, unsigned int len); int paravirt_disable_iospace(void); diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 2ddf083..0704c2a 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -35,7 +35,6 @@ KASAN_SANITIZE_sev-es.o := n KCSAN_SANITIZE := n OBJECT_FILES_NON_STANDARD_test_nx.o:= y -OBJECT_FILES_NON_STANDARD_paravirt_patch.o := y ifdef CONFIG_FRAME_POINTER OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y @@ -121,7 +120,7 @@ obj-$(CONFIG_AMD_NB)+= amd_nb.o obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o obj-$(CONFIG_KVM_GUEST)+= kvm.o kvmclock.o -obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 76ad4ce..f810e6f 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -616,7 +616,7 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start, BUG_ON(p->len > MAX_PATCH_LEN); /* prep the buffer with the original instructions */ memcpy(insn_buff, p->instr, p->len); - used = pv_ops.init.patch(p->type, insn_buff, (unsigned long)p->instr, p->len); + used = paravirt_patch(p->type, insn_buff, (unsigned long)p->instr, p->len); BUG_ON(used > p->len); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 855ae08..d073026 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -99,8 +99,8 @@ void __init native_pv_lock_init(void) static_branch_disable(&virt_spin_lock_key); } -unsigned paravirt_patch_default(u8 ty
[tip: x86/alternatives] x86/alternative: Drop unused feature parameter from ALTINSTR_REPLACEMENT()
The following commit has been merged into the x86/alternatives branch of tip: Commit-ID: db16e07269c2b4346e4332e43f04e447ef14fd2f Gitweb: https://git.kernel.org/tip/db16e07269c2b4346e4332e43f04e447ef14fd2f Author:Juergen Gross AuthorDate:Tue, 09 Mar 2021 14:48:04 +01:00 Committer: Borislav Petkov CommitterDate: Tue, 09 Mar 2021 20:08:28 +01:00 x86/alternative: Drop unused feature parameter from ALTINSTR_REPLACEMENT() The macro ALTINSTR_REPLACEMENT() doesn't make use of the feature parameter, so drop it. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210309134813.23912-4-jgr...@suse.com --- arch/x86/include/asm/alternative.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 13adca3..5753fb2 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -150,7 +150,7 @@ static inline int alternatives_text_reserved(void *start, void *end) " .byte " alt_rlen(num) "\n"/* replacement len */ \ " .byte " alt_pad_len "\n" /* pad len */ -#define ALTINSTR_REPLACEMENT(newinstr, feature, num) /* replacement */ \ +#define ALTINSTR_REPLACEMENT(newinstr, num)/* replacement */ \ "# ALT: replacement " #num "\n" \ b_replacement(num)":\n\t" newinstr "\n" e_replacement(num) ":\n" @@ -161,7 +161,7 @@ static inline int alternatives_text_reserved(void *start, void *end) ALTINSTR_ENTRY(feature, 1) \ ".popsection\n" \ ".pushsection .altinstr_replacement, \"ax\"\n" \ - ALTINSTR_REPLACEMENT(newinstr, feature, 1) \ + ALTINSTR_REPLACEMENT(newinstr, 1) \ ".popsection\n" #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\ @@ -171,8 +171,8 @@ static inline int alternatives_text_reserved(void *start, void *end) ALTINSTR_ENTRY(feature2, 2) \ ".popsection\n" \ ".pushsection .altinstr_replacement, \"ax\"\n" \ - ALTINSTR_REPLACEMENT(newinstr1, feature1, 1)\ - ALTINSTR_REPLACEMENT(newinstr2, feature2, 2)\ + ALTINSTR_REPLACEMENT(newinstr1, 1) \ + ALTINSTR_REPLACEMENT(newinstr2, 2) \ ".popsection\n" #define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, feat3) \ @@ -183,9 +183,9 @@ static inline int alternatives_text_reserved(void *start, void *end) ALTINSTR_ENTRY(feat3, 3) \ ".popsection\n" \ ".pushsection .altinstr_replacement, \"ax\"\n" \ - ALTINSTR_REPLACEMENT(newinsn1, feat1, 1) \ - ALTINSTR_REPLACEMENT(newinsn2, feat2, 2) \ - ALTINSTR_REPLACEMENT(newinsn3, feat3, 3) \ + ALTINSTR_REPLACEMENT(newinsn1, 1) \ + ALTINSTR_REPLACEMENT(newinsn2, 2) \ + ALTINSTR_REPLACEMENT(newinsn3, 3) \ ".popsection\n" /*
[tip: locking/core] locking/csd_lock: Prepare more CSD lock debugging
The following commit has been merged into the locking/core branch of tip: Commit-ID: de7b09ef658d637eed0584eaba30884e409aef31 Gitweb: https://git.kernel.org/tip/de7b09ef658d637eed0584eaba30884e409aef31 Author:Juergen Gross AuthorDate:Mon, 01 Mar 2021 11:13:35 +01:00 Committer: Ingo Molnar CommitterDate: Sat, 06 Mar 2021 12:49:48 +01:00 locking/csd_lock: Prepare more CSD lock debugging In order to be able to easily add more CSD lock debugging data to struct call_function_data->csd move the call_single_data_t element into a sub-structure. Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20210301101336.7797-3-jgr...@suse.com --- kernel/smp.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index d5f0b21..6d7e6db 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -31,8 +31,12 @@ #define CSD_TYPE(_csd) ((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK) +struct cfd_percpu { + call_single_data_t csd; +}; + struct call_function_data { - call_single_data_t __percpu *csd; + struct cfd_percpu __percpu *pcpu; cpumask_var_t cpumask; cpumask_var_t cpumask_ipi; }; @@ -55,8 +59,8 @@ int smpcfd_prepare_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->csd = alloc_percpu(call_single_data_t); - if (!cfd->csd) { + cfd->pcpu = alloc_percpu(struct cfd_percpu); + if (!cfd->pcpu) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); return -ENOMEM; @@ -71,7 +75,7 @@ int smpcfd_dead_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); - free_percpu(cfd->csd); + free_percpu(cfd->pcpu); return 0; } @@ -694,7 +698,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask, cpumask_clear(cfd->cpumask_ipi); for_each_cpu(cpu, cfd->cpumask) { - call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu); + call_single_data_t *csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd; if (cond_func && !cond_func(cpu, info)) continue; @@ -719,7 +723,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask, for_each_cpu(cpu, cfd->cpumask) { call_single_data_t *csd; - csd = per_cpu_ptr(cfd->csd, cpu); + csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd; csd_lock_wait(csd); } }
[tip: locking/core] locking/csd_lock: Add more data to CSD lock debugging
The following commit has been merged into the locking/core branch of tip: Commit-ID: a5aabace5fb8abf2adcfcf0fe54c089b20d71755 Gitweb: https://git.kernel.org/tip/a5aabace5fb8abf2adcfcf0fe54c089b20d71755 Author:Juergen Gross AuthorDate:Mon, 01 Mar 2021 11:13:36 +01:00 Committer: Ingo Molnar CommitterDate: Sat, 06 Mar 2021 12:49:48 +01:00 locking/csd_lock: Add more data to CSD lock debugging In order to help identifying problems with IPI handling and remote function execution add some more data to IPI debugging code. There have been multiple reports of CPUs looping long times (many seconds) in smp_call_function_many() waiting for another CPU executing a function like tlb flushing. Most of these reports have been for cases where the kernel was running as a guest on top of KVM or Xen (there are rumours of that happening under VMWare, too, and even on bare metal). Finding the root cause hasn't been successful yet, even after more than 2 years of chasing this bug by different developers. Commit: 35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics") tried to address this by adding some debug code and by issuing another IPI when a hang was detected. This helped mitigating the problem (the repeated IPI unlocks the hang), but the root cause is still unknown. Current available data suggests that either an IPI wasn't sent when it should have been, or that the IPI didn't result in the target CPU executing the queued function (due to the IPI not reaching the CPU, the IPI handler not being called, or the handler not seeing the queued request). Try to add more diagnostic data by introducing a global atomic counter which is being incremented when doing critical operations (before and after queueing a new request, when sending an IPI, and when dequeueing a request). The counter value is stored in percpu variables which can be printed out when a hang is detected. The data of the last event (consisting of sequence counter, source CPU, target CPU, and event type) is stored in a global variable. When a new event is to be traced, the data of the last event is stored in the event related percpu location and the global data is updated with the new event's data. This allows to track two events in one data location: one by the value of the event data (the event before the current one), and one by the location itself (the current event). A typical printout with a detected hang will look like this: csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 53 ns for CPU#06 scf_handler_1+0x0/0x50(0xa2a881bb1410). csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xa2a8813823c0) request. csd: cnt(8cc): -> dequeue (src cpu 0 == empty) csd: cnt(8cd): ->0006 idle csd: cnt(0003668): 0001->0006 queue csd: cnt(0003669): 0001->0006 ipi csd: cnt(0003e0f): 0007->000a queue csd: cnt(0003e10): 0001-> ping csd: cnt(0003e71): 0003-> ping csd: cnt(0003e72): ->0006 gotipi csd: cnt(0003e73): ->0006 handle csd: cnt(0003e74): ->0006 dequeue (src cpu 0 == empty) csd: cnt(0003e7f): 0004->0006 ping csd: cnt(0003e80): 0001-> pinged csd: cnt(0003eb2): 0005->0001 noipi csd: cnt(0003eb3): 0001->0006 queue csd: cnt(0003eb4): 0001->0006 noipi csd: cnt now: 0003f00 The idea is to print only relevant entries. Those are all events which are associated with the hang (so sender side events for the source CPU of the hanging request, and receiver side events for the target CPU), and the related events just before those (for adding data needed to identify a possible race). Printing all available data would be possible, but this would add large amounts of data printed on larger configurations. Signed-off-by: Juergen Gross [ Minor readability edits. Breaks col80 but is far more readable. ] Signed-off-by: Ingo Molnar Tested-by: Paul E. McKenney Link: https://lore.kernel.org/r/20210301101336.7797-4-jgr...@suse.com --- Documentation/admin-guide/kernel-parameters.txt | 4 +- kernel/smp.c| 226 ++- 2 files changed, 226 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 98dbffa..1fe9d38 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -789,6 +789,10 @@ printed to the console in case a hanging CPU is detected, and that CPU is pinged again in order to try to resolve the hang situation. + 0: disable csdlock debugging (default) + 1: enable basic csdlock debugging (minor impact) + ext: enable extended csdlock debugging (more impact, +but mor
[tip: locking/core] locking/csd_lock: Add boot parameter for controlling CSD lock debugging
The following commit has been merged into the locking/core branch of tip: Commit-ID: 8d0968cc6b8ffd8496c2ebffdfdc801f949a85e5 Gitweb: https://git.kernel.org/tip/8d0968cc6b8ffd8496c2ebffdfdc801f949a85e5 Author:Juergen Gross AuthorDate:Mon, 01 Mar 2021 11:13:34 +01:00 Committer: Ingo Molnar CommitterDate: Sat, 06 Mar 2021 12:49:48 +01:00 locking/csd_lock: Add boot parameter for controlling CSD lock debugging Currently CSD lock debugging can be switched on and off via a kernel config option only. Unfortunately there is at least one problem with CSD lock handling pending for about 2 years now, which has been seen in different environments (mostly when running virtualized under KVM or Xen, at least once on bare metal). Multiple attempts to catch this issue have finally led to introduction of CSD lock debug code, but this code is not in use in most distros as it has some impact on performance. In order to be able to ship kernels with CONFIG_CSD_LOCK_WAIT_DEBUG enabled even for production use, add a boot parameter for switching the debug functionality on. This will reduce any performance impact of the debug coding to a bare minimum when not being used. Signed-off-by: Juergen Gross [ Minor edits. ] Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20210301101336.7797-2-jgr...@suse.com --- Documentation/admin-guide/kernel-parameters.txt | 6 +++- kernel/smp.c| 38 ++-- 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0454572..98dbffa 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -784,6 +784,12 @@ cs89x0_media= [HW,NET] Format: { rj45 | aui | bnc } + csdlock_debug= [KNL] Enable debug add-ons of cross-CPU function call + handling. When switched on, additional debug data is + printed to the console in case a hanging CPU is + detected, and that CPU is pinged again in order to try + to resolve the hang situation. + dasd= [HW,NET] See header of drivers/s390/block/dasd_devmap.c. diff --git a/kernel/smp.c b/kernel/smp.c index aeb0adf..d5f0b21 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "smpboot.h" #include "sched/smp.h" @@ -102,6 +103,20 @@ void __init call_function_init(void) #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG +static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled); + +static int __init csdlock_debug(char *str) +{ + unsigned int val = 0; + + get_option(&str, &val); + if (val) + static_branch_enable(&csdlock_debug_enabled); + + return 0; +} +early_param("csdlock_debug", csdlock_debug); + static DEFINE_PER_CPU(call_single_data_t *, cur_csd); static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); static DEFINE_PER_CPU(void *, cur_csd_info); @@ -110,7 +125,7 @@ static DEFINE_PER_CPU(void *, cur_csd_info); static atomic_t csd_bug_count = ATOMIC_INIT(0); /* Record current CSD work for current CPU, NULL to erase. */ -static void csd_lock_record(call_single_data_t *csd) +static void __csd_lock_record(call_single_data_t *csd) { if (!csd) { smp_mb(); /* NULL cur_csd after unlock. */ @@ -125,7 +140,13 @@ static void csd_lock_record(call_single_data_t *csd) /* Or before unlock, as the case may be. */ } -static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) +static __always_inline void csd_lock_record(call_single_data_t *csd) +{ + if (static_branch_unlikely(&csdlock_debug_enabled)) + __csd_lock_record(csd); +} + +static int csd_lock_wait_getcpu(call_single_data_t *csd) { unsigned int csd_type; @@ -140,7 +161,7 @@ static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, * so waiting on other types gets much less information. */ -static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) +static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) { int cpu = -1; int cpux; @@ -204,7 +225,7 @@ static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 t * previous function call. For multi-cpu calls its even more interesting * as we'll have to ensure no other cpu is observing our csd. */ -static __always_inline void csd_lock_wait(call_single_data_t *csd) +static void __csd_lock_wait(call_single_data_t *csd) { int bug_id = 0; u64 ts0, ts1; @@ -218,6 +239,15 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd) smp_acquire__a
[tip: locking/core] locking/csd_lock: Add boot parameter for controlling CSD lock debugging
The following commit has been merged into the locking/core branch of tip: Commit-ID: 4b816578c16b92b68fb9842dcec0bc2fdc2b36d8 Gitweb: https://git.kernel.org/tip/4b816578c16b92b68fb9842dcec0bc2fdc2b36d8 Author:Juergen Gross AuthorDate:Mon, 01 Mar 2021 11:13:34 +01:00 Committer: Ingo Molnar CommitterDate: Mon, 01 Mar 2021 14:27:58 +01:00 locking/csd_lock: Add boot parameter for controlling CSD lock debugging Currently CSD lock debugging can be switched on and off via a kernel config option only. Unfortunately there is at least one problem with CSD lock handling pending for about 2 years now, which has been seen in different environments (mostly when running virtualized under KVM or Xen, at least once on bare metal). Multiple attempts to catch this issue have finally led to introduction of CSD lock debug code, but this code is not in use in most distros as it has some impact on performance. In order to be able to ship kernels with CONFIG_CSD_LOCK_WAIT_DEBUG enabled even for production use, add a boot parameter for switching the debug functionality on. This will reduce any performance impact of the debug coding to a bare minimum when not being used. Signed-off-by: Juergen Gross [ Minor edits. ] Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20210301101336.7797-2-jgr...@suse.com --- Documentation/admin-guide/kernel-parameters.txt | 6 +++- kernel/smp.c| 38 ++-- 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0454572..98dbffa 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -784,6 +784,12 @@ cs89x0_media= [HW,NET] Format: { rj45 | aui | bnc } + csdlock_debug= [KNL] Enable debug add-ons of cross-CPU function call + handling. When switched on, additional debug data is + printed to the console in case a hanging CPU is + detected, and that CPU is pinged again in order to try + to resolve the hang situation. + dasd= [HW,NET] See header of drivers/s390/block/dasd_devmap.c. diff --git a/kernel/smp.c b/kernel/smp.c index aeb0adf..d5f0b21 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "smpboot.h" #include "sched/smp.h" @@ -102,6 +103,20 @@ void __init call_function_init(void) #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG +static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled); + +static int __init csdlock_debug(char *str) +{ + unsigned int val = 0; + + get_option(&str, &val); + if (val) + static_branch_enable(&csdlock_debug_enabled); + + return 0; +} +early_param("csdlock_debug", csdlock_debug); + static DEFINE_PER_CPU(call_single_data_t *, cur_csd); static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); static DEFINE_PER_CPU(void *, cur_csd_info); @@ -110,7 +125,7 @@ static DEFINE_PER_CPU(void *, cur_csd_info); static atomic_t csd_bug_count = ATOMIC_INIT(0); /* Record current CSD work for current CPU, NULL to erase. */ -static void csd_lock_record(call_single_data_t *csd) +static void __csd_lock_record(call_single_data_t *csd) { if (!csd) { smp_mb(); /* NULL cur_csd after unlock. */ @@ -125,7 +140,13 @@ static void csd_lock_record(call_single_data_t *csd) /* Or before unlock, as the case may be. */ } -static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) +static __always_inline void csd_lock_record(call_single_data_t *csd) +{ + if (static_branch_unlikely(&csdlock_debug_enabled)) + __csd_lock_record(csd); +} + +static int csd_lock_wait_getcpu(call_single_data_t *csd) { unsigned int csd_type; @@ -140,7 +161,7 @@ static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, * so waiting on other types gets much less information. */ -static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) +static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) { int cpu = -1; int cpux; @@ -204,7 +225,7 @@ static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 t * previous function call. For multi-cpu calls its even more interesting * as we'll have to ensure no other cpu is observing our csd. */ -static __always_inline void csd_lock_wait(call_single_data_t *csd) +static void __csd_lock_wait(call_single_data_t *csd) { int bug_id = 0; u64 ts0, ts1; @@ -218,6 +239,15 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd) smp_acquire__a
[tip: locking/core] locking/csd_lock: Add more data to CSD lock debugging
The following commit has been merged into the locking/core branch of tip: Commit-ID: 6bf3195fdbab92b57f3167101a0b651b93dbeae7 Gitweb: https://git.kernel.org/tip/6bf3195fdbab92b57f3167101a0b651b93dbeae7 Author:Juergen Gross AuthorDate:Mon, 01 Mar 2021 11:13:36 +01:00 Committer: Ingo Molnar CommitterDate: Mon, 01 Mar 2021 14:27:59 +01:00 locking/csd_lock: Add more data to CSD lock debugging In order to help identifying problems with IPI handling and remote function execution add some more data to IPI debugging code. There have been multiple reports of CPUs looping long times (many seconds) in smp_call_function_many() waiting for another CPU executing a function like tlb flushing. Most of these reports have been for cases where the kernel was running as a guest on top of KVM or Xen (there are rumours of that happening under VMWare, too, and even on bare metal). Finding the root cause hasn't been successful yet, even after more than 2 years of chasing this bug by different developers. Commit: 35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics") tried to address this by adding some debug code and by issuing another IPI when a hang was detected. This helped mitigating the problem (the repeated IPI unlocks the hang), but the root cause is still unknown. Current available data suggests that either an IPI wasn't sent when it should have been, or that the IPI didn't result in the target CPU executing the queued function (due to the IPI not reaching the CPU, the IPI handler not being called, or the handler not seeing the queued request). Try to add more diagnostic data by introducing a global atomic counter which is being incremented when doing critical operations (before and after queueing a new request, when sending an IPI, and when dequeueing a request). The counter value is stored in percpu variables which can be printed out when a hang is detected. The data of the last event (consisting of sequence counter, source CPU, target CPU, and event type) is stored in a global variable. When a new event is to be traced, the data of the last event is stored in the event related percpu location and the global data is updated with the new event's data. This allows to track two events in one data location: one by the value of the event data (the event before the current one), and one by the location itself (the current event). A typical printout with a detected hang will look like this: csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 53 ns for CPU#06 scf_handler_1+0x0/0x50(0xa2a881bb1410). csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xa2a8813823c0) request. csd: cnt(8cc): -> dequeue (src cpu 0 == empty) csd: cnt(8cd): ->0006 idle csd: cnt(0003668): 0001->0006 queue csd: cnt(0003669): 0001->0006 ipi csd: cnt(0003e0f): 0007->000a queue csd: cnt(0003e10): 0001-> ping csd: cnt(0003e71): 0003-> ping csd: cnt(0003e72): ->0006 gotipi csd: cnt(0003e73): ->0006 handle csd: cnt(0003e74): ->0006 dequeue (src cpu 0 == empty) csd: cnt(0003e7f): 0004->0006 ping csd: cnt(0003e80): 0001-> pinged csd: cnt(0003eb2): 0005->0001 noipi csd: cnt(0003eb3): 0001->0006 queue csd: cnt(0003eb4): 0001->0006 noipi csd: cnt now: 0003f00 The idea is to print only relevant entries. Those are all events which are associated with the hang (so sender side events for the source CPU of the hanging request, and receiver side events for the target CPU), and the related events just before those (for adding data needed to identify a possible race). Printing all available data would be possible, but this would add large amounts of data printed on larger configurations. Signed-off-by: Juergen Gross [ Minor readability edits. Breaks col80 but is far more readable. ] Signed-off-by: Ingo Molnar Tested-by: Paul E. McKenney Link: https://lore.kernel.org/r/20210301101336.7797-4-jgr...@suse.com --- Documentation/admin-guide/kernel-parameters.txt | 4 +- kernel/smp.c| 226 ++- 2 files changed, 226 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 98dbffa..1fe9d38 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -789,6 +789,10 @@ printed to the console in case a hanging CPU is detected, and that CPU is pinged again in order to try to resolve the hang situation. + 0: disable csdlock debugging (default) + 1: enable basic csdlock debugging (minor impact) + ext: enable extended csdlock debugging (more impact, +but mor
[tip: locking/core] locking/csd_lock: Prepare more CSD lock debugging
The following commit has been merged into the locking/core branch of tip: Commit-ID: b3e3bc34b1e938c6447fa8b646010c4016be7fad Gitweb: https://git.kernel.org/tip/b3e3bc34b1e938c6447fa8b646010c4016be7fad Author:Juergen Gross AuthorDate:Mon, 01 Mar 2021 11:13:35 +01:00 Committer: Ingo Molnar CommitterDate: Mon, 01 Mar 2021 14:27:58 +01:00 locking/csd_lock: Prepare more CSD lock debugging In order to be able to easily add more CSD lock debugging data to struct call_function_data->csd move the call_single_data_t element into a sub-structure. Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20210301101336.7797-3-jgr...@suse.com --- kernel/smp.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index d5f0b21..6d7e6db 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -31,8 +31,12 @@ #define CSD_TYPE(_csd) ((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK) +struct cfd_percpu { + call_single_data_t csd; +}; + struct call_function_data { - call_single_data_t __percpu *csd; + struct cfd_percpu __percpu *pcpu; cpumask_var_t cpumask; cpumask_var_t cpumask_ipi; }; @@ -55,8 +59,8 @@ int smpcfd_prepare_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->csd = alloc_percpu(call_single_data_t); - if (!cfd->csd) { + cfd->pcpu = alloc_percpu(struct cfd_percpu); + if (!cfd->pcpu) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); return -ENOMEM; @@ -71,7 +75,7 @@ int smpcfd_dead_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); - free_percpu(cfd->csd); + free_percpu(cfd->pcpu); return 0; } @@ -694,7 +698,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask, cpumask_clear(cfd->cpumask_ipi); for_each_cpu(cpu, cfd->cpumask) { - call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu); + call_single_data_t *csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd; if (cond_func && !cond_func(cpu, info)) continue; @@ -719,7 +723,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask, for_each_cpu(cpu, cfd->cpumask) { call_single_data_t *csd; - csd = per_cpu_ptr(cfd->csd, cpu); + csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd; csd_lock_wait(csd); } }
[tip: x86/paravirt] x86/xen: Drop USERGS_SYSRET64 paravirt call
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: afd30525a659ac0ae0904f0cb4a2ca75522c3123 Gitweb: https://git.kernel.org/tip/afd30525a659ac0ae0904f0cb4a2ca75522c3123 Author:Juergen Gross AuthorDate:Wed, 20 Jan 2021 14:55:45 +01:00 Committer: Borislav Petkov CommitterDate: Wed, 10 Feb 2021 12:32:07 +01:00 x86/xen: Drop USERGS_SYSRET64 paravirt call USERGS_SYSRET64 is used to return from a syscall via SYSRET, but a Xen PV guest will nevertheless use the IRET hypercall, as there is no sysret PV hypercall defined. So instead of testing all the prerequisites for doing a sysret and then mangling the stack for Xen PV again for doing an iret just use the iret exit from the beginning. This can easily be done via an ALTERNATIVE like it is done for the sysenter compat case already. It should be noted that this drops the optimization in Xen for not restoring a few registers when returning to user mode, but it seems as if the saved instructions in the kernel more than compensate for this drop (a kernel build in a Xen PV guest was slightly faster with this patch applied). While at it remove the stale sysret32 remnants. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/2021012013.32594-6-jgr...@suse.com --- arch/x86/entry/entry_64.S | 16 +++- arch/x86/include/asm/irqflags.h | 6 -- arch/x86/include/asm/paravirt.h | 5 - arch/x86/include/asm/paravirt_types.h | 8 arch/x86/kernel/asm-offsets_64.c | 2 -- arch/x86/kernel/paravirt.c| 5 + arch/x86/kernel/paravirt_patch.c | 4 arch/x86/xen/enlighten_pv.c | 1 - arch/x86/xen/xen-asm.S| 20 arch/x86/xen/xen-ops.h| 2 -- 10 files changed, 8 insertions(+), 61 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a876204..ce0464d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -46,14 +46,6 @@ .code64 .section .entry.text, "ax" -#ifdef CONFIG_PARAVIRT_XXL -SYM_CODE_START(native_usergs_sysret64) - UNWIND_HINT_EMPTY - swapgs - sysretq -SYM_CODE_END(native_usergs_sysret64) -#endif /* CONFIG_PARAVIRT_XXL */ - /* * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers. * @@ -123,7 +115,12 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL) * Try to use SYSRET instead of IRET if we're returning to * a completely clean 64-bit userspace context. If we're not, * go to the slow exit path. +* In the Xen PV case we must use iret anyway. */ + + ALTERNATIVE "", "jmpswapgs_restore_regs_and_return_to_usermode", \ + X86_FEATURE_XENPV + movqRCX(%rsp), %rcx movqRIP(%rsp), %r11 @@ -215,7 +212,8 @@ syscall_return_via_sysret: popq%rdi popq%rsp - USERGS_SYSRET64 + swapgs + sysretq SYM_CODE_END(entry_SYSCALL_64) /* diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 8c86ede..e585a47 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -132,12 +132,6 @@ static __always_inline unsigned long arch_local_irq_save(void) #endif #define INTERRUPT_RETURN jmp native_iret -#define USERGS_SYSRET64\ - swapgs; \ - sysretq; -#define USERGS_SYSRET32\ - swapgs; \ - sysretl #else #define INTERRUPT_RETURN iret diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index f2ebe10..dd43b11 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -776,11 +776,6 @@ extern void default_banner(void); #ifdef CONFIG_X86_64 #ifdef CONFIG_PARAVIRT_XXL -#define USERGS_SYSRET64 \ - PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64), \ - ANNOTATE_RETPOLINE_SAFE; \ - jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);) - #ifdef CONFIG_DEBUG_ENTRY #define SAVE_FLAGS(clobbers)\ PARA_SITE(PARA_PATCH(PV_IRQ_save_fl), \ diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 130f428..0169365 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -156,14 +156,6 @@ struct pv_cpu_ops { u64 (*read_pmc)(int counter); - /* -* Switch to usermode gs and return to 64-bit usermode using -* sysret. Only used in 64-bit kernels to return to 64-bit -* processes. Usermode register state, including %rsp, must -
[tip: x86/paravirt] x86/pv: Switch SWAPGS to ALTERNATIVE
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: 53c9d9240944088274aadbbbafc6138ca462db4f Gitweb: https://git.kernel.org/tip/53c9d9240944088274aadbbbafc6138ca462db4f Author:Juergen Gross AuthorDate:Wed, 20 Jan 2021 14:55:44 +01:00 Committer: Borislav Petkov CommitterDate: Wed, 10 Feb 2021 12:25:49 +01:00 x86/pv: Switch SWAPGS to ALTERNATIVE SWAPGS is used only for interrupts coming from user mode or for returning to user mode. So there is no reason to use the PARAVIRT framework, as it can easily be replaced by an ALTERNATIVE depending on X86_FEATURE_XENPV. There are several instances using the PV-aware SWAPGS macro in paths which are never executed in a Xen PV guest. Replace those with the plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Acked-by: Andy Lutomirski Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/2021012013.32594-5-jgr...@suse.com --- arch/x86/entry/entry_64.S | 10 +- arch/x86/include/asm/irqflags.h | 20 arch/x86/include/asm/paravirt.h | 20 arch/x86/include/asm/paravirt_types.h | 2 -- arch/x86/kernel/asm-offsets_64.c | 1 - arch/x86/kernel/paravirt.c| 1 - arch/x86/kernel/paravirt_patch.c | 3 --- arch/x86/xen/enlighten_pv.c | 3 --- 8 files changed, 13 insertions(+), 47 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index cad0870..a876204 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -669,7 +669,7 @@ native_irq_return_ldt: */ pushq %rdi/* Stash user RDI */ - SWAPGS /* to kernel GS */ + swapgs /* to kernel GS */ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */ movqPER_CPU_VAR(espfix_waddr), %rdi @@ -699,7 +699,7 @@ native_irq_return_ldt: orq PER_CPU_VAR(espfix_stack), %rax SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi - SWAPGS /* to user GS */ + swapgs /* to user GS */ popq%rdi/* Restore user RDI */ movq%rax, %rsp @@ -943,7 +943,7 @@ SYM_CODE_START_LOCAL(paranoid_entry) ret .Lparanoid_entry_swapgs: - SWAPGS + swapgs /* * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an @@ -1001,7 +1001,7 @@ SYM_CODE_START_LOCAL(paranoid_exit) jnz restore_regs_and_return_to_kernel /* We are returning to a context with user GSBASE */ - SWAPGS_UNSAFE_STACK + swapgs jmp restore_regs_and_return_to_kernel SYM_CODE_END(paranoid_exit) @@ -1426,7 +1426,7 @@ nmi_no_fsgsbase: jnz nmi_restore nmi_swapgs: - SWAPGS_UNSAFE_STACK + swapgs nmi_restore: POP_REGS diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 2dfc8d3..8c86ede 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -131,18 +131,6 @@ static __always_inline unsigned long arch_local_irq_save(void) #define SAVE_FLAGS(x) pushfq; popq %rax #endif -#define SWAPGS swapgs -/* - * Currently paravirt can't handle swapgs nicely when we - * don't have a stack we can rely on (such as a user space - * stack). So we either find a way around these or just fault - * and emulate if a guest tries to call swapgs directly. - * - * Either way, this is a good way to document that we don't - * have a reliable stack. x86_64 only. - */ -#define SWAPGS_UNSAFE_STACKswapgs - #define INTERRUPT_RETURN jmp native_iret #define USERGS_SYSRET64\ swapgs; \ @@ -170,6 +158,14 @@ static __always_inline int arch_irqs_disabled(void) return arch_irqs_disabled_flags(flags); } +#else +#ifdef CONFIG_X86_64 +#ifdef CONFIG_XEN_PV +#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV +#else +#define SWAPGS swapgs +#endif +#endif #endif /* !__ASSEMBLY__ */ #endif diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index f8dce11..f2ebe10 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -776,26 +776,6 @@ extern void default_banner(void); #ifdef CONFIG_X86_64 #ifdef CONFIG_PARAVIRT_XXL -/* - * If swapgs is used while the userspace stack is still current, - * there's no way to call a pvop. The PV replacement *must* be - * inlined, or the swapgs instruction must be trapped and emulated. - */ -#define SWAPGS_UNSAFE_STACK\ - PARA_SITE(PARA_PATCH(PV_CPU_sw
[tip: x86/paravirt] x86/xen: Use specific Xen pv interrupt entry for MCE
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: c3d7fa6684b5b3a07a48fc379d27bfb8a96661d9 Gitweb: https://git.kernel.org/tip/c3d7fa6684b5b3a07a48fc379d27bfb8a96661d9 Author:Juergen Gross AuthorDate:Wed, 20 Jan 2021 14:55:42 +01:00 Committer: Borislav Petkov CommitterDate: Wed, 10 Feb 2021 12:07:10 +01:00 x86/xen: Use specific Xen pv interrupt entry for MCE Xen PV guests don't use IST. For machine check interrupts, switch to the same model as debug interrupts. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Reviewed-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/2021012013.32594-3-jgr...@suse.com --- arch/x86/include/asm/idtentry.h | 3 +++ arch/x86/xen/enlighten_pv.c | 16 +++- arch/x86/xen/xen-asm.S | 2 +- 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index f656aab..616909e 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -585,6 +585,9 @@ DECLARE_IDTENTRY_MCE(X86_TRAP_MC, exc_machine_check); #else DECLARE_IDTENTRY_RAW(X86_TRAP_MC, exc_machine_check); #endif +#ifdef CONFIG_XEN_PV +DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_exc_machine_check); +#endif #endif /* NMI */ diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 9a5a50c..9db1d31 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -590,6 +590,20 @@ DEFINE_IDTENTRY_RAW(exc_xen_unknown_trap) BUG(); } +#ifdef CONFIG_X86_MCE +DEFINE_IDTENTRY_RAW(xenpv_exc_machine_check) +{ + /* +* There's no IST on Xen PV, but we still need to dispatch +* to the correct handler. +*/ + if (user_mode(regs)) + noist_exc_machine_check(regs); + else + exc_machine_check(regs); +} +#endif + struct trap_array_entry { void (*orig)(void); void (*xen)(void); @@ -610,7 +624,7 @@ static struct trap_array_entry trap_array[] = { TRAP_ENTRY_REDIR(exc_debug, true ), TRAP_ENTRY(exc_double_fault,true ), #ifdef CONFIG_X86_MCE - TRAP_ENTRY(exc_machine_check, true ), + TRAP_ENTRY_REDIR(exc_machine_check, true ), #endif TRAP_ENTRY_REDIR(exc_nmi, true ), TRAP_ENTRY(exc_int3,false ), diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index 53cf8aa..cd330ce 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -172,7 +172,7 @@ xen_pv_trap asm_exc_spurious_interrupt_bug xen_pv_trap asm_exc_coprocessor_error xen_pv_trap asm_exc_alignment_check #ifdef CONFIG_X86_MCE -xen_pv_trap asm_exc_machine_check +xen_pv_trap asm_xenpv_exc_machine_check #endif /* CONFIG_X86_MCE */ xen_pv_trap asm_exc_simd_coprocessor_error #ifdef CONFIG_IA32_EMULATION
[tip: x86/paravirt] x86/xen: Use specific Xen pv interrupt entry for DF
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: 5b4c6d65019bff65757f61adbbad5e45a333b800 Gitweb: https://git.kernel.org/tip/5b4c6d65019bff65757f61adbbad5e45a333b800 Author:Juergen Gross AuthorDate:Wed, 20 Jan 2021 14:55:43 +01:00 Committer: Borislav Petkov CommitterDate: Wed, 10 Feb 2021 12:13:40 +01:00 x86/xen: Use specific Xen pv interrupt entry for DF Xen PV guests don't use IST. For double fault interrupts, switch to the same model as NMI. Correct a typo in a comment while copying it. Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Reviewed-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/2021012013.32594-4-jgr...@suse.com --- arch/x86/include/asm/idtentry.h | 3 +++ arch/x86/xen/enlighten_pv.c | 10 -- arch/x86/xen/xen-asm.S | 2 +- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 616909e..41e2e2e 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -608,6 +608,9 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_DB, xenpv_exc_debug); /* #DF */ DECLARE_IDTENTRY_DF(X86_TRAP_DF, exc_double_fault); +#ifdef CONFIG_XEN_PV +DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,xenpv_exc_double_fault); +#endif /* #VC */ #ifdef CONFIG_AMD_MEM_ENCRYPT diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 9db1d31..1fec2ee 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -567,10 +567,16 @@ void noist_exc_debug(struct pt_regs *regs); DEFINE_IDTENTRY_RAW(xenpv_exc_nmi) { - /* On Xen PV, NMI doesn't use IST. The C part is the sane as native. */ + /* On Xen PV, NMI doesn't use IST. The C part is the same as native. */ exc_nmi(regs); } +DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault) +{ + /* On Xen PV, DF doesn't use IST. The C part is the same as native. */ + exc_double_fault(regs, error_code); +} + DEFINE_IDTENTRY_RAW(xenpv_exc_debug) { /* @@ -622,7 +628,7 @@ struct trap_array_entry { static struct trap_array_entry trap_array[] = { TRAP_ENTRY_REDIR(exc_debug, true ), - TRAP_ENTRY(exc_double_fault,true ), + TRAP_ENTRY_REDIR(exc_double_fault, true ), #ifdef CONFIG_X86_MCE TRAP_ENTRY_REDIR(exc_machine_check, true ), #endif diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index cd330ce..eac9dac 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -161,7 +161,7 @@ xen_pv_trap asm_exc_overflow xen_pv_trap asm_exc_bounds xen_pv_trap asm_exc_invalid_op xen_pv_trap asm_exc_device_not_available -xen_pv_trap asm_exc_double_fault +xen_pv_trap asm_xenpv_exc_double_fault xen_pv_trap asm_exc_coproc_segment_overrun xen_pv_trap asm_exc_invalid_tss xen_pv_trap asm_exc_segment_not_present
[tip: x86/paravirt] x86/pv: Rework arch_local_irq_restore() to not use popf
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: ab234a260b1f625b26cbefa93ca365b0ae66df33 Gitweb: https://git.kernel.org/tip/ab234a260b1f625b26cbefa93ca365b0ae66df33 Author:Juergen Gross AuthorDate:Wed, 20 Jan 2021 14:55:46 +01:00 Committer: Borislav Petkov CommitterDate: Wed, 10 Feb 2021 12:36:45 +01:00 x86/pv: Rework arch_local_irq_restore() to not use popf POPF is a rather expensive operation, so don't use it for restoring irq flags. Instead, test whether interrupts are enabled in the flags parameter and enable interrupts via STI in that case. This results in the restore_fl paravirt op to be no longer needed. Suggested-by: Andy Lutomirski Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/2021012013.32594-7-jgr...@suse.com --- arch/x86/include/asm/irqflags.h | 20 +-- arch/x86/include/asm/paravirt.h | 5 +- arch/x86/include/asm/paravirt_types.h | 7 ++- arch/x86/kernel/irqflags.S| 11 +-- arch/x86/kernel/paravirt.c| 1 +- arch/x86/kernel/paravirt_patch.c | 3 +--- arch/x86/xen/enlighten_pv.c | 2 +-- arch/x86/xen/irq.c| 23 +- arch/x86/xen/xen-asm.S| 28 +-- arch/x86/xen/xen-ops.h| 1 +- 10 files changed, 8 insertions(+), 93 deletions(-) diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index e585a47..144d70e 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -35,15 +35,6 @@ extern __always_inline unsigned long native_save_fl(void) return flags; } -extern inline void native_restore_fl(unsigned long flags); -extern inline void native_restore_fl(unsigned long flags) -{ - asm volatile("push %0 ; popf" -: /* no output */ -:"g" (flags) -:"memory", "cc"); -} - static __always_inline void native_irq_disable(void) { asm volatile("cli": : :"memory"); @@ -79,11 +70,6 @@ static __always_inline unsigned long arch_local_save_flags(void) return native_save_fl(); } -static __always_inline void arch_local_irq_restore(unsigned long flags) -{ - native_restore_fl(flags); -} - static __always_inline void arch_local_irq_disable(void) { native_irq_disable(); @@ -152,6 +138,12 @@ static __always_inline int arch_irqs_disabled(void) return arch_irqs_disabled_flags(flags); } + +static __always_inline void arch_local_irq_restore(unsigned long flags) +{ + if (!arch_irqs_disabled_flags(flags)) + arch_local_irq_enable(); +} #else #ifdef CONFIG_X86_64 #ifdef CONFIG_XEN_PV diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index dd43b11..4abf110 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -648,11 +648,6 @@ static inline notrace unsigned long arch_local_save_flags(void) return PVOP_CALLEE0(unsigned long, irq.save_fl); } -static inline notrace void arch_local_irq_restore(unsigned long f) -{ - PVOP_VCALLEE1(irq.restore_fl, f); -} - static inline notrace void arch_local_irq_disable(void) { PVOP_VCALLEE0(irq.irq_disable); diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 0169365..de87087 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -168,16 +168,13 @@ struct pv_cpu_ops { struct pv_irq_ops { #ifdef CONFIG_PARAVIRT_XXL /* -* Get/set interrupt state. save_fl and restore_fl are only -* expected to use X86_EFLAGS_IF; all other bits -* returned from save_fl are undefined, and may be ignored by -* restore_fl. +* Get/set interrupt state. save_fl is expected to use X86_EFLAGS_IF; +* all other bits returned from save_fl are undefined. * * NOTE: These functions callers expect the callee to preserve * more registers than the standard C calling convention. */ struct paravirt_callee_save save_fl; - struct paravirt_callee_save restore_fl; struct paravirt_callee_save irq_disable; struct paravirt_callee_save irq_enable; diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S index 0db0375..8ef3506 100644 --- a/arch/x86/kernel/irqflags.S +++ b/arch/x86/kernel/irqflags.S @@ -13,14 +13,3 @@ SYM_FUNC_START(native_save_fl) ret SYM_FUNC_END(native_save_fl) EXPORT_SYMBOL(native_save_fl) - -/* - * void native_restore_fl(unsigned long flags) - * %eax/%rdi: flags - */ -SYM_FUNC_START(native_restore_fl) - push %_ASM_ARG1 - popf - ret -SYM_FUNC_END(native_restore_fl) -EXPORT_SYMBOL(native_restore_fl) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 18560b7..c60222a 100644 -
[tip: x86/urgent] x86/alternative: Don't call text_poke() in lazy TLB mode
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: abee7c494d8c41bb388839bccc47e06247f0d7de Gitweb: https://git.kernel.org/tip/abee7c494d8c41bb388839bccc47e06247f0d7de Author:Juergen Gross AuthorDate:Fri, 09 Oct 2020 16:42:25 +02:00 Committer: Peter Zijlstra CommitterDate: Thu, 22 Oct 2020 12:37:23 +02:00 x86/alternative: Don't call text_poke() in lazy TLB mode When running in lazy TLB mode the currently active page tables might be the ones of a previous process, e.g. when running a kernel thread. This can be problematic in case kernel code is being modified via text_poke() in a kernel thread, and on another processor exit_mmap() is active for the process which was running on the first cpu before the kernel thread. As text_poke() is using a temporary address space and the former address space (obtained via cpu_tlbstate.loaded_mm) is restored afterwards, there is a race possible in case the cpu on which exit_mmap() is running wants to make sure there are no stale references to that address space on any cpu active (this e.g. is required when running as a Xen PV guest, where this problem has been observed and analyzed). In order to avoid that, drop off TLB lazy mode before switching to the temporary address space. Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs") Signed-off-by: Juergen Gross Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201009144225.12019-1-jgr...@suse.com --- arch/x86/kernel/alternative.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index cdaab30..cd6be6f 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm) temp_mm_state_t temp_state; lockdep_assert_irqs_disabled(); + + /* +* Make sure not to be in TLB lazy mode, as otherwise we'll end up +* with a stale address space WITHOUT being in lazy mode after +* restoring the previous mm. +*/ + if (this_cpu_read(cpu_tlbstate.is_lazy)) + leave_mm(smp_processor_id()); + temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm); switch_mm_irqs_off(NULL, mm, current);
[tip: x86/paravirt] x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: ecac71816a1829c0e54c41c5f1845f75b55dc618 Gitweb: https://git.kernel.org/tip/ecac71816a1829c0e54c41c5f1845f75b55dc618 Author:Juergen Gross AuthorDate:Sat, 15 Aug 2020 12:06:38 +02:00 Committer: Ingo Molnar CommitterDate: Sat, 15 Aug 2020 13:52:11 +02:00 x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT There are some code parts using CONFIG_PARAVIRT for Xen pvops related issues instead of the more stringent CONFIG_PARAVIRT_XXL. Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200815100641.26362-4-jgr...@suse.com --- arch/x86/entry/entry_64.S| 4 ++-- arch/x86/include/asm/fixmap.h| 2 +- arch/x86/include/asm/required-features.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 70dea93..26fc9b4 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -46,13 +46,13 @@ .code64 .section .entry.text, "ax" -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_XXL SYM_CODE_START(native_usergs_sysret64) UNWIND_HINT_EMPTY swapgs sysretq SYM_CODE_END(native_usergs_sysret64) -#endif /* CONFIG_PARAVIRT */ +#endif /* CONFIG_PARAVIRT_XXL */ /* * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers. diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 0f0dd64..77217bd 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -99,7 +99,7 @@ enum fixed_addresses { FIX_PCIE_MCFG, #endif #endif -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_XXL FIX_PARAVIRT_BOOTMAP, #endif #ifdef CONFIG_X86_INTEL_MID diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h index 6847d85..3ff0d48 100644 --- a/arch/x86/include/asm/required-features.h +++ b/arch/x86/include/asm/required-features.h @@ -54,7 +54,7 @@ #endif #ifdef CONFIG_X86_64 -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_XXL /* Paravirtualized systems may not have PSE or PGE available */ #define NEED_PSE 0 #define NEED_PGE 0
[tip: x86/paravirt] x86/paravirt: Remove set_pte_at() pv-op
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: e1ac3e66d301e57472f31ebee81b916e9fa8b35b Gitweb: https://git.kernel.org/tip/e1ac3e66d301e57472f31ebee81b916e9fa8b35b Author:Juergen Gross AuthorDate:Sat, 15 Aug 2020 12:06:40 +02:00 Committer: Ingo Molnar CommitterDate: Sat, 15 Aug 2020 13:52:12 +02:00 x86/paravirt: Remove set_pte_at() pv-op On x86 set_pte_at() is now always falling back to set_pte(). So instead of having this fallback after the paravirt maze just drop the set_pte_at paravirt operation and let set_pte_at() use the set_pte() function directly. Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200815100641.26362-6-jgr...@suse.com --- arch/x86/include/asm/paravirt.h | 8 +--- arch/x86/include/asm/paravirt_types.h | 2 -- arch/x86/include/asm/pgtable.h| 7 +++ arch/x86/kernel/paravirt.c| 1 - arch/x86/xen/mmu_pv.c | 8 include/trace/events/xen.h| 20 6 files changed, 4 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index e02c409..f0464b8 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -412,12 +412,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte) PVOP_VCALL2(mmu.set_pte, ptep, pte.pte); } -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte) -{ - PVOP_VCALL4(mmu.set_pte_at, mm, addr, ptep, pte.pte); -} - static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) { PVOP_VCALL2(mmu.set_pmd, pmdp, native_pmd_val(pmd)); @@ -510,7 +504,7 @@ static inline void set_pte_atomic(pte_t *ptep, pte_t pte) static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - set_pte_at(mm, addr, ptep, __pte(0)); + set_pte(ptep, __pte(0)); } static inline void pmd_clear(pmd_t *pmdp) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index f27c3fe..0fad9f6 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -242,8 +242,6 @@ struct pv_mmu_ops { /* Pagetable manipulation functions */ void (*set_pte)(pte_t *ptep, pte_t pteval); - void (*set_pte_at)(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pteval); void (*set_pmd)(pmd_t *pmdp, pmd_t pmdval); pte_t (*ptep_modify_prot_start)(struct vm_area_struct *vma, unsigned long addr, diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b836138..5e0dcc2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -63,7 +63,6 @@ extern pmdval_t early_pmd_flags; #include #else /* !CONFIG_PARAVIRT_XXL */ #define set_pte(ptep, pte) native_set_pte(ptep, pte) -#define set_pte_at(mm, addr, ptep, pte)native_set_pte_at(mm, addr, ptep, pte) #define set_pte_atomic(ptep, pte) \ native_set_pte_atomic(ptep, pte) @@ -1033,10 +1032,10 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp) return res; } -static inline void native_set_pte_at(struct mm_struct *mm, unsigned long addr, -pte_t *ptep , pte_t pte) +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) { - native_set_pte(ptep, pte); + set_pte(ptep, pte); } static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index e56a144..6c3407b 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -360,7 +360,6 @@ struct paravirt_patch_template pv_ops = { .mmu.release_p4d= paravirt_nop, .mmu.set_pte= native_set_pte, - .mmu.set_pte_at = native_set_pte_at, .mmu.set_pmd= native_set_pmd, .mmu.ptep_modify_prot_start = __ptep_modify_prot_start, diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 3273c98..eda7814 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -285,13 +285,6 @@ static void xen_set_pte(pte_t *ptep, pte_t pteval) __xen_set_pte(ptep, pteval); } -static void xen_set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pteval) -{ - trace_xen_mmu_set_pte_at(mm, addr, ptep, pteval); - __xen_set_pte(ptep, pteval); -} - pte_t xen_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { @@ -2105,7 +2098,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { .release_pmd = xen_relea
[tip: x86/paravirt] x86/entry/32: Simplify CONFIG_XEN_PV build dependency
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: 76fdb041c1f02311e6e05211c895e932af08b041 Gitweb: https://git.kernel.org/tip/76fdb041c1f02311e6e05211c895e932af08b041 Author:Juergen Gross AuthorDate:Sat, 15 Aug 2020 12:06:39 +02:00 Committer: Ingo Molnar CommitterDate: Sat, 15 Aug 2020 13:52:12 +02:00 x86/entry/32: Simplify CONFIG_XEN_PV build dependency With 32-bit Xen PV support gone, the following commit is not needed anymore: a4c0e91d1d65bc58 ("x86/entry/32: Fix XEN_PV build dependency") Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200815100641.26362-5-jgr...@suse.com --- arch/x86/include/asm/idtentry.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index a433661..337dcfd 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -547,7 +547,7 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, exc_machine_check); /* NMI */ DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi); -#if defined(CONFIG_XEN_PV) && defined(CONFIG_X86_64) +#ifdef CONFIG_XEN_PV DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, xenpv_exc_nmi); #endif @@ -557,7 +557,7 @@ DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB, exc_debug); #else DECLARE_IDTENTRY_RAW(X86_TRAP_DB, exc_debug); #endif -#if defined(CONFIG_XEN_PV) && defined(CONFIG_X86_64) +#ifdef CONFIG_XEN_PV DECLARE_IDTENTRY_RAW(X86_TRAP_DB, xenpv_exc_debug); #endif
[tip: x86/paravirt] x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXL
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: 0cabf9914990dc59a7e1793ef2fb294d578dc210 Gitweb: https://git.kernel.org/tip/0cabf9914990dc59a7e1793ef2fb294d578dc210 Author:Juergen Gross AuthorDate:Sat, 15 Aug 2020 12:06:36 +02:00 Committer: Ingo Molnar CommitterDate: Sat, 15 Aug 2020 13:52:11 +02:00 x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXL The last 32-bit user of stuff under CONFIG_PARAVIRT_XXL is gone. Remove 32-bit specific parts. Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200815100641.26362-2-jgr...@suse.com --- arch/x86/entry/vdso/vdso32/vclock_gettime.c | 1 +- arch/x86/include/asm/paravirt.h | 120 +-- arch/x86/include/asm/paravirt_types.h | 21 +--- arch/x86/include/asm/pgtable-3level_types.h | 5 +- arch/x86/include/asm/segment.h | 4 +- arch/x86/kernel/cpu/common.c| 8 +- arch/x86/kernel/kprobes/core.c | 1 +- arch/x86/kernel/kprobes/opt.c | 1 +- arch/x86/kernel/paravirt.c | 18 +--- arch/x86/kernel/paravirt_patch.c| 17 +--- arch/x86/xen/enlighten_pv.c | 6 +- 11 files changed, 13 insertions(+), 189 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c index 84a4a73..283ed9d 100644 --- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c +++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c @@ -14,6 +14,7 @@ #undef CONFIG_ILLEGAL_POINTER_VALUE #undef CONFIG_SPARSEMEM_VMEMMAP #undef CONFIG_NR_CPUS +#undef CONFIG_PARAVIRT_XXL #define CONFIG_X86_32 1 #define CONFIG_PGTABLE_LEVELS 2 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 3d2afec..25c7a73 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -160,8 +160,6 @@ static inline void wbinvd(void) PVOP_VCALL0(cpu.wbinvd); } -#define get_kernel_rpl() (pv_info.kernel_rpl) - static inline u64 paravirt_read_msr(unsigned msr) { return PVOP_CALL1(u64, cpu.read_msr, msr); @@ -277,12 +275,10 @@ static inline void load_TLS(struct thread_struct *t, unsigned cpu) PVOP_VCALL2(cpu.load_tls, t, cpu); } -#ifdef CONFIG_X86_64 static inline void load_gs_index(unsigned int gs) { PVOP_VCALL1(cpu.load_gs_index, gs); } -#endif static inline void write_ldt_entry(struct desc_struct *dt, int entry, const void *desc) @@ -375,52 +371,22 @@ static inline void paravirt_release_p4d(unsigned long pfn) static inline pte_t __pte(pteval_t val) { - pteval_t ret; - - if (sizeof(pteval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pteval_t, mmu.make_pte, val, (u64)val >> 32); - else - ret = PVOP_CALLEE1(pteval_t, mmu.make_pte, val); - - return (pte_t) { .pte = ret }; + return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) }; } static inline pteval_t pte_val(pte_t pte) { - pteval_t ret; - - if (sizeof(pteval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pteval_t, mmu.pte_val, - pte.pte, (u64)pte.pte >> 32); - else - ret = PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte); - - return ret; + return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte); } static inline pgd_t __pgd(pgdval_t val) { - pgdval_t ret; - - if (sizeof(pgdval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pgdval_t, mmu.make_pgd, val, (u64)val >> 32); - else - ret = PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val); - - return (pgd_t) { ret }; + return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) }; } static inline pgdval_t pgd_val(pgd_t pgd) { - pgdval_t ret; - - if (sizeof(pgdval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pgdval_t, mmu.pgd_val, - pgd.pgd, (u64)pgd.pgd >> 32); - else - ret = PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd); - - return ret; + return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd); } #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION @@ -438,78 +404,40 @@ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned pte_t *ptep, pte_t old_pte, pte_t pte) { - if (sizeof(pteval_t) > sizeof(long)) - /* 5 arg words */ - pv_ops.mmu.ptep_modify_prot_commit(vma, addr, ptep, pte); - else - PVOP_VCALL4(mmu.ptep_modify_prot_commit, - vma, addr, ptep, pte.pte); + PVOP_VCALL4(mmu.ptep_modify_prot_commit, vma, addr, ptep, pte.pte); } static inline void set_pte(pte_t *ptep, pte_t pte) { - if (sizeof(pteval_t) > sizeof(long)) - PVOP_VCALL3(mmu.set_
[tip: x86/paravirt] x86/paravirt: Avoid needless paravirt step clearing page table entries
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: 7c9f80cb76ec9f14c3b25509168b1a2f7942e418 Gitweb: https://git.kernel.org/tip/7c9f80cb76ec9f14c3b25509168b1a2f7942e418 Author:Juergen Gross AuthorDate:Sat, 15 Aug 2020 12:06:41 +02:00 Committer: Ingo Molnar CommitterDate: Sat, 15 Aug 2020 13:52:12 +02:00 x86/paravirt: Avoid needless paravirt step clearing page table entries pte_clear() et al are based on two paravirt steps today: one step to create a page table entry with all zeroes, and one step to write this entry value. Drop the first step as it is completely useless. Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200815100641.26362-7-jgr...@suse.com --- arch/x86/include/asm/paravirt.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index f0464b8..d25cc68 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -448,7 +448,7 @@ static inline pudval_t pud_val(pud_t pud) static inline void pud_clear(pud_t *pudp) { - set_pud(pudp, __pud(0)); + set_pud(pudp, native_make_pud(0)); } static inline void set_p4d(p4d_t *p4dp, p4d_t p4d) @@ -485,15 +485,15 @@ static inline void __set_pgd(pgd_t *pgdp, pgd_t pgd) } while (0) #define pgd_clear(pgdp) do { \ - if (pgtable_l5_enabled()) \ - set_pgd(pgdp, __pgd(0));\ + if (pgtable_l5_enabled()) \ + set_pgd(pgdp, native_make_pgd(0)); \ } while (0) #endif /* CONFIG_PGTABLE_LEVELS == 5 */ static inline void p4d_clear(p4d_t *p4dp) { - set_p4d(p4dp, __p4d(0)); + set_p4d(p4dp, native_make_p4d(0)); } static inline void set_pte_atomic(pte_t *ptep, pte_t pte) @@ -504,12 +504,12 @@ static inline void set_pte_atomic(pte_t *ptep, pte_t pte) static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - set_pte(ptep, __pte(0)); + set_pte(ptep, native_make_pte(0)); } static inline void pmd_clear(pmd_t *pmdp) { - set_pmd(pmdp, __pmd(0)); + set_pmd(pmdp, native_make_pmd(0)); } #define __HAVE_ARCH_START_CONTEXT_SWITCH
[tip: x86/paravirt] x86/paravirt: Clean up paravirt macros
The following commit has been merged into the x86/paravirt branch of tip: Commit-ID: 94b827becc6a87c905ab30b398e12a266518acbb Gitweb: https://git.kernel.org/tip/94b827becc6a87c905ab30b398e12a266518acbb Author:Juergen Gross AuthorDate:Sat, 15 Aug 2020 12:06:37 +02:00 Committer: Ingo Molnar CommitterDate: Sat, 15 Aug 2020 13:52:11 +02:00 x86/paravirt: Clean up paravirt macros Some paravirt macros are no longer used, delete them. Signed-off-by: Juergen Gross Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200815100641.26362-3-jgr...@suse.com --- arch/x86/include/asm/paravirt.h | 15 --- 1 file changed, 15 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 25c7a73..e02c409 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -586,16 +586,9 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu); #endif /* SMP && PARAVIRT_SPINLOCKS */ #ifdef CONFIG_X86_32 -#define PV_SAVE_REGS "pushl %ecx; pushl %edx;" -#define PV_RESTORE_REGS "popl %edx; popl %ecx;" - /* save and restore all caller-save registers, except return value */ #define PV_SAVE_ALL_CALLER_REGS"pushl %ecx;" #define PV_RESTORE_ALL_CALLER_REGS "popl %ecx;" - -#define PV_FLAGS_ARG "0" -#define PV_EXTRA_CLOBBERS -#define PV_VEXTRA_CLOBBERS #else /* save and restore all caller-save registers, except return value */ #define PV_SAVE_ALL_CALLER_REGS \ @@ -616,14 +609,6 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu); "pop %rsi;" \ "pop %rdx;" \ "pop %rcx;" - -/* We save some registers, but all of them, that's too much. We clobber all - * caller saved registers but the argument parameter */ -#define PV_SAVE_REGS "pushq %%rdi;" -#define PV_RESTORE_REGS "popq %%rdi;" -#define PV_EXTRA_CLOBBERS EXTRA_CLOBBERS, "rcx" , "rdx", "rsi" -#define PV_VEXTRA_CLOBBERS EXTRA_CLOBBERS, "rdi", "rcx" , "rdx", "rsi" -#define PV_FLAGS_ARG "D" #endif /*