[tip: perf/core] x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit
The following commit has been merged into the perf/core branch of tip: Commit-ID: a161545ab53b174c016b0eb63c289525d2f6 Gitweb: https://git.kernel.org/tip/a161545ab53b174c016b0eb63c289525d2f6 Author:Ricardo Neri AuthorDate:Mon, 12 Apr 2021 07:30:41 -07:00 Committer: Peter Zijlstra CommitterDate: Mon, 19 Apr 2021 20:03:23 +02:00 x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit Add feature enumeration to identify a processor with Intel Hybrid Technology: one in which CPUs of more than one type are the same package. On a hybrid processor, all CPUs support the same homogeneous (i.e., symmetric) instruction set. All CPUs enumerate the same features in CPUID. Thus, software (user space and kernel) can run and migrate to any CPU in the system as well as utilize any of the enumerated features without any change or special provisions. The main difference among CPUs in a hybrid processor are power and performance properties. Signed-off-by: Ricardo Neri Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Tony Luck Reviewed-by: Len Brown Acked-by: Borislav Petkov Link: https://lkml.kernel.org/r/1618237865-33448-2-git-send-email-kan.li...@linux.intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cc96e26..1ba4a6e 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -374,6 +374,7 @@ #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_TSX_FORCE_ABORT(18*32+13) /* "" TSX_FORCE_ABORT */ #define X86_FEATURE_SERIALIZE (18*32+14) /* SERIALIZE instruction */ +#define X86_FEATURE_HYBRID_CPU (18*32+15) /* "" This part has CPUs of more than one type */ #define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */ #define X86_FEATURE_PCONFIG(18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */
[tip: perf/core] x86/cpu: Add helper function to get the type of the current hybrid CPU
The following commit has been merged into the perf/core branch of tip: Commit-ID: 250b3c0d79d1f4a55e54d8a9ef48058660483fef Gitweb: https://git.kernel.org/tip/250b3c0d79d1f4a55e54d8a9ef48058660483fef Author:Ricardo Neri AuthorDate:Mon, 12 Apr 2021 07:30:42 -07:00 Committer: Peter Zijlstra CommitterDate: Mon, 19 Apr 2021 20:03:23 +02:00 x86/cpu: Add helper function to get the type of the current hybrid CPU On processors with Intel Hybrid Technology (i.e., one having more than one type of CPU in the same package), all CPUs support the same instruction set and enumerate the same features on CPUID. Thus, all software can run on any CPU without restrictions. However, there may be model-specific differences among types of CPUs. For instance, each type of CPU may support a different number of performance counters. Also, machine check error banks may be wired differently. Even though most software will not care about these differences, kernel subsystems dealing with these differences must know. Add and expose a new helper function get_this_hybrid_cpu_type() to query the type of the current hybrid CPU. The function will be used later in the perf subsystem. The Intel Software Developer's Manual defines the CPU type as 8-bit identifier. Signed-off-by: Ricardo Neri Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Tony Luck Reviewed-by: Len Brown Acked-by: Borislav Petkov Link: https://lkml.kernel.org/r/1618237865-33448-3-git-send-email-kan.li...@linux.intel.com --- arch/x86/include/asm/cpu.h | 6 ++ arch/x86/kernel/cpu/intel.c | 16 2 files changed, 22 insertions(+) diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index da78ccb..610905d 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -45,6 +45,7 @@ extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c); extern void switch_to_sld(unsigned long tifn); extern bool handle_user_split_lock(struct pt_regs *regs, long error_code); extern bool handle_guest_split_lock(unsigned long ip); +u8 get_this_hybrid_cpu_type(void); #else static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {} static inline void switch_to_sld(unsigned long tifn) {} @@ -57,6 +58,11 @@ static inline bool handle_guest_split_lock(unsigned long ip) { return false; } + +static inline u8 get_this_hybrid_cpu_type(void) +{ + return 0; +} #endif #ifdef CONFIG_IA32_FEAT_CTL void init_ia32_feat_ctl(struct cpuinfo_x86 *c); diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 0e422a5..26fb626 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1195,3 +1195,19 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) cpu_model_supports_sld = true; split_lock_setup(); } + +#define X86_HYBRID_CPU_TYPE_ID_SHIFT 24 + +/** + * get_this_hybrid_cpu_type() - Get the type of this hybrid CPU + * + * Returns the CPU type [31:24] (i.e., Atom or Core) of a CPU in + * a hybrid processor. If the processor is not hybrid, returns 0. + */ +u8 get_this_hybrid_cpu_type(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) + return 0; + + return cpuid_eax(0x001a) >> X86_HYBRID_CPU_TYPE_ID_SHIFT; +}
[tip: x86/cpu] x86/cpu: Use SERIALIZE in sync_core() when available
The following commit has been merged into the x86/cpu branch of tip: Commit-ID: bf9c912f9a649776c2d741310486a6984edaac72 Gitweb: https://git.kernel.org/tip/bf9c912f9a649776c2d741310486a6984edaac72 Author:Ricardo Neri AuthorDate:Thu, 06 Aug 2020 20:28:33 -07:00 Committer: Borislav Petkov CommitterDate: Mon, 17 Aug 2020 17:23:04 +02:00 x86/cpu: Use SERIALIZE in sync_core() when available The SERIALIZE instruction gives software a way to force the processor to complete all modifications to flags, registers and memory from previous instructions and drain all buffered writes to memory before the next instruction is fetched and executed. Thus, it serves the purpose of sync_core(). Use it when available. Suggested-by: Andy Lutomirski Signed-off-by: Ricardo Neri Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Link: https://lkml.kernel.org/r/20200807032833.17484-1-ricardo.neri-calde...@linux.intel.com --- arch/x86/include/asm/special_insns.h | 6 ++ arch/x86/include/asm/sync_core.h | 26 ++ 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 59a3e13..5999b0b 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -234,6 +234,12 @@ static inline void clwb(volatile void *__p) #define nop() asm volatile ("nop") +static inline void serialize(void) +{ + /* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */ + asm volatile(".byte 0xf, 0x1, 0xe8" ::: "memory"); +} + #endif /* __KERNEL__ */ #endif /* _ASM_X86_SPECIAL_INSNS_H */ diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index fdb5b35..4631c0f 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -5,6 +5,7 @@ #include #include #include +#include #ifdef CONFIG_X86_32 static inline void iret_to_self(void) @@ -54,14 +55,23 @@ static inline void iret_to_self(void) static inline void sync_core(void) { /* -* There are quite a few ways to do this. IRET-to-self is nice -* because it works on every CPU, at any CPL (so it's compatible -* with paravirtualization), and it never exits to a hypervisor. -* The only down sides are that it's a bit slow (it seems to be -* a bit more than 2x slower than the fastest options) and that -* it unmasks NMIs. The "push %cs" is needed because, in -* paravirtual environments, __KERNEL_CS may not be a valid CS -* value when we do IRET directly. +* The SERIALIZE instruction is the most straightforward way to +* do this but it not universally available. +*/ + if (static_cpu_has(X86_FEATURE_SERIALIZE)) { + serialize(); + return; + } + + /* +* For all other processors, there are quite a few ways to do this. +* IRET-to-self is nice because it works on every CPU, at any CPL +* (so it's compatible with paravirtualization), and it never exits +* to a hypervisor. The only down sides are that it's a bit slow +* (it seems to be a bit more than 2x slower than the fastest +* options) and that it unmasks NMIs. The "push %cs" is needed +* because, in paravirtual environments, __KERNEL_CS may not be a +* valid CS value when we do IRET directly. * * In case NMI unmasking or performance ever becomes a problem, * the next best option appears to be MOV-to-CR2 and an
[tip: x86/cpu] x86/cpufeatures: Add enumeration for SERIALIZE instruction
The following commit has been merged into the x86/cpu branch of tip: Commit-ID: 85b23fbc7d88f8c6e3951721802d7845bc39663d Gitweb: https://git.kernel.org/tip/85b23fbc7d88f8c6e3951721802d7845bc39663d Author:Ricardo Neri AuthorDate:Sun, 26 Jul 2020 21:31:29 -07:00 Committer: Ingo Molnar CommitterDate: Mon, 27 Jul 2020 12:42:06 +02:00 x86/cpufeatures: Add enumeration for SERIALIZE instruction The Intel architecture defines a set of Serializing Instructions (a detailed definition can be found in Vol.3 Section 8.3 of the Intel "main" manual, SDM). However, these instructions do more than what is required, have side effects and/or may be rather invasive. Furthermore, some of these instructions are only available in kernel mode or may cause VMExits. Thus, software using these instructions only to serialize execution (as defined in the manual) must handle the undesired side effects. As indicated in the name, SERIALIZE is a new Intel architecture Serializing Instruction. Crucially, it does not have any of the mentioned side effects. Also, it does not cause VMExit and can be used in user mode. This new instruction is currently documented in the latest "extensions" manual (ISE). It will appear in the "main" manual in the future. Signed-off-by: Ricardo Neri Signed-off-by: Ingo Molnar Reviewed-by: Tony Luck Acked-by: Dave Hansen Link: https://lore.kernel.org/r/20200727043132.15082-2-ricardo.neri-calde...@linux.intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 02dabc9..adf45cf 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -365,6 +365,7 @@ #define X86_FEATURE_SRBDS_CTRL (18*32+ 9) /* "" SRBDS mitigation MSR available */ #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_TSX_FORCE_ABORT(18*32+13) /* "" TSX_FORCE_ABORT */ +#define X86_FEATURE_SERIALIZE (18*32+14) /* SERIALIZE instruction */ #define X86_FEATURE_PCONFIG(18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP(18*32+27) /* "" Single Thread Indirect Branch Predictors */
[tip: x86/cpu] x86/cpu: Relocate sync_core() to sync_core.h
The following commit has been merged into the x86/cpu branch of tip: Commit-ID: 9998a9832c4027e907353e5e05fde730cf624b77 Gitweb: https://git.kernel.org/tip/9998a9832c4027e907353e5e05fde730cf624b77 Author:Ricardo Neri AuthorDate:Sun, 26 Jul 2020 21:31:30 -07:00 Committer: Ingo Molnar CommitterDate: Mon, 27 Jul 2020 12:42:06 +02:00 x86/cpu: Relocate sync_core() to sync_core.h Having sync_core() in processor.h is problematic since it is not possible to check for hardware capabilities via the *cpu_has() family of macros. The latter needs the definitions in processor.h. It also looks more intuitive to relocate the function to sync_core.h. This changeset does not make changes in functionality. Signed-off-by: Ricardo Neri Signed-off-by: Ingo Molnar Reviewed-by: Tony Luck Link: https://lore.kernel.org/r/20200727043132.15082-3-ricardo.neri-calde...@linux.intel.com --- arch/x86/include/asm/processor.h| 64 + arch/x86/include/asm/sync_core.h| 64 - arch/x86/kernel/alternative.c | 1 +- arch/x86/kernel/cpu/mce/core.c | 1 +- drivers/misc/sgi-gru/grufault.c | 1 +- drivers/misc/sgi-gru/gruhandles.c | 1 +- drivers/misc/sgi-gru/grukservices.c | 1 +- 7 files changed, 69 insertions(+), 64 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 03b7c4c..68ba42f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -678,70 +678,6 @@ static inline unsigned int cpuid_edx(unsigned int op) return edx; } -/* - * This function forces the icache and prefetched instruction stream to - * catch up with reality in two very specific cases: - * - * a) Text was modified using one virtual address and is about to be executed - * from the same physical page at a different virtual address. - * - * b) Text was modified on a different CPU, may subsequently be - * executed on this CPU, and you want to make sure the new version - * gets executed. This generally means you're calling this in a IPI. - * - * If you're calling this for a different reason, you're probably doing - * it wrong. - */ -static inline void sync_core(void) -{ - /* -* There are quite a few ways to do this. IRET-to-self is nice -* because it works on every CPU, at any CPL (so it's compatible -* with paravirtualization), and it never exits to a hypervisor. -* The only down sides are that it's a bit slow (it seems to be -* a bit more than 2x slower than the fastest options) and that -* it unmasks NMIs. The "push %cs" is needed because, in -* paravirtual environments, __KERNEL_CS may not be a valid CS -* value when we do IRET directly. -* -* In case NMI unmasking or performance ever becomes a problem, -* the next best option appears to be MOV-to-CR2 and an -* unconditional jump. That sequence also works on all CPUs, -* but it will fault at CPL3 (i.e. Xen PV). -* -* CPUID is the conventional way, but it's nasty: it doesn't -* exist on some 486-like CPUs, and it usually exits to a -* hypervisor. -* -* Like all of Linux's memory ordering operations, this is a -* compiler barrier as well. -*/ -#ifdef CONFIG_X86_32 - asm volatile ( - "pushfl\n\t" - "pushl %%cs\n\t" - "pushl $1f\n\t" - "iret\n\t" - "1:" - : ASM_CALL_CONSTRAINT : : "memory"); -#else - unsigned int tmp; - - asm volatile ( - "mov %%ss, %0\n\t" - "pushq %q0\n\t" - "pushq %%rsp\n\t" - "addq $8, (%%rsp)\n\t" - "pushfq\n\t" - "mov %%cs, %0\n\t" - "pushq %q0\n\t" - "pushq $1f\n\t" - "iretq\n\t" - "1:" - : "=" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory"); -#endif -} - extern void select_idle_routine(const struct cpuinfo_x86 *c); extern void amd_e400_c1e_apic_setup(void); diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index c67caaf..9c5573f 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -7,6 +7,70 @@ #include /* + * This function forces the icache and prefetched instruction stream to + * catch up with reality in two very specific cases: + * + * a) Text was modified using one virtual address and is about to be executed + * from the same physical page at a different virtual address. + * + * b) Text was modified on a different CPU, may subsequently be + * executed on this CPU, and you want to make sure the new version + * gets executed. This generally means you're calling this in a IPI. + * + * If you're calling this for a different reason, you're probably doing + * it wrong. +
[tip: x86/cpu] x86/cpu: Refactor sync_core() for readability
The following commit has been merged into the x86/cpu branch of tip: Commit-ID: f69ca629d89d65737537e05308ac531f7bb07d5c Gitweb: https://git.kernel.org/tip/f69ca629d89d65737537e05308ac531f7bb07d5c Author:Ricardo Neri AuthorDate:Sun, 26 Jul 2020 21:31:31 -07:00 Committer: Ingo Molnar CommitterDate: Mon, 27 Jul 2020 12:42:06 +02:00 x86/cpu: Refactor sync_core() for readability Instead of having #ifdef/#endif blocks inside sync_core() for X86_64 and X86_32, implement the new function iret_to_self() with two versions. In this manner, avoid having to use even more more #ifdef/#endif blocks when adding support for SERIALIZE in sync_core(). Co-developed-by: Tony Luck Signed-off-by: Tony Luck Signed-off-by: Ricardo Neri Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200727043132.15082-4-ricardo.neri-calde...@linux.intel.com --- arch/x86/include/asm/special_insns.h | 1 +- arch/x86/include/asm/sync_core.h | 56 +++ 2 files changed, 32 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index eb8e781..59a3e13 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -234,7 +234,6 @@ static inline void clwb(volatile void *__p) #define nop() asm volatile ("nop") - #endif /* __KERNEL__ */ #endif /* _ASM_X86_SPECIAL_INSNS_H */ diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index 9c5573f..fdb5b35 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -6,6 +6,37 @@ #include #include +#ifdef CONFIG_X86_32 +static inline void iret_to_self(void) +{ + asm volatile ( + "pushfl\n\t" + "pushl %%cs\n\t" + "pushl $1f\n\t" + "iret\n\t" + "1:" + : ASM_CALL_CONSTRAINT : : "memory"); +} +#else +static inline void iret_to_self(void) +{ + unsigned int tmp; + + asm volatile ( + "mov %%ss, %0\n\t" + "pushq %q0\n\t" + "pushq %%rsp\n\t" + "addq $8, (%%rsp)\n\t" + "pushfq\n\t" + "mov %%cs, %0\n\t" + "pushq %q0\n\t" + "pushq $1f\n\t" + "iretq\n\t" + "1:" + : "=" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory"); +} +#endif /* CONFIG_X86_32 */ + /* * This function forces the icache and prefetched instruction stream to * catch up with reality in two very specific cases: @@ -44,30 +75,7 @@ static inline void sync_core(void) * Like all of Linux's memory ordering operations, this is a * compiler barrier as well. */ -#ifdef CONFIG_X86_32 - asm volatile ( - "pushfl\n\t" - "pushl %%cs\n\t" - "pushl $1f\n\t" - "iret\n\t" - "1:" - : ASM_CALL_CONSTRAINT : : "memory"); -#else - unsigned int tmp; - - asm volatile ( - "mov %%ss, %0\n\t" - "pushq %q0\n\t" - "pushq %%rsp\n\t" - "addq $8, (%%rsp)\n\t" - "pushfq\n\t" - "mov %%cs, %0\n\t" - "pushq %q0\n\t" - "pushq $1f\n\t" - "iretq\n\t" - "1:" - : "=" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory"); -#endif + iret_to_self(); } /*