[tip: perf/core] x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit

2021-04-20 Thread tip-bot2 for Ricardo Neri
The following commit has been merged into the perf/core branch of tip:

Commit-ID: a161545ab53b174c016b0eb63c289525d2f6
Gitweb:
https://git.kernel.org/tip/a161545ab53b174c016b0eb63c289525d2f6
Author:Ricardo Neri 
AuthorDate:Mon, 12 Apr 2021 07:30:41 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:23 +02:00

x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit

Add feature enumeration to identify a processor with Intel Hybrid
Technology: one in which CPUs of more than one type are the same package.
On a hybrid processor, all CPUs support the same homogeneous (i.e.,
symmetric) instruction set. All CPUs enumerate the same features in CPUID.
Thus, software (user space and kernel) can run and migrate to any CPU in
the system as well as utilize any of the enumerated features without any
change or special provisions. The main difference among CPUs in a hybrid
processor are power and performance properties.

Signed-off-by: Ricardo Neri 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Tony Luck 
Reviewed-by: Len Brown 
Acked-by: Borislav Petkov 
Link: 
https://lkml.kernel.org/r/1618237865-33448-2-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index cc96e26..1ba4a6e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -374,6 +374,7 @@
 #define X86_FEATURE_MD_CLEAR   (18*32+10) /* VERW clears CPU buffers */
 #define X86_FEATURE_TSX_FORCE_ABORT(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_SERIALIZE  (18*32+14) /* SERIALIZE instruction */
+#define X86_FEATURE_HYBRID_CPU (18*32+15) /* "" This part has CPUs of 
more than one type */
 #define X86_FEATURE_TSXLDTRK   (18*32+16) /* TSX Suspend Load Address 
Tracking */
 #define X86_FEATURE_PCONFIG(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR   (18*32+19) /* Intel ARCH LBR */


[tip: perf/core] x86/cpu: Add helper function to get the type of the current hybrid CPU

2021-04-20 Thread tip-bot2 for Ricardo Neri
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 250b3c0d79d1f4a55e54d8a9ef48058660483fef
Gitweb:
https://git.kernel.org/tip/250b3c0d79d1f4a55e54d8a9ef48058660483fef
Author:Ricardo Neri 
AuthorDate:Mon, 12 Apr 2021 07:30:42 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:23 +02:00

x86/cpu: Add helper function to get the type of the current hybrid CPU

On processors with Intel Hybrid Technology (i.e., one having more than
one type of CPU in the same package), all CPUs support the same
instruction set and enumerate the same features on CPUID. Thus, all
software can run on any CPU without restrictions. However, there may be
model-specific differences among types of CPUs. For instance, each type
of CPU may support a different number of performance counters. Also,
machine check error banks may be wired differently. Even though most
software will not care about these differences, kernel subsystems
dealing with these differences must know.

Add and expose a new helper function get_this_hybrid_cpu_type() to query
the type of the current hybrid CPU. The function will be used later in
the perf subsystem.

The Intel Software Developer's Manual defines the CPU type as 8-bit
identifier.

Signed-off-by: Ricardo Neri 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Tony Luck 
Reviewed-by: Len Brown 
Acked-by: Borislav Petkov 
Link: 
https://lkml.kernel.org/r/1618237865-33448-3-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/include/asm/cpu.h  |  6 ++
 arch/x86/kernel/cpu/intel.c | 16 
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccb..610905d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -45,6 +45,7 @@ extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 
*c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+u8 get_this_hybrid_cpu_type(void);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
@@ -57,6 +58,11 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
return false;
 }
+
+static inline u8 get_this_hybrid_cpu_type(void)
+{
+   return 0;
+}
 #endif
 #ifdef CONFIG_IA32_FEAT_CTL
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 0e422a5..26fb626 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1195,3 +1195,19 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
cpu_model_supports_sld = true;
split_lock_setup();
 }
+
+#define X86_HYBRID_CPU_TYPE_ID_SHIFT   24
+
+/**
+ * get_this_hybrid_cpu_type() - Get the type of this hybrid CPU
+ *
+ * Returns the CPU type [31:24] (i.e., Atom or Core) of a CPU in
+ * a hybrid processor. If the processor is not hybrid, returns 0.
+ */
+u8 get_this_hybrid_cpu_type(void)
+{
+   if (!cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
+   return 0;
+
+   return cpuid_eax(0x001a) >> X86_HYBRID_CPU_TYPE_ID_SHIFT;
+}


[tip: x86/cpu] x86/cpu: Use SERIALIZE in sync_core() when available

2020-08-17 Thread tip-bot2 for Ricardo Neri
The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: bf9c912f9a649776c2d741310486a6984edaac72
Gitweb:
https://git.kernel.org/tip/bf9c912f9a649776c2d741310486a6984edaac72
Author:Ricardo Neri 
AuthorDate:Thu, 06 Aug 2020 20:28:33 -07:00
Committer: Borislav Petkov 
CommitterDate: Mon, 17 Aug 2020 17:23:04 +02:00

x86/cpu: Use SERIALIZE in sync_core() when available

The SERIALIZE instruction gives software a way to force the processor to
complete all modifications to flags, registers and memory from previous
instructions and drain all buffered writes to memory before the next
instruction is fetched and executed. Thus, it serves the purpose of
sync_core(). Use it when available.

Suggested-by: Andy Lutomirski 
Signed-off-by: Ricardo Neri 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tony Luck 
Link: 
https://lkml.kernel.org/r/20200807032833.17484-1-ricardo.neri-calde...@linux.intel.com
---
 arch/x86/include/asm/special_insns.h |  6 ++
 arch/x86/include/asm/sync_core.h | 26 ++
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index 59a3e13..5999b0b 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -234,6 +234,12 @@ static inline void clwb(volatile void *__p)
 
 #define nop() asm volatile ("nop")
 
+static inline void serialize(void)
+{
+   /* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */
+   asm volatile(".byte 0xf, 0x1, 0xe8" ::: "memory");
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index fdb5b35..4631c0f 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_32
 static inline void iret_to_self(void)
@@ -54,14 +55,23 @@ static inline void iret_to_self(void)
 static inline void sync_core(void)
 {
/*
-* There are quite a few ways to do this.  IRET-to-self is nice
-* because it works on every CPU, at any CPL (so it's compatible
-* with paravirtualization), and it never exits to a hypervisor.
-* The only down sides are that it's a bit slow (it seems to be
-* a bit more than 2x slower than the fastest options) and that
-* it unmasks NMIs.  The "push %cs" is needed because, in
-* paravirtual environments, __KERNEL_CS may not be a valid CS
-* value when we do IRET directly.
+* The SERIALIZE instruction is the most straightforward way to
+* do this but it not universally available.
+*/
+   if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
+   serialize();
+   return;
+   }
+
+   /*
+* For all other processors, there are quite a few ways to do this.
+* IRET-to-self is nice because it works on every CPU, at any CPL
+* (so it's compatible with paravirtualization), and it never exits
+* to a hypervisor. The only down sides are that it's a bit slow
+* (it seems to be a bit more than 2x slower than the fastest
+* options) and that it unmasks NMIs.  The "push %cs" is needed
+* because, in paravirtual environments, __KERNEL_CS may not be a
+* valid CS value when we do IRET directly.
 *
 * In case NMI unmasking or performance ever becomes a problem,
 * the next best option appears to be MOV-to-CR2 and an


[tip: x86/cpu] x86/cpufeatures: Add enumeration for SERIALIZE instruction

2020-07-27 Thread tip-bot2 for Ricardo Neri
The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: 85b23fbc7d88f8c6e3951721802d7845bc39663d
Gitweb:
https://git.kernel.org/tip/85b23fbc7d88f8c6e3951721802d7845bc39663d
Author:Ricardo Neri 
AuthorDate:Sun, 26 Jul 2020 21:31:29 -07:00
Committer: Ingo Molnar 
CommitterDate: Mon, 27 Jul 2020 12:42:06 +02:00

x86/cpufeatures: Add enumeration for SERIALIZE instruction

The Intel architecture defines a set of Serializing Instructions (a
detailed definition can be found in Vol.3 Section 8.3 of the Intel "main"
manual, SDM). However, these instructions do more than what is required,
have side effects and/or may be rather invasive. Furthermore, some of
these instructions are only available in kernel mode or may cause VMExits.
Thus, software using these instructions only to serialize execution (as
defined in the manual) must handle the undesired side effects.

As indicated in the name, SERIALIZE is a new Intel architecture
Serializing Instruction. Crucially, it does not have any of the mentioned
side effects. Also, it does not cause VMExit and can be used in user mode.

This new instruction is currently documented in the latest "extensions"
manual (ISE). It will appear in the "main" manual in the future.

Signed-off-by: Ricardo Neri 
Signed-off-by: Ingo Molnar 
Reviewed-by: Tony Luck 
Acked-by: Dave Hansen 
Link: 
https://lore.kernel.org/r/20200727043132.15082-2-ricardo.neri-calde...@linux.intel.com
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 02dabc9..adf45cf 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -365,6 +365,7 @@
 #define X86_FEATURE_SRBDS_CTRL (18*32+ 9) /* "" SRBDS mitigation MSR 
available */
 #define X86_FEATURE_MD_CLEAR   (18*32+10) /* VERW clears CPU buffers */
 #define X86_FEATURE_TSX_FORCE_ABORT(18*32+13) /* "" TSX_FORCE_ABORT */
+#define X86_FEATURE_SERIALIZE  (18*32+14) /* SERIALIZE instruction */
 #define X86_FEATURE_PCONFIG(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL  (18*32+26) /* "" Speculation Control 
(IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP(18*32+27) /* "" Single Thread 
Indirect Branch Predictors */


[tip: x86/cpu] x86/cpu: Relocate sync_core() to sync_core.h

2020-07-27 Thread tip-bot2 for Ricardo Neri
The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: 9998a9832c4027e907353e5e05fde730cf624b77
Gitweb:
https://git.kernel.org/tip/9998a9832c4027e907353e5e05fde730cf624b77
Author:Ricardo Neri 
AuthorDate:Sun, 26 Jul 2020 21:31:30 -07:00
Committer: Ingo Molnar 
CommitterDate: Mon, 27 Jul 2020 12:42:06 +02:00

x86/cpu: Relocate sync_core() to sync_core.h

Having sync_core() in processor.h is problematic since it is not possible
to check for hardware capabilities via the *cpu_has() family of macros.
The latter needs the definitions in processor.h.

It also looks more intuitive to relocate the function to sync_core.h.

This changeset does not make changes in functionality.

Signed-off-by: Ricardo Neri 
Signed-off-by: Ingo Molnar 
Reviewed-by: Tony Luck 
Link: 
https://lore.kernel.org/r/20200727043132.15082-3-ricardo.neri-calde...@linux.intel.com
---
 arch/x86/include/asm/processor.h| 64 +
 arch/x86/include/asm/sync_core.h| 64 -
 arch/x86/kernel/alternative.c   |  1 +-
 arch/x86/kernel/cpu/mce/core.c  |  1 +-
 drivers/misc/sgi-gru/grufault.c |  1 +-
 drivers/misc/sgi-gru/gruhandles.c   |  1 +-
 drivers/misc/sgi-gru/grukservices.c |  1 +-
 7 files changed, 69 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 03b7c4c..68ba42f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -678,70 +678,6 @@ static inline unsigned int cpuid_edx(unsigned int op)
return edx;
 }
 
-/*
- * This function forces the icache and prefetched instruction stream to
- * catch up with reality in two very specific cases:
- *
- *  a) Text was modified using one virtual address and is about to be executed
- * from the same physical page at a different virtual address.
- *
- *  b) Text was modified on a different CPU, may subsequently be
- * executed on this CPU, and you want to make sure the new version
- * gets executed.  This generally means you're calling this in a IPI.
- *
- * If you're calling this for a different reason, you're probably doing
- * it wrong.
- */
-static inline void sync_core(void)
-{
-   /*
-* There are quite a few ways to do this.  IRET-to-self is nice
-* because it works on every CPU, at any CPL (so it's compatible
-* with paravirtualization), and it never exits to a hypervisor.
-* The only down sides are that it's a bit slow (it seems to be
-* a bit more than 2x slower than the fastest options) and that
-* it unmasks NMIs.  The "push %cs" is needed because, in
-* paravirtual environments, __KERNEL_CS may not be a valid CS
-* value when we do IRET directly.
-*
-* In case NMI unmasking or performance ever becomes a problem,
-* the next best option appears to be MOV-to-CR2 and an
-* unconditional jump.  That sequence also works on all CPUs,
-* but it will fault at CPL3 (i.e. Xen PV).
-*
-* CPUID is the conventional way, but it's nasty: it doesn't
-* exist on some 486-like CPUs, and it usually exits to a
-* hypervisor.
-*
-* Like all of Linux's memory ordering operations, this is a
-* compiler barrier as well.
-*/
-#ifdef CONFIG_X86_32
-   asm volatile (
-   "pushfl\n\t"
-   "pushl %%cs\n\t"
-   "pushl $1f\n\t"
-   "iret\n\t"
-   "1:"
-   : ASM_CALL_CONSTRAINT : : "memory");
-#else
-   unsigned int tmp;
-
-   asm volatile (
-   "mov %%ss, %0\n\t"
-   "pushq %q0\n\t"
-   "pushq %%rsp\n\t"
-   "addq $8, (%%rsp)\n\t"
-   "pushfq\n\t"
-   "mov %%cs, %0\n\t"
-   "pushq %q0\n\t"
-   "pushq $1f\n\t"
-   "iretq\n\t"
-   "1:"
-   : "=" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
-#endif
-}
-
 extern void select_idle_routine(const struct cpuinfo_x86 *c);
 extern void amd_e400_c1e_apic_setup(void);
 
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index c67caaf..9c5573f 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -7,6 +7,70 @@
 #include 
 
 /*
+ * This function forces the icache and prefetched instruction stream to
+ * catch up with reality in two very specific cases:
+ *
+ *  a) Text was modified using one virtual address and is about to be executed
+ * from the same physical page at a different virtual address.
+ *
+ *  b) Text was modified on a different CPU, may subsequently be
+ * executed on this CPU, and you want to make sure the new version
+ * gets executed.  This generally means you're calling this in a IPI.
+ *
+ * If you're calling this for a different reason, you're probably doing
+ * it wrong.
+ 

[tip: x86/cpu] x86/cpu: Refactor sync_core() for readability

2020-07-27 Thread tip-bot2 for Ricardo Neri
The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: f69ca629d89d65737537e05308ac531f7bb07d5c
Gitweb:
https://git.kernel.org/tip/f69ca629d89d65737537e05308ac531f7bb07d5c
Author:Ricardo Neri 
AuthorDate:Sun, 26 Jul 2020 21:31:31 -07:00
Committer: Ingo Molnar 
CommitterDate: Mon, 27 Jul 2020 12:42:06 +02:00

x86/cpu: Refactor sync_core() for readability

Instead of having #ifdef/#endif blocks inside sync_core() for X86_64 and
X86_32, implement the new function iret_to_self() with two versions.

In this manner, avoid having to use even more more #ifdef/#endif blocks
when adding support for SERIALIZE in sync_core().

Co-developed-by: Tony Luck 
Signed-off-by: Tony Luck 
Signed-off-by: Ricardo Neri 
Signed-off-by: Ingo Molnar 
Link: 
https://lore.kernel.org/r/20200727043132.15082-4-ricardo.neri-calde...@linux.intel.com
---
 arch/x86/include/asm/special_insns.h |  1 +-
 arch/x86/include/asm/sync_core.h | 56 +++
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index eb8e781..59a3e13 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -234,7 +234,6 @@ static inline void clwb(volatile void *__p)
 
 #define nop() asm volatile ("nop")
 
-
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index 9c5573f..fdb5b35 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -6,6 +6,37 @@
 #include 
 #include 
 
+#ifdef CONFIG_X86_32
+static inline void iret_to_self(void)
+{
+   asm volatile (
+   "pushfl\n\t"
+   "pushl %%cs\n\t"
+   "pushl $1f\n\t"
+   "iret\n\t"
+   "1:"
+   : ASM_CALL_CONSTRAINT : : "memory");
+}
+#else
+static inline void iret_to_self(void)
+{
+   unsigned int tmp;
+
+   asm volatile (
+   "mov %%ss, %0\n\t"
+   "pushq %q0\n\t"
+   "pushq %%rsp\n\t"
+   "addq $8, (%%rsp)\n\t"
+   "pushfq\n\t"
+   "mov %%cs, %0\n\t"
+   "pushq %q0\n\t"
+   "pushq $1f\n\t"
+   "iretq\n\t"
+   "1:"
+   : "=" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+}
+#endif /* CONFIG_X86_32 */
+
 /*
  * This function forces the icache and prefetched instruction stream to
  * catch up with reality in two very specific cases:
@@ -44,30 +75,7 @@ static inline void sync_core(void)
 * Like all of Linux's memory ordering operations, this is a
 * compiler barrier as well.
 */
-#ifdef CONFIG_X86_32
-   asm volatile (
-   "pushfl\n\t"
-   "pushl %%cs\n\t"
-   "pushl $1f\n\t"
-   "iret\n\t"
-   "1:"
-   : ASM_CALL_CONSTRAINT : : "memory");
-#else
-   unsigned int tmp;
-
-   asm volatile (
-   "mov %%ss, %0\n\t"
-   "pushq %q0\n\t"
-   "pushq %%rsp\n\t"
-   "addq $8, (%%rsp)\n\t"
-   "pushfq\n\t"
-   "mov %%cs, %0\n\t"
-   "pushq %q0\n\t"
-   "pushq $1f\n\t"
-   "iretq\n\t"
-   "1:"
-   : "=" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
-#endif
+   iret_to_self();
 }
 
 /*