from:"tip\-bot for Fenghua Yu"

[tip:x86/urgent] x86/umwait: Fix error handling in umwait_init()

2019-08-12 Thread tip-bot for Fenghua Yu

Commit-ID:  e7409258845a0f64967f8377e99294d438137537
Gitweb: https://git.kernel.org/tip/e7409258845a0f64967f8377e99294d438137537
Author: Fenghua Yu 
AuthorDate: Fri, 9 Aug 2019 18:40:37 -0700
Committer:  Thomas Gleixner 
CommitDate: Mon, 12 Aug 2019 14:51:13 +0200

x86/umwait: Fix error handling in umwait_init()

Currently, failure of cpuhp_setup_state() is ignored and the syscore ops
and the control interfaces can still be added even after the failure. But,
this error handling will cause a few issues:

1. The CPUs may have different values in the IA32_UMWAIT_CONTROL
   MSR because there is no way to roll back the control MSR on
   the CPUs which already set the MSR before the failure.

2. If the sysfs interface is added successfully, there will be a mismatch
   between the global control value and the control MSR:
   - The interface shows the default global control value. But,
 the control MSR is not set to the value because the CPU online
 function, which is supposed to set the MSR to the value,
 is not installed.
   - If the sysadmin changes the global control value through
 the interface, the control MSR on all current online CPUs is
 set to the new value. But, the control MSR on newly onlined CPUs
 after the value change will not be set to the new value due to
 lack of the CPU online function.

3. On resume from suspend/hibernation, the boot CPU restores the control
   MSR to the global control value through the syscore ops. But, the
   control MSR on all APs is not set due to lake of the CPU online
   function.

To solve the issues and enforce consistent behavior on the failure
of the CPU hotplug setup, make the following changes:

1. Cache the original control MSR value which is configured by
   hardware or BIOS before kernel boot. This value is likely to
   be 0. But it could be a different number as well. Cache the
   control MSR only once before the MSR is changed.
2. Add the CPU offline function so that the MSR is restored to the
   original control value on all CPUs on the failure.
3. On the failure, exit from cpumait_init() so that the syscore ops
   and the control interfaces are not added.

Reported-by: Valdis Kletnieks 
Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/1565401237-60936-1-git-send-email-fenghua...@intel.com

---
 arch/x86/kernel/cpu/umwait.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 6a204e7336c1..32b4dc9030aa 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -17,6 +17,12 @@
  */
 static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE);
 
+/*
+ * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by
+ * hardware or BIOS before kernel boot.
+ */
+static u32 orig_umwait_control_cached __ro_after_init;
+
 /*
  * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in
  * the sysfs write functions.
@@ -52,6 +58,23 @@ static int umwait_cpu_online(unsigned int cpu)
return 0;
 }
 
+/*
+ * The CPU hotplug callback sets the control MSR to the original control
+ * value.
+ */
+static int umwait_cpu_offline(unsigned int cpu)
+{
+   /*
+* This code is protected by the CPU hotplug already and
+* orig_umwait_control_cached is never changed after it caches
+* the original control MSR value in umwait_init(). So there
+* is no race condition here.
+*/
+   wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0);
+
+   return 0;
+}
+
 /*
  * On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which
  * is the only active CPU at this time. The MSR is set up on the APs via the
@@ -185,8 +208,22 @@ static int __init umwait_init(void)
if (!boot_cpu_has(X86_FEATURE_WAITPKG))
return -ENODEV;
 
+   /*
+* Cache the original control MSR value before the control MSR is
+* changed. This is the only place where orig_umwait_control_cached
+* is modified.
+*/
+   rdmsrl(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached);
+
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait:online",
-   umwait_cpu_online, NULL);
+   umwait_cpu_online, umwait_cpu_offline);
+   if (ret < 0) {
+   /*
+* On failure, the control MSR on all CPUs has the
+* original control value.
+*/
+   return ret;
+   }
 
register_syscore_ops(&umwait_syscore_ops);

[tip:x86/cpu] x86/umwait: Add sysfs interface to control umwait maximum time

2019-06-23 Thread tip-bot for Fenghua Yu

Commit-ID:  bd9a0c97e53c3d7a56b2751179903ddc5da42683
Gitweb: https://git.kernel.org/tip/bd9a0c97e53c3d7a56b2751179903ddc5da42683
Author: Fenghua Yu 
AuthorDate: Wed, 19 Jun 2019 18:33:57 -0700
Committer:  Thomas Gleixner 
CommitDate: Mon, 24 Jun 2019 01:44:20 +0200

x86/umwait: Add sysfs interface to control umwait maximum time

IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
that processor can stay in C0.1 or C0.2. A zero value means no maximum
time.

Each instruction sets its own deadline in the instruction's implicit
input EDX:EAX value. The instruction wakes up if the time-stamp counter
reaches or exceeds the specified deadline, or the umwait maximum time
expires, or a store happens in the monitored address range in umwait.

The administrator can write an unsigned 32-bit number to
/sys/devices/system/cpu/umwait_control/max_time to change the default
value. Note that a value of zero means there is no limit. The lower two
bits of the value must be zero.

[ tglx: Simplify the write function. Massage changelog ]

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
Cc: "Borislav Petkov" 
Cc: "H Peter Anvin" 
Cc: "Andy Lutomirski" 
Cc: "Peter Zijlstra" 
Cc: "Ravi V Shankar" 
Link: 
https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua...@intel.com

---
 arch/x86/kernel/cpu/umwait.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 56149d630e35..6a204e7336c1 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -131,8 +131,44 @@ static ssize_t enable_c02_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(enable_c02);
 
+static ssize_t
+max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
+{
+   u32 ctrl = READ_ONCE(umwait_control_cached);
+
+   return sprintf(buf, "%u\n", umwait_ctrl_max_time(ctrl));
+}
+
+static ssize_t max_time_store(struct device *kobj,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   u32 max_time, ctrl;
+   int ret;
+
+   ret = kstrtou32(buf, 0, &max_time);
+   if (ret)
+   return ret;
+
+   /* bits[1:0] must be zero */
+   if (max_time & ~MSR_IA32_UMWAIT_CONTROL_TIME_MASK)
+   return -EINVAL;
+
+   mutex_lock(&umwait_lock);
+
+   ctrl = READ_ONCE(umwait_control_cached);
+   if (max_time != umwait_ctrl_max_time(ctrl))
+   umwait_update_control(max_time, umwait_ctrl_c02_enabled(ctrl));
+
+   mutex_unlock(&umwait_lock);
+
+   return count;
+}
+static DEVICE_ATTR_RW(max_time);
+
 static struct attribute *umwait_attrs[] = {
&dev_attr_enable_c02.attr,
+   &dev_attr_max_time.attr,
NULL
 };

[tip:x86/cpu] x86/cpufeatures: Enumerate user wait instructions

2019-06-23 Thread tip-bot for Fenghua Yu

Commit-ID:  6dbbf5ec9e1e9f607a4c51266d0f9a63ba754b63
Gitweb: https://git.kernel.org/tip/6dbbf5ec9e1e9f607a4c51266d0f9a63ba754b63
Author: Fenghua Yu 
AuthorDate: Wed, 19 Jun 2019 18:33:54 -0700
Committer:  Thomas Gleixner 
CommitDate: Mon, 24 Jun 2019 01:44:19 +0200

x86/cpufeatures: Enumerate user wait instructions

umonitor, umwait, and tpause are a set of user wait instructions.

umonitor arms address monitoring hardware using an address. The
address range is determined by using CPUID.0x5. A store to
an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

umwait instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(C0.1 state) or an improved power/performance optimized state
(C0.2 state).

tpause instructs the processor to enter an implementation-dependent
optimized state C0.1 or C0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

The instructions provide power saving method while waiting in
user space. Additionally, they can allow a sibling hyperthread to
make faster progress while this thread is waiting. One example of an
application usage of umwait is when waiting for input data from another
application, such as a user level multi-threaded packet processing
engine.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Detailed information on the instructions and CPUID feature WAITPKG flag
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference and Intel 64 and IA-32
Architectures Software Developer's Manual.

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ashok Raj 
Reviewed-by: Andy Lutomirski 
Cc: "Borislav Petkov" 
Cc: "H Peter Anvin" 
Cc: "Peter Zijlstra" 
Cc: "Tony Luck" 
Cc: "Ravi V Shankar" 
Link: 
https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua...@intel.com

---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 8ecd9fac97c3..998c2cc08363 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -330,6 +330,7 @@
 #define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction 
Protection */
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys 
for Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable 
*/
+#define X86_FEATURE_WAITPKG(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE 
Instructions */
 #define X86_FEATURE_AVX512_VBMI2   (16*32+ 6) /* Additional AVX512 Vector 
Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI   (16*32+ 8) /* Galois Field New 
Instructions */
 #define X86_FEATURE_VAES   (16*32+ 9) /* Vector AES */

[tip:x86/cpu] x86/umwait: Add sysfs interface to control umwait C0.2 state

2019-06-23 Thread tip-bot for Fenghua Yu

Commit-ID:  ff4b353f2ef9dc8e396d7cb9572801e34a8c7374
Gitweb: https://git.kernel.org/tip/ff4b353f2ef9dc8e396d7cb9572801e34a8c7374
Author: Fenghua Yu 
AuthorDate: Wed, 19 Jun 2019 18:33:56 -0700
Committer:  Thomas Gleixner 
CommitDate: Mon, 24 Jun 2019 01:44:20 +0200

x86/umwait: Add sysfs interface to control umwait C0.2 state

C0.2 state in umwait and tpause instructions can be enabled or disabled
on a processor through IA32_UMWAIT_CONTROL MSR register.

By default, C0.2 is enabled and the user wait instructions results in
lower power consumption with slower wakeup time.

But in real time systems which require faster wakeup time although power
savings could be smaller, the administrator needs to disable C0.2 and all
umwait invocations from user applications use C0.1.

Create a sysfs interface which allows the administrator to control C0.2
state during run time.

Andy Lutomirski suggested to turn off local irqs before writing the MSR to
ensure the cached control value is not changed by a concurrent sysfs write
from a different CPU via IPI.

[ tglx: Simplified the update logic in the write function and got rid of
all the convoluted type casts. Added a shared update function and
made the namespace consistent. Moved the sysfs create invocation.
Massaged changelog ]

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
Cc: "Borislav Petkov" 
Cc: "H Peter Anvin" 
Cc: "Andy Lutomirski" 
Cc: "Peter Zijlstra" 
Cc: "Ravi V Shankar" 
Link: 
https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua...@intel.com

---
 arch/x86/kernel/cpu/umwait.c | 118 ---
 1 file changed, 110 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 0a113c731df3..56149d630e35 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -7,8 +7,8 @@
 
 #define UMWAIT_C02_ENABLE  0
 
-#define UMWAIT_CTRL_VAL(maxtime, c02_disable)  \
-   (((maxtime) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) |  \
+#define UMWAIT_CTRL_VAL(max_time, c02_disable) \
+   (((max_time) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) | \
((c02_disable) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE))
 
 /*
@@ -17,10 +17,38 @@
  */
 static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE);
 
-/* Set IA32_UMWAIT_CONTROL MSR on this CPU to the current global setting. */
+/*
+ * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in
+ * the sysfs write functions.
+ */
+static DEFINE_MUTEX(umwait_lock);
+
+static void umwait_update_control_msr(void * unused)
+{
+   lockdep_assert_irqs_disabled();
+   wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
+}
+
+/*
+ * The CPU hotplug callback sets the control MSR to the global control
+ * value.
+ *
+ * Disable interrupts so the read of umwait_control_cached and the WRMSR
+ * are protected against a concurrent sysfs write. Otherwise the sysfs
+ * write could update the cached value after it had been read on this CPU
+ * and issue the IPI before the old value had been written. The IPI would
+ * interrupt, write the new value and after return from IPI the previous
+ * value would be written by this CPU.
+ *
+ * With interrupts disabled the upcoming CPU either sees the new control
+ * value or the IPI is updating this CPU to the new control value after
+ * interrupts have been reenabled.
+ */
 static int umwait_cpu_online(unsigned int cpu)
 {
-   wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+   local_irq_disable();
+   umwait_update_control_msr(NULL);
+   local_irq_enable();
return 0;
 }
 
@@ -36,15 +64,86 @@ static int umwait_cpu_online(unsigned int cpu)
  */
 static void umwait_syscore_resume(void)
 {
-   wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+   umwait_update_control_msr(NULL);
 }
 
 static struct syscore_ops umwait_syscore_ops = {
.resume = umwait_syscore_resume,
 };
 
+/* sysfs interface */
+
+/*
+ * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
+ * Otherwise, C0.2 is enabled.
+ */
+static inline bool umwait_ctrl_c02_enabled(u32 ctrl)
+{
+   return !(ctrl & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE);
+}
+
+static inline u32 umwait_ctrl_max_time(u32 ctrl)
+{
+   return ctrl & MSR_IA32_UMWAIT_CONTROL_TIME_MASK;
+}
+
+static inline void umwait_update_control(u32 maxtime, bool c02_enable)
+{
+   u32 ctrl = maxtime & MSR_IA32_UMWAIT_CONTROL_TIME_MASK;
+
+   if (!c02_enable)
+   ctrl |= MSR_IA32_UMWAIT_CONTROL_C02_DISABLE;
+
+   WRITE_ONCE(umwait_control_cached, ctrl);
+   /* Propagate to all CPUs */
+   on_each_cpu(umwait_update_control_msr, NULL, 1);
+}
+
+static ssize_t
+enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   u32 ctrl =

[tip:x86/cpu] Documentation/ABI: Document umwait control sysfs interfaces

2019-06-23 Thread tip-bot for Fenghua Yu

Commit-ID:  203dffacf592317e54480704f569a09f8b7ca380
Gitweb: https://git.kernel.org/tip/203dffacf592317e54480704f569a09f8b7ca380
Author: Fenghua Yu 
AuthorDate: Wed, 19 Jun 2019 18:33:58 -0700
Committer:  Thomas Gleixner 
CommitDate: Mon, 24 Jun 2019 01:44:35 +0200

Documentation/ABI: Document umwait control sysfs interfaces

Since two new sysfs interface files are created for umwait control, add
an ABI document entry for the files:

   /sys/devices/system/cpu/umwait_control/enable_c02
   /sys/devices/system/cpu/umwait_control/max_time

[ tglx: Made the write value instructions readable ]

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ashok Raj 
Cc: "Borislav Petkov" 
Cc: "H Peter Anvin" 
Cc: "Andy Lutomirski" 
Cc: "Peter Zijlstra" 
Cc: "Tony Luck" 
Cc: "Ravi V Shankar" 
Link: 
https://lkml.kernel.org/r/1560994438-235698-6-git-send-email-fenghua...@intel.com
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 23 ++
 1 file changed, 23 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1528239f69b2..923fe2001472 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -538,3 +538,26 @@ Description:   Intel Energy and Performance Bias Hint 
(EPB)
 
This attribute is present for all online CPUs supporting the
Intel EPB feature.
+
+What:  /sys/devices/system/cpu/umwait_control
+   /sys/devices/system/cpu/umwait_control/enable_c02
+   /sys/devices/system/cpu/umwait_control/max_time
+Date:  May 2019
+Contact:   Linux kernel mailing list 
+Description:   Umwait control
+
+   enable_c02: Read/write interface to control umwait C0.2 state
+   Read returns C0.2 state status:
+   0: C0.2 is disabled
+   1: C0.2 is enabled
+
+   Write 'y' or '1'  or 'on' to enable C0.2 state.
+   Write 'n' or '0'  or 'off' to disable C0.2 state.
+
+   The interface is case insensitive.
+
+   max_time: Read/write interface to control umwait maximum time
+ in TSC-quanta that the CPU can reside in either C0.1
+ or C0.2 state. The time is an unsigned 32-bit number.
+ Note that a value of zero means there is no limit.
+ Low order two bits must be zero.

[tip:x86/cpu] x86/umwait: Initialize umwait control values

2019-06-23 Thread tip-bot for Fenghua Yu

Commit-ID:  bd688c69b7e6693de3bd78f38fd63f7850c2711e
Gitweb: https://git.kernel.org/tip/bd688c69b7e6693de3bd78f38fd63f7850c2711e
Author: Fenghua Yu 
AuthorDate: Wed, 19 Jun 2019 18:33:55 -0700
Committer:  Thomas Gleixner 
CommitDate: Mon, 24 Jun 2019 01:44:19 +0200

x86/umwait: Initialize umwait control values

umwait or tpause allows the processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period specified by
the instruction or until the system time limit or until a store to the
monitored address range in umwait.

IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
the processor and to set the maximum time the processor can reside in C0.1
or C0.2.

By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.

Andy Lutomirski proposed to set the maximum umwait time to 10 cycles by
default. A quote from Andy:

  "What I want to avoid is the case where it works dramatically differently
   on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
   behave a bit differently if the max timeout is hit, and I'd like that
   path to get exercised widely by making it happen even on default
   configs."

A sysfs interface to adjust the time and the C0.2 enablement is provided in
a follow up change.

[ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
mask throughout the code.
Massaged comments and changelog ]

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ashok Raj 
Reviewed-by: Andy Lutomirski 
Cc: "Borislav Petkov" 
Cc: "H Peter Anvin" 
Cc: "Peter Zijlstra" 
Cc: "Tony Luck" 
Cc: "Ravi V Shankar" 
Link: 
https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua...@intel.com

---
 arch/x86/include/asm/msr-index.h |  9 ++
 arch/x86/kernel/cpu/Makefile |  1 +
 arch/x86/kernel/cpu/umwait.c | 62 
 3 files changed, 72 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..6b4fc2788078 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -61,6 +61,15 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT  31
 #define MSR_PLATFORM_INFO_CPUID_FAULT  
BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL0xe1
+#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLEBIT(0)
+#define MSR_IA32_UMWAIT_CONTROL_RESERVED   BIT(1)
+/*
+ * The time field is bit[31:2], but representing a 32bit value with
+ * bit[1:0] zero.
+ */
+#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK  (~0x03U)
+
 #define MSR_PKG_CST_CONFIG_CONTROL 0x00e2
 #define NHM_C3_AUTO_DEMOTE (1UL << 25)
 #define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index a7d9a4cb3ab6..4b4eb06e117c 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -24,6 +24,7 @@ obj-y += match.o
 obj-y  += bugs.o
 obj-y  += aperfmperf.o
 obj-y  += cpuid-deps.o
+obj-y  += umwait.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
 obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
new file mode 100644
index ..0a113c731df3
--- /dev/null
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+
+#include 
+
+#define UMWAIT_C02_ENABLE  0
+
+#define UMWAIT_CTRL_VAL(maxtime, c02_disable)  \
+   (((maxtime) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) |  \
+   ((c02_disable) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE))
+
+/*
+ * Cache IA32_UMWAIT_CONTROL MSR. This is a systemwide control. By default,
+ * umwait max time is 10 in TSC-quanta and C0.2 is enabled
+ */
+static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE);
+
+/* Set IA32_UMWAIT_CONTROL MSR on this CPU to the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+   wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+   return 0;
+}
+
+/*
+ * On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which
+ * is the only active CPU at this time. The MSR is set up on the APs via the
+ * CPU hotplug callback.
+ *
+ * This function is invoked on resume from suspend and hibernation. On
+ * resume from suspend the restore should be not required, but we neither
+ * trust the firmware nor does it matter if the same value is written
+ * again.
+ */
+static void umwait_syscore_resume(void)
+{
+   wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+}
+
+static struct syscore_ops

[tip:x86/cpu] x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions

2019-06-20 Thread tip-bot for Fenghua Yu

Commit-ID:  b302e4b176d00e1cbc80148c5d0aee36751f7480
Gitweb: https://git.kernel.org/tip/b302e4b176d00e1cbc80148c5d0aee36751f7480
Author: Fenghua Yu 
AuthorDate: Mon, 17 Jun 2019 11:00:16 -0700
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 12:38:49 +0200

x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions

AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.

BF16 is a short version of 32-bit single-precision floating-point
format (FP32) and has several advantages over 16-bit half-precision
floating-point format (FP16). BF16 keeps FP32 accumulation after
multiplication without loss of precision, offers more than enough
range for deep learning training tasks, and doesn't need to handle
hardware exception.

AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5]
AVX512_BF16.

CPUID.7.1:EAX contains only feature bits. Reuse the currently empty
word 12 as a pure features word to hold the feature bits including
AVX512_BF16.

Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference.

 [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ]

Signed-off-by: Fenghua Yu 
Signed-off-by: Borislav Petkov 
Cc: "Chang S. Bae" 
Cc: Frederic Weisbecker 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Jann Horn 
Cc: Masahiro Yamada 
Cc: Michael Ellerman 
Cc: Nadav Amit 
Cc: Paolo Bonzini 
Cc: Pavel Tatashin 
Cc: Peter Feiner 
Cc: Radim Krcmar 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V Shankar" 
Cc: Robert Hoo 
Cc: "Sean J Christopherson" 
Cc: Thomas Gleixner 
Cc: Thomas Lendacky 
Cc: x86 
Link: 
https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua...@intel.com
---
 arch/x86/include/asm/cpufeature.h  | 2 +-
 arch/x86/include/asm/cpufeatures.h | 3 +++
 arch/x86/kernel/cpu/common.c   | 6 ++
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 403f70c2e431..58acda503817 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -23,7 +23,7 @@ enum cpuid_leafs
CPUID_7_0_EBX,
CPUID_D_1_EAX,
CPUID_LNX_4,
-   CPUID_DUMMY,
+   CPUID_7_1_EAX,
CPUID_8000_0008_EBX,
CPUID_6_EAX,
CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index be858b86023a..8ecd9fac97c3 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -282,6 +282,9 @@
 #define X86_FEATURE_CQM_MBM_TOTAL  (11*32+ 2) /* LLC Total MBM monitoring 
*/
 #define X86_FEATURE_CQM_MBM_LOCAL  (11*32+ 3) /* LLC Local MBM monitoring 
*/
 
+/* Intel-defined CPU features, CPUID level 0x0007:1 (EAX), word 12 */
+#define X86_FEATURE_AVX512_BF16(12*32+ 5) /* AVX512 BFLOAT16 
instructions */
+
 /* AMD-defined CPU features, CPUID level 0x8008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
 #define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired 
Count */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index efb114298cfb..dad20bc891d5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -847,6 +847,12 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_capability[CPUID_7_0_EBX] = ebx;
c->x86_capability[CPUID_7_ECX] = ecx;
c->x86_capability[CPUID_7_EDX] = edx;
+
+   /* Check valid sub-leaf index before accessing it */
+   if (eax >= 1) {
+   cpuid_count(0x0007, 1, &eax, &ebx, &ecx, &edx);
+   c->x86_capability[CPUID_7_1_EAX] = eax;
+   }
}
 
/* Extended state features: level 0x000d */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index fa07a224e7b9..a444028d8145 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -62,6 +62,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_OCCUP_LLC,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
+   { X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
{}
 };

[tip:x86/cpu] x86/cpufeatures: Combine word 11 and 12 into a new scattered features word

2019-06-20 Thread tip-bot for Fenghua Yu

Commit-ID:  acec0ce081de0c36459eea91647faf99296445a3
Gitweb: https://git.kernel.org/tip/acec0ce081de0c36459eea91647faf99296445a3
Author: Fenghua Yu 
AuthorDate: Wed, 19 Jun 2019 18:51:09 +0200
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 12:38:44 +0200

x86/cpufeatures: Combine word 11 and 12 into a new scattered features word

It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
whole feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be
added in word 11 in the future.

Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
Linux-defined leaf.

Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
name in the next patch when CPUID.7.1:EAX occupies world 12.

Maximum number of RMID and cache occupancy scale are retrieved from
CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
code into a separate function.

KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.

Signed-off-by: Fenghua Yu 
Signed-off-by: Borislav Petkov 
Cc: Aaron Lewis 
Cc: Andy Lutomirski 
Cc: Babu Moger 
Cc: "Chang S. Bae" 
Cc: "Sean J Christopherson" 
Cc: Frederic Weisbecker 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Jann Horn 
Cc: Juergen Gross 
Cc: Konrad Rzeszutek Wilk 
Cc: kvm ML 
Cc: Masahiro Yamada 
Cc: Masami Hiramatsu 
Cc: Nadav Amit 
Cc: Paolo Bonzini 
Cc: Pavel Tatashin 
Cc: Peter Feiner 
Cc: "Peter Zijlstra (Intel)" 
Cc: "Radim Krčmář" 
Cc: "Rafael J. Wysocki" 
Cc: Ravi V Shankar 
Cc: Sherry Hurwitz 
Cc: Thomas Gleixner 
Cc: Thomas Lendacky 
Cc: x86 
Link: 
https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua...@intel.com
---
 arch/x86/include/asm/cpufeature.h  |  4 ++--
 arch/x86/include/asm/cpufeatures.h | 17 ++---
 arch/x86/kernel/cpu/common.c   | 38 +++---
 arch/x86/kernel/cpu/cpuid-deps.c   |  3 +++
 arch/x86/kernel/cpu/scattered.c|  4 
 arch/x86/kvm/cpuid.h   |  2 --
 6 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 1d337c51f7e6..403f70c2e431 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -22,8 +22,8 @@ enum cpuid_leafs
CPUID_LNX_3,
CPUID_7_0_EBX,
CPUID_D_1_EAX,
-   CPUID_F_0_EDX,
-   CPUID_F_1_EDX,
+   CPUID_LNX_4,
+   CPUID_DUMMY,
CPUID_8000_0008_EBX,
CPUID_6_EAX,
CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 1017b9c7dfe0..be858b86023a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -271,13 +271,16 @@
 #define X86_FEATURE_XGETBV1(10*32+ 2) /* XGETBV with ECX = 1 
instruction */
 #define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS 
instructions */
 
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x000F:0 (EDX), word 11 */
-#define X86_FEATURE_CQM_LLC(11*32+ 1) /* LLC QoS if 1 */
-
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x000F:1 (EDX), word 12 */
-#define X86_FEATURE_CQM_OCCUP_LLC  (12*32+ 0) /* LLC occupancy monitoring 
*/
-#define X86_FEATURE_CQM_MBM_TOTAL  (12*32+ 1) /* LLC Total MBM monitoring 
*/
-#define X86_FEATURE_CQM_MBM_LOCAL  (12*32+ 2) /* LLC Local MBM monitoring 
*/
+/*
+ * Extended auxiliary flags: Linux defined - for features scattered in various
+ * CPUID levels like 0xf, etc.
+ *
+ * Reuse free bits when adding new feature flags!
+ */
+#define X86_FEATURE_CQM_LLC(11*32+ 0) /* LLC QoS if 1 */
+#define X86_FEATURE_CQM_OCCUP_LLC  (11*32+ 1) /* LLC occupancy monitoring 
*/
+#define X86_FEATURE_CQM_MBM_TOTAL  (11*32+ 2) /* LLC Total MBM monitoring 
*/
+#define X86_FEATURE_CQM_MBM_LOCAL  (11*32+ 3) /* LLC Local MBM monitoring 
*/
 
 /* AMD-defined CPU features, CPUID level 0x8008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fe6ed9696467..efb114298cfb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -803,33 +803,25 @@ static void init_speculation_control(struct cpuinfo_x86 
*c)
 
 static void init_cqm(struct cpuinfo_x86 *c)
 {
-   u32 eax, ebx, ecx, edx;
-
-   /* Additional Intel-defined flags: level 0x000F */
-   if (c->cpuid_level >= 0x000F) {
+   if (!cpu_has(c, X86_FEATURE_CQM_LLC)) {
+   c->x86_cache_max_rmid  = -1;
+   c->x86_cache_occ_scale = -1;
+   return;
+   }
 
-   /* QoS sub-leaf, EAX=0Fh, ECX=0 */
-   cpuid_count(0x000F, 0, &eax, &ebx, &ecx, &edx);
-   c->x86_c

[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIR64B instruction

2018-10-24 Thread tip-bot for Fenghua Yu

Commit-ID:  ace6485a03266cc3c198ce8e927a1ce0ce139699
Gitweb: https://git.kernel.org/tip/ace6485a03266cc3c198ce8e927a1ce0ce139699
Author: Fenghua Yu 
AuthorDate: Wed, 24 Oct 2018 14:57:17 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 25 Oct 2018 07:42:48 +0200

x86/cpufeatures: Enumerate MOVDIR64B instruction

MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity.
Direct store is implemented by using write combining (WC) for writing
data directly into memory without caching the data.

In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
work descriptors (and data in some cases) to device-hosted work-queues
atomically without cache pollution.

Availability of the MOVDIR64B instruction is indicated by the
presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIR64B flag.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Ravi V Shankar 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1540418237-125817-3-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 90934ee7b79a..28c4a502b419 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -332,6 +332,7 @@
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
+#define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */

[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIRI instruction

2018-10-24 Thread tip-bot for Fenghua Yu

Commit-ID:  33823f4d63f7a010653d219800539409a78ef4be
Gitweb: https://git.kernel.org/tip/33823f4d63f7a010653d219800539409a78ef4be
Author: Fenghua Yu 
AuthorDate: Wed, 24 Oct 2018 14:57:16 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 25 Oct 2018 07:42:48 +0200

x86/cpufeatures: Enumerate MOVDIRI instruction

MOVDIRI moves doubleword or quadword from register to memory through
direct store which is implemented by using write combining (WC) for
writing data directly into memory without caching the data.

Programmable agents can handle streaming offload (e.g. high speed packet
processing in network). Hardware implements a doorbell (tail pointer)
register that is updated by software when adding new work-elements to
the streaming offload work-queue.

MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
uncachable write to MMIO. MOVDIRI has lower overhead than other ways
to write the doorbell.

Availability of the MOVDIRI instruction is indicated by the presence of
the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIRI flag.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Ravi V Shankar 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1540418237-125817-2-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 89a048c2faec..90934ee7b79a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -331,6 +331,7 @@
 #define X86_FEATURE_LA57   (16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
+#define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */

[tip:x86/urgent] x86/intel_rdt: Add Reinette as co-maintainer for RDT

2018-09-20 Thread tip-bot for Fenghua Yu

Commit-ID:  a8b3bb338e4ee4cc84a2b9a6fdf27049b84baa59
Gitweb: https://git.kernel.org/tip/a8b3bb338e4ee4cc84a2b9a6fdf27049b84baa59
Author: Fenghua Yu 
AuthorDate: Thu, 20 Sep 2018 12:37:08 -0700
Committer:  Thomas Gleixner 
CommitDate: Thu, 20 Sep 2018 21:44:35 +0200

x86/intel_rdt: Add Reinette as co-maintainer for RDT

Reinette Chatre is doing great job on enabling pseudo-locking and other
features in RDT. Add her as co-maintainer for RDT.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Acked-by: Ingo Molnar 
Acked-by: Reinette Chatre 
Cc: "H Peter Anvin" 
Cc: "Tony Luck" 
Link: 
https://lkml.kernel.org/r/1537472228-221799-1-git-send-email-fenghua...@intel.com

---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 091e66b60cd2..140ea6ee3ac8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12260,6 +12260,7 @@ F:  Documentation/networking/rds.txt
 
 RDT - RESOURCE ALLOCATION
 M: Fenghua Yu 
+M: Reinette Chatre 
 L: linux-kernel@vger.kernel.org
 S: Supported
 F: arch/x86/kernel/cpu/intel_rdt*

[tip:x86/urgent] x86/cpufeatures: Enumerate cldemote instruction

2018-04-25 Thread tip-bot for Fenghua Yu

Commit-ID:  9124130573950dcfc06b6a59306edfda2fc33ec7
Gitweb: https://git.kernel.org/tip/9124130573950dcfc06b6a59306edfda2fc33ec7
Author: Fenghua Yu 
AuthorDate: Mon, 23 Apr 2018 11:29:22 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 26 Apr 2018 07:31:12 +0200

x86/cpufeatures: Enumerate cldemote instruction

cldemote is a new instruction in future x86 processors. It hints
to hardware that a specified cache line should be moved ("demoted")
from the cache(s) closest to the processor core to a level more
distant from the processor core. This instruction is faster than
snooping to make the cache line available for other cores.

cldemote instruction is indicated by the presence of the CPUID
feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]).

More details on cldemote instruction can be found in the latest
Intel Architecture Instruction Set Extensions and Future Features
Programming Reference.

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "H. Peter Anvin" 
Cc: "Ashok Raj" 
Link: 
https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index d554c11e01ff..578793e97431 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -320,6 +320,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ   (16*32+14) /* POPCNT for vectors of 
DW/QW */
 #define X86_FEATURE_LA57   (16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
+#define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */

[tip:x86/urgent] x86/cpufeatures: Enumerate cldemote instruction

2018-04-25 Thread tip-bot for Fenghua Yu

Commit-ID:  ec8c7206b71d46ee50a23697933dfafec8d5c426
Gitweb: https://git.kernel.org/tip/ec8c7206b71d46ee50a23697933dfafec8d5c426
Author: Fenghua Yu 
AuthorDate: Mon, 23 Apr 2018 11:29:22 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 25 Apr 2018 10:56:24 +0200

x86/cpufeatures: Enumerate cldemote instruction

cldemote is a new instruction in future x86 processors. It hints
to hardware that a specified cache line should be moved ("demoted")
from the cache(s) closest to the processor core to a level more
distant from the processor core. This instruction is faster than
snooping to make the cache line available for other cores.

cldemote instruction is indicated by the presence of the CPUID
feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]).

More details on cldemote instruction can be found in the latest
Intel Architecture Instruction Set Extensions and Future Features
Programming Reference.

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "H. Peter Anvin" 
Cc: "Ashok Raj" 
Link: 
https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua...@intel.com

---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index d554c11e01ff..578793e97431 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -320,6 +320,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ   (16*32+14) /* POPCNT for vectors of 
DW/QW */
 #define X86_FEATURE_LA57   (16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
+#define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */

[tip:x86/cache] x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature

2018-01-18 Thread tip-bot for Fenghua Yu

Commit-ID:  a511e7935378ef1f321456a90beae2a2632d3d83
Gitweb: https://git.kernel.org/tip/a511e7935378ef1f321456a90beae2a2632d3d83
Author: Fenghua Yu 
AuthorDate: Wed, 20 Dec 2017 14:57:21 -0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jan 2018 09:33:30 +0100

x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature

L2 Code and Data Prioritization (CDP) is enumerated in
CPUID(EAX=0x10, ECX=0x2):ECX.bit2

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: Vikas" 
Cc: Sai Praneeth" 
Cc: Reinette" 
Link: 
https://lkml.kernel.org/r/1513810644-78015-4-git-send-email-fenghua...@intel.com


---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/scattered.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 25b9375..67bbfaa 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -206,6 +206,7 @@
 #define X86_FEATURE_RETPOLINE  ( 7*32+12) /* Generic Retpoline 
mitigation for Spectre variant 2 */
 #define X86_FEATURE_RETPOLINE_AMD  ( 7*32+13) /* AMD Retpoline mitigation 
for Spectre variant 2 */
 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory 
Number */
+#define X86_FEATURE_CDP_L2 ( 7*32+15) /* Code and Data 
Prioritization L2 */
 #define X86_FEATURE_AVX512_4VNNIW  ( 7*32+16) /* AVX-512 Neural Network 
Instructions */
 #define X86_FEATURE_AVX512_4FMAPS  ( 7*32+17) /* AVX-512 Multiply 
Accumulation Single precision */
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index d0e6976..df4d8f7 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -26,6 +26,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_CAT_L3,   CPUID_EBX,  1, 0x0010, 0 },
{ X86_FEATURE_CAT_L2,   CPUID_EBX,  2, 0x0010, 0 },
{ X86_FEATURE_CDP_L3,   CPUID_ECX,  2, 0x0010, 1 },
+   { X86_FEATURE_CDP_L2,   CPUID_ECX,  2, 0x0010, 2 },
{ X86_FEATURE_MBA,  CPUID_EBX,  3, 0x0010, 0 },
{ X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
{ X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },

[tip:x86/cache] x86/intel_rdt: Add L2CDP support in documentation

2018-01-18 Thread tip-bot for Fenghua Yu

Commit-ID:  aa55d5a4bd919f26fce519c470d11a58541c6aa7
Gitweb: https://git.kernel.org/tip/aa55d5a4bd919f26fce519c470d11a58541c6aa7
Author: Fenghua Yu 
AuthorDate: Wed, 20 Dec 2017 14:57:20 -0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jan 2018 09:33:30 +0100

x86/intel_rdt: Add L2CDP support in documentation

L2 and L3 Code and Data Prioritization (CDP) can be enabled separately.
The existing mount parameter "cdp" is only for enabling L3 CDP and will be
kept for backwards compability.

Add a new mount parameter 'cdpl2' for L2 CDP.

[ tglx: Made changelog readable ]

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: Vikas" 
Cc: Sai Praneeth" 
Cc: Reinette" 
Link: 
https://lkml.kernel.org/r/1513810644-78015-3-git-send-email-fenghua...@intel.com


---
 Documentation/x86/intel_rdt_ui.txt | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt 
b/Documentation/x86/intel_rdt_ui.txt
index 1ad77b1..756fd76 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -10,18 +10,21 @@ This feature is enabled by the CONFIG_INTEL_RDT Kconfig and 
the
 X86 /proc/cpuinfo flag bits:
 RDT (Resource Director Technology) Allocation - "rdt_a"
 CAT (Cache Allocation Technology) - "cat_l3", "cat_l2"
-CDP (Code and Data Prioritization ) - "cdp_l3"
+CDP (Code and Data Prioritization ) - "cdp_l3", "cdp_l2"
 CQM (Cache QoS Monitoring) - "cqm_llc", "cqm_occup_llc"
 MBM (Memory Bandwidth Monitoring) - "cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation) - "mba"
 
 To use the feature mount the file system:
 
- # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
+ # mount -t resctrl resctrl [-o cdp[,cdpl2]] /sys/fs/resctrl
 
 mount options are:
 
 "cdp": Enable code/data prioritization in L3 cache allocations.
+"cdpl2": Enable code/data prioritization in L2 cache allocations.
+
+L2 and L3 CDP are controlled seperately.
 
 RDT features are orthogonal. A particular system may support only
 monitoring, only control, or both monitoring and control.

[tip:x86/cache] x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG

2018-01-18 Thread tip-bot for Fenghua Yu

Commit-ID:  99adde9b370de8e07ef76630c6f60dbf586cdf0e
Gitweb: https://git.kernel.org/tip/99adde9b370de8e07ef76630c6f60dbf586cdf0e
Author: Fenghua Yu 
AuthorDate: Wed, 20 Dec 2017 14:57:23 -0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jan 2018 09:33:31 +0100

x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG

Bit 0 in MSR IA32_L2_QOS_CFG (0xc82) is L2 CDP enable bit. By default,
the bit is zero, i.e. L2 CAT is enabled, and L2 CDP is disabled. When
the resctrl mount parameter "cdpl2" is given, the bit is set to 1 and L2
CDP is enabled.

In L2 CDP mode, the L2 CAT mask MSRs are re-mapped into interleaved pairs
of mask MSRs for code (referenced by an odd CLOSID) and data (referenced by
an even CLOSID).

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: Vikas" 
Cc: Sai Praneeth" 
Cc: Reinette" 
Link: 
https://lkml.kernel.org/r/1513810644-78015-6-git-send-email-fenghua...@intel.com


---
 arch/x86/kernel/cpu/intel_rdt.h  |   3 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 117 ---
 2 files changed, 94 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 19ffc5a..3fd7a70 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -7,12 +7,15 @@
 #include 
 
 #define IA32_L3_QOS_CFG0xc81
+#define IA32_L2_QOS_CFG0xc82
 #define IA32_L3_CBM_BASE   0xc90
 #define IA32_L2_CBM_BASE   0xd10
 #define IA32_MBA_THRTL_BASE0xd50
 
 #define L3_QOS_CDP_ENABLE  0x01ULL
 
+#define L2_QOS_CDP_ENABLE  0x01ULL
+
 /*
  * Event IDs are used to program IA32_QM_EVTSEL before reading event
  * counter from IA32_QM_CTR
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 64c5ff9..bdab7d2 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -990,6 +990,7 @@ out_destroy:
kernfs_remove(kn);
return ret;
 }
+
 static void l3_qos_cfg_update(void *arg)
 {
bool *enable = arg;
@@ -997,8 +998,17 @@ static void l3_qos_cfg_update(void *arg)
wrmsrl(IA32_L3_QOS_CFG, *enable ? L3_QOS_CDP_ENABLE : 0ULL);
 }
 
-static int set_l3_qos_cfg(struct rdt_resource *r, bool enable)
+static void l2_qos_cfg_update(void *arg)
 {
+   bool *enable = arg;
+
+   wrmsrl(IA32_L2_QOS_CFG, *enable ? L2_QOS_CDP_ENABLE : 0ULL);
+}
+
+static int set_cache_qos_cfg(int level, bool enable)
+{
+   void (*update)(void *arg);
+   struct rdt_resource *r_l;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
int cpu;
@@ -1006,16 +1016,24 @@ static int set_l3_qos_cfg(struct rdt_resource *r, bool 
enable)
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;
 
-   list_for_each_entry(d, &r->domains, list) {
+   if (level == RDT_RESOURCE_L3)
+   update = l3_qos_cfg_update;
+   else if (level == RDT_RESOURCE_L2)
+   update = l2_qos_cfg_update;
+   else
+   return -EINVAL;
+
+   r_l = &rdt_resources_all[level];
+   list_for_each_entry(d, &r_l->domains, list) {
/* Pick one CPU from each domain instance to update MSR */
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
}
cpu = get_cpu();
/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
if (cpumask_test_cpu(cpu, cpu_mask))
-   l3_qos_cfg_update(&enable);
+   update(&enable);
/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-   smp_call_function_many(cpu_mask, l3_qos_cfg_update, &enable, 1);
+   smp_call_function_many(cpu_mask, update, &enable, 1);
put_cpu();
 
free_cpumask_var(cpu_mask);
@@ -1023,52 +1041,99 @@ static int set_l3_qos_cfg(struct rdt_resource *r, bool 
enable)
return 0;
 }
 
-static int cdp_enable(void)
+static int cdp_enable(int level, int data_type, int code_type)
 {
-   struct rdt_resource *r_l3data = &rdt_resources_all[RDT_RESOURCE_L3DATA];
-   struct rdt_resource *r_l3code = &rdt_resources_all[RDT_RESOURCE_L3CODE];
-   struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3];
+   struct rdt_resource *r_ldata = &rdt_resources_all[data_type];
+   struct rdt_resource *r_lcode = &rdt_resources_all[code_type];
+   struct rdt_resource *r_l = &rdt_resources_all[level];
int ret;
 
-   if (!r_l3->alloc_capable || !r_l3data->alloc_capable ||
-   !r_l3code->alloc_capable)
+   if (!r_l->alloc_capable || !r_ldata->alloc_capable ||
+   !r_lcode->alloc_capable)
return -EINVAL;
 
-   ret = set_l3_qos_cfg(r_l3, true);
+   ret = set_cache_qos_cfg(level, true);
if (!ret) {
-   r_l3->alloc_enabled = false;
-   r_l3data->alloc_enabled = true;
-   r_l3code->alloc_enabled =

[tip:x86/cache] x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)

2018-01-18 Thread tip-bot for Fenghua Yu

Commit-ID:  def10853930a82456ab862a3a8292a3a16c386e7
Gitweb: https://git.kernel.org/tip/def10853930a82456ab862a3a8292a3a16c386e7
Author: Fenghua Yu 
AuthorDate: Wed, 20 Dec 2017 14:57:22 -0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jan 2018 09:33:31 +0100

x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)

L2 data and L2 code are added as new resources in rdt_resources_all[]
and data in the resources are configured.

When L2 CDP is enabled, the schemata will have the two resources in
this format:
L2DATA:l2id0=;l2id1=;
L2CODE:l2id0=;l2id1=;

 represent CBM (Cache Bit Mask) values in the schemata, similar to all
others (L2 CAT/L3 CAT/L3 CDP).

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: Vikas" 
Cc: Sai Praneeth" 
Cc: Reinette" 
Link: 
https://lkml.kernel.org/r/1513810644-78015-5-git-send-email-fenghua...@intel.com


---
 arch/x86/kernel/cpu/intel_rdt.c | 66 ++---
 arch/x86/kernel/cpu/intel_rdt.h |  2 ++
 2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 9944237..5202da0 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -135,6 +135,40 @@ struct rdt_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
+   [RDT_RESOURCE_L2DATA] =
+   {
+   .rid= RDT_RESOURCE_L2DATA,
+   .name   = "L2DATA",
+   .domains= domain_init(RDT_RESOURCE_L2DATA),
+   .msr_base   = IA32_L2_CBM_BASE,
+   .msr_update = cat_wrmsr,
+   .cache_level= 2,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 2,
+   .cbm_idx_offset = 0,
+   },
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
+   [RDT_RESOURCE_L2CODE] =
+   {
+   .rid= RDT_RESOURCE_L2CODE,
+   .name   = "L2CODE",
+   .domains= domain_init(RDT_RESOURCE_L2CODE),
+   .msr_base   = IA32_L2_CBM_BASE,
+   .msr_update = cat_wrmsr,
+   .cache_level= 2,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 2,
+   .cbm_idx_offset = 1,
+   },
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
[RDT_RESOURCE_MBA] =
{
.rid= RDT_RESOURCE_MBA,
@@ -259,15 +293,15 @@ static void rdt_get_cache_alloc_cfg(int idx, struct 
rdt_resource *r)
r->alloc_enabled = true;
 }
 
-static void rdt_get_cdp_l3_config(int type)
+static void rdt_get_cdp_config(int level, int type)
 {
-   struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3];
+   struct rdt_resource *r_l = &rdt_resources_all[level];
struct rdt_resource *r = &rdt_resources_all[type];
 
-   r->num_closid = r_l3->num_closid / 2;
-   r->cache.cbm_len = r_l3->cache.cbm_len;
-   r->default_ctrl = r_l3->default_ctrl;
-   r->cache.shareable_bits = r_l3->cache.shareable_bits;
+   r->num_closid = r_l->num_closid / 2;
+   r->cache.cbm_len = r_l->cache.cbm_len;
+   r->default_ctrl = r_l->default_ctrl;
+   r->cache.shareable_bits = r_l->cache.shareable_bits;
r->data_width = (r->cache.cbm_len + 3) / 4;
r->alloc_capable = true;
/*
@@ -277,6 +311,18 @@ static void rdt_get_cdp_l3_config(int type)
r->alloc_enabled = false;
 }
 
+static void rdt_get_cdp_l3_config(void)
+{
+   rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA);
+   rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3CODE);
+}
+
+static void rdt_get_cdp_l2_config(void)
+{
+   rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA);
+   rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2CODE);
+}
+
 static int get_cache_id(int cpu, int level)
 {
struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
@@ -729,15 +775,15 @@ static __init bool get_rdt_alloc_resources(void)
 
if (rdt_cpu_has(X86_FEATURE_CAT_L3)) {
rdt_get_cache_alloc_cfg(1, &rdt_resources_all[RDT_RESOURCE_L3]);
-   if (rdt_cpu_has(X86_FEATURE_CDP_L3)) {
-   rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA);
-   rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE);
-

[tip:x86/cache] x86/intel_rdt: Add command line parameter to control L2_CDP

2018-01-18 Thread tip-bot for Fenghua Yu

Commit-ID:  31516de306c0c9235156cdc7acb976ea21f1f646
Gitweb: https://git.kernel.org/tip/31516de306c0c9235156cdc7acb976ea21f1f646
Author: Fenghua Yu 
AuthorDate: Wed, 20 Dec 2017 14:57:24 -0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jan 2018 09:33:32 +0100

x86/intel_rdt: Add command line parameter to control L2_CDP

L2 CDP can be controlled by kernel parameter "rdt=".
If "rdt=l2cdp", L2 CDP is turned on.
If "rdt=!l2cdp", L2 CDP is turned off.

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: Vikas" 
Cc: Sai Praneeth" 
Cc: Reinette" 
Link: 
https://lkml.kernel.org/r/1513810644-78015-7-git-send-email-fenghua...@intel.com


---
 Documentation/admin-guide/kernel-parameters.txt | 3 ++-
 arch/x86/kernel/cpu/intel_rdt.c | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 46b26bf..fde058c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3682,7 +3682,8 @@
 
rdt=[HW,X86,RDT]
Turn on/off individual RDT features. List is:
-   cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, mba.
+   cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
+   mba.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 5202da0..410629f 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -691,6 +691,7 @@ enum {
RDT_FLAG_L3_CAT,
RDT_FLAG_L3_CDP,
RDT_FLAG_L2_CAT,
+   RDT_FLAG_L2_CDP,
RDT_FLAG_MBA,
 };
 
@@ -713,6 +714,7 @@ static struct rdt_options rdt_options[]  __initdata = {
RDT_OPT(RDT_FLAG_L3_CAT,"l3cat",X86_FEATURE_CAT_L3),
RDT_OPT(RDT_FLAG_L3_CDP,"l3cdp",X86_FEATURE_CDP_L3),
RDT_OPT(RDT_FLAG_L2_CAT,"l2cat",X86_FEATURE_CAT_L2),
+   RDT_OPT(RDT_FLAG_L2_CDP,"l2cdp",X86_FEATURE_CDP_L2),
RDT_OPT(RDT_FLAG_MBA,   "mba",  X86_FEATURE_MBA),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)

[tip:x86/cache] x86/intel_rdt: Update documentation

2018-01-18 Thread tip-bot for Fenghua Yu

Commit-ID:  0ff8e080b18d1d2dbe5c866d5f31c27ab806a785
Gitweb: https://git.kernel.org/tip/0ff8e080b18d1d2dbe5c866d5f31c27ab806a785
Author: Fenghua Yu 
AuthorDate: Wed, 20 Dec 2017 14:57:19 -0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jan 2018 09:33:30 +0100

x86/intel_rdt: Update documentation

With more flag bits in /proc/cpuinfo for RDT, it's better to classify the
bits for readability.

Some previously missing bits are added as well.

Signed-off-by: Fenghua Yu 
Signed-off-by: Thomas Gleixner 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: Vikas" 
Cc: Sai Praneeth" 
Cc: Reinette" 
Link: 
https://lkml.kernel.org/r/1513810644-78015-2-git-send-email-fenghua...@intel.com


---
 Documentation/x86/intel_rdt_ui.txt | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt 
b/Documentation/x86/intel_rdt_ui.txt
index 6851854..1ad77b1 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -7,7 +7,13 @@ Tony Luck 
 Vikas Shivappa 
 
 This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the
-X86 /proc/cpuinfo flag bits "rdt", "cqm", "cat_l3" and "cdp_l3".
+X86 /proc/cpuinfo flag bits:
+RDT (Resource Director Technology) Allocation - "rdt_a"
+CAT (Cache Allocation Technology) - "cat_l3", "cat_l2"
+CDP (Code and Data Prioritization ) - "cdp_l3"
+CQM (Cache QoS Monitoring) - "cqm_llc", "cqm_occup_llc"
+MBM (Memory Bandwidth Monitoring) - "cqm_mbm_total", "cqm_mbm_local"
+MBA (Memory Bandwidth Allocation) - "mba"
 
 To use the feature mount the file system:

[tip:x86/cache] x86/intel_rdt: Show bitmask of shareable resource with other executing units

2017-08-01 Thread tip-bot for Fenghua Yu

Commit-ID:  0dd2d7494cd818d06a2ae1cd840cd62124a2d25e
Gitweb: http://git.kernel.org/tip/0dd2d7494cd818d06a2ae1cd840cd62124a2d25e
Author: Fenghua Yu 
AuthorDate: Tue, 25 Jul 2017 15:39:04 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 1 Aug 2017 22:41:30 +0200

x86/intel_rdt: Show bitmask of shareable resource with other executing units

CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource.
Each set bit within the length of the CBM indicates the corresponding
unit of the resource allocation may be used by other entities in the
platform (e.g. an integrated graphics engine or hardware units outside
the processor core and have direct access to the resource). Each
cleared bit within the length of the CBM indicates the corresponding
allocation unit can be configured to implement a priority-based
allocation scheme without interference with other hardware agents in
the system. Bits outside the length of the CBM are reserved.

More details on the bit mask are described in x86 Software Developer's
Manual.

The bitmask is shown in "info" directory for each resource. It's
up to user to decide how to use the bitmask within a CBM in a partition
to share or isolate a resource with other executing units.

Suggested-by: Reinette Chatre 
Signed-off-by: Fenghua Yu 
Signed-off-by: Tony Luck 
Signed-off-by: Thomas Gleixner 
Cc: ravi.v.shan...@intel.com
Cc: pet...@infradead.org
Cc: eran...@google.com
Cc: a...@linux.intel.com
Cc: davi...@google.com
Cc: vikas.shiva...@linux.intel.com
Link: http://lkml.kernel.org/r/20170725223904.12996-1-tony.l...@intel.com

---
 Documentation/x86/intel_rdt_ui.txt   |  7 +++
 arch/x86/kernel/cpu/intel_rdt.c  |  2 ++
 arch/x86/kernel/cpu/intel_rdt.h  |  3 +++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 16 
 4 files changed, 28 insertions(+)

diff --git a/Documentation/x86/intel_rdt_ui.txt 
b/Documentation/x86/intel_rdt_ui.txt
index 76f21e2..4d8848e 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -48,6 +48,13 @@ related to allocation:
 "min_cbm_bits":The minimum number of consecutive bits which
must be set when writing a mask.
 
+"shareable_bits":  Bitmask of shareable resource with other executing
+   entities (e.g. I/O). User can use this when
+   setting up exclusive cache partitions. Note that
+   some platforms support devices that have their
+   own settings for cache use which can over-ride
+   these bits.
+
 Memory bandwitdh(MB) subdirectory contains the following files
 with respect to allocation:
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index de26aa7..da4f389 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -193,6 +193,7 @@ static inline bool cache_alloc_hsw_probe(void)
r->num_closid = 4;
r->default_ctrl = max_cbm;
r->cache.cbm_len = 20;
+   r->cache.shareable_bits = 0xc;
r->cache.min_cbm_bits = 2;
r->alloc_capable = true;
r->alloc_enabled = true;
@@ -260,6 +261,7 @@ static void rdt_get_cache_alloc_cfg(int idx, struct 
rdt_resource *r)
r->num_closid = edx.split.cos_max + 1;
r->cache.cbm_len = eax.split.cbm_len + 1;
r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
+   r->cache.shareable_bits = ebx & r->default_ctrl;
r->data_width = (r->cache.cbm_len + 3) / 4;
r->alloc_capable = true;
r->alloc_enabled = true;
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 94e488a..4040bf1 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -227,12 +227,15 @@ struct msr_param {
  * @cbm_idx_offset:Offset of CBM index. CBM index is computed by:
  * closid * cbm_idx_multi + cbm_idx_offset
  * in a cache bit mask
+ * @shareable_bits:Bitmask of shareable resource with other
+ * executing entities
  */
 struct rdt_cache {
unsigned intcbm_len;
unsigned intmin_cbm_bits;
unsigned intcbm_idx_mult;
unsigned intcbm_idx_offset;
+   unsigned intshareable_bits;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index c24dd06..2621ae3 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -596,6 +596,15 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file 
*of,
return 0;
 }
 
+static int rdt_shareable_bits_show(struct kernfs_open_file *of,
+  struct seq_file *seq, void *v)
+{
+   struct rdt_resource *r = of->kn->parent->priv;
+
+   seq_printf(seq, "%x\n", r->cache.shareable_bits);
+

[tip:x86/cache] x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled

2016-12-01 Thread tip-bot for Fenghua Yu

Commit-ID:  74fcdae1a7fdf30de5413ccc1eca271415d01124
Gitweb: http://git.kernel.org/tip/74fcdae1a7fdf30de5413ccc1eca271415d01124
Author: Fenghua Yu 
AuthorDate: Thu, 1 Dec 2016 12:55:14 -0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Dec 2016 01:13:02 +0100

x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled

intel_rdt_sched_in() must be called with preemption disabled because the
function accesses percpu variables (pqr_state and closid).

If a task moves itself via move_myself() preemption is enabled, which
violates the calling convention and can result in incorrect closid
selection when the task gets preempted or migrated.

Add the required protection and a comment about the calling convention.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Marcelo Tosatti" 
Cc: "Sai Prakhya" 
Cc: "Vikas Shivappa" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1480625714-54246-1-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/intel_rdt.h | 2 ++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 6e90e87..95ce5c8 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -192,6 +192,8 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
  *   resctrl file system.
  * - Caches the per cpu CLOSid values and does the MSR write only
  *   when a task with a different CLOSid is scheduled in.
+ *
+ * Must be called with preemption disabled.
  */
 static inline void intel_rdt_sched_in(void)
 {
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index fb8e03e..1afd3f3 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -326,8 +326,10 @@ static void move_myself(struct callback_head *head)
kfree(rdtgrp);
}
 
+   preempt_disable();
/* update PQR_ASSOC MSR to make resource group go into effect */
intel_rdt_sched_in();
+   preempt_enable();
 
kfree(callback);
 }

[tip:x86/cache] x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount

2016-11-28 Thread tip-bot for Fenghua Yu

Commit-ID:  0efc89be9471b152599d2db7eb47de8e0d71c59f
Gitweb: http://git.kernel.org/tip/0efc89be9471b152599d2db7eb47de8e0d71c59f
Author: Fenghua Yu 
AuthorDate: Fri, 18 Nov 2016 15:18:04 -0800
Committer:  Thomas Gleixner 
CommitDate: Mon, 28 Nov 2016 11:07:50 +0100

x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount

When removing a sub directory/rdtgroup by rmdir or umount, closid in a
task in the sub directory is set to default rdtgroup's closid which is 0.
If the task is running on a CPU, the PQR_ASSOC MSR is only updated
when the task runs through a context switch. Up to the context switch,
the task runs with the wrong closid.

Make the change immediately effective by invoking a smp function call on
all CPUs which are running moved task. If one of the affected tasks was
moved or scheduled out before the function call is executed on the CPU the
only damage is the extra interruption of the CPU.

[ tglx: Reworked it to avoid blindly interrupting all CPUs and extra loops ]

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Sai Prakhya" 
Cc: "Vikas Shivappa" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1479511084-59727-2-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 113 +++
 1 file changed, 83 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index eccea8a..fb8e03e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -194,12 +194,13 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
 /*
  * This is safe against intel_rdt_sched_in() called from __switch_to()
  * because __switch_to() is executed with interrupts disabled. A local call
- * from rdt_update_percpu_closid() is proteced against __switch_to() because
+ * from rdt_update_closid() is proteced against __switch_to() because
  * preemption is disabled.
  */
-static void rdt_update_cpu_closid(void *v)
+static void rdt_update_cpu_closid(void *closid)
 {
-   this_cpu_write(cpu_closid, *(int *)v);
+   if (closid)
+   this_cpu_write(cpu_closid, *(int *)closid);
/*
 * We cannot unconditionally write the MSR because the current
 * executing task might have its own closid selected. Just reuse
@@ -208,14 +209,23 @@ static void rdt_update_cpu_closid(void *v)
intel_rdt_sched_in();
 }
 
-/* Update the per cpu closid and eventually the PGR_ASSOC MSR */
-static void rdt_update_percpu_closid(const struct cpumask *cpu_mask, int 
closid)
+/*
+ * Update the PGR_ASSOC MSR on all cpus in @cpu_mask,
+ *
+ * Per task closids must have been set up before calling this function.
+ *
+ * The per cpu closids are updated with the smp function call, when @closid
+ * is not NULL. If @closid is NULL then all affected percpu closids must
+ * have been set up before calling this function.
+ */
+static void
+rdt_update_closid(const struct cpumask *cpu_mask, int *closid)
 {
int cpu = get_cpu();
 
if (cpumask_test_cpu(cpu, cpu_mask))
-   rdt_update_cpu_closid(&closid);
-   smp_call_function_many(cpu_mask, rdt_update_cpu_closid, &closid, 1);
+   rdt_update_cpu_closid(closid);
+   smp_call_function_many(cpu_mask, rdt_update_cpu_closid, closid, 1);
put_cpu();
 }
 
@@ -264,7 +274,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file 
*of,
/* Give any dropped cpus to rdtgroup_default */
cpumask_or(&rdtgroup_default.cpu_mask,
   &rdtgroup_default.cpu_mask, tmpmask);
-   rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid);
+   rdt_update_closid(tmpmask, &rdtgroup_default.closid);
}
 
/*
@@ -278,7 +288,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file 
*of,
continue;
cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask);
}
-   rdt_update_percpu_closid(tmpmask, rdtgrp->closid);
+   rdt_update_closid(tmpmask, &rdtgrp->closid);
}
 
/* Done pushing/pulling - update this group with new mask */
@@ -807,18 +817,49 @@ static int reset_all_cbms(struct rdt_resource *r)
 }
 
 /*
- * Forcibly remove all of subdirectories under root.
+ * Move tasks from one to the other group. If @from is NULL, then all tasks
+ * in the systems are moved unconditionally (used for teardown).
+ *
+ * If @mask is not NULL the cpus on which moved tasks are running are set
+ * in that mask so the update smp function call is restricted to affected
+ * cpus.
  */
-static void rmdir_all_sub(void)
+static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
+struct cpumask *mask)
 {
-   struct rdtgroup *rdtgrp, *tmp;
struct task_struct *p,

[tip:x86/cache] x86/intel_rdt: Fix setting of closid when adding CPUs to a group

2016-11-28 Thread tip-bot for Fenghua Yu

Commit-ID:  2659f46da8307871989f475accdcdfc4807e9e6c
Gitweb: http://git.kernel.org/tip/2659f46da8307871989f475accdcdfc4807e9e6c
Author: Fenghua Yu 
AuthorDate: Fri, 18 Nov 2016 15:18:03 -0800
Committer:  Thomas Gleixner 
CommitDate: Mon, 28 Nov 2016 11:07:50 +0100

x86/intel_rdt: Fix setting of closid when adding CPUs to a group

There was a cut & paste error when adding code to update the per-cpu
closid when changing the bitmask of CPUs to an rdt group.

The update erronously assigns the closid of the default group to the CPUs
which are moved to a group instead of assigning the closid of their new
group. Use the proper closid.

Fixes: f410770293a1 ("x86/intel_rdt: Update percpu closid immeditately on CPUs 
affected by change")
Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Sai Prakhya" 
Cc: "Vikas Shivappa" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1479511084-59727-1-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 98edba4..eccea8a 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -278,7 +278,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file 
*of,
continue;
cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask);
}
-   rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid);
+   rdt_update_percpu_closid(tmpmask, rdtgrp->closid);
}
 
/* Done pushing/pulling - update this group with new mask */

[tip:x86/cache] x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee

2016-11-15 Thread tip-bot for Fenghua Yu

Commit-ID:  f410770293a1fbc08906474c24104a7a11943eb6
Gitweb: http://git.kernel.org/tip/f410770293a1fbc08906474c24104a7a11943eb6
Author: Fenghua Yu 
AuthorDate: Fri, 11 Nov 2016 17:02:38 -0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 15 Nov 2016 18:35:50 +0100

x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee

If CPUs are moved to or removed from a rdtgroup, the percpu closid storage
is updated. If tasks running on an affected CPU use the percpu closid then
the PQR_ASSOC MSR is only updated when the task runs through a context
switch. Up to the context switch the CPUs operate on the wrong closid. This
state is potentially unbound.

Make the change immediately effective by invoking a smp function call on
the affected CPUs which stores the new closid in the perpu storage and
calls the rdt_sched_in() function which updates the MSR, if the current
task uses the percpu closid.

[ tglx: Made it work and massaged changelog once more ]

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Sai Prakhya" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1478912558-55514-3-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 72 
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index d6bad09..98edba4 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -191,12 +191,40 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
return ret;
 }
 
+/*
+ * This is safe against intel_rdt_sched_in() called from __switch_to()
+ * because __switch_to() is executed with interrupts disabled. A local call
+ * from rdt_update_percpu_closid() is proteced against __switch_to() because
+ * preemption is disabled.
+ */
+static void rdt_update_cpu_closid(void *v)
+{
+   this_cpu_write(cpu_closid, *(int *)v);
+   /*
+* We cannot unconditionally write the MSR because the current
+* executing task might have its own closid selected. Just reuse
+* the context switch code.
+*/
+   intel_rdt_sched_in();
+}
+
+/* Update the per cpu closid and eventually the PGR_ASSOC MSR */
+static void rdt_update_percpu_closid(const struct cpumask *cpu_mask, int 
closid)
+{
+   int cpu = get_cpu();
+
+   if (cpumask_test_cpu(cpu, cpu_mask))
+   rdt_update_cpu_closid(&closid);
+   smp_call_function_many(cpu_mask, rdt_update_cpu_closid, &closid, 1);
+   put_cpu();
+}
+
 static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
   char *buf, size_t nbytes, loff_t off)
 {
cpumask_var_t tmpmask, newmask;
struct rdtgroup *rdtgrp, *r;
-   int ret, cpu;
+   int ret;
 
if (!buf)
return -EINVAL;
@@ -236,8 +264,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file 
*of,
/* Give any dropped cpus to rdtgroup_default */
cpumask_or(&rdtgroup_default.cpu_mask,
   &rdtgroup_default.cpu_mask, tmpmask);
-   for_each_cpu(cpu, tmpmask)
-   per_cpu(cpu_closid, cpu) = 0;
+   rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid);
}
 
/*
@@ -251,8 +278,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file 
*of,
continue;
cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask);
}
-   for_each_cpu(cpu, tmpmask)
-   per_cpu(cpu_closid, cpu) = rdtgrp->closid;
+   rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid);
}
 
/* Done pushing/pulling - update this group with new mask */
@@ -781,25 +807,12 @@ static int reset_all_cbms(struct rdt_resource *r)
 }
 
 /*
- * MSR_IA32_PQR_ASSOC is scoped per logical CPU, so all updates
- * are always in thread context.
- */
-static void rdt_reset_pqr_assoc_closid(void *v)
-{
-   struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
-
-   state->closid = 0;
-   wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
-}
-
-/*
  * Forcibly remove all of subdirectories under root.
  */
 static void rmdir_all_sub(void)
 {
struct rdtgroup *rdtgrp, *tmp;
struct task_struct *p, *t;
-   int cpu;
 
/* move all tasks to default resource group */
read_lock(&tasklist_lock);
@@ -807,14 +820,6 @@ static void rmdir_all_sub(void)
t->closid = 0;
read_unlock(&tasklist_lock);
 
-   get_cpu();
-   /* Reset PQR_ASSOC MSR on this cpu. */
-   rdt_reset_pqr_assoc_closid(NULL);
-   /* Reset PQR_ASSOC MSR on the rest of cpus. */
-   smp_call_function_many(cpu_online_mask, rdt_reset_pqr_assoc_closid,
-

[tip:x86/cache] x86/intel_rdt: Protect info directory from removal

2016-11-15 Thread tip-bot for Fenghua Yu

Commit-ID:  f57b308728902d9ffade53466e9201e999a870e4
Gitweb: http://git.kernel.org/tip/f57b308728902d9ffade53466e9201e999a870e4
Author: Fenghua Yu 
AuthorDate: Fri, 11 Nov 2016 17:02:36 -0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 15 Nov 2016 18:35:49 +0100

x86/intel_rdt: Protect info directory from removal

The info directory and the per-resource subdirectories of the info
directory have no reference to a struct rdtgroup in kn->priv. An attempt to
remove one of those directories results in a NULL pointer dereference.

Protect the directories from removal and return -EPERM instead of -ENOENT.

[ tglx: Massaged changelog ]

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Sai Prakhya" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1478912558-55514-1-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 4795880..cff286e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -644,16 +644,29 @@ static int parse_rdtgroupfs_options(char *data)
  */
 static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
 {
-   if (kernfs_type(kn) == KERNFS_DIR)
-   return kn->priv;
-   else
+   if (kernfs_type(kn) == KERNFS_DIR) {
+   /*
+* All the resource directories use "kn->priv"
+* to point to the "struct rdtgroup" for the
+* resource. "info" and its subdirectories don't
+* have rdtgroup structures, so return NULL here.
+*/
+   if (kn == kn_info || kn->parent == kn_info)
+   return NULL;
+   else
+   return kn->priv;
+   } else {
return kn->parent->priv;
+   }
 }
 
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
 {
struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
 
+   if (!rdtgrp)
+   return NULL;
+
atomic_inc(&rdtgrp->waitcount);
kernfs_break_active_protection(kn);
 
@@ -670,6 +683,9 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
 {
struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
 
+   if (!rdtgrp)
+   return;
+
mutex_unlock(&rdtgroup_mutex);
 
if (atomic_dec_and_test(&rdtgrp->waitcount) &&
@@ -918,7 +934,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
rdtgrp = rdtgroup_kn_lock_live(kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(kn);
-   return -ENOENT;
+   return -EPERM;
}
 
/* Give any tasks back to the default group */

[tip:x86/cache] x86/intel_rdt: Reset per cpu closids on unmount

2016-11-15 Thread tip-bot for Fenghua Yu

Commit-ID:  c7cc0cc10cdecc275211c8749defba6c41aaf5de
Gitweb: http://git.kernel.org/tip/c7cc0cc10cdecc275211c8749defba6c41aaf5de
Author: Fenghua Yu 
AuthorDate: Fri, 11 Nov 2016 17:02:37 -0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 15 Nov 2016 18:35:50 +0100

x86/intel_rdt: Reset per cpu closids on unmount

All CPUs in a rdtgroup are given back to the default rdtgroup before the
rdtgroup is removed during umount. After umount, the default rdtgroup
contains all online CPUs, but the per cpu closids are not cleared. As a
result the stale closid value will be used immediately after the next
mount.

Move all cpus to the default group and update the percpu closid storage.

[ tglx: Massaged changelong ]

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Sai Prakhya" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1478912558-55514-2-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 2f54931..d6bad09 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -799,6 +799,7 @@ static void rmdir_all_sub(void)
 {
struct rdtgroup *rdtgrp, *tmp;
struct task_struct *p, *t;
+   int cpu;
 
/* move all tasks to default resource group */
read_lock(&tasklist_lock);
@@ -813,14 +814,29 @@ static void rmdir_all_sub(void)
smp_call_function_many(cpu_online_mask, rdt_reset_pqr_assoc_closid,
   NULL, 1);
put_cpu();
+
list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
/* Remove each rdtgroup other than root */
if (rdtgrp == &rdtgroup_default)
continue;
+
+   /*
+* Give any CPUs back to the default group. We cannot copy
+* cpu_online_mask because a CPU might have executed the
+* offline callback already, but is still marked online.
+*/
+   cpumask_or(&rdtgroup_default.cpu_mask,
+  &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
+
kernfs_remove(rdtgrp->kn);
list_del(&rdtgrp->rdtgroup_list);
kfree(rdtgrp);
}
+
+   /* Reset all per cpu closids to the default value */
+   for_each_cpu(cpu, &rdtgroup_default.cpu_mask)
+   per_cpu(cpu_closid, cpu) = 0;
+
kernfs_remove(kn_info);
 }

[tip:x86/cache] x86/intel_rdt: Add scheduler hook

2016-10-30 Thread tip-bot for Fenghua Yu

Commit-ID:  4f341a5e48443fcc2e2d935ca990e462c02bb1a6
Gitweb: http://git.kernel.org/tip/4f341a5e48443fcc2e2d935ca990e462c02bb1a6
Author: Fenghua Yu 
AuthorDate: Fri, 28 Oct 2016 15:04:48 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 30 Oct 2016 19:10:16 -0600

x86/intel_rdt: Add scheduler hook

Hook the x86 scheduler code to update closid based on whether the current
task is assigned to a specific closid or running on a CPU assigned to a
specific closid.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Shaohua Li" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "David Carrillo-Cisneros" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477692289-37412-10-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/intel_rdt.h | 42 
 arch/x86/kernel/cpu/intel_rdt.c  |  1 -
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  3 +++
 arch/x86/kernel/process_32.c |  4 +++
 arch/x86/kernel/process_64.c |  4 +++
 5 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 2e5eab0..5bc72a4 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -1,8 +1,12 @@
 #ifndef _ASM_X86_INTEL_RDT_H
 #define _ASM_X86_INTEL_RDT_H
 
+#ifdef CONFIG_INTEL_RDT_A
+
 #include 
 
+#include 
+
 #define IA32_L3_QOS_CFG0xc81
 #define IA32_L3_CBM_BASE   0xc90
 #define IA32_L2_CBM_BASE   0xd10
@@ -176,4 +180,42 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file 
*of,
char *buf, size_t nbytes, loff_t off);
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
   struct seq_file *s, void *v);
+
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ *   which supports resource control and we enable by mounting the
+ *   resctrl file system.
+ * - Caches the per cpu CLOSid values and does the MSR write only
+ *   when a task with a different CLOSid is scheduled in.
+ */
+static inline void intel_rdt_sched_in(void)
+{
+   if (static_branch_likely(&rdt_enable_key)) {
+   struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+   int closid;
+
+   /*
+* If this task has a closid assigned, use it.
+* Else use the closid assigned to this cpu.
+*/
+   closid = current->closid;
+   if (closid == 0)
+   closid = this_cpu_read(cpu_closid);
+
+   if (closid != state->closid) {
+   state->closid = closid;
+   wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
+   }
+   }
+}
+
+#else
+
+static inline void intel_rdt_sched_in(void) {}
+
+#endif /* CONFIG_INTEL_RDT_A */
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 40094ae..5a533fe 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -29,7 +29,6 @@
 #include 
 #include 
 
-#include 
 #include 
 #include 
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 5c4bab9..a90ad22 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -292,6 +292,9 @@ static void move_myself(struct callback_head *head)
kfree(rdtgrp);
}
 
+   /* update PQR_ASSOC MSR to make resource group go into effect */
+   intel_rdt_sched_in();
+
kfree(callback);
 }
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index bd7be8e..efe7f9f 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void __show_regs(struct pt_regs *regs, int all)
 {
@@ -299,5 +300,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct 
*next_p)
 
this_cpu_write(current_task, next_p);
 
+   /* Load the Intel cache allocation PQR MSR. */
+   intel_rdt_sched_in();
+
return prev_p;
 }
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3760b3..acd7d6f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 __visible DEFINE_PER_CPU(unsigned long, rsp_scratch);
 
@@ -473,6 +474,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct 
*next_p)
loadsegment(ss, __KERNEL_DS);
}
 
+   /* Load the Intel

[tip:x86/cache] MAINTAINERS: Add maintainer for Intel RDT resource allocation

2016-10-30 Thread tip-bot for Fenghua Yu

Commit-ID:  48553d103d0b63991a08980889c6a35b3e05b5e3
Gitweb: http://git.kernel.org/tip/48553d103d0b63991a08980889c6a35b3e05b5e3
Author: Fenghua Yu 
AuthorDate: Fri, 28 Oct 2016 15:04:49 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 30 Oct 2016 19:10:17 -0600

MAINTAINERS: Add maintainer for Intel RDT resource allocation

We create five new files for Intel RDT resource allocation:
arch/x86/kernel/cpu/intel_rdt.c
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
arch/x86/kernel/cpu/intel_rdt_schemata.c
arch/x86/include/asm/intel_rdt.h
Documentation/x86/intel_rdt_ui.txt

Fenghua Yu will maintain this code.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Shaohua Li" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "David Carrillo-Cisneros" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477692289-37412-11-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c447953..4e6a044 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10107,6 +10107,14 @@ L: linux-r...@vger.kernel.org
 S: Supported
 F: drivers/infiniband/sw/rdmavt
 
+RDT - RESOURCE ALLOCATION
+M: Fenghua Yu 
+L: linux-kernel@vger.kernel.org
+S: Supported
+F: arch/x86/kernel/cpu/intel_rdt*
+F: arch/x86/include/asm/intel_rdt*
+F: Documentation/x86/intel_rdt*
+
 READ-COPY UPDATE (RCU)
 M: "Paul E. McKenney" 
 M: Josh Triplett

[tip:x86/cache] x86/intel_rdt: Add tasks files

2016-10-30 Thread tip-bot for Fenghua Yu

Commit-ID:  e02737d5b82640497637d18428e2793bb7f02881
Gitweb: http://git.kernel.org/tip/e02737d5b82640497637d18428e2793bb7f02881
Author: Fenghua Yu 
AuthorDate: Fri, 28 Oct 2016 15:04:46 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 30 Oct 2016 19:10:15 -0600

x86/intel_rdt: Add tasks files

The root directory all subdirectories are automatically populated with a
read/write (mode 0644) file named "tasks". When read it will show all the
task IDs assigned to the resource group. Tasks can be added (one at a time)
to a group by writing the task ID to the file.  E.g.

Membership in a resource group is indicated by a new field in the
task_struct "int closid" which holds the CLOSID for each task. The default
resource group uses CLOSID=0 which means that all existing tasks when the
resctrl file system is mounted belong to the default group.

If a group is removed, tasks which are members of that group are moved to
the default group.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Shaohua Li" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "David Carrillo-Cisneros" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477692289-37412-8-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 180 +++
 include/linux/sched.h|   3 +
 2 files changed, 183 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index e05a186..5cc0865 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -267,6 +268,162 @@ unlock:
return ret ?: nbytes;
 }
 
+struct task_move_callback {
+   struct callback_headwork;
+   struct rdtgroup *rdtgrp;
+};
+
+static void move_myself(struct callback_head *head)
+{
+   struct task_move_callback *callback;
+   struct rdtgroup *rdtgrp;
+
+   callback = container_of(head, struct task_move_callback, work);
+   rdtgrp = callback->rdtgrp;
+
+   /*
+* If resource group was deleted before this task work callback
+* was invoked, then assign the task to root group and free the
+* resource group.
+*/
+   if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+   (rdtgrp->flags & RDT_DELETED)) {
+   current->closid = 0;
+   kfree(rdtgrp);
+   }
+
+   kfree(callback);
+}
+
+static int __rdtgroup_move_task(struct task_struct *tsk,
+   struct rdtgroup *rdtgrp)
+{
+   struct task_move_callback *callback;
+   int ret;
+
+   callback = kzalloc(sizeof(*callback), GFP_KERNEL);
+   if (!callback)
+   return -ENOMEM;
+   callback->work.func = move_myself;
+   callback->rdtgrp = rdtgrp;
+
+   /*
+* Take a refcount, so rdtgrp cannot be freed before the
+* callback has been invoked.
+*/
+   atomic_inc(&rdtgrp->waitcount);
+   ret = task_work_add(tsk, &callback->work, true);
+   if (ret) {
+   /*
+* Task is exiting. Drop the refcount and free the callback.
+* No need to check the refcount as the group cannot be
+* deleted before the write function unlocks rdtgroup_mutex.
+*/
+   atomic_dec(&rdtgrp->waitcount);
+   kfree(callback);
+   } else {
+   tsk->closid = rdtgrp->closid;
+   }
+   return ret;
+}
+
+static int rdtgroup_task_write_permission(struct task_struct *task,
+ struct kernfs_open_file *of)
+{
+   const struct cred *tcred = get_task_cred(task);
+   const struct cred *cred = current_cred();
+   int ret = 0;
+
+   /*
+* Even if we're attaching all tasks in the thread group, we only
+* need to check permissions on one of them.
+*/
+   if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+   !uid_eq(cred->euid, tcred->uid) &&
+   !uid_eq(cred->euid, tcred->suid))
+   ret = -EPERM;
+
+   put_cred(tcred);
+   return ret;
+}
+
+static int rdtgroup_move_task(pid_t pid, struct rdtgroup *rdtgrp,
+ struct kernfs_open_file *of)
+{
+   struct task_struct *tsk;
+   int ret;
+
+   rcu_read_lock();
+   if (pid) {
+   tsk = find_task_by_vpid(pid);
+   if (!tsk) {
+   rcu_read_unlock();
+   return -ESRCH;
+   }
+   } else {
+   tsk = current;
+   }
+
+   get_task_struct(tsk);
+   rcu_read_unlock();
+
+   ret = rdtgroup_task_write_permission(tsk, of);
+   if (!ret)
+   ret =

[tip:x86/cache] x86/intel_rdt: Add basic resctrl filesystem support

2016-10-30 Thread tip-bot for Fenghua Yu

Commit-ID:  5ff193fbde20df5d80fec367cea3e7856c057320
Gitweb: http://git.kernel.org/tip/5ff193fbde20df5d80fec367cea3e7856c057320
Author: Fenghua Yu 
AuthorDate: Fri, 28 Oct 2016 15:04:42 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 30 Oct 2016 19:10:14 -0600

x86/intel_rdt: Add basic resctrl filesystem support

Use kernfs as basis for our user interface filesystem. This patch
supports mount/umount, and one mount parameter "cdp" to enable code/data
prioritization (though all we do at this point is ensure that the system
can support CDP).  The file system is not populated yet in this patch.

[ tglx: Fixed up a few nits and added cdp handling in case of error ]

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Shaohua Li" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "David Carrillo-Cisneros" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/intel_rdt.h |  26 +++
 arch/x86/kernel/cpu/Makefile |   2 +-
 arch/x86/kernel/cpu/intel_rdt.c  |   8 +-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 271 +++
 include/uapi/linux/magic.h   |   1 +
 5 files changed, 306 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index c0d0a6e..09d00e6 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -1,9 +1,31 @@
 #ifndef _ASM_X86_INTEL_RDT_H
 #define _ASM_X86_INTEL_RDT_H
 
+#include 
+
+#define IA32_L3_QOS_CFG0xc81
 #define IA32_L3_CBM_BASE   0xc90
 #define IA32_L2_CBM_BASE   0xd10
 
+#define L3_QOS_CDP_ENABLE  0x01ULL
+
+/**
+ * struct rdtgroup - store rdtgroup's data in resctrl file system.
+ * @kn:kernfs node
+ * @rdtgroup_list: linked list for all rdtgroups
+ * @closid:closid for this rdtgroup
+ */
+struct rdtgroup {
+   struct kernfs_node  *kn;
+   struct list_headrdtgroup_list;
+   int closid;
+};
+
+/* List of all resource groups */
+extern struct list_head rdt_all_groups;
+
+int __init rdtgroup_init(void);
+
 /**
  * struct rdt_resource - attributes of an RDT resource
  * @enabled:   Is this feature enabled on this machine
@@ -68,6 +90,10 @@ struct msr_param {
 extern struct mutex rdtgroup_mutex;
 
 extern struct rdt_resource rdt_resources_all[];
+extern struct rdtgroup rdtgroup_default;
+DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
+
+int __init rdtgroup_init(void);
 
 enum {
RDT_RESOURCE_L3,
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index cf4bfd0..b4334e8 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)   += umc.o
 
-obj-$(CONFIG_INTEL_RDT_A)  += intel_rdt.o
+obj-$(CONFIG_INTEL_RDT_A)  += intel_rdt.o intel_rdt_rdtgroup.o
 
 obj-$(CONFIG_X86_MCE)  += mcheck/
 obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 3d4b397..9d95414 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -361,7 +361,7 @@ static int intel_rdt_offline_cpu(unsigned int cpu)
 static int __init intel_rdt_late_init(void)
 {
struct rdt_resource *r;
-   int state;
+   int state, ret;
 
if (!get_rdt_resources())
return -ENODEV;
@@ -372,6 +372,12 @@ static int __init intel_rdt_late_init(void)
if (state < 0)
return state;
 
+   ret = rdtgroup_init();
+   if (ret) {
+   cpuhp_remove_state(state);
+   return ret;
+   }
+
for_each_capable_rdt_resource(r)
pr_info("Intel RDT %s allocation detected\n", r->name);
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
new file mode 100644
index 000..106e4ce
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -0,0 +1,271 @@
+/*
+ * User interface for Resource Alloction in Resource Director Technology(RDT)
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Author: Fenghua Yu 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General

[tip:x86/cache] x86/intel_rdt: Add mkdir to resctrl file system

2016-10-30 Thread tip-bot for Fenghua Yu

Commit-ID:  60cf5e101fd4441ab112a81e88726efb6fd7542c
Gitweb: http://git.kernel.org/tip/60cf5e101fd4441ab112a81e88726efb6fd7542c
Author: Fenghua Yu 
AuthorDate: Fri, 28 Oct 2016 15:04:44 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 30 Oct 2016 19:10:14 -0600

x86/intel_rdt: Add mkdir to resctrl file system

Resource control groups are represented as directories in the resctrl
file system. The root directory describes the default resources available
to tasks that have not been assigned specific resources. Other directories
can be created at the root level to make new resource groups. It is not
permitted to make directories within other directories.

Hardware uses a CLOSID (Class of service ID) to determine which resource
limits are currently in effect. The exact number available is enumerated
by CPUID leaf 0x10, but on current implementations it is a small number.
We implement a simple bitmask allocator for CLOSIDs.

Each resource control group uses one CLOSID, which limits the total number
of directories that can be created.

Resource groups can be removed using rmdir.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Shaohua Li" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "David Carrillo-Cisneros" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477692289-37412-6-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/intel_rdt.h |   9 ++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 231 +++
 2 files changed, 240 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 5b7b3f6..8032ace 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -14,13 +14,20 @@
  * @kn:kernfs node
  * @rdtgroup_list: linked list for all rdtgroups
  * @closid:closid for this rdtgroup
+ * @flags: status bits
+ * @waitcount: how many cpus expect to find this
  */
 struct rdtgroup {
struct kernfs_node  *kn;
struct list_headrdtgroup_list;
int closid;
+   int flags;
+   atomic_twaitcount;
 };
 
+/* rdtgroup.flags */
+#defineRDT_DELETED 1
+
 /* List of all resource groups */
 extern struct list_head rdt_all_groups;
 
@@ -156,4 +163,6 @@ union cpuid_0x10_1_edx {
 };
 
 void rdt_cbm_update(void *arg);
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
+void rdtgroup_kn_unlock(struct kernfs_node *kn);
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index fbb42e7..85d31ea 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -26,10 +26,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 #include 
+#include 
 
 DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
 struct kernfs_root *rdt_root;
@@ -39,6 +41,55 @@ LIST_HEAD(rdt_all_groups);
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
+/*
+ * Trivial allocator for CLOSIDs. Since h/w only supports a small number,
+ * we can keep a bitmap of free CLOSIDs in a single integer.
+ *
+ * Using a global CLOSID across all resources has some advantages and
+ * some drawbacks:
+ * + We can simply set "current->closid" to assign a task to a resource
+ *   group.
+ * + Context switch code can avoid extra memory references deciding which
+ *   CLOSID to load into the PQR_ASSOC MSR
+ * - We give up some options in configuring resource groups across multi-socket
+ *   systems.
+ * - Our choices on how to configure each resource become progressively more
+ *   limited as the number of resources grows.
+ */
+static int closid_free_map;
+
+static void closid_init(void)
+{
+   struct rdt_resource *r;
+   int rdt_min_closid = 32;
+
+   /* Compute rdt_min_closid across all resources */
+   for_each_enabled_rdt_resource(r)
+   rdt_min_closid = min(rdt_min_closid, r->num_closid);
+
+   closid_free_map = BIT_MASK(rdt_min_closid) - 1;
+
+   /* CLOSID 0 is always reserved for the default group */
+   closid_free_map &= ~1;
+}
+
+int closid_alloc(void)
+{
+   int closid = ffs(closid_free_map);
+
+   if (closid == 0)
+   return -ENOSPC;
+   closid--;
+   closid_free_map &= ~(1 << closid);
+
+   return closid;
+}
+
+static void closid_free(int closid)
+{
+   closid_free_map |= 1 << closid;
+}
+
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
 static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
@@ -287,6 +338,54 @@ static int parse_rdtgroupfs_options(char *data)
return ret;
 }
 
+

[tip:x86/cache] x86/intel_rdt: Add "info" files to resctrl file system

2016-10-30 Thread tip-bot for Fenghua Yu

Commit-ID:  4e978d06dedb8207b298a5a8a49fce4b2ab80d12
Gitweb: http://git.kernel.org/tip/4e978d06dedb8207b298a5a8a49fce4b2ab80d12
Author: Fenghua Yu 
AuthorDate: Fri, 28 Oct 2016 15:04:43 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 30 Oct 2016 19:10:14 -0600

x86/intel_rdt: Add "info" files to resctrl file system

For the convenience of applications we make the decoded values of some
of the CPUID values available in read-only (0444) files.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Shaohua Li" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "David Carrillo-Cisneros" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477692289-37412-5-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/intel_rdt.h |  24 
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 185 +++
 2 files changed, 209 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 09d00e6..5b7b3f6 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -27,6 +27,30 @@ extern struct list_head rdt_all_groups;
 int __init rdtgroup_init(void);
 
 /**
+ * struct rftype - describe each file in the resctrl file system
+ * @name: file name
+ * @mode: access mode
+ * @kf_ops: operations
+ * @seq_show: show content of the file
+ * @write: write to the file
+ */
+struct rftype {
+   char*name;
+   umode_t mode;
+   struct kernfs_ops   *kf_ops;
+
+   int (*seq_show)(struct kernfs_open_file *of,
+   struct seq_file *sf, void *v);
+   /*
+* write() is the generic write callback which maps directly to
+* kernfs write operation and overrides all other operations.
+* Maximum write size is determined by ->max_write_len.
+*/
+   ssize_t (*write)(struct kernfs_open_file *of,
+char *buf, size_t nbytes, loff_t off);
+};
+
+/**
  * struct rdt_resource - attributes of an RDT resource
  * @enabled:   Is this feature enabled on this machine
  * @capable:   Is this feature available on this machine
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 106e4ce..fbb42e7 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include 
@@ -34,6 +36,176 @@ struct kernfs_root *rdt_root;
 struct rdtgroup rdtgroup_default;
 LIST_HEAD(rdt_all_groups);
 
+/* Kernel fs node for "info" directory under root */
+static struct kernfs_node *kn_info;
+
+/* set uid and gid of rdtgroup dirs and files to that of the creator */
+static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
+{
+   struct iattr iattr = { .ia_valid = ATTR_UID | ATTR_GID,
+   .ia_uid = current_fsuid(),
+   .ia_gid = current_fsgid(), };
+
+   if (uid_eq(iattr.ia_uid, GLOBAL_ROOT_UID) &&
+   gid_eq(iattr.ia_gid, GLOBAL_ROOT_GID))
+   return 0;
+
+   return kernfs_setattr(kn, &iattr);
+}
+
+static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft)
+{
+   struct kernfs_node *kn;
+   int ret;
+
+   kn = __kernfs_create_file(parent_kn, rft->name, rft->mode,
+ 0, rft->kf_ops, rft, NULL, NULL);
+   if (IS_ERR(kn))
+   return PTR_ERR(kn);
+
+   ret = rdtgroup_kn_set_ugid(kn);
+   if (ret) {
+   kernfs_remove(kn);
+   return ret;
+   }
+
+   return 0;
+}
+
+static int rdtgroup_add_files(struct kernfs_node *kn, struct rftype *rfts,
+ int len)
+{
+   struct rftype *rft;
+   int ret;
+
+   lockdep_assert_held(&rdtgroup_mutex);
+
+   for (rft = rfts; rft < rfts + len; rft++) {
+   ret = rdtgroup_add_file(kn, rft);
+   if (ret)
+   goto error;
+   }
+
+   return 0;
+error:
+   pr_warn("Failed to add %s, err=%d\n", rft->name, ret);
+   while (--rft >= rfts)
+   kernfs_remove_by_name(kn, rft->name);
+   return ret;
+}
+
+static int rdtgroup_seqfile_show(struct seq_file *m, void *arg)
+{
+   struct kernfs_open_file *of = m->private;
+   struct rftype *rft = of->kn->priv;
+
+   if (rft->seq_show)
+   return rft->seq_show(of, m, arg);
+   return 0;
+}
+
+static ssize_t rdtgroup_file_write(struct kernfs_open_file *of, char *buf,
+  size_t nbytes, loff_t off)
+{
+   struct rftype *rft = of->kn->priv;
+
+   if (rft->write)
+   return rft->write(of, buf, nbytes, off);
+
+

[tip:x86/cache] Documentation, x86: Documentation for Intel resource allocation user interface

2016-10-30 Thread tip-bot for Fenghua Yu

Commit-ID:  f20e57892806ad244eaec7a7ae365e78fee53377
Gitweb: http://git.kernel.org/tip/f20e57892806ad244eaec7a7ae365e78fee53377
Author: Fenghua Yu 
AuthorDate: Fri, 28 Oct 2016 15:04:40 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 30 Oct 2016 19:10:13 -0600

Documentation, x86: Documentation for Intel resource allocation user interface

The documentation describes user interface of how to allocate resource
in Intel RDT.

Please note that the documentation covers generic user interface. Current
patch set code only implemente CAT L3. CAT L2 code will be sent later.

[ tglx: Added cpu example ]

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "Shaohua Li" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "David Carrillo-Cisneros" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477692289-37412-2-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 Documentation/x86/intel_rdt_ui.txt | 195 +
 1 file changed, 195 insertions(+)

diff --git a/Documentation/x86/intel_rdt_ui.txt 
b/Documentation/x86/intel_rdt_ui.txt
new file mode 100644
index 000..3b0ebd4
--- /dev/null
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -0,0 +1,195 @@
+User Interface for Resource Allocation in Intel Resource Director Technology
+
+Copyright (C) 2016 Intel Corporation
+
+Fenghua Yu 
+Tony Luck 
+
+This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
+X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
+
+To use the feature mount the file system:
+
+ # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
+
+mount options are:
+
+"cdp": Enable code/data prioritization in L3 cache allocations.
+
+
+Resource groups
+---
+Resource groups are represented as directories in the resctrl file
+system. The default group is the root directory. Other groups may be
+created as desired by the system administrator using the "mkdir(1)"
+command, and removed using "rmdir(1)".
+
+There are three files associated with each group:
+
+"tasks": A list of tasks that belongs to this group. Tasks can be
+   added to a group by writing the task ID to the "tasks" file
+   (which will automatically remove them from the previous
+   group to which they belonged). New tasks created by fork(2)
+   and clone(2) are added to the same group as their parent.
+   If a pid is not in any sub partition, it is in root partition
+   (i.e. default partition).
+
+"cpus": A bitmask of logical CPUs assigned to this group. Writing
+   a new mask can add/remove CPUs from this group. Added CPUs
+   are removed from their previous group. Removed ones are
+   given to the default (root) group. You cannot remove CPUs
+   from the default group.
+
+"schemata": A list of all the resources available to this group.
+   Each resource has its own line and format - see below for
+   details.
+
+When a task is running the following rules define which resources
+are available to it:
+
+1) If the task is a member of a non-default group, then the schemata
+for that group is used.
+
+2) Else if the task belongs to the default group, but is running on a
+CPU that is assigned to some specific group, then the schemata for
+the CPU's group is used.
+
+3) Otherwise the schemata for the default group is used.
+
+
+Schemata files - general concepts
+-
+Each line in the file describes one resource. The line starts with
+the name of the resource, followed by specific values to be applied
+in each of the instances of that resource on the system.
+
+Cache IDs
+-
+On current generation systems there is one L3 cache per socket and L2
+caches are generally just shared by the hyperthreads on a core, but this
+isn't an architectural requirement. We could have multiple separate L3
+caches on a socket, multiple cores could share an L2 cache. So instead
+of using "socket" or "core" to define the set of logical cpus sharing
+a resource we use a "Cache ID". At a given cache level this will be a
+unique number across the whole system (but it isn't guaranteed to be a
+contiguous sequence, there may be gaps).  To find the ID for each logical
+CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
+
+Cache Bit Masks (CBM)
+-
+For cache resources we describe the portion of the cache that is available
+for allocation using a bitmask. The maximum value of the mask is defined
+by each cpu model (and may be different for different cache levels). It
+is found using CPUID, but is also provided in the "info" directory of
+the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
+requires that these masks have all the '1' bits in a contiguous block. So
+0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
+and 0xA are not.  On a system with a 20-bit m

[tip:x86/cache] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID

2016-10-26 Thread tip-bot for Fenghua Yu

Commit-ID:  c1c7c3f9d6bb6999a45f66ea4c6bfbcab87ff34b
Gitweb: http://git.kernel.org/tip/c1c7c3f9d6bb6999a45f66ea4c6bfbcab87ff34b
Author: Fenghua Yu 
AuthorDate: Sat, 22 Oct 2016 06:19:55 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 26 Oct 2016 23:12:39 +0200

x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID

Define struct rdt_resource to hold all the parameterized values for an RDT
resource and fill in the CPUID enumerated values from leaf 0x10 if
available. Hard code them for the MSR detected Haswells.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "David Carrillo-Cisneros" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "Shaohua Li" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477142405-32078-9-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/intel_rdt.h |  68 
 arch/x86/kernel/cpu/intel_rdt.c  | 111 ---
 2 files changed, 172 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 3aca86d..9780409 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -2,5 +2,73 @@
 #define _ASM_X86_INTEL_RDT_H
 
 #define IA32_L3_CBM_BASE   0xc90
+#define IA32_L2_CBM_BASE   0xd10
 
+/**
+ * struct rdt_resource - attributes of an RDT resource
+ * @enabled:   Is this feature enabled on this machine
+ * @capable:   Is this feature available on this machine
+ * @name:  Name to use in "schemata" file
+ * @num_closid:Number of CLOSIDs available
+ * @max_cbm:   Largest Cache Bit Mask allowed
+ * @min_cbm_bits:  Minimum number of consecutive bits to be set
+ * in a cache bit mask
+ * @domains:   All domains for this resource
+ * @num_domains:   Number of domains active
+ * @msr_base:  Base MSR address for CBMs
+ * @tmp_cbms:  Scratch space when updating schemata
+ * @cache_level:   Which cache level defines scope of this domain
+ * @cbm_idx_multi: Multiplier of CBM index
+ * @cbm_idx_offset:Offset of CBM index. CBM index is computed by:
+ * closid * cbm_idx_multi + cbm_idx_offset
+ */
+struct rdt_resource {
+   boolenabled;
+   boolcapable;
+   char*name;
+   int num_closid;
+   int cbm_len;
+   int min_cbm_bits;
+   u32 max_cbm;
+   struct list_headdomains;
+   int num_domains;
+   int msr_base;
+   u32 *tmp_cbms;
+   int cache_level;
+   int cbm_idx_multi;
+   int cbm_idx_offset;
+};
+
+extern struct rdt_resource rdt_resources_all[];
+
+enum {
+   RDT_RESOURCE_L3,
+   RDT_RESOURCE_L3DATA,
+   RDT_RESOURCE_L3CODE,
+   RDT_RESOURCE_L2,
+
+   /* Must be the last */
+   RDT_NUM_RESOURCES,
+};
+
+#define for_each_capable_rdt_resource(r) \
+   for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\
+r++) \
+   if (r->capable)
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
+union cpuid_0x10_1_eax {
+   struct {
+   unsigned int cbm_len:5;
+   } split;
+   unsigned int full;
+};
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
+union cpuid_0x10_1_edx {
+   struct {
+   unsigned int cos_max:16;
+   } split;
+   unsigned int full;
+};
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index f8e35cf..157dc8d0 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -31,6 +31,47 @@
 #include 
 #include 
 
+#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
+
+struct rdt_resource rdt_resources_all[] = {
+   {
+   .name   = "L3",
+   .domains= domain_init(RDT_RESOURCE_L3),
+   .msr_base   = IA32_L3_CBM_BASE,
+   .min_cbm_bits   = 1,
+   .cache_level= 3,
+   .cbm_idx_multi  = 1,
+   .cbm_idx_offset = 0
+   },
+   {
+   .name   = "L3DATA",
+   .domains= domain_init(RDT_RESOURCE_L3DATA),
+   .msr_base   = IA32_L3_CBM_BASE,
+   .min_cbm_bits   = 1,
+   .cache_level= 3,
+   .cbm_

[tip:x86/cache] x86/cqm: Share PQR_ASSOC related data between CQM and CAT

2016-10-26 Thread tip-bot for Fenghua Yu

Commit-ID:  6b281569df649ed76145c527028fbbe8a32493aa
Gitweb: http://git.kernel.org/tip/6b281569df649ed76145c527028fbbe8a32493aa
Author: Fenghua Yu 
AuthorDate: Sat, 22 Oct 2016 06:19:56 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 26 Oct 2016 23:12:39 +0200

x86/cqm: Share PQR_ASSOC related data between CQM and CAT

PQR_ASSOC MSR contains the RMID used for preformance monitoring of cache
occupancy and memory bandwidth. The upper 32bit of this MSR contain the
CLOSID for cache allocation. So we need to share the information between
the two facilities.

Move the rdt data structure declaration into the shared header file and
make the per cpu data structure containing the MSR values global.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "David Carrillo-Cisneros" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "Shaohua Li" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477142405-32078-10-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/events/intel/cqm.c | 21 +
 arch/x86/include/asm/intel_rdt_common.h | 21 +
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index df86874..0c45cc8 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -24,32 +24,13 @@ static unsigned int cqm_l3_scale; /* supposedly cacheline 
size */
 static bool cqm_enabled, mbm_enabled;
 unsigned int mbm_socket_max;
 
-/**
- * struct intel_pqr_state - State cache for the PQR MSR
- * @rmid:  The cached Resource Monitoring ID
- * @closid:The cached Class Of Service ID
- * @rmid_usecnt:   The usage counter for rmid
- *
- * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
- * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
- * contains both parts, so we need to cache them.
- *
- * The cache also helps to avoid pointless updates if the value does
- * not change.
- */
-struct intel_pqr_state {
-   u32 rmid;
-   u32 closid;
-   int rmid_usecnt;
-};
-
 /*
  * The cached intel_pqr_state is strictly per CPU and can never be
  * updated from a remote CPU. Both functions which modify the state
  * (intel_cqm_event_start and intel_cqm_event_stop) are called with
  * interrupts disabled, which is sufficient for the protection.
  */
-static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 static struct hrtimer *mbm_timers;
 /**
  * struct sample - mbm event's (local or total) data
diff --git a/arch/x86/include/asm/intel_rdt_common.h 
b/arch/x86/include/asm/intel_rdt_common.h
index e6e15cf..b31081b 100644
--- a/arch/x86/include/asm/intel_rdt_common.h
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -3,4 +3,25 @@
 
 #define MSR_IA32_PQR_ASSOC 0x0c8f
 
+/**
+ * struct intel_pqr_state - State cache for the PQR MSR
+ * @rmid:  The cached Resource Monitoring ID
+ * @closid:The cached Class Of Service ID
+ * @rmid_usecnt:   The usage counter for rmid
+ *
+ * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
+ * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
+ * contains both parts, so we need to cache them.
+ *
+ * The cache also helps to avoid pointless updates if the value does
+ * not change.
+ */
+struct intel_pqr_state {
+   u32 rmid;
+   u32 closid;
+   int rmid_usecnt;
+};
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
+
 #endif /* _ASM_X86_INTEL_RDT_COMMON_H */

[tip:x86/cache] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization

2016-10-26 Thread tip-bot for Fenghua Yu

Commit-ID:  78e99b4a2b9afb1c304259fcd4a1c71ca97e3acd
Gitweb: http://git.kernel.org/tip/78e99b4a2b9afb1c304259fcd4a1c71ca97e3acd
Author: Fenghua Yu 
AuthorDate: Sat, 22 Oct 2016 06:19:53 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 26 Oct 2016 23:12:38 +0200

x86/intel_rdt: Add CONFIG, Makefile, and basic initialization

Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
control inclusion of Resource Director Technology in the build.

Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature for now.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "David Carrillo-Cisneros" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "Shaohua Li" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/Kconfig| 12 +
 arch/x86/kernel/cpu/Makefile|  2 ++
 arch/x86/kernel/cpu/intel_rdt.c | 54 +
 3 files changed, 68 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bada636..770fb5f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -407,6 +407,18 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH
 
+config INTEL_RDT_A
+   bool "Intel Resource Director Technology Allocation support"
+   default n
+   depends on X86 && CPU_SUP_INTEL
+   help
+ Select to enable resource allocation which is a sub-feature of
+ Intel Resource Director Technology(RDT). More information about
+ RDT can be found in the Intel x86 Architecture Software
+ Developer Manual.
+
+ Say N if unsure.
+
 if X86_32
 config X86_EXTENDED_PLATFORM
bool "Support for extended (non-PC) x86 platforms"
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 4a8697f..cf4bfd0 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,6 +34,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)   += umc.o
 
+obj-$(CONFIG_INTEL_RDT_A)  += intel_rdt.o
+
 obj-$(CONFIG_X86_MCE)  += mcheck/
 obj-$(CONFIG_MTRR) += mtrr/
 obj-$(CONFIG_MICROCODE)+= microcode/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 000..7d7aebe
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,54 @@
+/*
+ * Resource Director Technology(RDT)
+ * - Cache Allocation code.
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Authors:
+ *Fenghua Yu 
+ *Tony Luck 
+ *Vikas Shivappa 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual June 2016, volume 3, section 17.17.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+
+static inline bool get_rdt_resources(void)
+{
+   bool ret = false;
+
+   if (!boot_cpu_has(X86_FEATURE_RDT_A))
+   return false;
+   if (boot_cpu_has(X86_FEATURE_CAT_L3))
+   ret = true;
+
+   return ret;
+}
+
+static int __init intel_rdt_late_init(void)
+{
+   if (!get_rdt_resources())
+   return -ENODEV;
+
+   pr_info("Intel RDT cache allocation detected\n");
+   if (boot_cpu_has(X86_FEATURE_CDP_L3))
+   pr_info("Intel RDT code data prioritization detected\n");
+
+   return 0;
+}
+
+late_initcall(intel_rdt_late_init);

[tip:x86/cache] x86/intel_rdt: Add Haswell feature discovery

2016-10-26 Thread tip-bot for Fenghua Yu

Commit-ID:  113c60970cf41723891e3a1b303517eaf8510bb5
Gitweb: http://git.kernel.org/tip/113c60970cf41723891e3a1b303517eaf8510bb5
Author: Fenghua Yu 
AuthorDate: Sat, 22 Oct 2016 06:19:54 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 26 Oct 2016 23:12:38 +0200

x86/intel_rdt: Add Haswell feature discovery

Some Haswell generation CPUs support RDT, but they don't enumerate this via
CPUID.  Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on cpu model 63
(INTEL_FAM6_HASWELL_X)

Move the relevant defines into a common header file which is shared between
RDT/CQM and RDT/Allocation to avoid duplication.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "David Carrillo-Cisneros" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "Shaohua Li" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477142405-32078-8-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/events/intel/cqm.c |  2 +-
 arch/x86/include/asm/intel_rdt.h|  6 
 arch/x86/include/asm/intel_rdt_common.h |  6 
 arch/x86/kernel/cpu/intel_rdt.c | 49 ++---
 4 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 8f82b02..df86874 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -7,9 +7,9 @@
 #include 
 #include 
 #include 
+#include 
 #include "../perf_event.h"
 
-#define MSR_IA32_PQR_ASSOC 0x0c8f
 #define MSR_IA32_QM_CTR0x0c8e
 #define MSR_IA32_QM_EVTSEL 0x0c8d
 
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 000..3aca86d
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_X86_INTEL_RDT_H
+#define _ASM_X86_INTEL_RDT_H
+
+#define IA32_L3_CBM_BASE   0xc90
+
+#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/include/asm/intel_rdt_common.h 
b/arch/x86/include/asm/intel_rdt_common.h
new file mode 100644
index 000..e6e15cf
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_X86_INTEL_RDT_COMMON_H
+#define _ASM_X86_INTEL_RDT_COMMON_H
+
+#define MSR_IA32_PQR_ASSOC 0x0c8f
+
+#endif /* _ASM_X86_INTEL_RDT_COMMON_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 7d7aebe..f8e35cf 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -27,16 +27,57 @@
 #include 
 #include 
 
+#include 
+#include 
+#include 
+
+/*
+ * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
+ * as they do not have CPUID enumeration support for Cache allocation.
+ * The check for Vendor/Family/Model is not enough to guarantee that
+ * the MSRs won't #GP fault because only the following SKUs support
+ * CAT:
+ * Intel(R) Xeon(R)  CPU E5-2658  v3  @  2.20GHz
+ * Intel(R) Xeon(R)  CPU E5-2648L v3  @  1.80GHz
+ * Intel(R) Xeon(R)  CPU E5-2628L v3  @  2.00GHz
+ * Intel(R) Xeon(R)  CPU E5-2618L v3  @  2.30GHz
+ * Intel(R) Xeon(R)  CPU E5-2608L v3  @  2.00GHz
+ * Intel(R) Xeon(R)  CPU E5-2658A v3  @  2.20GHz
+ *
+ * Probe by trying to write the first of the L3 cach mask registers
+ * and checking that the bits stick. Max CLOSids is always 4 and max cbm length
+ * is always 20 on hsw server parts. The minimum cache bitmask length
+ * allowed for HSW server is always 2 bits. Hardcode all of them.
+ */
+static inline bool cache_alloc_hsw_probe(void)
+{
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+   boot_cpu_data.x86 == 6 &&
+   boot_cpu_data.x86_model == INTEL_FAM6_HASWELL_X) {
+   u32 l, h, max_cbm = BIT_MASK(20) - 1;
+
+   if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
+   return false;
+   rdmsr(IA32_L3_CBM_BASE, l, h);
+
+   /* If all the bits were set in MSR, return success */
+   return l == max_cbm;
+   }
+
+   return false;
+}
+
 static inline bool get_rdt_resources(void)
 {
-   bool ret = false;
+   if (cache_alloc_hsw_probe())
+   return true;
 
if (!boot_cpu_has(X86_FEATURE_RDT_A))
return false;
-   if (boot_cpu_has(X86_FEATURE_CAT_L3))
-   ret = true;
+   if (!boot_cpu_has(X86_FEATURE_CAT_L3))
+   return false;
 
-   return ret;
+   return true;
 }
 
 static int __init intel_rdt_late_init(void)

[tip:x86/cache] x86/cpufeature: Add RDT CPUID feature bits

2016-10-26 Thread tip-bot for Fenghua Yu

Commit-ID:  4ab1586488cb56ed8728e54c4157cc38646874d9
Gitweb: http://git.kernel.org/tip/4ab1586488cb56ed8728e54c4157cc38646874d9
Author: Fenghua Yu 
AuthorDate: Sat, 22 Oct 2016 06:19:51 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 26 Oct 2016 23:12:38 +0200

x86/cpufeature: Add RDT CPUID feature bits

Check CPUID leaves for all the Resource Director Technology (RDT)
Cache Allocation Technology (CAT) bits.

Presence of allocation features:
  CPUID.(EAX=7H, ECX=0):EBX[bit 15] X86_FEATURE_RDT_A

L2 and L3 caches are each separately enabled:
  CPUID.(EAX=10H, ECX=0):EBX[bit 1] X86_FEATURE_CAT_L3
  CPUID.(EAX=10H, ECX=0):EBX[bit 2] X86_FEATURE_CAT_L2

L3 cache may support independent control of allocation for
code and data (CDP = Code/Data Prioritization):
  CPUID.(EAX=10H, ECX=1):ECX[bit 2] X86_FEATURE_CDP_L3

[ tglx: Fixed up Borislavs comments and moved the feature bits into a gap ]

Signed-off-by: Fenghua Yu 
Acked-by: "Borislav Petkov" 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "David Carrillo-Cisneros" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "Shaohua Li" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477142405-32078-5-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/cpufeatures.h | 4 
 arch/x86/kernel/cpu/scattered.c| 3 +++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index a396292..90b8c0b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -189,6 +189,9 @@
 
 #define X86_FEATURE_CPB( 7*32+ 2) /* AMD Core Performance 
Boost */
 #define X86_FEATURE_EPB( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS 
support */
+#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */
+#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */
+#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */
 
 #define X86_FEATURE_HW_PSTATE  ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
@@ -221,6 +224,7 @@
 #define X86_FEATURE_RTM( 9*32+11) /* Restricted Transactional 
Memory */
 #define X86_FEATURE_CQM( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX( 9*32+14) /* Memory Protection 
Extension */
+#define X86_FEATURE_RDT_A  ( 9*32+15) /* Resource Director Technology 
Allocation */
 #define X86_FEATURE_AVX512F( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_AVX512DQ   ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) 
Instructions */
 #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..49fb680 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -36,6 +36,9 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
{ X86_FEATURE_AVX512_4FMAPS,CR_EDX, 3, 0x0007, 0 },
{ X86_FEATURE_APERFMPERF,   CR_ECX, 0, 0x0006, 0 },
{ X86_FEATURE_EPB,  CR_ECX, 3, 0x0006, 0 },
+   { X86_FEATURE_CAT_L3,   CR_EBX, 1, 0x0010, 0 },
+   { X86_FEATURE_CAT_L2,   CR_EBX, 2, 0x0010, 0 },
+   { X86_FEATURE_CDP_L3,   CR_ECX, 2, 0x0010, 1 },
{ X86_FEATURE_HW_PSTATE,CR_EDX, 7, 0x8007, 0 },
{ X86_FEATURE_CPB,  CR_EDX, 9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CR_EDX,11, 0x8007, 0 },

[tip:x86/cache] x86/intel_cacheinfo: Enable cache id in cache info

2016-10-26 Thread tip-bot for Fenghua Yu

Commit-ID:  d57e3ab7e34c51a8badeea1b500bfb738d0af66e
Gitweb: http://git.kernel.org/tip/d57e3ab7e34c51a8badeea1b500bfb738d0af66e
Author: Fenghua Yu 
AuthorDate: Sat, 22 Oct 2016 06:19:50 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 26 Oct 2016 23:12:37 +0200

x86/intel_cacheinfo: Enable cache id in cache info

Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.

For more details please see the section on "Cache ID Extraction
Parameters" in "Intel 64 Architecture Processor Topology Enumeration".

Also the documentation of the CPUID instruction in the "Intel 64 and
IA-32 Architectures Software Developer's Manual"

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "David Carrillo-Cisneros" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "Shaohua Li" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/cpu/intel_cacheinfo.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c 
b/arch/x86/kernel/cpu/intel_cacheinfo.c
index de6626c..8dc5720 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -153,6 +153,7 @@ struct _cpuid4_info_regs {
union _cpuid4_leaf_eax eax;
union _cpuid4_leaf_ebx ebx;
union _cpuid4_leaf_ecx ecx;
+   unsigned int id;
unsigned long size;
struct amd_northbridge *nb;
 };
@@ -894,6 +895,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int 
index,
 static void ci_leaf_init(struct cacheinfo *this_leaf,
 struct _cpuid4_info_regs *base)
 {
+   this_leaf->id = base->id;
+   this_leaf->attributes = CACHE_ID;
this_leaf->level = base->eax.split.level;
this_leaf->type = cache_type_map[base->eax.split.type];
this_leaf->coherency_line_size =
@@ -920,6 +923,22 @@ static int __init_cache_level(unsigned int cpu)
return 0;
 }
 
+/*
+ * The max shared threads number comes from CPUID.4:EAX[25-14] with input
+ * ECX as cache index. Then right shift apicid by the number's order to get
+ * cache id for this cache node.
+ */
+static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4_regs)
+{
+   struct cpuinfo_x86 *c = &cpu_data(cpu);
+   unsigned long num_threads_sharing;
+   int index_msb;
+
+   num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing;
+   index_msb = get_count_order(num_threads_sharing);
+   id4_regs->id = c->apicid >> index_msb;
+}
+
 static int __populate_cache_leaves(unsigned int cpu)
 {
unsigned int idx, ret;
@@ -931,6 +950,7 @@ static int __populate_cache_leaves(unsigned int cpu)
ret = cpuid4_cache_lookup_regs(idx, &id4_regs);
if (ret)
return ret;
+   get_cache_id(cpu, &id4_regs);
ci_leaf_init(this_leaf++, &id4_regs);
__cache_cpumap_setup(cpu, idx, &id4_regs);
}

[tip:x86/cache] cacheinfo: Introduce cache id

2016-10-26 Thread tip-bot for Fenghua Yu

Commit-ID:  e9a2ea5a1ba09c35258f3663842fb8d8cf2e00c2
Gitweb: http://git.kernel.org/tip/e9a2ea5a1ba09c35258f3663842fb8d8cf2e00c2
Author: Fenghua Yu 
AuthorDate: Sat, 22 Oct 2016 06:19:49 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 26 Oct 2016 23:12:37 +0200

cacheinfo: Introduce cache id

Cache management software needs an id for each instance of a cache of
a particular type.

The current cacheinfo structure does not provide any information about
the underlying hardware so there is no way to expose it.

Hardware with cache management features provides means (cpuid, enumeration
etc.) to retrieve the hardware id of a particular cache instance. Cache
instances which share hardware have the same hardware id.

Add an 'id' field to struct cacheinfo to store this information. Expose
this information under the /sys/devices/system/cpu/cpu*/cache/index*/
directory as well.

Signed-off-by: Fenghua Yu 
Cc: "Ravi V Shankar" 
Cc: "Tony Luck" 
Cc: "David Carrillo-Cisneros" 
Cc: "Sai Prakhya" 
Cc: "Peter Zijlstra" 
Cc: "Stephane Eranian" 
Cc: "Dave Hansen" 
Cc: "Shaohua Li" 
Cc: "Nilay Vaish" 
Cc: "Vikas Shivappa" 
Cc: "Ingo Molnar" 
Cc: "Borislav Petkov" 
Cc: "H. Peter Anvin" 
Link: 
http://lkml.kernel.org/r/1477142405-32078-3-git-send-email-fenghua...@intel.com
Signed-off-by: Thomas Gleixner 

---
 drivers/base/cacheinfo.c  | 5 +
 include/linux/cacheinfo.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index e9fd32e..00a9688 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -233,6 +233,7 @@ static ssize_t file_name##_show(struct device *dev, 
\
return sprintf(buf, "%u\n", this_leaf->object); \
 }
 
+show_one(id, id);
 show_one(level, level);
 show_one(coherency_line_size, coherency_line_size);
 show_one(number_of_sets, number_of_sets);
@@ -314,6 +315,7 @@ static ssize_t write_policy_show(struct device *dev,
return n;
 }
 
+static DEVICE_ATTR_RO(id);
 static DEVICE_ATTR_RO(level);
 static DEVICE_ATTR_RO(type);
 static DEVICE_ATTR_RO(coherency_line_size);
@@ -327,6 +329,7 @@ static DEVICE_ATTR_RO(shared_cpu_list);
 static DEVICE_ATTR_RO(physical_line_partition);
 
 static struct attribute *cache_default_attrs[] = {
+   &dev_attr_id.attr,
&dev_attr_type.attr,
&dev_attr_level.attr,
&dev_attr_shared_cpu_map.attr,
@@ -350,6 +353,8 @@ cache_default_attrs_is_visible(struct kobject *kobj,
const struct cpumask *mask = &this_leaf->shared_cpu_map;
umode_t mode = attr->mode;
 
+   if ((attr == &dev_attr_id.attr) && (this_leaf->attributes & CACHE_ID))
+   return mode;
if ((attr == &dev_attr_type.attr) && this_leaf->type)
return mode;
if ((attr == &dev_attr_level.attr) && this_leaf->level)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2189935..0bcbb67 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -18,6 +18,7 @@ enum cache_type {
 
 /**
  * struct cacheinfo - represent a cache leaf node
+ * @id: This cache's id. It is unique among caches with the same (type, level).
  * @type: type of the cache - data, inst or unified
  * @level: represents the hierarchy in the multi-level cache
  * @coherency_line_size: size of each cache line usually representing
@@ -44,6 +45,7 @@ enum cache_type {
  * keeping, the remaining members form the core properties of the cache
  */
 struct cacheinfo {
+   unsigned int id;
enum cache_type type;
unsigned int level;
unsigned int coherency_line_size;
@@ -61,6 +63,7 @@ struct cacheinfo {
 #define CACHE_WRITE_ALLOCATE   BIT(3)
 #define CACHE_ALLOCATE_POLICY_MASK \
(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
+#define CACHE_ID   BIT(4)
 
struct device_node *of_node;
bool disable_sysfs;

[tip:x86/fpu] x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization

2016-06-18 Thread tip-bot for Fenghua Yu

Commit-ID:  7d9370607d28afd454775c623d5447603473a3c3
Gitweb: http://git.kernel.org/tip/7d9370607d28afd454775c623d5447603473a3c3
Author: Fenghua Yu 
AuthorDate: Fri, 20 May 2016 10:47:07 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 18 Jun 2016 10:10:19 +0200

x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init 
optimization

Keep init_fpstate.xsave.header.xfeatures as zero for init optimization.
This is important for init optimization that is implemented in processor.
If a bit corresponding to an xstate in xstate_bv is 0, it means the
xstate is in init status and will not be read from memory to the processor
during XRSTOR/XRSTORS instruction. This largely impacts context switch
performance.

Signed-off-by: Fenghua Yu 
Signed-off-by: Yu-cheng Yu 
Reviewed-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Quentin Casasnovas 
Cc: Ravi V. Shankar 
Cc: Sai Praneeth Prakhya 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/2fb4ec7f18b76e8cda057a8c0038def74a9b8044.1463760376.git.yu-cheng...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/fpu/xstate.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 46abfaf..dbfef1b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -329,13 +329,11 @@ static void __init setup_init_fpu_buf(void)
setup_xstate_features();
print_xstate_features();
 
-   if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+   if (boot_cpu_has(X86_FEATURE_XSAVES))
init_fpstate.xsave.header.xcomp_bv = (u64)1 << 63 | 
xfeatures_mask;
-   init_fpstate.xsave.header.xfeatures = xfeatures_mask;
-   }
 
/*
-* Init all the features state with header_bv being 0x0
+* Init all the features state with header.xfeatures being 0x0
 */
copy_kernel_to_xregs_booting(&init_fpstate.xsave);

[tip:x86/fpu] x86/fpu/xstate: Define and use 'fpu_user_xstate_size'

2016-06-18 Thread tip-bot for Fenghua Yu

Commit-ID:  a1141e0b5ca6ee3e5e35d5f1a310a5ecb9c96ce5
Gitweb: http://git.kernel.org/tip/a1141e0b5ca6ee3e5e35d5f1a310a5ecb9c96ce5
Author: Fenghua Yu 
AuthorDate: Fri, 20 May 2016 10:47:05 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 18 Jun 2016 10:10:18 +0200

x86/fpu/xstate: Define and use 'fpu_user_xstate_size'

The kernel xstate area can be in standard or compacted format;
it is always in standard format for user mode. When XSAVES is
enabled, the kernel uses the compacted format and it is necessary
to use a separate fpu_user_xstate_size for signal/ptrace frames.

Signed-off-by: Fenghua Yu 
[ Rebased the patch and cleaned up the naming. ]
Signed-off-by: Yu-cheng Yu 
Reviewed-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Quentin Casasnovas 
Cc: Ravi V. Shankar 
Cc: Sai Praneeth Prakhya 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/8756ec34dabddfc727cda5743195eb81e8caf91c.1463760376.git.yu-cheng...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/fpu/xstate.h |  1 -
 arch/x86/include/asm/processor.h  |  1 +
 arch/x86/kernel/fpu/init.c|  5 ++-
 arch/x86/kernel/fpu/signal.c  | 27 ++
 arch/x86/kernel/fpu/xstate.c  | 76 ---
 5 files changed, 73 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 38951b0..16df2c4 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -39,7 +39,6 @@
 #define REX_PREFIX
 #endif
 
-extern unsigned int xstate_size;
 extern u64 xfeatures_mask;
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 62c6cc3..0a16a16 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -368,6 +368,7 @@ DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
 #endif /* X86_64 */
 
 extern unsigned int xstate_size;
+extern unsigned int fpu_user_xstate_size;
 
 struct perf_event;
 
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index aacfd7a..5b1928c 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void)
 }
 
 /*
- * Set up the xstate_size based on the legacy FPU context size.
+ * Set up the user and kernel xstate_size based on the legacy FPU context size.
  *
  * We set this up first, and later it will be overwritten by
  * fpu__init_system_xstate() if the CPU knows about xstates.
@@ -226,6 +226,9 @@ static void __init fpu__init_system_xstate_size_legacy(void)
else
xstate_size = sizeof(struct fregs_state);
}
+
+   fpu_user_xstate_size = xstate_size;
+
/*
 * Quirk: we don't yet handle the XSAVES* instructions
 * correctly, as we don't correctly convert between
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c6f2a3c..0d29d4d 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -32,7 +32,7 @@ static inline int check_for_xstate(struct fxregs_state __user 
*buf,
/* Check for the first magic field and other error scenarios. */
if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
fx_sw->xstate_size < min_xstate_size ||
-   fx_sw->xstate_size > xstate_size ||
+   fx_sw->xstate_size > fpu_user_xstate_size ||
fx_sw->xstate_size > fx_sw->extended_size)
return -1;
 
@@ -89,7 +89,8 @@ static inline int save_xstate_epilog(void __user *buf, int 
ia32_frame)
if (!use_xsave())
return err;
 
-   err |= __put_user(FP_XSTATE_MAGIC2, (__u32 *)(buf + xstate_size));
+   err |= __put_user(FP_XSTATE_MAGIC2,
+ (__u32 *)(buf + fpu_user_xstate_size));
 
/*
 * Read the xfeatures which we copied (directly from the cpu or
@@ -126,7 +127,7 @@ static inline int copy_fpregs_to_sigframe(struct 
xregs_state __user *buf)
else
err = copy_fregs_to_user((struct fregs_state __user *) buf);
 
-   if (unlikely(err) && __clear_user(buf, xstate_size))
+   if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
err = -EFAULT;
return err;
 }
@@ -176,8 +177,19 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user 
*buf_fx, int size)
if (ia32_fxstate)
copy_fxregs_to_kernel(&tsk->thread.fpu);
} else {
+   /*
+* It is a *bug* if kernel uses compacted-format for xsave
+* area and we copy it out directly to a signal frame. It
+* should have been handled above by saving the registers
+* directly.
+*/
+

[tip:x86/fpu] x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'

2016-06-18 Thread tip-bot for Fenghua Yu

Commit-ID:  bf15a8cf8d14879b785c548728415d36ccb6a33b
Gitweb: http://git.kernel.org/tip/bf15a8cf8d14879b785c548728415d36ccb6a33b
Author: Fenghua Yu 
AuthorDate: Fri, 20 May 2016 10:47:06 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 18 Jun 2016 10:10:18 +0200

x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to 
distinguish it from 'fpu_user_xstate_size'

User space uses standard format xsave area. fpstate in signal frame
should have standard format size.

To explicitly distinguish between xstate size in kernel space and the
one in user space, we rename 'xstate_size' to 'fpu_kernel_xstate_size'.

Cleanup only, no change in functionality.

Signed-off-by: Fenghua Yu 
[ Rebased the patch and cleaned up the naming. ]
Signed-off-by: Yu-cheng Yu 
Reviewed-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Quentin Casasnovas 
Cc: Ravi V. Shankar 
Cc: Sai Praneeth Prakhya 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/2ecbae347a5152d94be52adf7d0f3b7305d90d99.1463760376.git.yu-cheng...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/processor.h |  2 +-
 arch/x86/kernel/fpu/core.c   |  7 ---
 arch/x86/kernel/fpu/init.c   | 20 +++-
 arch/x86/kernel/fpu/signal.c |  2 +-
 arch/x86/kernel/fpu/xstate.c |  8 
 5 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0a16a16..965c5d2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -367,7 +367,7 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
 #endif /* X86_64 */
 
-extern unsigned int xstate_size;
+extern unsigned int fpu_kernel_xstate_size;
 extern unsigned int fpu_user_xstate_size;
 
 struct perf_event;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7d56474..c759bd0 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -227,7 +227,7 @@ void fpstate_init(union fpregs_state *state)
return;
}
 
-   memset(state, 0, xstate_size);
+   memset(state, 0, fpu_kernel_xstate_size);
 
if (static_cpu_has(X86_FEATURE_FXSR))
fpstate_init_fxstate(&state->fxsave);
@@ -252,7 +252,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 * leak into the child task:
 */
if (use_eager_fpu())
-   memset(&dst_fpu->state.xsave, 0, xstate_size);
+   memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
 
/*
 * Save current FPU registers directly into the child
@@ -271,7 +271,8 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 */
preempt_disable();
if (!copy_fpregs_to_fpstate(dst_fpu)) {
-   memcpy(&src_fpu->state, &dst_fpu->state, xstate_size);
+   memcpy(&src_fpu->state, &dst_fpu->state,
+  fpu_kernel_xstate_size);
 
if (use_eager_fpu())
copy_kernel_to_fpregs(&src_fpu->state);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 5b1928c..60f3839 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -145,8 +145,8 @@ static void __init fpu__init_system_generic(void)
  * This is inherent to the XSAVE architecture which puts all state
  * components into a single, continuous memory block:
  */
-unsigned int xstate_size;
-EXPORT_SYMBOL_GPL(xstate_size);
+unsigned int fpu_kernel_xstate_size;
+EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
 
 /* Get alignment of the TYPE. */
 #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
@@ -178,7 +178,7 @@ static void __init fpu__init_task_struct_size(void)
 * Add back the dynamically-calculated register state
 * size.
 */
-   task_size += xstate_size;
+   task_size += fpu_kernel_xstate_size;
 
/*
 * We dynamically size 'struct fpu', so we require that
@@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void)
 }
 
 /*
- * Set up the user and kernel xstate_size based on the legacy FPU context size.
+ * Set up the user and kernel xstate sizes based on the legacy FPU context 
size.
  *
  * We set this up first, and later it will be overwritten by
  * fpu__init_system_xstate() if the CPU knows about xstates.
@@ -208,7 +208,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
on_boot_cpu = 0;
 
/*
-* Note that xstate_size might be overwriten later during
+* Note that xstate sizes might be overwritten later during
 * fpu__init_system_xstate().
 */
 
@@ -219,15 +219,17 @@ static void __init 
fpu__init_system_xstate_size_legacy(void)
 */
setup_clear_cpu_cap

[tip:x86/fpu] x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'

2016-06-17 Thread tip-bot for Fenghua Yu

Commit-ID:  63a5db07a03947218e5f4fb0776df6b6ca328287
Gitweb: http://git.kernel.org/tip/63a5db07a03947218e5f4fb0776df6b6ca328287
Author: Fenghua Yu 
AuthorDate: Fri, 20 May 2016 10:47:06 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 17 Jun 2016 10:10:22 +0200

x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to 
distinguish it from 'fpu_user_xstate_size'

User space uses standard format xsave area. fpstate in signal frame
should have standard format size.

To explicitly distinguish between xstate size in kernel space and the
one in user space, we rename 'xstate_size' to 'fpu_kernel_xstate_size'.

Cleanup only, no change in functionality.

Signed-off-by: Fenghua Yu 
[ Rebased the patch and cleaned up the naming. ]
Signed-off-by: Yu-cheng Yu 
Reviewed-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Quentin Casasnovas 
Cc: Ravi V. Shankar 
Cc: Sai Praneeth Prakhya 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/2ecbae347a5152d94be52adf7d0f3b7305d90d99.1463760376.git.yu-cheng...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/processor.h |  2 +-
 arch/x86/kernel/fpu/core.c   |  7 ---
 arch/x86/kernel/fpu/init.c   | 20 +++-
 arch/x86/kernel/fpu/signal.c |  2 +-
 arch/x86/kernel/fpu/xstate.c |  8 
 5 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0a16a16..965c5d2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -367,7 +367,7 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
 #endif /* X86_64 */
 
-extern unsigned int xstate_size;
+extern unsigned int fpu_kernel_xstate_size;
 extern unsigned int fpu_user_xstate_size;
 
 struct perf_event;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7d56474..c759bd0 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -227,7 +227,7 @@ void fpstate_init(union fpregs_state *state)
return;
}
 
-   memset(state, 0, xstate_size);
+   memset(state, 0, fpu_kernel_xstate_size);
 
if (static_cpu_has(X86_FEATURE_FXSR))
fpstate_init_fxstate(&state->fxsave);
@@ -252,7 +252,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 * leak into the child task:
 */
if (use_eager_fpu())
-   memset(&dst_fpu->state.xsave, 0, xstate_size);
+   memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
 
/*
 * Save current FPU registers directly into the child
@@ -271,7 +271,8 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 */
preempt_disable();
if (!copy_fpregs_to_fpstate(dst_fpu)) {
-   memcpy(&src_fpu->state, &dst_fpu->state, xstate_size);
+   memcpy(&src_fpu->state, &dst_fpu->state,
+  fpu_kernel_xstate_size);
 
if (use_eager_fpu())
copy_kernel_to_fpregs(&src_fpu->state);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 5b1928c..60f3839 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -145,8 +145,8 @@ static void __init fpu__init_system_generic(void)
  * This is inherent to the XSAVE architecture which puts all state
  * components into a single, continuous memory block:
  */
-unsigned int xstate_size;
-EXPORT_SYMBOL_GPL(xstate_size);
+unsigned int fpu_kernel_xstate_size;
+EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
 
 /* Get alignment of the TYPE. */
 #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
@@ -178,7 +178,7 @@ static void __init fpu__init_task_struct_size(void)
 * Add back the dynamically-calculated register state
 * size.
 */
-   task_size += xstate_size;
+   task_size += fpu_kernel_xstate_size;
 
/*
 * We dynamically size 'struct fpu', so we require that
@@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void)
 }
 
 /*
- * Set up the user and kernel xstate_size based on the legacy FPU context size.
+ * Set up the user and kernel xstate sizes based on the legacy FPU context 
size.
  *
  * We set this up first, and later it will be overwritten by
  * fpu__init_system_xstate() if the CPU knows about xstates.
@@ -208,7 +208,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
on_boot_cpu = 0;
 
/*
-* Note that xstate_size might be overwriten later during
+* Note that xstate sizes might be overwritten later during
 * fpu__init_system_xstate().
 */
 
@@ -219,15 +219,17 @@ static void __init 
fpu__init_system_xstate_size_legacy(void)
 */
setup_clear_cpu_cap

[tip:x86/fpu] x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization

2016-06-17 Thread tip-bot for Fenghua Yu

Commit-ID:  2729818f35c9b1a1614624e2edcd3e80c59c8689
Gitweb: http://git.kernel.org/tip/2729818f35c9b1a1614624e2edcd3e80c59c8689
Author: Fenghua Yu 
AuthorDate: Fri, 20 May 2016 10:47:07 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 17 Jun 2016 10:10:23 +0200

x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init 
optimization

Keep init_fpstate.xsave.header.xfeatures as zero for init optimization.
This is important for init optimization that is implemented in processor.
If a bit corresponding to an xstate in xstate_bv is 0, it means the
xstate is in init status and will not be read from memory to the processor
during XRSTOR/XRSTORS instruction. This largely impacts context switch
performance.

Signed-off-by: Fenghua Yu 
Signed-off-by: Yu-cheng Yu 
Reviewed-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Quentin Casasnovas 
Cc: Ravi V. Shankar 
Cc: Sai Praneeth Prakhya 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/2fb4ec7f18b76e8cda057a8c0038def74a9b8044.1463760376.git.yu-cheng...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/fpu/xstate.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 46abfaf..dbfef1b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -329,13 +329,11 @@ static void __init setup_init_fpu_buf(void)
setup_xstate_features();
print_xstate_features();
 
-   if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+   if (boot_cpu_has(X86_FEATURE_XSAVES))
init_fpstate.xsave.header.xcomp_bv = (u64)1 << 63 | 
xfeatures_mask;
-   init_fpstate.xsave.header.xfeatures = xfeatures_mask;
-   }
 
/*
-* Init all the features state with header_bv being 0x0
+* Init all the features state with header.xfeatures being 0x0
 */
copy_kernel_to_xregs_booting(&init_fpstate.xsave);

[tip:x86/fpu] x86/fpu/xstate: Define and use 'fpu_user_xstate_size'

2016-06-17 Thread tip-bot for Fenghua Yu

Commit-ID:  4543ea7e282d313b48cd34bbb9dc89c1dbdd13a7
Gitweb: http://git.kernel.org/tip/4543ea7e282d313b48cd34bbb9dc89c1dbdd13a7
Author: Fenghua Yu 
AuthorDate: Fri, 20 May 2016 10:47:05 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 17 Jun 2016 10:10:22 +0200

x86/fpu/xstate: Define and use 'fpu_user_xstate_size'

The kernel xstate area can be in standard or compacted format;
it is always in standard format for user mode. When XSAVES is
enabled, the kernel uses the compacted format and it is necessary
to use a separate fpu_user_xstate_size for signal/ptrace frames.

Signed-off-by: Fenghua Yu 
[ Rebased the patch and cleaned up the naming. ]
Signed-off-by: Yu-cheng Yu 
Reviewed-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Quentin Casasnovas 
Cc: Ravi V. Shankar 
Cc: Sai Praneeth Prakhya 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/8756ec34dabddfc727cda5743195eb81e8caf91c.1463760376.git.yu-cheng...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/fpu/xstate.h |  1 -
 arch/x86/include/asm/processor.h  |  1 +
 arch/x86/kernel/fpu/init.c|  5 ++-
 arch/x86/kernel/fpu/signal.c  | 27 ++
 arch/x86/kernel/fpu/xstate.c  | 76 ---
 5 files changed, 73 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 38951b0..16df2c4 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -39,7 +39,6 @@
 #define REX_PREFIX
 #endif
 
-extern unsigned int xstate_size;
 extern u64 xfeatures_mask;
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 62c6cc3..0a16a16 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -368,6 +368,7 @@ DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
 #endif /* X86_64 */
 
 extern unsigned int xstate_size;
+extern unsigned int fpu_user_xstate_size;
 
 struct perf_event;
 
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index aacfd7a..5b1928c 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void)
 }
 
 /*
- * Set up the xstate_size based on the legacy FPU context size.
+ * Set up the user and kernel xstate_size based on the legacy FPU context size.
  *
  * We set this up first, and later it will be overwritten by
  * fpu__init_system_xstate() if the CPU knows about xstates.
@@ -226,6 +226,9 @@ static void __init fpu__init_system_xstate_size_legacy(void)
else
xstate_size = sizeof(struct fregs_state);
}
+
+   fpu_user_xstate_size = xstate_size;
+
/*
 * Quirk: we don't yet handle the XSAVES* instructions
 * correctly, as we don't correctly convert between
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c6f2a3c..0d29d4d 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -32,7 +32,7 @@ static inline int check_for_xstate(struct fxregs_state __user 
*buf,
/* Check for the first magic field and other error scenarios. */
if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
fx_sw->xstate_size < min_xstate_size ||
-   fx_sw->xstate_size > xstate_size ||
+   fx_sw->xstate_size > fpu_user_xstate_size ||
fx_sw->xstate_size > fx_sw->extended_size)
return -1;
 
@@ -89,7 +89,8 @@ static inline int save_xstate_epilog(void __user *buf, int 
ia32_frame)
if (!use_xsave())
return err;
 
-   err |= __put_user(FP_XSTATE_MAGIC2, (__u32 *)(buf + xstate_size));
+   err |= __put_user(FP_XSTATE_MAGIC2,
+ (__u32 *)(buf + fpu_user_xstate_size));
 
/*
 * Read the xfeatures which we copied (directly from the cpu or
@@ -126,7 +127,7 @@ static inline int copy_fpregs_to_sigframe(struct 
xregs_state __user *buf)
else
err = copy_fregs_to_user((struct fregs_state __user *) buf);
 
-   if (unlikely(err) && __clear_user(buf, xstate_size))
+   if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
err = -EFAULT;
return err;
 }
@@ -176,8 +177,19 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user 
*buf_fx, int size)
if (ia32_fxstate)
copy_fxregs_to_kernel(&tsk->thread.fpu);
} else {
+   /*
+* It is a *bug* if kernel uses compacted-format for xsave
+* area and we copy it out directly to a signal frame. It
+* should have been handled above by saving the registers
+* directly.
+*/
+

[tip:x86/asm] x86/cpufeature: Enable new AVX-512 features

2016-03-12 Thread tip-bot for Fenghua Yu

Commit-ID:  d05004944206cbbf1c453e179768163731c7c6f1
Gitweb: http://git.kernel.org/tip/d05004944206cbbf1c453e179768163731c7c6f1
Author: Fenghua Yu 
AuthorDate: Thu, 10 Mar 2016 19:38:18 -0800
Committer:  Ingo Molnar 
CommitDate: Sat, 12 Mar 2016 17:30:53 +0100

x86/cpufeature: Enable new AVX-512 features

A few new AVX-512 instruction groups/features are added in cpufeatures.h
for enuermation: AVX512DQ, AVX512BW, and AVX512VL.

Clear the flags in fpu__xstate_clear_all_cpu_caps().

The specification for latest AVX-512 including the features can be found at:

  https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf

Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up
the flags and enable them in KVM.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: Gleb Natapov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Quentin Casasnovas 
Cc: Ravi V Shankar 
Cc: Thomas Gleixner 
Cc: k...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua...@intel.com
[ Added more detailed feature descriptions. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 3 +++
 arch/x86/kernel/fpu/xstate.c   | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index d11a3aa..9e0567f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
 #define X86_FEATURE_CQM( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX( 9*32+14) /* Memory Protection 
Extension */
 #define X86_FEATURE_AVX512F( 9*32+16) /* AVX-512 Foundation */
+#define X86_FEATURE_AVX512DQ   ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) 
Instructions */
 #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX 
instructions */
 #define X86_FEATURE_SMAP   ( 9*32+20) /* Supervisor Mode Access Prevention 
*/
@@ -230,6 +231,8 @@
 #define X86_FEATURE_AVX512ER   ( 9*32+27) /* AVX-512 Exponential and 
Reciprocal */
 #define X86_FEATURE_AVX512CD   ( 9*32+28) /* AVX-512 Conflict Detection */
 #define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction 
Extensions */
+#define X86_FEATURE_AVX512BW   ( 9*32+30) /* AVX-512 BW (Byte/Word granular) 
Instructions */
+#define X86_FEATURE_AVX512VL   ( 9*32+31) /* AVX-512 VL (128/256 Vector 
Length) Extensions */
 
 /* Extended state features, CPUID level 0x000d:1 (eax), word 10 */
 #define X86_FEATURE_XSAVEOPT   (10*32+ 0) /* XSAVEOPT */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index d425cda5..6e8354f 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -51,6 +51,9 @@ void fpu__xstate_clear_all_cpu_caps(void)
setup_clear_cpu_cap(X86_FEATURE_AVX512PF);
setup_clear_cpu_cap(X86_FEATURE_AVX512ER);
setup_clear_cpu_cap(X86_FEATURE_AVX512CD);
+   setup_clear_cpu_cap(X86_FEATURE_AVX512DQ);
+   setup_clear_cpu_cap(X86_FEATURE_AVX512BW);
+   setup_clear_cpu_cap(X86_FEATURE_AVX512VL);
setup_clear_cpu_cap(X86_FEATURE_MPX);
setup_clear_cpu_cap(X86_FEATURE_XGETBV1);
 }

[tip:x86/cache] x86/intel_rapl: Modify hot cpu notification handling

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  2a7a6718afed6b61628ca1845dc49827759bed7d
Gitweb: http://git.kernel.org/tip/2a7a6718afed6b61628ca1845dc49827759bed7d
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:07 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:55 -0800

x86/intel_rapl: Modify hot cpu notification handling

From: Vikas Shivappa 

 - In rapl_cpu_init, use the existing package<->core map instead of
 looping through all cpus in rapl_cpumask.

 - In rapl_cpu_exit, use the same mapping instead of looping all online
 cpus. In large systems with large number of cpus the time taken to
 loop may be expensive and also the time increase linearly.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-3-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c | 35 ++---
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c 
b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
index ed446bd..0e0fe70 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -130,6 +130,12 @@ static struct pmu rapl_pmu_class;
 static cpumask_t rapl_cpu_mask;
 static int rapl_cntr_mask;
 
+/*
+ * Temporary cpumask used during hot cpu notificaiton handling. The usage
+ * is serialized by hot cpu locks.
+ */
+static cpumask_t tmp_cpumask;
+
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu);
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free);
 
@@ -533,18 +539,16 @@ static struct pmu rapl_pmu_class = {
 static void rapl_cpu_exit(int cpu)
 {
struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
-   int i, phys_id = topology_physical_package_id(cpu);
int target = -1;
+   int i;
 
/* find a new cpu on same package */
-   for_each_online_cpu(i) {
-   if (i == cpu)
-   continue;
-   if (phys_id == topology_physical_package_id(i)) {
-   target = i;
-   break;
-   }
-   }
+   cpumask_and(&tmp_cpumask, topology_core_cpumask(cpu), cpu_online_mask);
+   cpumask_clear_cpu(cpu, &tmp_cpumask);
+   i = cpumask_any(&tmp_cpumask);
+   if (i < nr_cpu_ids)
+   target = i;
+
/*
 * clear cpu from cpumask
 * if was set in cpumask and still some cpu on package,
@@ -566,15 +570,10 @@ static void rapl_cpu_exit(int cpu)
 
 static void rapl_cpu_init(int cpu)
 {
-   int i, phys_id = topology_physical_package_id(cpu);
-
-   /* check if phys_is is already covered */
-   for_each_cpu(i, &rapl_cpu_mask) {
-   if (phys_id == topology_physical_package_id(i))
-   return;
-   }
-   /* was not found, so add it */
-   cpumask_set_cpu(cpu, &rapl_cpu_mask);
+   /* check if cpu's package is already covered.If not, add it.*/
+   cpumask_and(&tmp_cpumask, &rapl_cpu_mask, topology_core_cpumask(cpu));
+   if (cpumask_empty(&tmp_cpumask))
+   cpumask_set_cpu(cpu, &rapl_cpu_mask);
 }
 
 static __init void rapl_hsw_server_quirk(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cache] x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  5ad9144cdb9a591caa8f9b33b618f137e1fbea93
Gitweb: http://git.kernel.org/tip/5ad9144cdb9a591caa8f9b33b618f137e1fbea93
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:16 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:57 -0800

x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation

From: Vikas Shivappa 

Add a new cgroup 'intel_rdt' to manage cache allocation. Each cgroup
directory is associated with a class of service id(closid). To map a
task with closid during scheduling, this patch removes the closid field
from task_struct and uses the already existing 'cgroups' field in
task_struct.

The cgroup has a file 'l3_cbm' which represents the L3 cache capacity
bitmask(CBM). The CBM is global for the whole system currently. The
capacity bitmask needs to have only contiguous bits set and number of
bits that can be set is less than the max bits that can be set. The
tasks belonging to a cgroup get to fill in the L3 cache represented by
the capacity bitmask of the cgroup. For ex: if the max bits in the CBM
is 10 and the cache size is 10MB, each bit represents 1MB of cache
capacity.

Root cgroup always has all the bits set in the l3_cbm. User can create
more cgroups with mkdir syscall. By default the child cgroups inherit
the capacity bitmask(CBM) from parent. User can change the CBM specified
in hex for each cgroup. Each unique bitmask is associated with a class
of service ID and an -ENOSPC is returned once we run out of
closids.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-12-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/include/asm/intel_rdt.h |  37 +++-
 arch/x86/kernel/cpu/intel_rdt.c  | 199 +--
 include/linux/cgroup_subsys.h|   4 +
 include/linux/sched.h|   3 -
 init/Kconfig |   4 +-
 5 files changed, 234 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index afb6da3..fbe1e00 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -3,6 +3,7 @@
 
 #ifdef CONFIG_INTEL_RDT
 
+#include 
 #include 
 
 #define MAX_CBM_LENGTH 32
@@ -12,20 +13,54 @@
 extern struct static_key rdt_enable_key;
 void __intel_rdt_sched_in(void *dummy);
 
+struct intel_rdt {
+   struct cgroup_subsys_state css;
+   u32 closid;
+};
+
 struct clos_cbm_table {
unsigned long l3_cbm;
unsigned int clos_refcnt;
 };
 
 /*
+ * Return rdt group corresponding to this container.
+ */
+static inline struct intel_rdt *css_rdt(struct cgroup_subsys_state *css)
+{
+   return css ? container_of(css, struct intel_rdt, css) : NULL;
+}
+
+static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir)
+{
+   return css_rdt(ir->css.parent);
+}
+
+/*
+ * Return rdt group to which this task belongs.
+ */
+static inline struct intel_rdt *task_rdt(struct task_struct *task)
+{
+   return css_rdt(task_css(task, intel_rdt_cgrp_id));
+}
+
+/*
  * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
  *
  * Following considerations are made so that this has minimal impact
  * on scheduler hot path:
  * - This will stay as no-op unless we are running on an Intel SKU
  * which supports L3 cache allocation.
+ * - When support is present and enabled, does not do any
+ * IA32_PQR_MSR writes until the user starts really using the feature
+ * ie creates a rdt cgroup directory and assigns a cache_mask thats
+ * different from the root cgroup's cache_mask.
  * - Caches the per cpu CLOSid values and does the MSR write only
- * when a task with a different CLOSid is scheduled in.
+ * when a task with a different CLOSid is scheduled in. That
+ * means the task belongs to a different cgroup.
+ * - Closids are allocated so that different cgroup directories
+ * with same cache_mask gets the same CLOSid. This minimizes CLOSids
+ * used and reduces MSR write frequency.
  */
 static inline void intel_rdt_sched_in(void)
 {
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index ecaf8e6..acbede2 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -53,11 +53,17 @@ static cpumask_t tmp_cpumask;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
 
+static struct intel_rdt rdt_root_group;
+#define rdt_for_each_child(pos_css, parent_ir) \
+   css_for_each_child((pos_css), &(parent_ir)->css)
+
 struct rdt_remote_data {
int msr;
u64 val;
 };
 
+static DEFINE_SPINLOCK(closid_lock);
+
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
  * as it does not have CPUID enumeration support for Cache allocation.
@@ -108,17 +114,18 @@ static inline bool cache_alloc_supported(struct 
cpuinfo_x86 *c)
return false;
 }
 
-
 void __intel_rdt_sched_in(void *dummy)

[tip:x86/cache] x86,cgroup/intel_rdt : Add intel_rdt cgroup documentation

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  f5faa67fb17b931e2b0223dc8a4d29e64c9bfa9d
Gitweb: http://git.kernel.org/tip/f5faa67fb17b931e2b0223dc8a4d29e64c9bfa9d
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:15 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:57 -0800

x86,cgroup/intel_rdt : Add intel_rdt cgroup documentation

From: Vikas Shivappa 

Add documentation on using the cache allocation cgroup interface with
examples.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-11-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 Documentation/cgroups/rdt.txt | 133 ++
 1 file changed, 133 insertions(+)

diff --git a/Documentation/cgroups/rdt.txt b/Documentation/cgroups/rdt.txt
new file mode 100644
index 000..9fa6c6a
--- /dev/null
+++ b/Documentation/cgroups/rdt.txt
@@ -0,0 +1,133 @@
+RDT
+---
+
+Copyright (C) 2014 Intel Corporation
+Written by vikas.shiva...@linux.intel.com
+
+CONTENTS:
+=
+
+1. Cache Allocation Technology
+  1.1 Why is Cache allocation needed?
+2. Usage Examples and Syntax
+
+1. Cache Allocation Technology
+===
+
+1.1 Why is Cache allocation needed
+--
+
+In today's new processors the number of cores is continuously increasing
+especially in large scale usage models where VMs are used like
+webservers and datacenters. The number of cores increase the number of
+threads or workloads that can simultaneously be run. When
+multi-threaded-applications, VMs, workloads run concurrently they
+compete for shared resources including L3 cache.
+
+The architecture also allows dynamically changing these subsets during
+runtime to further optimize the performance of the higher priority
+application with minimal degradation to the low priority app.
+Additionally, resources can be rebalanced for system throughput benefit.
+This technique may be useful in managing large computer systems which
+large L3 cache.
+
+Cloud/Container use case:
+The key use case scenarios are in large server clusters in a typical
+cloud or container context. A central 'managing agent' would control
+resource allocations to a set of VMs or containers. In today's resource
+management, cgroups are widely used already and a significant amount of
+plumbing in user space is already done to perform tasks like
+allocating/configuring resources dynamically and statically. An
+important example is dockers using systemd and systemd in turn using
+cgroups in its core to manage resources. This makes cgroup interface an
+easily adaptable interface for cache allocation.
+
+Noisy neighbour use case:
+A more specific use case may be when a streaming app which is constantly
+copying data and accessing linear space larger than L3 cache
+and hence evicting a large amount of cache which could have
+otherwise been used by a high priority computing application. Using the
+cache allocation feature, the 'noisy neighbours' like the streaming
+application can be confined to use a smaller cache and the high priority
+application be awarded a larger amount of cache space. A managing agent
+can monitor the cache allocation using cache monitoring through libperf
+and be able to make resource adjustments either statically or
+dynamically.
+This interface hence helps in maintaining a resource policy to
+provide the quality of service requirements like number of requests
+handled, response time.
+
+More information can be found in the Intel SDM June 2015, Volume 3,
+section 17.16. More information on kernel implementation details can be
+found in Documentation/x86/intel_rdt.txt.
+
+2. Usage examples and syntax
+
+
+Following is an example on how a system administrator/root user can
+configure L3 cache allocation to threads.
+
+To enable the cache allocation during compile time set the
+CONFIG_INTEL_RDT=y.
+
+To check if Cache allocation was enabled on your system
+  $ dmesg | grep -i intel_rdt
+  intel_rdt: Intel Cache Allocation enabled
+
+  $ cat /proc/cpuinfo
+output would have 'rdt' (if rdt is enabled) and 'cat_l3' (if L3
+cache allocation is enabled).
+
+example1: Following would mount the cache allocation cgroup subsystem
+and create 2 directories.
+
+  $ cd /sys/fs/cgroup
+  $ mkdir rdt
+  $ mount -t cgroup -ointel_rdt intel_rdt /sys/fs/cgroup/rdt
+  $ cd rdt
+  $ mkdir group1
+  $ mkdir group2
+
+Following are some of the Files in the directory
+
+  $ ls
+  intel_rdt.l3_cbm
+  tasks
+
+Say if the cache is 4MB (looked up from /proc/cpuinfo) and max cbm is 16
+bits (indicated by the root nodes cbm). This assigns 1MB of cache to
+group1 and group2 which is exclusive between them.
+
+  $ cd group1
+  $ /bin/echo 0xf > intel_rdt.l3_cbm
+
+  $ cd group2
+  $ /bin/echo 0xf0 > intel_rdt.l3_cbm
+
+Assign tasks to the group2
+
+  $ /bin/echo PID1 > tasks
+  $ /bin/echo PID2 > tasks
+
+Now threads PID1 and PID2 get to fill the 1MB of cache that was
+allocated

[tip:x86/cache] x86/intel_rdt: Intel haswell Cache Allocation enumeration

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  8741b655628d89380bfbe0ded7a83c0bc2293a72
Gitweb: http://git.kernel.org/tip/8741b655628d89380bfbe0ded7a83c0bc2293a72
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:14 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:56 -0800

x86/intel_rdt: Intel haswell Cache Allocation enumeration

From: Vikas Shivappa 

This patch is specific to Intel haswell (hsw) server SKUs. Cache
Allocation on hsw server needs to be enumerated separately as HSW does
not have support for CPUID enumeration for Cache Allocation. This patch
does a probe by writing a CLOSid (Class of service id) into high 32 bits
of IA32_PQR_MSR and see if the bits stick. The probe is only done after
confirming that the CPU is HSW server. Other hardcoded values are:

 - L3 cache bit mask must be at least two bits.
 - Maximum CLOSids supported is always 4.
 - Maximum bits support in cache bit mask is always 20.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-10-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/kernel/cpu/intel_rdt.c | 59 +++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 31f8588..ecaf8e6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -38,6 +38,10 @@ static struct clos_cbm_table *cctable;
  */
 unsigned long *closmap;
 /*
+ * Minimum bits required in Cache bitmask.
+ */
+static unsigned int min_bitmask_len = 1;
+/*
  * Mask of CPUs for writing CBM values. We only need one CPU per-socket.
  */
 static cpumask_t rdt_cpumask;
@@ -54,6 +58,57 @@ struct rdt_remote_data {
u64 val;
 };
 
+/*
+ * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
+ * as it does not have CPUID enumeration support for Cache allocation.
+ *
+ * Probes by writing to the high 32 bits(CLOSid) of the IA32_PQR_MSR and
+ * testing if the bits stick. Max CLOSids is always 4 and max cbm length
+ * is always 20 on hsw server parts. The minimum cache bitmask length
+ * allowed for HSW server is always 2 bits. Hardcode all of them.
+ */
+static inline bool cache_alloc_hsw_probe(void)
+{
+   u32 l, h_old, h_new, h_tmp;
+
+   if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
+   return false;
+
+   /*
+* Default value is always 0 if feature is present.
+*/
+   h_tmp = h_old ^ 0x1U;
+   if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp) ||
+   rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_new))
+   return false;
+
+   if (h_tmp != h_new)
+   return false;
+
+   wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_old);
+
+   boot_cpu_data.x86_cache_max_closid = 4;
+   boot_cpu_data.x86_cache_max_cbm_len = 20;
+   min_bitmask_len = 2;
+
+   return true;
+}
+
+static inline bool cache_alloc_supported(struct cpuinfo_x86 *c)
+{
+   if (cpu_has(c, X86_FEATURE_CAT_L3))
+   return true;
+
+   /*
+* Probe for Haswell server CPUs.
+*/
+   if (c->x86 == 0x6 && c->x86_model == 0x3f)
+   return cache_alloc_hsw_probe();
+
+   return false;
+}
+
+
 void __intel_rdt_sched_in(void *dummy)
 {
struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
@@ -126,7 +181,7 @@ static bool cbm_validate(unsigned long var)
unsigned long first_bit, zero_bit;
u64 max_cbm;
 
-   if (bitmap_weight(&var, max_cbm_len) < 1)
+   if (bitmap_weight(&var, max_cbm_len) < min_bitmask_len)
return false;
 
max_cbm = (1ULL << max_cbm_len) - 1;
@@ -310,7 +365,7 @@ static int __init intel_rdt_late_init(void)
u32 maxid, max_cbm_len;
int err = 0, size, i;
 
-   if (!cpu_has(c, X86_FEATURE_CAT_L3))
+   if (!cache_alloc_supported(c))
return -ENODEV;
 
maxid = c->x86_cache_max_closid;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cache] x86/intel_rdt: Hot cpu support for Cache Allocation

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  cf0978cd31053d58c99ab74e613147f86ecd1724
Gitweb: http://git.kernel.org/tip/cf0978cd31053d58c99ab74e613147f86ecd1724
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:13 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:56 -0800

x86/intel_rdt: Hot cpu support for Cache Allocation

From: Vikas Shivappa 

This patch adds hot plug cpu support for Intel Cache allocation. Support
includes updating the cache bitmask MSRs IA32_L3_QOS_n when a new CPU
package comes online or goes offline. The IA32_L3_QOS_n MSRs are one per
Class of service on each CPU package. The new package's MSRs are
synchronized with the values of existing MSRs. Also the software cache
for IA32_PQR_ASSOC MSRs are reset during hot cpu notifications.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-9-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/kernel/cpu/intel_rdt.c | 76 +
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 8379df8..31f8588 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -234,6 +235,75 @@ static inline bool rdt_cpumask_update(int cpu)
return false;
 }
 
+/*
+ * cbm_update_msrs() - Updates all the existing IA32_L3_MASK_n MSRs
+ * which are one per CLOSid on the current package.
+ */
+static void cbm_update_msrs(void *dummy)
+{
+   int maxid = boot_cpu_data.x86_cache_max_closid;
+   struct rdt_remote_data info;
+   unsigned int i;
+
+   for (i = 0; i < maxid; i++) {
+   if (cctable[i].clos_refcnt) {
+   info.msr = CBM_FROM_INDEX(i);
+   info.val = cctable[i].l3_cbm;
+   msr_cpu_update(&info);
+   }
+   }
+}
+
+static inline void intel_rdt_cpu_start(int cpu)
+{
+   struct intel_pqr_state *state = &per_cpu(pqr_state, cpu);
+
+   state->closid = 0;
+   mutex_lock(&rdt_group_mutex);
+   if (rdt_cpumask_update(cpu))
+   smp_call_function_single(cpu, cbm_update_msrs, NULL, 1);
+   mutex_unlock(&rdt_group_mutex);
+}
+
+static void intel_rdt_cpu_exit(unsigned int cpu)
+{
+   int i;
+
+   mutex_lock(&rdt_group_mutex);
+   if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
+   mutex_unlock(&rdt_group_mutex);
+   return;
+   }
+
+   cpumask_and(&tmp_cpumask, topology_core_cpumask(cpu), cpu_online_mask);
+   cpumask_clear_cpu(cpu, &tmp_cpumask);
+   i = cpumask_any(&tmp_cpumask);
+
+   if (i < nr_cpu_ids)
+   cpumask_set_cpu(i, &rdt_cpumask);
+   mutex_unlock(&rdt_group_mutex);
+}
+
+static int intel_rdt_cpu_notifier(struct notifier_block *nb,
+ unsigned long action, void *hcpu)
+{
+   unsigned int cpu  = (unsigned long)hcpu;
+
+   switch (action) {
+   case CPU_DOWN_FAILED:
+   case CPU_ONLINE:
+   intel_rdt_cpu_start(cpu);
+   break;
+   case CPU_DOWN_PREPARE:
+   intel_rdt_cpu_exit(cpu);
+   break;
+   default:
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
 static int __init intel_rdt_late_init(void)
 {
struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -261,9 +331,15 @@ static int __init intel_rdt_late_init(void)
goto out_err;
}
 
+   cpu_notifier_register_begin();
+
for_each_online_cpu(i)
rdt_cpumask_update(i);
 
+   __hotcpu_notifier(intel_rdt_cpu_notifier, 0);
+
+   cpu_notifier_register_done();
+
static_key_slow_inc(&rdt_enable_key);
pr_info("Intel cache allocation enabled\n");
 out_err:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cache] x86/intel_rdt: Implement scheduling support for Intel RDT

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  f17254c756e640c8299212b6822faf142a89b813
Gitweb: http://git.kernel.org/tip/f17254c756e640c8299212b6822faf142a89b813
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:12 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:56 -0800

x86/intel_rdt: Implement scheduling support for Intel RDT

From: Vikas Shivappa 

Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. For
Cache Allocation, MSR write would let the task fill in the cache
'subset' represented by the task's capacity bit mask.

The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
CLOSid. During context switch kernel implements this by writing the
CLOSid of the task belongs to the CPU's IA32_PQR_ASSOC MSR.

This patch also implements a common software cache for IA32_PQR_MSR
(RMID 0:9, CLOSId 32:63) to be used by both Cache monitoring (CMT) and
Cache allocation. CMT updates the RMID where as cache_alloc updates the
CLOSid in the software cache. During scheduling when the new RMID/CLOSid
value is different from the cached values, IA32_PQR_MSR is updated.
Since the measured rdmsr latency for IA32_PQR_MSR is very high (~250
 cycles) this software cache is necessary to avoid reading the MSR to
compare the current CLOSid value.

The following considerations are done for the PQR MSR write so that it
minimally impacts scheduler hot path:
 - This path does not exist on any non-intel platforms.
 - On Intel platforms, this would not exist by default unless INTEL_RDT
 is enabled.
 - remains a no-op when INTEL_RDT is enabled and intel SKU does not
 support the feature.
 - When feature is available and enabled, never does MSR write till the
 user manually starts using one of the capacity bit masks.
 - MSR write is only done when there is a task with different Closid is
 scheduled on the CPU. Typically if the task groups are bound to be
 scheduled on a set of CPUs, the number of MSR writes is greatly
 reduced.
 - A per CPU cache of CLOSids is maintained to do the check so that we
 don't have to do a rdmsr which actually costs a lot of cycles.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-8-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/include/asm/intel_rdt.h   | 28 
 arch/x86/include/asm/pqr_common.h  | 27 +++
 arch/x86/kernel/cpu/intel_rdt.c| 25 +
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 26 +++---
 arch/x86/kernel/process_64.c   |  6 ++
 5 files changed, 89 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 4f45dc8..afb6da3 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -3,14 +3,42 @@
 
 #ifdef CONFIG_INTEL_RDT
 
+#include 
+
 #define MAX_CBM_LENGTH 32
 #define IA32_L3_CBM_BASE   0xc90
 #define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
 
+extern struct static_key rdt_enable_key;
+void __intel_rdt_sched_in(void *dummy);
+
 struct clos_cbm_table {
unsigned long l3_cbm;
unsigned int clos_refcnt;
 };
 
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ * which supports L3 cache allocation.
+ * - Caches the per cpu CLOSid values and does the MSR write only
+ * when a task with a different CLOSid is scheduled in.
+ */
+static inline void intel_rdt_sched_in(void)
+{
+   /*
+* Call the schedule in code only when RDT is enabled.
+*/
+   if (static_key_false(&rdt_enable_key))
+   __intel_rdt_sched_in(NULL);
+}
+
+#else
+
+static inline void intel_rdt_sched_in(void) {}
+
 #endif
 #endif
diff --git a/arch/x86/include/asm/pqr_common.h 
b/arch/x86/include/asm/pqr_common.h
new file mode 100644
index 000..11e985c
--- /dev/null
+++ b/arch/x86/include/asm/pqr_common.h
@@ -0,0 +1,27 @@
+#ifndef _X86_RDT_H_
+#define _X86_RDT_H_
+
+#define MSR_IA32_PQR_ASSOC 0x0c8f
+
+/**
+ * struct intel_pqr_state - State cache for the PQR MSR
+ * @rmid:  The cached Resource Monitoring ID
+ * @closid:The cached Class Of Service ID
+ * @rmid_usecnt:   The usage counter for rmid
+ *
+ * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
+ * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
+ * contains both parts, so we need to cache them.
+ *
+ * The cache also helps to avoid pointless updates if the value does
+ * not change.
+ */
+struct intel_pqr_state {
+   u32 rmid;
+   u32 closid;
+   int rmid_usecnt;
+};
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
+
+#endif
diff --git a/arch/x86/kernel/cpu/i

[tip:x86/cache] x86/intel_rdt: Add support for Cache Allocation detection

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  257372262056d9e963990a1ad6a917ca0b57d80e
Gitweb: http://git.kernel.org/tip/257372262056d9e963990a1ad6a917ca0b57d80e
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:09 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:55 -0800

x86/intel_rdt: Add support for Cache Allocation detection

From: Vikas Shivappa 

This patch includes CPUID enumeration routines for Cache allocation and
new values to track resources to the cpuinfo_x86 structure.

Cache allocation provides a way for the Software (OS/VMM) to restrict
cache allocation to a defined 'subset' of cache which may be overlapping
with other 'subsets'. This feature is used when allocating a line in
cache ie when pulling new data into the cache. The programming of the
hardware is done via programming MSRs (model specific registers).

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-5-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/include/asm/cpufeature.h |  6 +-
 arch/x86/include/asm/processor.h  |  3 +++
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/common.c  | 15 +++
 arch/x86/kernel/cpu/intel_rdt.c   | 40 +++
 init/Kconfig  | 12 
 6 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index e4f8010..671abaa 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -12,7 +12,7 @@
 #include 
 #endif
 
-#define NCAPINTS   14  /* N 32-bit words worth of info */
+#define NCAPINTS   15  /* N 32-bit words worth of info */
 #define NBUGINTS   1   /* N 32-bit bug flags */
 
 /*
@@ -231,6 +231,7 @@
 #define X86_FEATURE_RTM( 9*32+11) /* Restricted Transactional 
Memory */
 #define X86_FEATURE_CQM( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX( 9*32+14) /* Memory Protection 
Extension */
+#define X86_FEATURE_RDT( 9*32+15) /* Resource Allocation */
 #define X86_FEATURE_AVX512F( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX 
instructions */
@@ -258,6 +259,9 @@
 /* AMD-defined CPU features, CPUID level 0x8008 (ebx), word 13 */
 #define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */
 
+/* Intel-defined CPU features, CPUID level 0x0010:0 (ebx), word 13 */
+#define X86_FEATURE_CAT_L3 (14*32 + 1) /* Cache Allocation L3 */
+
 /*
  * BUG word(s)
  */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 6752225..c0aa1eb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -120,6 +120,9 @@ struct cpuinfo_x86 {
int x86_cache_occ_scale;/* scale to bytes */
int x86_power;
unsigned long   loops_per_jiffy;
+   /* Cache Allocation values: */
+   u16 x86_cache_max_cbm_len;
+   u16 x86_cache_max_closid;
/* cpuid returned max cores value: */
u16  x86_max_cores;
u16 apicid;
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 5803130..b3292a4 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_CPU_SUP_INTEL)   += perf_event_msr.o
 obj-$(CONFIG_CPU_SUP_AMD)  += perf_event_msr.o
 endif
 
+obj-$(CONFIG_INTEL_RDT)+= intel_rdt.o
 
 obj-$(CONFIG_X86_MCE)  += mcheck/
 obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c2b7522..e64dc78 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -653,6 +653,21 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
}
}
 
+   /* Additional Intel-defined flags: level 0x0010 */
+   if (c->cpuid_level >= 0x0010) {
+   u32 eax, ebx, ecx, edx;
+
+   cpuid_count(0x0010, 0, &eax, &ebx, &ecx, &edx);
+   c->x86_capability[14] = ebx;
+
+   if (cpu_has(c, X86_FEATURE_CAT_L3)) {
+
+   cpuid_count(0x0010, 1, &eax, &ebx, &ecx, &edx);
+   c->x86_cache_max_closid = edx + 1;
+   c->x86_cache_max_cbm_len = eax + 1;
+   }
+   }
+
/* AMD-defined flags: level 0x8001 */
xlvl = cpuid_eax(0x8000);
c->extended_cpuid_level = xlvl;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 000..f49e970
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,40 @@
+/*
+

[tip:x86/cache] x86/intel_rdt: Add Class of service management

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  d4223b381c10bff94dc7491806b6108429831fc6
Gitweb: http://git.kernel.org/tip/d4223b381c10bff94dc7491806b6108429831fc6
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:10 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:56 -0800

x86/intel_rdt: Add Class of service management

From: Vikas Shivappa 

Adds some data-structures and APIs to support Class of service
management(closid). There is a new clos_cbm table which keeps a 1:1
mapping between closid and capacity bit mask (cbm)
and a count of usage of closid. Each task would be associated with a
Closid at a time and this patch adds a new field closid to task_struct
to keep track of the same.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-6-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/include/asm/intel_rdt.h | 12 ++
 arch/x86/kernel/cpu/intel_rdt.c  | 82 +++-
 include/linux/sched.h|  3 ++
 3 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 000..88b7643
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,12 @@
+#ifndef _RDT_H_
+#define _RDT_H_
+
+#ifdef CONFIG_INTEL_RDT
+
+struct clos_cbm_table {
+   unsigned long l3_cbm;
+   unsigned int clos_refcnt;
+};
+
+#endif
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index f49e970..d79213a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,17 +24,95 @@
 
 #include 
 #include 
+#include 
+
+/*
+ * cctable maintains 1:1 mapping between CLOSid and cache bitmask.
+ */
+static struct clos_cbm_table *cctable;
+/*
+ * closid availability bit map.
+ */
+unsigned long *closmap;
+static DEFINE_MUTEX(rdt_group_mutex);
+
+static inline void closid_get(u32 closid)
+{
+   struct clos_cbm_table *cct = &cctable[closid];
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   cct->clos_refcnt++;
+}
+
+static int closid_alloc(u32 *closid)
+{
+   u32 maxid;
+   u32 id;
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   maxid = boot_cpu_data.x86_cache_max_closid;
+   id = find_first_zero_bit(closmap, maxid);
+   if (id == maxid)
+   return -ENOSPC;
+
+   set_bit(id, closmap);
+   closid_get(id);
+   *closid = id;
+
+   return 0;
+}
+
+static inline void closid_free(u32 closid)
+{
+   clear_bit(closid, closmap);
+   cctable[closid].l3_cbm = 0;
+}
+
+static void closid_put(u32 closid)
+{
+   struct clos_cbm_table *cct = &cctable[closid];
+
+   lockdep_assert_held(&rdt_group_mutex);
+   if (WARN_ON(!cct->clos_refcnt))
+   return;
+
+   if (!--cct->clos_refcnt)
+   closid_free(closid);
+}
 
 static int __init intel_rdt_late_init(void)
 {
struct cpuinfo_x86 *c = &boot_cpu_data;
+   u32 maxid, max_cbm_len;
+   int err = 0, size;
 
if (!cpu_has(c, X86_FEATURE_CAT_L3))
return -ENODEV;
 
-   pr_info("Intel cache allocation detected\n");
+   maxid = c->x86_cache_max_closid;
+   max_cbm_len = c->x86_cache_max_cbm_len;
 
-   return 0;
+   size = maxid * sizeof(struct clos_cbm_table);
+   cctable = kzalloc(size, GFP_KERNEL);
+   if (!cctable) {
+   err = -ENOMEM;
+   goto out_err;
+   }
+
+   size = BITS_TO_LONGS(maxid) * sizeof(long);
+   closmap = kzalloc(size, GFP_KERNEL);
+   if (!closmap) {
+   kfree(cctable);
+   err = -ENOMEM;
+   goto out_err;
+   }
+
+   pr_info("Intel cache allocation enabled\n");
+out_err:
+
+   return err;
 }
 
 late_initcall(intel_rdt_late_init);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index edad7a4..0a6db46 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1668,6 +1668,9 @@ struct task_struct {
/* cg_list protected by css_set_lock and tsk->alloc_lock */
struct list_head cg_list;
 #endif
+#ifdef CONFIG_INTEL_RDT
+   u32 closid;
+#endif
 #ifdef CONFIG_FUTEX
struct robust_list_head __user *robust_list;
 #ifdef CONFIG_COMPAT
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cache] x86/intel_rdt: Add L3 cache capacity bitmask management

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  a424209c74c3c30fb1677075afa5d9277e01c46b
Gitweb: http://git.kernel.org/tip/a424209c74c3c30fb1677075afa5d9277e01c46b
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:11 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:56 -0800

x86/intel_rdt: Add L3 cache capacity bitmask management

From: Vikas Shivappa 

This patch adds different APIs to manage the L3 cache capacity bitmask.
The capacity bit mask(CBM) needs to have only contiguous bits set. The
current implementation has a global CBM for each class of service id.
There are APIs added to update the CBM via MSR write to IA32_L3_MASK_n
on all packages. Other APIs are to read and write entries to the
clos_cbm_table.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-7-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/include/asm/intel_rdt.h |   4 ++
 arch/x86/kernel/cpu/intel_rdt.c  | 133 ++-
 2 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 88b7643..4f45dc8 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -3,6 +3,10 @@
 
 #ifdef CONFIG_INTEL_RDT
 
+#define MAX_CBM_LENGTH 32
+#define IA32_L3_CBM_BASE   0xc90
+#define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
+
 struct clos_cbm_table {
unsigned long l3_cbm;
unsigned int clos_refcnt;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index d79213a..6ad5b48 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -34,8 +34,22 @@ static struct clos_cbm_table *cctable;
  * closid availability bit map.
  */
 unsigned long *closmap;
+/*
+ * Mask of CPUs for writing CBM values. We only need one CPU per-socket.
+ */
+static cpumask_t rdt_cpumask;
+/*
+ * Temporary cpumask used during hot cpu notificaiton handling. The usage
+ * is serialized by hot cpu locks.
+ */
+static cpumask_t tmp_cpumask;
 static DEFINE_MUTEX(rdt_group_mutex);
 
+struct rdt_remote_data {
+   int msr;
+   u64 val;
+};
+
 static inline void closid_get(u32 closid)
 {
struct clos_cbm_table *cct = &cctable[closid];
@@ -82,11 +96,126 @@ static void closid_put(u32 closid)
closid_free(closid);
 }
 
+static bool cbm_validate(unsigned long var)
+{
+   u32 max_cbm_len = boot_cpu_data.x86_cache_max_cbm_len;
+   unsigned long first_bit, zero_bit;
+   u64 max_cbm;
+
+   if (bitmap_weight(&var, max_cbm_len) < 1)
+   return false;
+
+   max_cbm = (1ULL << max_cbm_len) - 1;
+   if (var & ~max_cbm)
+   return false;
+
+   first_bit = find_first_bit(&var, max_cbm_len);
+   zero_bit = find_next_zero_bit(&var, max_cbm_len, first_bit);
+
+   if (find_next_bit(&var, max_cbm_len, zero_bit) < max_cbm_len)
+   return false;
+
+   return true;
+}
+
+static int clos_cbm_table_read(u32 closid, unsigned long *l3_cbm)
+{
+   u32 maxid = boot_cpu_data.x86_cache_max_closid;
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   if (closid >= maxid)
+   return -EINVAL;
+
+   *l3_cbm = cctable[closid].l3_cbm;
+
+   return 0;
+}
+
+/*
+ * clos_cbm_table_update() - Update a clos cbm table entry.
+ * @closid: the closid whose cbm needs to be updated
+ * @cbm: the new cbm value that has to be updated
+ *
+ * This assumes the cbm is validated as per the interface requirements
+ * and the cache allocation requirements(through the cbm_validate).
+ */
+static int clos_cbm_table_update(u32 closid, unsigned long cbm)
+{
+   u32 maxid = boot_cpu_data.x86_cache_max_closid;
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   if (closid >= maxid)
+   return -EINVAL;
+
+   cctable[closid].l3_cbm = cbm;
+
+   return 0;
+}
+
+static bool cbm_search(unsigned long cbm, u32 *closid)
+{
+   u32 maxid = boot_cpu_data.x86_cache_max_closid;
+   u32 i;
+
+   for (i = 0; i < maxid; i++) {
+   if (cctable[i].clos_refcnt &&
+   bitmap_equal(&cbm, &cctable[i].l3_cbm, MAX_CBM_LENGTH)) {
+   *closid = i;
+   return true;
+   }
+   }
+
+   return false;
+}
+
+static void closcbm_map_dump(void)
+{
+   u32 i;
+
+   pr_debug("CBMMAP\n");
+   for (i = 0; i < boot_cpu_data.x86_cache_max_closid; i++) {
+   pr_debug("l3_cbm: 0x%x,clos_refcnt: %u\n",
+(unsigned int)cctable[i].l3_cbm, cctable[i].clos_refcnt);
+   }
+}
+
+static void msr_cpu_update(void *arg)
+{
+   struct rdt_remote_data *info = arg;
+
+   wrmsrl(info->msr, info->val);
+}
+
+/*
+ * msr_update_all() - Update the msr for all packages.
+ */
+static inline void msr_update_all(int msr, u64 val)
+{
+   struct rdt_remote_data info;
+
+   info.msr = msr;
+   info.v

[tip:x86/cache] x86/intel_rdt: Cache Allocation documentation

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  133b3d646e2cc7b49c71dc0fdff76a690611a5d0
Gitweb: http://git.kernel.org/tip/133b3d646e2cc7b49c71dc0fdff76a690611a5d0
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:08 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:55 -0800

x86/intel_rdt: Cache Allocation documentation

From: Vikas Shivappa 

Adds a description of Cache allocation technology, overview of kernel
framework implementation. The framework has APIs to manage class of
service, capacity bitmask(CBM), scheduling support and other
architecture specific implementation. The APIs are used to build the
cgroup interface in later patches.

Cache allocation is a sub-feature of Resource Director Technology (RDT)
or Platform Shared resource control which provides support to control
Platform shared resources like L3 cache.

Cache Allocation Technology provides a way for the Software (OS/VMM) to
restrict cache allocation to a defined 'subset' of cache which may be
overlapping with other 'subsets'. This feature is used when allocating a
line in cache ie when pulling new data into the cache. The tasks are
grouped into CLOS (class of service). OS uses MSR writes to indicate the
CLOSid of the thread when scheduling in and to indicate the cache
capacity associated with the CLOSid. Currently cache allocation is
supported for L3 cache.

More information can be found in the Intel SDM June 2015, Volume 3,
section 17.16.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-4-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 Documentation/x86/intel_rdt.txt | 109 
 1 file changed, 109 insertions(+)

diff --git a/Documentation/x86/intel_rdt.txt b/Documentation/x86/intel_rdt.txt
new file mode 100644
index 000..05ec819
--- /dev/null
+++ b/Documentation/x86/intel_rdt.txt
@@ -0,0 +1,109 @@
+Intel RDT
+-
+
+Copyright (C) 2014 Intel Corporation
+Written by vikas.shiva...@linux.intel.com
+
+CONTENTS:
+=
+
+1. Cache Allocation Technology
+  1.1 What is RDT and Cache allocation ?
+  1.2 Why is Cache allocation needed ?
+  1.3 Cache allocation implementation overview
+  1.4 Assignment of CBM and CLOS
+  1.5 Scheduling and Context Switch
+
+1. Cache Allocation Technology
+===
+
+1.1 What is RDT and Cache allocation
+
+
+Cache allocation is a sub-feature of Resource Director Technology (RDT)
+Allocation or Platform Shared resource control which provides support to
+control Platform shared resources like L3 cache. Currently L3 Cache is
+the only resource that is supported in RDT. More information can be
+found in the Intel SDM June 2015, Volume 3, section 17.16.
+
+Cache Allocation Technology provides a way for the Software (OS/VMM) to
+restrict cache allocation to a defined 'subset' of cache which may be
+overlapping with other 'subsets'. This feature is used when allocating a
+line in cache ie when pulling new data into the cache. The programming
+of the h/w is done via programming MSRs.
+
+The different cache subsets are identified by CLOS identifier (class of
+service) and each CLOS has a CBM (cache bit mask). The CBM is a
+contiguous set of bits which defines the amount of cache resource that
+is available for each 'subset'.
+
+1.2 Why is Cache allocation needed
+--
+
+In todays new processors the number of cores is continuously increasing
+especially in large scale usage models where VMs are used like
+webservers and datacenters. The number of cores increase the number of
+threads or workloads that can simultaneously be run. When
+multi-threaded-applications, VMs, workloads run concurrently they
+compete for shared resources including L3 cache.
+
+The architecture also allows dynamically changing these subsets during
+runtime to further optimize the performance of the higher priority
+application with minimal degradation to the low priority app.
+Additionally, resources can be rebalanced for system throughput benefit.
+
+This technique may be useful in managing large computer server systems
+with large L3 cache, in the cloud and container context. Examples may be
+large servers running instances of webservers or database servers. In
+such complex systems, these subsets can be used for more careful placing
+of the available cache resources by a centralized root accessible
+interface.
+
+A specific use case may be to solve the noisy neighbour issue when a app
+which is constantly copying data like streaming app is using large
+amount of cache which could have otherwise been used by a high priority
+computing application. Using the cache allocation feature, the streaming
+application can be confined to use a smaller cache and the high priority
+application be awarded a larger amount of cache space.
+
+1.3 Cache allocation implementation Overview
+
+
+Kernel has a new field i

[tip:x86/cache] x86/intel_cqm: Modify hot cpu notification handling

2015-12-18 Thread tip-bot for Fenghua Yu

Commit-ID:  8a91dc4e92327b61fbe5941d25e74660e2a44579
Gitweb: http://git.kernel.org/tip/8a91dc4e92327b61fbe5941d25e74660e2a44579
Author: Fenghua Yu 
AuthorDate: Thu, 17 Dec 2015 14:46:06 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 18 Dec 2015 13:17:55 -0800

x86/intel_cqm: Modify hot cpu notification handling

From: Vikas Shivappa 

 - In cqm_pick_event_reader, use the existing package<->core map instead
 of looping through all cpus in cqm_cpumask.

 - In intel_cqm_cpu_exit, use the same map instead of looping through
 all online cpus. In large systems with large number of cpus the time
 taken to loop may be expensive and also the time increases linearly.

Signed-off-by: Vikas Shivappa 
Link: 
http://lkml.kernel.org/r/1450392376-6397-2-git-send-email-fenghua...@intel.com
Signed-off-by: Fenghua Yu 
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 34 +++---
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c 
b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index a316ca9..dd82bc7 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -62,6 +62,12 @@ static LIST_HEAD(cache_groups);
  */
 static cpumask_t cqm_cpumask;
 
+/*
+ * Temporary cpumask used during hot cpu notificaiton handling. The usage
+ * is serialized by hot cpu locks.
+ */
+static cpumask_t tmp_cpumask;
+
 #define RMID_VAL_ERROR (1ULL << 63)
 #define RMID_VAL_UNAVAIL   (1ULL << 62)
 
@@ -1244,15 +1250,13 @@ static struct pmu intel_cqm_pmu = {
 
 static inline void cqm_pick_event_reader(int cpu)
 {
-   int phys_id = topology_physical_package_id(cpu);
-   int i;
+   cpumask_and(&tmp_cpumask, &cqm_cpumask, topology_core_cpumask(cpu));
 
-   for_each_cpu(i, &cqm_cpumask) {
-   if (phys_id == topology_physical_package_id(i))
-   return; /* already got reader for this socket */
-   }
-
-   cpumask_set_cpu(cpu, &cqm_cpumask);
+   /*
+* Pick a reader if there isn't one already.
+*/
+   if (cpumask_empty(&tmp_cpumask))
+   cpumask_set_cpu(cpu, &cqm_cpumask);
 }
 
 static void intel_cqm_cpu_starting(unsigned int cpu)
@@ -1270,7 +1274,6 @@ static void intel_cqm_cpu_starting(unsigned int cpu)
 
 static void intel_cqm_cpu_exit(unsigned int cpu)
 {
-   int phys_id = topology_physical_package_id(cpu);
int i;
 
/*
@@ -1279,15 +1282,12 @@ static void intel_cqm_cpu_exit(unsigned int cpu)
if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask))
return;
 
-   for_each_online_cpu(i) {
-   if (i == cpu)
-   continue;
+   cpumask_and(&tmp_cpumask, topology_core_cpumask(cpu), cpu_online_mask);
+   cpumask_clear_cpu(cpu, &tmp_cpumask);
+   i = cpumask_any(&tmp_cpumask);
 
-   if (phys_id == topology_physical_package_id(i)) {
-   cpumask_set_cpu(i, &cqm_cpumask);
-   break;
-   }
-   }
+   if (i < nr_cpu_ids)
+   cpumask_set_cpu(i, &cqm_cpumask);
 }
 
 static int intel_cqm_cpu_notifier(struct notifier_block *nb,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/mm] x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes

2015-02-18 Thread tip-bot for Fenghua Yu

Commit-ID:  1db491f77b6ed0f32f1d4a3ac40a5be9524f1914
Gitweb: http://git.kernel.org/tip/1db491f77b6ed0f32f1d4a3ac40a5be9524f1914
Author: Fenghua Yu 
AuthorDate: Thu, 15 Jan 2015 20:30:01 -0800
Committer:  Ingo Molnar 
CommitDate: Thu, 19 Feb 2015 01:28:38 +0100

x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes

With more embedded systems emerging using Quark, among other
things, 32-bit kernel matters again. 32-bit machine and kernel
uses PAE paging, which currently wastes at least 4K of memory
per process on Linux where we have to reserve an entire page to
support a single 32-byte PGD structure. It would be a very good
thing if we could eliminate that wastage.

PAE paging is used to access more than 4GB memory on x86-32. And
it is required for NX.

In this patch, we still allocate one page for pgd for a Xen
domain and 64-bit kernel because one page pgd is assumed in
these cases. But we can save memory space by only allocating
32-byte pgd for 32-bit PAE kernel when it is not running as a
Xen domain.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Christoph Lameter 
Cc: Dave Hansen 
Cc: Glenn Williamson 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1421382601-46912-1-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/pgtable.c | 81 +--
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6fb6927..d223e1f 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -271,12 +271,87 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, 
pgd_t *pgd, pmd_t *pmds[])
}
 }
 
+/*
+ * Xen paravirt assumes pgd table should be in one page. 64 bit kernel also
+ * assumes that pgd should be in one page.
+ *
+ * But kernel with PAE paging that is not running as a Xen domain
+ * only needs to allocate 32 bytes for pgd instead of one page.
+ */
+#ifdef CONFIG_X86_PAE
+
+#include 
+
+#define PGD_SIZE   (PTRS_PER_PGD * sizeof(pgd_t))
+#define PGD_ALIGN  32
+
+static struct kmem_cache *pgd_cache;
+
+static int __init pgd_cache_init(void)
+{
+   /*
+* When PAE kernel is running as a Xen domain, it does not use
+* shared kernel pmd. And this requires a whole page for pgd.
+*/
+   if (!SHARED_KERNEL_PMD)
+   return 0;
+
+   /*
+* when PAE kernel is not running as a Xen domain, it uses
+* shared kernel pmd. Shared kernel pmd does not require a whole
+* page for pgd. We are able to just allocate a 32-byte for pgd.
+* During boot time, we create a 32-byte slab for pgd table allocation.
+*/
+   pgd_cache = kmem_cache_create("pgd_cache", PGD_SIZE, PGD_ALIGN,
+ SLAB_PANIC, NULL);
+   if (!pgd_cache)
+   return -ENOMEM;
+
+   return 0;
+}
+core_initcall(pgd_cache_init);
+
+static inline pgd_t *_pgd_alloc(void)
+{
+   /*
+* If no SHARED_KERNEL_PMD, PAE kernel is running as a Xen domain.
+* We allocate one page for pgd.
+*/
+   if (!SHARED_KERNEL_PMD)
+   return (pgd_t *)__get_free_page(PGALLOC_GFP);
+
+   /*
+* Now PAE kernel is not running as a Xen domain. We can allocate
+* a 32-byte slab for pgd to save memory space.
+*/
+   return kmem_cache_alloc(pgd_cache, PGALLOC_GFP);
+}
+
+static inline void _pgd_free(pgd_t *pgd)
+{
+   if (!SHARED_KERNEL_PMD)
+   free_page((unsigned long)pgd);
+   else
+   kmem_cache_free(pgd_cache, pgd);
+}
+#else
+static inline pgd_t *_pgd_alloc(void)
+{
+   return (pgd_t *)__get_free_page(PGALLOC_GFP);
+}
+
+static inline void _pgd_free(pgd_t *pgd)
+{
+   free_page((unsigned long)pgd);
+}
+#endif /* CONFIG_X86_PAE */
+
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
pgd_t *pgd;
pmd_t *pmds[PREALLOCATED_PMDS];
 
-   pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);
+   pgd = _pgd_alloc();
 
if (pgd == NULL)
goto out;
@@ -306,7 +381,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 out_free_pmds:
free_pmds(pmds);
 out_free_pgd:
-   free_page((unsigned long)pgd);
+   _pgd_free(pgd);
 out:
return NULL;
 }
@@ -316,7 +391,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
pgd_mop_up_pmds(mm, pgd);
pgd_dtor(pgd);
paravirt_pgd_free(mm, pgd);
-   free_page((unsigned long)pgd);
+   _pgd_free(pgd);
 }
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Clean up code in xstate offsets computation in xsave area

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  8ff925e10f2c72680918b95173ef4f8bb982d59e
Gitweb: http://git.kernel.org/tip/8ff925e10f2c72680918b95173ef4f8bb982d59e
Author: Fenghua Yu 
AuthorDate: Fri, 30 May 2014 14:59:24 -0700
Committer:  H. Peter Anvin 
CommitDate: Fri, 30 May 2014 17:12:41 -0700

x86/xsaves: Clean up code in xstate offsets computation in xsave area

This patch cleans up some code in xstate offsets computation in xsave
area:

1. It changes xstate_comp_offsets as an array. This avoids possible NULL pointer
   caused by possible kmalloc() failure during boot time.
2. It changes the global variable xstate_comp_sizes to a local variable because
   it is used only in setup_xstate_comp().
3. It adds missing offsets for FP and SSE in xsave area.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/xsave.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a6cb823..940b142 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -26,7 +26,7 @@ struct xsave_struct *init_xstate_buf;
 
 static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32;
 static unsigned int *xstate_offsets, *xstate_sizes;
-static unsigned int *xstate_comp_offsets, *xstate_comp_sizes;
+static unsigned int xstate_comp_offsets[sizeof(pcntxt_mask)*8];
 static unsigned int xstate_features;
 
 /*
@@ -491,11 +491,16 @@ static void __init setup_xstate_features(void)
  */
 void setup_xstate_comp(void)
 {
+   unsigned int xstate_comp_sizes[sizeof(pcntxt_mask)*8];
int i;
 
-   xstate_comp_offsets = kmalloc(xstate_features * sizeof(int),
- GFP_KERNEL);
-   xstate_comp_sizes = kmalloc(xstate_features * sizeof(int), GFP_KERNEL);
+   /*
+* The FP xstates and SSE xstates are legacy states. They are always
+* in the fixed offsets in the xsave area in either compacted form
+* or standard form.
+*/
+   xstate_comp_offsets[0] = 0;
+   xstate_comp_offsets[1] = offsetof(struct i387_fxsave_struct, xmm_space);
 
if (!cpu_has_xsaves) {
for (i = 2; i < xstate_features; i++) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/cpufeature.h: Reformat x86 feature macros

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  446fd806f5408b623fa51f3aa084e56844563779
Gitweb: http://git.kernel.org/tip/446fd806f5408b623fa51f3aa084e56844563779
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:29 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 12:37:10 -0700

x86/cpufeature.h: Reformat x86 feature macros

In each X86 feature macro definition, add one space in front of the word
number which is a one-digit number currently.

The purpose of reformatting the macros is to align one-digit and two-digit
word numbers.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-2-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/cpufeature.h | 362 +++---
 1 file changed, 181 insertions(+), 181 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index e265ff9..2837b92 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -18,213 +18,213 @@
  */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */
-#define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */
-#define X86_FEATURE_VME(0*32+ 1) /* Virtual Mode Extensions */
-#define X86_FEATURE_DE (0*32+ 2) /* Debugging Extensions */
-#define X86_FEATURE_PSE(0*32+ 3) /* Page Size Extensions */
-#define X86_FEATURE_TSC(0*32+ 4) /* Time Stamp Counter */
-#define X86_FEATURE_MSR(0*32+ 5) /* Model-Specific Registers */
-#define X86_FEATURE_PAE(0*32+ 6) /* Physical Address 
Extensions */
-#define X86_FEATURE_MCE(0*32+ 7) /* Machine Check Exception */
-#define X86_FEATURE_CX8(0*32+ 8) /* CMPXCHG8 instruction */
-#define X86_FEATURE_APIC   (0*32+ 9) /* Onboard APIC */
-#define X86_FEATURE_SEP(0*32+11) /* SYSENTER/SYSEXIT */
-#define X86_FEATURE_MTRR   (0*32+12) /* Memory Type Range Registers */
-#define X86_FEATURE_PGE(0*32+13) /* Page Global Enable */
-#define X86_FEATURE_MCA(0*32+14) /* Machine Check Architecture 
*/
-#define X86_FEATURE_CMOV   (0*32+15) /* CMOV instructions */
+#define X86_FEATURE_FPU( 0*32+ 0) /* Onboard FPU */
+#define X86_FEATURE_VME( 0*32+ 1) /* Virtual Mode Extensions */
+#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */
+#define X86_FEATURE_PSE( 0*32+ 3) /* Page Size Extensions */
+#define X86_FEATURE_TSC( 0*32+ 4) /* Time Stamp Counter */
+#define X86_FEATURE_MSR( 0*32+ 5) /* Model-Specific Registers 
*/
+#define X86_FEATURE_PAE( 0*32+ 6) /* Physical Address 
Extensions */
+#define X86_FEATURE_MCE( 0*32+ 7) /* Machine Check Exception */
+#define X86_FEATURE_CX8( 0*32+ 8) /* CMPXCHG8 instruction */
+#define X86_FEATURE_APIC   ( 0*32+ 9) /* Onboard APIC */
+#define X86_FEATURE_SEP( 0*32+11) /* SYSENTER/SYSEXIT */
+#define X86_FEATURE_MTRR   ( 0*32+12) /* Memory Type Range Registers */
+#define X86_FEATURE_PGE( 0*32+13) /* Page Global Enable */
+#define X86_FEATURE_MCA( 0*32+14) /* Machine Check 
Architecture */
+#define X86_FEATURE_CMOV   ( 0*32+15) /* CMOV instructions */
  /* (plus FCMOVcc, FCOMI with FPU) */
-#define X86_FEATURE_PAT(0*32+16) /* Page Attribute Table */
-#define X86_FEATURE_PSE36  (0*32+17) /* 36-bit PSEs */
-#define X86_FEATURE_PN (0*32+18) /* Processor serial number */
-#define X86_FEATURE_CLFLUSH(0*32+19) /* CLFLUSH instruction */
-#define X86_FEATURE_DS (0*32+21) /* "dts" Debug Store */
-#define X86_FEATURE_ACPI   (0*32+22) /* ACPI via MSR */
-#define X86_FEATURE_MMX(0*32+23) /* Multimedia Extensions */
-#define X86_FEATURE_FXSR   (0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */
-#define X86_FEATURE_XMM(0*32+25) /* "sse" */
-#define X86_FEATURE_XMM2   (0*32+26) /* "sse2" */
-#define X86_FEATURE_SELFSNOOP  (0*32+27) /* "ss" CPU self snoop */
-#define X86_FEATURE_HT (0*32+28) /* Hyper-Threading */
-#define X86_FEATURE_ACC(0*32+29) /* "tm" Automatic clock 
control */
-#define X86_FEATURE_IA64   (0*32+30) /* IA-64 processor */
-#define X86_FEATURE_PBE(0*32+31) /* Pending Break Enable */
+#define X86_FEATURE_PAT( 0*32+16) /* Page Attribute Table */
+#define X86_FEATURE_PSE36  ( 0*32+17) /* 36-bit PSEs */
+#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */
+#define X86_FEATURE_CLFLUSH( 0*32+19) /* CLFLUSH instruction */
+#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */
+#define X86_FEATURE_ACPI   ( 0*32+22) /* ACPI via MSR */
+#define X86_FEATURE_MMX( 0*32+23) /* Multimedia Extensions */
+#define X86_FEATURE

[tip:x86/xsave] Define kernel API to get address of each state in xsave area

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  7496d6458fe3219d63848ce4a9afbd86245cab22
Gitweb: http://git.kernel.org/tip/7496d6458fe3219d63848ce4a9afbd86245cab22
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:44 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:33:09 -0700

Define kernel API to get address of each state in xsave area

In standard form, each state is saved in the xsave area in fixed offset.
But in compacted form, offset of each saved state only can be calculated during
run time because some xstates may not be enabled and saved.

We define kernel API get_xsave_addr() returns address of a given state saved in 
a xsave area.

It can be called in kernel to get address of each xstate in xsave area in
either standard format or compacted format.

It's useful when kernel wants to directly access each state in xsave area.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/xsave.h |  3 +++
 arch/x86/kernel/process.c|  1 +
 arch/x86/kernel/xsave.c  | 64 
 3 files changed, 68 insertions(+)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index aa3ff0c..1ba577c 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -255,4 +255,7 @@ static inline int xrestore_user(struct xsave_struct __user 
*buf, u64 mask)
return err;
 }
 
+void *get_xsave_addr(struct xsave_struct *xsave, int xstate);
+void setup_xstate_comp(void);
+
 #endif
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4505e2a..f804dc9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -93,6 +93,7 @@ void arch_task_cache_init(void)
kmem_cache_create("task_xstate", xstate_size,
  __alignof__(union thread_xstate),
  SLAB_PANIC | SLAB_NOTRACK, NULL);
+   setup_xstate_comp();
 }
 
 /*
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index f930f8a..a6cb823 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -482,6 +482,47 @@ static void __init setup_xstate_features(void)
 }
 
 /*
+ * This function sets up offsets and sizes of all extended states in
+ * xsave area. This supports both standard format and compacted format
+ * of the xsave aread.
+ *
+ * Input: void
+ * Output: void
+ */
+void setup_xstate_comp(void)
+{
+   int i;
+
+   xstate_comp_offsets = kmalloc(xstate_features * sizeof(int),
+ GFP_KERNEL);
+   xstate_comp_sizes = kmalloc(xstate_features * sizeof(int), GFP_KERNEL);
+
+   if (!cpu_has_xsaves) {
+   for (i = 2; i < xstate_features; i++) {
+   if (test_bit(i, (unsigned long *)&pcntxt_mask)) {
+   xstate_comp_offsets[i] = xstate_offsets[i];
+   xstate_comp_sizes[i] = xstate_sizes[i];
+   }
+   }
+   return;
+   }
+
+   xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+
+   for (i = 2; i < xstate_features; i++) {
+   if (test_bit(i, (unsigned long *)&pcntxt_mask))
+   xstate_comp_sizes[i] = xstate_sizes[i];
+   else
+   xstate_comp_sizes[i] = 0;
+
+   if (i > 2)
+   xstate_comp_offsets[i] = xstate_comp_offsets[i-1]
+   + xstate_comp_sizes[i-1];
+
+   }
+}
+
+/*
  * setup the xstate image representing the init state
  */
 static void __init setup_init_fpu_buf(void)
@@ -668,3 +709,26 @@ void eager_fpu_init(void)
else
fxrstor_checking(&init_xstate_buf->i387);
 }
+
+/*
+ * Given the xsave area and a state inside, this function returns the
+ * address of the state.
+ *
+ * This is the API that is called to get xstate address in either
+ * standard format or compacted format of xsave area.
+ *
+ * Inputs:
+ * xsave: base address of the xsave area;
+ * xstate: state which is defined in xsave.h (e.g. XSTATE_FP, XSTATE_SSE,
+ * etc.)
+ * Output:
+ * address of the state in the xsave area.
+ */
+void *get_xsave_addr(struct xsave_struct *xsave, int xstate)
+{
+   int feature = fls64(xstate) - 1;
+   if (!test_bit(feature, (unsigned long *)&pcntxt_mask))
+   return NULL;
+
+   return (void *)xsave + xstate_comp_offsets[feature];
+}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Enable xsaves/xrstors

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  7e7ce87f6ad4e1730364e5e76628b43c5759b700
Gitweb: http://git.kernel.org/tip/7e7ce87f6ad4e1730364e5e76628b43c5759b700
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:43 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:33:07 -0700

x86/xsaves: Enable xsaves/xrstors

If xsaves/xrstors is enabled, compacted format of xsave area will be used
and less memory may be used for context per process. And modified
optimization implemented in xsaves/xrstors improves performance of saving
xstate.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-16-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/xsave.c | 39 +--
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 8fa7c7d..f930f8a 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,7 +25,9 @@ u64 pcntxt_mask;
 struct xsave_struct *init_xstate_buf;
 
 static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32;
-static unsigned int *xstate_offsets, *xstate_sizes, xstate_features;
+static unsigned int *xstate_offsets, *xstate_sizes;
+static unsigned int *xstate_comp_offsets, *xstate_comp_sizes;
+static unsigned int xstate_features;
 
 /*
  * If a processor implementation discern that a processor state component is
@@ -283,7 +286,7 @@ sanitize_restored_xstate(struct task_struct *tsk,
 
if (use_xsave()) {
/* These bits must be zero. */
-   xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
+   memset(xsave_hdr->reserved, 0, 48);
 
/*
 * Init the state that is not present in the memory
@@ -526,6 +529,30 @@ static int __init eager_fpu_setup(char *s)
 }
 __setup("eagerfpu=", eager_fpu_setup);
 
+
+/*
+ * Calculate total size of enabled xstates in XCR0/pcntxt_mask.
+ */
+static void __init init_xstate_size(void)
+{
+   unsigned int eax, ebx, ecx, edx;
+   int i;
+
+   if (!cpu_has_xsaves) {
+   cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
+   xstate_size = ebx;
+   return;
+   }
+
+   xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+   for (i = 2; i < 64; i++) {
+   if (test_bit(i, (unsigned long *)&pcntxt_mask)) {
+   cpuid_count(XSTATE_CPUID, i, &eax, &ebx, &ecx, &edx);
+   xstate_size += eax;
+   }
+   }
+}
+
 /*
  * Enable and initialize the xsave feature.
  */
@@ -557,8 +584,7 @@ static void __init xstate_enable_boot_cpu(void)
/*
 * Recompute the context size for enabled features
 */
-   cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
-   xstate_size = ebx;
+   init_xstate_size();
 
update_regset_xstate_info(xstate_size, pcntxt_mask);
prepare_fx_sw_frame();
@@ -578,8 +604,9 @@ static void __init xstate_enable_boot_cpu(void)
}
}
 
-   pr_info("enabled xstate_bv 0x%llx, cntxt size 0x%x\n",
-   pcntxt_mask, xstate_size);
+   pr_info("enabled xstate_bv 0x%llx, cntxt size 0x%x using %s\n",
+   pcntxt_mask, xstate_size,
+   cpu_has_xsaves ? "compacted form" : "standard form");
 }
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Clear reserved bits in xsave header

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  21e726c4a3625a1038e97795b7aad97109ba7e19
Gitweb: http://git.kernel.org/tip/21e726c4a3625a1038e97795b7aad97109ba7e19
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:39 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:33:00 -0700

x86/xsaves: Clear reserved bits in xsave header

The reserved bits (128~511) in the xsave header must be zero according to
X86 SDM. Clear the bits in this patch.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-12-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/i387.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index d5dd808..a9a4229 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -375,7 +375,7 @@ int xstateregs_set(struct task_struct *target, const struct 
user_regset *regset,
/*
 * These bits must be zero.
 */
-   xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
+   memset(xsave_hdr->reserved, 0, 48);
 
return ret;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  47c2f292cc8669f70644a949cadd5fa5ee0e0e07
Gitweb: http://git.kernel.org/tip/47c2f292cc8669f70644a949cadd5fa5ee0e0e07
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:42 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:33:06 -0700

x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf

setup_init_fpu_buf() calls booting time xsaves and xrstors to save and restore
xstate in xsave area.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-15-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/xsave.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a4b451c..8fa7c7d 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -496,15 +496,21 @@ static void __init setup_init_fpu_buf(void)
 
setup_xstate_features();
 
+   if (cpu_has_xsaves) {
+   init_xstate_buf->xsave_hdr.xcomp_bv =
+   (u64)1 << 63 | pcntxt_mask;
+   init_xstate_buf->xsave_hdr.xstate_bv = pcntxt_mask;
+   }
+
/*
 * Init all the features state with header_bv being 0x0
 */
-   xrstor_state(init_xstate_buf, -1);
+   xrstor_state_booting(init_xstate_buf, -1);
/*
 * Dump the init state again. This is to identify the init state
 * of any feature which is not represented by all zero's.
 */
-   xsave_state(init_xstate_buf, -1);
+   xsave_state_booting(init_xstate_buf, -1);
 }
 
 static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Save xstate to task' s xsave area in __save_fpu during booting time

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  f41d830fa890044cb60f6bb39fc8f6493ffebb47
Gitweb: http://git.kernel.org/tip/f41d830fa890044cb60f6bb39fc8f6493ffebb47
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:41 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:33:04 -0700

x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time

__save_fpu() can be called during early booting time when cpu caps are not
enabled and alternative can not be used yet. Therefore, it calls
xsave_state_booting() during booting time to save xstate to task's xsave area.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-14-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/fpu-internal.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index cea1c76..6099c0e 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -508,9 +508,12 @@ static inline void user_fpu_begin(void)
 
 static inline void __save_fpu(struct task_struct *tsk)
 {
-   if (use_xsave())
-   xsave_state(&tsk->thread.fpu.state->xsave, -1);
-   else
+   if (use_xsave()) {
+   if (unlikely(system_state == SYSTEM_BOOTING))
+   xsave_state_booting(&tsk->thread.fpu.state->xsave, -1);
+   else
+   xsave_state(&tsk->thread.fpu.state->xsave, -1);
+   } else
fpu_fxsave(&tsk->thread.fpu);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Add xsaves and xrstors support for booting time

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  adb9d526e98268b647a74726346e1c40e6a37d2e
Gitweb: http://git.kernel.org/tip/adb9d526e98268b647a74726346e1c40e6a37d2e
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:40 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:33:02 -0700

x86/xsaves: Add xsaves and xrstors support for booting time

Since boot_cpu_data and cpu capabilities are not enabled yet during early
booting time, alternative can not be used in some functions to access xsave
area. Therefore, we define two new functions xrstor_state_booting() and
xsave_state_booting() to access xsave area just during early booting time.

xrstor_state_booting restores xstate from xsave area during early booting time.
xsave_state_booting saves xstate to xsave area during early booting time.

The two functions are similar to xrstor_state and xsave_state respectively.
But the two functions don't use alternatives because alternatives are not
enabled when they are called in such early booting time.

xrstor_state_booting is called only by functions defined as __init. So it's
defined as __init and will be removed from memory after booting time. There
is no extra memory cost caused by this function during running time.

But because xsave_state_booting can be called by run-time function __save_fpu(),
it's not defined as __init and will stay in memory during running time although
it will not be called anymore during running time. It is not ideal to
have this function stay in memory during running time. But it's a pretty small
function and the memory cost will be small. By doing in this way, we can
avoid to change a lot of code to just remove this small function and save a
bit memory for running time.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-13-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/xsave.h | 60 
 1 file changed, 60 insertions(+)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 0d15231..aa3ff0c 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -66,6 +66,66 @@ extern int init_fpu(struct task_struct *child);
: [err] "=r" (err)
 
 /*
+ * This function is called only during boot time when x86 caps are not set
+ * up and alternative can not be used yet.
+ */
+static int xsave_state_booting(struct xsave_struct *fx, u64 mask)
+{
+   u32 lmask = mask;
+   u32 hmask = mask >> 32;
+   int err = 0;
+
+   WARN_ON(system_state != SYSTEM_BOOTING);
+
+   if (boot_cpu_has(X86_FEATURE_XSAVES))
+   asm volatile("1:"XSAVES"\n\t"
+   "2:\n\t"
+   : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+   :   "memory");
+   else
+   asm volatile("1:"XSAVE"\n\t"
+   "2:\n\t"
+   : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+   :   "memory");
+
+   asm volatile(xstate_fault
+: "0" (0)
+: "memory");
+
+   return err;
+}
+
+/*
+ * This function is called only during boot time when x86 caps are not set
+ * up and alternative can not be used yet.
+ */
+static inline int xrstor_state_booting(struct xsave_struct *fx, u64 mask)
+{
+   u32 lmask = mask;
+   u32 hmask = mask >> 32;
+   int err = 0;
+
+   WARN_ON(system_state != SYSTEM_BOOTING);
+
+   if (boot_cpu_has(X86_FEATURE_XSAVES))
+   asm volatile("1:"XRSTORS"\n\t"
+   "2:\n\t"
+   : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+   :   "memory");
+   else
+   asm volatile("1:"XRSTOR"\n\t"
+   "2:\n\t"
+   : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+   :   "memory");
+
+   asm volatile(xstate_fault
+: "0" (0)
+: "memory");
+
+   return err;
+}
+
+/*
  * Save processor xstate to xsave area.
  */
 static inline int xsave_state(struct xsave_struct *fx, u64 mask)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Use xsave/ xrstor for saving and restoring user space context

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  facbf4d91ae64f84ef93a00e4037135cd9f4b2ab
Gitweb: http://git.kernel.org/tip/facbf4d91ae64f84ef93a00e4037135cd9f4b2ab
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:38 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:32:57 -0700

x86/xsaves: Use xsave/xrstor for saving and restoring user space context

We use legacy xsave/xrstor to save and restore standard form of xsave area
in user space context. No xsaveopt or xsaves is used here for two reasons.

First, we don't want to use modified optimization which is implemented in
xsaveopt and xsaves because xrstor/xrstors might track a wrong user space
application.

Secondly, we don't use compacted format of xsave area for backward
compatibility because legacy user space applications only don't understand
the compacted format of the xsave area.

Using standard form of the xsave area may allocate more memory for
user context than compacted form, but preserves compatibility with
legacy applications.  Furthermore, even with holes, the relevant cache
lines don't get touched and thus the performance impact is limited.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-11-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/xsave.h | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 8b75824..0d15231 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -145,6 +145,16 @@ static inline int fpu_xrstor_checking(struct xsave_struct 
*fx)
return xrstor_state(fx, -1);
 }
 
+/*
+ * Save xstate to user space xsave area.
+ *
+ * We don't use modified optimization because xrstor/xrstors might track
+ * a different application.
+ *
+ * We don't use compacted format xsave area for
+ * backward compatibility for old applications which don't understand
+ * compacted format of xsave area.
+ */
 static inline int xsave_user(struct xsave_struct __user *buf)
 {
int err;
@@ -158,35 +168,28 @@ static inline int xsave_user(struct xsave_struct __user 
*buf)
return -EFAULT;
 
__asm__ __volatile__(ASM_STAC "\n"
-"1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
+"1:"XSAVE"\n"
 "2: " ASM_CLAC "\n"
-".section .fixup,\"ax\"\n"
-"3:  movl $-1,%[err]\n"
-"jmp  2b\n"
-".previous\n"
-_ASM_EXTABLE(1b,3b)
-: [err] "=r" (err)
+xstate_fault
 : "D" (buf), "a" (-1), "d" (-1), "0" (0)
 : "memory");
return err;
 }
 
+/*
+ * Restore xstate from user space xsave area.
+ */
 static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask)
 {
-   int err;
+   int err = 0;
struct xsave_struct *xstate = ((__force struct xsave_struct *)buf);
u32 lmask = mask;
u32 hmask = mask >> 32;
 
__asm__ __volatile__(ASM_STAC "\n"
-"1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n"
+"1:"XRSTOR"\n"
 "2: " ASM_CLAC "\n"
-".section .fixup,\"ax\"\n"
-"3:  movl $-1,%[err]\n"
-"jmp  2b\n"
-".previous\n"
-_ASM_EXTABLE(1b,3b)
-: [err] "=r" (err)
+xstate_fault
 : "D" (xstate), "a" (lmask), "d" (hmask), "0" (0)
 : "memory");   /* memory required? */
return err;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Use xsaves/ xrstors to save and restore xsave area

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  f31a9f7c71691569359fa7fb8b0acaa44bce0324
Gitweb: http://git.kernel.org/tip/f31a9f7c71691569359fa7fb8b0acaa44bce0324
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:36 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:31:21 -0700

x86/xsaves: Use xsaves/xrstors to save and restore xsave area

If xsaves is eanbled, use xsaves/xrstors instrucitons to save and restore
xstate. xsaves and xrstors support compacted format, init optimization,
modified optimization, and supervisor states.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-9-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/xsave.h | 84 +---
 1 file changed, 64 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 76c2459..f9177a2 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -65,6 +65,70 @@ extern int init_fpu(struct task_struct *child);
_ASM_EXTABLE(1b, 3b)\
: [err] "=r" (err)
 
+/*
+ * Save processor xstate to xsave area.
+ */
+static inline int xsave_state(struct xsave_struct *fx, u64 mask)
+{
+   u32 lmask = mask;
+   u32 hmask = mask >> 32;
+   int err = 0;
+
+   /*
+* If xsaves is enabled, xsaves replaces xsaveopt because
+* it supports compact format and supervisor states in addition to
+* modified optimization in xsaveopt.
+*
+* Otherwise, if xsaveopt is enabled, xsaveopt replaces xsave
+* because xsaveopt supports modified optimization which is not
+* supported by xsave.
+*
+* If none of xsaves and xsaveopt is enabled, use xsave.
+*/
+   alternative_input_2(
+   "1:"XSAVE,
+   "1:"XSAVEOPT,
+   X86_FEATURE_XSAVEOPT,
+   "1:"XSAVES,
+   X86_FEATURE_XSAVES,
+   [fx] "D" (fx), "a" (lmask), "d" (hmask) :
+   "memory");
+   asm volatile("2:\n\t"
+xstate_fault
+: "0" (0)
+: "memory");
+
+   return err;
+}
+
+/*
+ * Restore processor xstate from xsave area.
+ */
+static inline int xrstor_state(struct xsave_struct *fx, u64 mask)
+{
+   int err = 0;
+   u32 lmask = mask;
+   u32 hmask = mask >> 32;
+
+   /*
+* Use xrstors to restore context if it is enabled. xrstors supports
+* compacted format of xsave area which is not supported by xrstor.
+*/
+   alternative_input(
+   "1: " XRSTOR,
+   "1: " XRSTORS,
+   X86_FEATURE_XSAVES,
+   "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+   : "memory");
+
+   asm volatile("2:\n"
+xstate_fault
+: "0" (0)
+: "memory");
+
+   return err;
+}
+
 static inline int fpu_xrstor_checking(struct xsave_struct *fx)
 {
int err;
@@ -130,26 +194,6 @@ static inline int xrestore_user(struct xsave_struct __user 
*buf, u64 mask)
return err;
 }
 
-static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
-{
-   u32 lmask = mask;
-   u32 hmask = mask >> 32;
-
-   asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
-: : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-:   "memory");
-}
-
-static inline void xsave_state(struct xsave_struct *fx, u64 mask)
-{
-   u32 lmask = mask;
-   u32 hmask = mask >> 32;
-
-   asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x27\n\t"
-: : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-:   "memory");
-}
-
 static inline void fpu_xsave(struct fpu *fpu)
 {
/* This, however, we can work around by forcing the compiler to select
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Use xsaves/xrstors for context switch

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  f9de314b340f4816671f037e79ed01f685ac9787
Gitweb: http://git.kernel.org/tip/f9de314b340f4816671f037e79ed01f685ac9787
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:37 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:31:25 -0700

x86/xsaves: Use xsaves/xrstors for context switch

If xsaves is eanbled, use xsaves/xrstors for context switch to support
compacted format xsave area to occupy less memory and modified optimization
to improve saving performance.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-10-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/xsave.h | 37 -
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index f9177a2..8b75824 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -129,22 +129,20 @@ static inline int xrstor_state(struct xsave_struct *fx, 
u64 mask)
return err;
 }
 
-static inline int fpu_xrstor_checking(struct xsave_struct *fx)
+/*
+ * Save xstate context for old process during context switch.
+ */
+static inline void fpu_xsave(struct fpu *fpu)
 {
-   int err;
-
-   asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
-"2:\n"
-".section .fixup,\"ax\"\n"
-"3:  movl $-1,%[err]\n"
-"jmp  2b\n"
-".previous\n"
-_ASM_EXTABLE(1b, 3b)
-: [err] "=r" (err)
-: "D" (fx), "m" (*fx), "a" (-1), "d" (-1), "0" (0)
-: "memory");
+   xsave_state(&fpu->state->xsave, -1);
+}
 
-   return err;
+/*
+ * Restore xstate context for new process during context switch.
+ */
+static inline int fpu_xrstor_checking(struct xsave_struct *fx)
+{
+   return xrstor_state(fx, -1);
 }
 
 static inline int xsave_user(struct xsave_struct __user *buf)
@@ -194,15 +192,4 @@ static inline int xrestore_user(struct xsave_struct __user 
*buf, u64 mask)
return err;
 }
 
-static inline void fpu_xsave(struct fpu *fpu)
-{
-   /* This, however, we can work around by forcing the compiler to select
-  an addressing mode that doesn't require extended registers. */
-   alternative_input(
-   ".byte " REX_PREFIX "0x0f,0xae,0x27",
-   ".byte " REX_PREFIX "0x0f,0xae,0x37",
-   X86_FEATURE_XSAVEOPT,
-   [fx] "D" (&fpu->state->xsave), "a" (-1), "d" (-1) :
-   "memory");
-}
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Define a macro for handling xsave/ xrstor instruction fault

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  b84e70552e5aad71a1c14536e6ffcfe7934b73e4
Gitweb: http://git.kernel.org/tip/b84e70552e5aad71a1c14536e6ffcfe7934b73e4
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:35 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:31:18 -0700

x86/xsaves: Define a macro for handling xsave/xrstor instruction fault

Define a macro to handle fault generated by xsave, xsaveopt, xsaves, xrstor,
and xrstors instructions. It is used in functions like xsave_state() etc.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-8-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/xsave.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 71bdde4..76c2459 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -58,6 +58,13 @@ extern int init_fpu(struct task_struct *child);
 #define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f"
 #define XRSTORS".byte " REX_PREFIX "0x0f,0xc7,0x1f"
 
+#define xstate_fault   ".section .fixup,\"ax\"\n"  \
+   "3:  movl $-1,%[err]\n" \
+   "jmp  2b\n" \
+   ".previous\n"   \
+   _ASM_EXTABLE(1b, 3b)\
+   : [err] "=r" (err)
+
 static inline int fpu_xrstor_checking(struct xsave_struct *fx)
 {
int err;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Change compacted format xsave area header

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  0b29643a58439dc9a8b0c0cacad0e7cb608c8199
Gitweb: http://git.kernel.org/tip/0b29643a58439dc9a8b0c0cacad0e7cb608c8199
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:33 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:31:10 -0700

x86/xsaves: Change compacted format xsave area header

The XSAVE area header is changed to support both compacted format and
standard format of xsave area.

The XSAVE header of an xsave area comprises the 64 bytes starting at offset
512 from the area base address:

- Bytes 7:0 of the xsave header is a state-component bitmap called
  xstate_bv. It identifies the state components in the xsave area.

- Bytes 15:8 of the xsave header is a state-component bitmap called
  xcomp_bv. It is used as follows:
  - xcomp_bv[63] indicates the format of the extended region of
the xsave area. If it is clear, the standard format is used.
If it is set, the compacted format is used.
  - xcomp_bv[62:0] indicate which features (starting at feature 2)
have space allocated for them in the compacted format.

- Bytes 63:16 of the xsave header are reserved.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-6-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/processor.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..2c8d3b8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -386,8 +386,8 @@ struct bndcsr_struct {
 
 struct xsave_hdr_struct {
u64 xstate_bv;
-   u64 reserved1[2];
-   u64 reserved2[5];
+   u64 xcomp_bv;
+   u64 reserved[6];
 } __attribute__((packed));
 
 struct xsave_struct {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Define macros for xsave instructions

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  200b08a970b2ae764b670a326088ab8bc0a989cc
Gitweb: http://git.kernel.org/tip/200b08a970b2ae764b670a326088ab8bc0a989cc
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:34 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:31:16 -0700

x86/xsaves: Define macros for xsave instructions

Define macros for xsave, xsaveopt, xsaves, xrstor, and xrstors inline
instructions. The instructions will be used for saving and restoring xstate.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/xsave.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index d949ef2..71bdde4 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -52,6 +52,12 @@ extern void xsave_init(void);
 extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
 extern int init_fpu(struct task_struct *child);
 
+#define XSAVE  ".byte " REX_PREFIX "0x0f,0xae,0x27"
+#define XSAVEOPT   ".byte " REX_PREFIX "0x0f,0xae,0x37"
+#define XSAVES ".byte " REX_PREFIX "0x0f,0xc7,0x2f"
+#define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f"
+#define XRSTORS".byte " REX_PREFIX "0x0f,0xc7,0x1f"
+
 static inline int fpu_xrstor_checking(struct xsave_struct *fx)
 {
int err;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/xsaves: Detect xsaves/xrstors feature

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  6229ad278ca74acdbc8bd3a3d469322a3de91039
Gitweb: http://git.kernel.org/tip/6229ad278ca74acdbc8bd3a3d469322a3de91039
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:30 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:24:28 -0700

x86/xsaves: Detect xsaves/xrstors feature

Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended
state enumberation sub-leaf (eax=0x0d, ecx=1):
Bit 00: XSAVEOPT is available
Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set
Bit 02: Supports XGETBV with ECX = 1 if set
Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set

The above features are defined in the new word 10 in cpu features.

The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies
the state components that software has enabled xsaves and xrstors to manage.
If the bit corresponding to a state component is clear in XCR0 | IA32_XSS,
xsaves and xrstors will not operate on that state component, regardless of
the value of the instruction mask.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/cpufeature.h | 10 --
 arch/x86/include/uapi/asm/msr-index.h |  2 ++
 arch/x86/kernel/cpu/common.c  |  9 +
 arch/x86/kernel/cpu/scattered.c   |  1 -
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 2837b92..b82f951 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -8,7 +8,7 @@
 #include 
 #endif
 
-#define NCAPINTS   10  /* N 32-bit words worth of info */
+#define NCAPINTS   11  /* N 32-bit words worth of info */
 #define NBUGINTS   1   /* N 32-bit bug flags */
 
 /*
@@ -180,7 +180,6 @@
 #define X86_FEATURE_ARAT   ( 7*32+ 1) /* Always Running APIC Timer */
 #define X86_FEATURE_CPB( 7*32+ 2) /* AMD Core Performance 
Boost */
 #define X86_FEATURE_EPB( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS 
support */
-#define X86_FEATURE_XSAVEOPT   ( 7*32+ 4) /* Optimized Xsave */
 #define X86_FEATURE_PLN( 7*32+ 5) /* Intel Power Limit 
Notification */
 #define X86_FEATURE_PTS( 7*32+ 6) /* Intel Package Thermal 
Status */
 #define X86_FEATURE_DTHERM ( 7*32+ 7) /* Digital Thermal Sensor */
@@ -226,6 +225,12 @@
 #define X86_FEATURE_AVX512ER   ( 9*32+27) /* AVX-512 Exponential and 
Reciprocal */
 #define X86_FEATURE_AVX512CD   ( 9*32+28) /* AVX-512 Conflict Detection */
 
+/* Extended state features, CPUID level 0x000d:1 (eax), word 10 */
+#define X86_FEATURE_XSAVEOPT   (10*32+ 0) /* XSAVEOPT */
+#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */
+#define X86_FEATURE_XGETBV1(10*32+ 2) /* XGETBV with ECX = 1 */
+#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */
+
 /*
  * BUG word(s)
  */
@@ -328,6 +333,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_x2apic boot_cpu_has(X86_FEATURE_X2APIC)
 #define cpu_has_xsave  boot_cpu_has(X86_FEATURE_XSAVE)
 #define cpu_has_xsaveopt   boot_cpu_has(X86_FEATURE_XSAVEOPT)
+#define cpu_has_xsaves boot_cpu_has(X86_FEATURE_XSAVES)
 #define cpu_has_osxsaveboot_cpu_has(X86_FEATURE_OSXSAVE)
 #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq  boot_cpu_has(X86_FEATURE_PCLMULQDQ)
diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index fcf2b3a..5cd1569 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -297,6 +297,8 @@
 #define MSR_IA32_TSC_ADJUST 0x003b
 #define MSR_IA32_BNDCFGS   0x0d90
 
+#define MSR_IA32_XSS   0x0da0
+
 #define FEATURE_CONTROL_LOCKED (1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX   (1<<1)
 #define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX  (1<<2)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a135239..e7c4b97 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -632,6 +632,15 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_capability[9] = ebx;
}
 
+   /* Extended state features: level 0x000d */
+   if (c->cpuid_level >= 0x000d) {
+   u32 eax, ebx, ecx, edx;
+
+   cpuid_count(0x000d, 1, &eax, &ebx, &ecx, &edx);
+
+   c->x86_capability[10] = eax;
+   }
+
/* AMD-defined flags: level 0x8001 */
xlvl = cpuid_eax(0x8000);
c->extended_cpuid_level = xlvl;
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index b6f794a..4a8013d 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -38,7 +38,6 @@ void ini

[tip:x86/xsave] x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  b6f42a4a3c886bd18baf319d433a841ac9942c02
Gitweb: http://git.kernel.org/tip/b6f42a4a3c886bd18baf319d433a841ac9942c02
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:31 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:24:52 -0700

x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors

This patch adds a kernel parameter noxsaves to disable xsaves/xrstors feature.
The kernel will fall back to use xsaveopt and xrstor to save and restor
xstates. By using this parameter, xsave area occupies more memory because
standard form of xsave area in xsaveopt/xrstor occupies more memory than
compacted form of xsave area.

This patch adds a description of the kernel parameter noxsaveopt in doc.
The code to support the parameter noxsaveopt has been in the kernel before.
This patch just adds the description of this parameter in the doc.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-4-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 Documentation/kernel-parameters.txt | 15 +++
 arch/x86/kernel/cpu/common.c|  8 
 2 files changed, 23 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 30a8ad0d..0ebd952 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2124,6 +2124,21 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
 
+   noxsaveopt  [X86] Disables xsaveopt used in saving x86 extended
+   register states. The kernel will fall back to use
+   xsave to save the states. By using this parameter,
+   performance of saving the states is degraded because
+   xsave doesn't support modified optimization while
+   xsaveopt supports it on xsaveopt enabled systems.
+
+   noxsaves[X86] Disables xsaves and xrstors used in saving and
+   restoring x86 extended register state in compacted
+   form of xsave area. The kernel will fall back to use
+   xsaveopt and xrstor to save and restore the states
+   in standard form of xsave area. By using this
+   parameter, xsave area per process might occupy more
+   memory on xsaves enabled systems.
+
eagerfpu=   [X86]
on  enable eager fpu restore
off disable eager fpu restore
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e7c4b97..cdc9585 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -146,6 +146,7 @@ static int __init x86_xsave_setup(char *s)
 {
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT);
+   setup_clear_cpu_cap(X86_FEATURE_XSAVES);
setup_clear_cpu_cap(X86_FEATURE_AVX);
setup_clear_cpu_cap(X86_FEATURE_AVX2);
return 1;
@@ -159,6 +160,13 @@ static int __init x86_xsaveopt_setup(char *s)
 }
 __setup("noxsaveopt", x86_xsaveopt_setup);
 
+static int __init x86_xsaves_setup(char *s)
+{
+   setup_clear_cpu_cap(X86_FEATURE_XSAVES);
+   return 1;
+}
+__setup("noxsaves", x86_xsaves_setup);
+
 #ifdef CONFIG_X86_32
 static int cachesize_override = -1;
 static int disable_x86_serial_nr = 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/xsave] x86/alternative: Add alternative_input_2 to support alternative with two features and input

2014-05-30 Thread tip-bot for Fenghua Yu

Commit-ID:  5b3e83f46a2a7e8625258dbf84a26e7f4032bfa8
Gitweb: http://git.kernel.org/tip/5b3e83f46a2a7e8625258dbf84a26e7f4032bfa8
Author: Fenghua Yu 
AuthorDate: Thu, 29 May 2014 11:12:32 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 29 May 2014 14:24:53 -0700

x86/alternative: Add alternative_input_2 to support alternative with two 
features and input

alternative_input_2() replaces old instruction with new instructions with
input based on two features.

In alternative_input_2(oldinstr, newinstr1, feature1, newinstr2, feature2,
input...),

feature2 has higher priority to replace oldinstr than feature1.

If CPU has feature2, newinstr2 replaces oldinstr and newinstr2 is
executed during run time.

If CPU doesn't have feature2, but it has feature1, newinstr1 replaces oldinstr
and newinstr1 is executed during run time.

If CPU doesn't have feature2 and feature1, oldinstr is executed during run
time.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1401387164-43416-5-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/alternative.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 0a3f9c9..473bdbe 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -161,6 +161,20 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
asm volatile (ALTERNATIVE(oldinstr, newinstr, feature)  \
: : "i" (0), ## input)
 
+/*
+ * This is similar to alternative_input. But it has two features and
+ * respective instructions.
+ *
+ * If CPU has feature2, newinstr2 is used.
+ * Otherwise, if CPU has feature1, newinstr1 is used.
+ * Otherwise, oldinstr is used.
+ */
+#define alternative_input_2(oldinstr, newinstr1, feature1, newinstr2,   \
+  feature2, input...)   \
+   asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1,\
+   newinstr2, feature2) \
+   : : "i" (0), ## input)
+
 /* Like alternative_input, but with a single output argument */
 #define alternative_io(oldinstr, newinstr, feature, output, input...)  \
asm volatile (ALTERNATIVE(oldinstr, newinstr, feature)  \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cpufeature] x86, AVX-512: Enable AVX-512 States Context Switch

2014-02-20 Thread tip-bot for Fenghua Yu

Commit-ID:  c2bc11f10a39527cd1bb252097b5525664560956
Gitweb: http://git.kernel.org/tip/c2bc11f10a39527cd1bb252097b5525664560956
Author: Fenghua Yu 
AuthorDate: Thu, 20 Feb 2014 13:24:51 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 20 Feb 2014 13:56:55 -0800

x86, AVX-512: Enable AVX-512 States Context Switch

This patch enables Opmask, ZMM_Hi256, and Hi16_ZMM AVX-512 states for
xstate context switch.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1392931491-33237-2-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
Cc:  # hw enabling
---
 arch/x86/include/asm/xsave.h | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 5547389..6c1d741 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -6,11 +6,14 @@
 
 #define XSTATE_CPUID   0x000d
 
-#define XSTATE_FP  0x1
-#define XSTATE_SSE 0x2
-#define XSTATE_YMM 0x4
-#define XSTATE_BNDREGS 0x8
-#define XSTATE_BNDCSR  0x10
+#define XSTATE_FP  0x1
+#define XSTATE_SSE 0x2
+#define XSTATE_YMM 0x4
+#define XSTATE_BNDREGS 0x8
+#define XSTATE_BNDCSR  0x10
+#define XSTATE_OPMASK  0x20
+#define XSTATE_ZMM_Hi256   0x40
+#define XSTATE_Hi16_ZMM0x80
 
 #define XSTATE_FPSSE   (XSTATE_FP | XSTATE_SSE)
 
@@ -23,7 +26,8 @@
 #define XSAVE_YMM_OFFSET(XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET)
 
 /* Supported features which support lazy state saving */
-#define XSTATE_LAZY(XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XSTATE_LAZY(XSTATE_FP | XSTATE_SSE | XSTATE_YMM  \
+   | XSTATE_OPMASK | XSTATE_ZMM_Hi256 | XSTATE_Hi16_ZMM)
 
 /* Supported features which require eager state saving */
 #define XSTATE_EAGER   (XSTATE_BNDREGS | XSTATE_BNDCSR)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cpufeature] x86, AVX-512: AVX-512 Feature Detection

2014-02-20 Thread tip-bot for Fenghua Yu

Commit-ID:  8e5780fdeef7dc490b3f0b3a62704593721fa4f3
Gitweb: http://git.kernel.org/tip/8e5780fdeef7dc490b3f0b3a62704593721fa4f3
Author: Fenghua Yu 
AuthorDate: Thu, 20 Feb 2014 13:24:50 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 20 Feb 2014 13:56:55 -0800

x86, AVX-512: AVX-512 Feature Detection

AVX-512 is an extention of AVX2. Its spec can be found at:
http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf

This patch detects AVX-512 features by CPUID.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1392931491-33237-1-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
Cc:  # hw enabling
---
 arch/x86/include/asm/cpufeature.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index e099f95..5f12968 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -217,9 +217,13 @@
 #define X86_FEATURE_INVPCID(9*32+10) /* Invalidate Processor Context ID */
 #define X86_FEATURE_RTM(9*32+11) /* Restricted Transactional 
Memory */
 #define X86_FEATURE_MPX(9*32+14) /* Memory Protection 
Extension */
+#define X86_FEATURE_AVX512F(9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_RDSEED (9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX(9*32+19) /* The ADCX and ADOX 
instructions */
 #define X86_FEATURE_SMAP   (9*32+20) /* Supervisor Mode Access Prevention 
*/
+#define X86_FEATURE_AVX512PF   (9*32+26) /* AVX-512 Prefetch */
+#define X86_FEATURE_AVX512ER   (9*32+27) /* AVX-512 Exponential and Reciprocal 
*/
+#define X86_FEATURE_AVX512CD   (9*32+28) /* AVX-512 Conflict Detection */
 
 /*
  * BUG word(s)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/urgent] x86/apic, doc: Justification for disabling IO APIC before Local APIC

2013-12-04 Thread tip-bot for Fenghua Yu

Commit-ID:  2885432aaf15c1b7e65c787bfe7c5fec428296f0
Gitweb: http://git.kernel.org/tip/2885432aaf15c1b7e65c787bfe7c5fec428296f0
Author: Fenghua Yu 
AuthorDate: Wed, 4 Dec 2013 16:07:49 -0800
Committer:  H. Peter Anvin 
CommitDate: Wed, 4 Dec 2013 19:33:21 -0800

x86/apic, doc: Justification for disabling IO APIC before Local APIC

Since erratum AVR31 in "Intel Atom Processor C2000 Product Family
Specification Update" is now published, I added a justification
comment for disabling IO APIC before Local APIC, as changed in commit:

522e66464467 x86/apic: Disable I/O APIC before shutdown of the local APIC

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1386202069-51515-1-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/reboot.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index da3c599..c752cb4 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -558,6 +558,17 @@ void native_machine_shutdown(void)
 {
/* Stop the cpus and apics */
 #ifdef CONFIG_X86_IO_APIC
+   /*
+* Disabling IO APIC before local APIC is a workaround for
+* erratum AVR31 in "Intel Atom Processor C2000 Product Family
+* Specification Update". In this situation, interrupts that target
+* a Logical Processor whose Local APIC is either in the process of
+* being hardware disabled or software disabled are neither delivered
+* nor discarded. When this erratum occurs, the processor may hang.
+*
+* Even without the erratum, it still makes sense to quiet IO APIC
+* before disabling Local APIC.
+*/
disable_IO_APIC();
 #endif
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/asm] x86-64, copy_user: Remove zero byte check before copy user buffer.

2013-11-16 Thread tip-bot for Fenghua Yu

Commit-ID:  f4cb1cc18f364d761d5614eb62936647f259
Gitweb: http://git.kernel.org/tip/f4cb1cc18f364d761d5614eb62936647f259
Author: Fenghua Yu 
AuthorDate: Sat, 16 Nov 2013 12:37:01 -0800
Committer:  H. Peter Anvin 
CommitDate: Sat, 16 Nov 2013 18:00:58 -0800

x86-64, copy_user: Remove zero byte check before copy user buffer.

Operation of rep movsb instruction handles zero byte copy. As pointed out by
Linus, there is no need to check zero size in kernel. Removing this redundant
check saves a few cycles in copy user functions.

Reported-by: Linus Torvalds 
Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1384634221-6006-1-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/lib/copy_user_64.S | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index a30ca15..ffe4eb9 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -236,8 +236,6 @@ ENDPROC(copy_user_generic_unrolled)
 ENTRY(copy_user_generic_string)
CFI_STARTPROC
ASM_STAC
-   andl %edx,%edx
-   jz 4f
cmpl $8,%edx
jb 2f   /* less than 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -249,7 +247,7 @@ ENTRY(copy_user_generic_string)
 2: movl %edx,%ecx
 3: rep
movsb
-4: xorl %eax,%eax
+   xorl %eax,%eax
ASM_CLAC
ret
 
@@ -279,12 +277,10 @@ ENDPROC(copy_user_generic_string)
 ENTRY(copy_user_enhanced_fast_string)
CFI_STARTPROC
ASM_STAC
-   andl %edx,%edx
-   jz 2f
movl %edx,%ecx
 1: rep
movsb
-2: xorl %eax,%eax
+   xorl %eax,%eax
ASM_CLAC
ret
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/apic] x86/apic: Disable I/ O APIC before shutdown of the local APIC

2013-11-07 Thread tip-bot for Fenghua Yu

Commit-ID:  522e66464467543c0d88d023336eec4df03ad40b
Gitweb: http://git.kernel.org/tip/522e66464467543c0d88d023336eec4df03ad40b
Author: Fenghua Yu 
AuthorDate: Wed, 23 Oct 2013 18:30:12 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 7 Nov 2013 10:12:37 +0100

x86/apic: Disable I/O APIC before shutdown of the local APIC

In reboot and crash path, when we shut down the local APIC, the I/O APIC is
still active. This may cause issues because external interrupts
can still come in and disturb the local APIC during shutdown process.

To quiet external interrupts, disable I/O APIC before shutdown local APIC.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua...@intel.com
Cc: 
[ I suppose the 'issue' is a hang during shutdown. It's a fine change 
nevertheless. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/crash.c  | 2 +-
 arch/x86/kernel/reboot.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e0e0841..18677a9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -127,12 +127,12 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
cpu_emergency_vmxoff();
cpu_emergency_svm_disable();
 
-   lapic_shutdown();
 #ifdef CONFIG_X86_IO_APIC
/* Prevent crash_kexec() from deadlocking on ioapic_lock. */
ioapic_zap_locks();
disable_IO_APIC();
 #endif
+   lapic_shutdown();
 #ifdef CONFIG_HPET_TIMER
hpet_disable();
 #endif
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 7e920bf..618ce26 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -550,6 +550,10 @@ static void native_machine_emergency_restart(void)
 void native_machine_shutdown(void)
 {
/* Stop the cpus and apics */
+#ifdef CONFIG_X86_IO_APIC
+   disable_IO_APIC();
+#endif
+
 #ifdef CONFIG_SMP
/*
 * Stop all of the others. Also disable the local irq to
@@ -562,10 +566,6 @@ void native_machine_shutdown(void)
 
lapic_shutdown();
 
-#ifdef CONFIG_X86_IO_APIC
-   disable_IO_APIC();
-#endif
-
 #ifdef CONFIG_HPET_TIMER
hpet_disable();
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/urgent] x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL

2013-03-19 Thread tip-bot for Fenghua Yu

Commit-ID:  c83a9d5e425d4678b05ca058fec6254f18601474
Gitweb: http://git.kernel.org/tip/c83a9d5e425d4678b05ca058fec6254f18601474
Author: Fenghua Yu 
AuthorDate: Tue, 19 Mar 2013 08:04:44 -0700
Committer:  H. Peter Anvin 
CommitDate: Tue, 19 Mar 2013 19:51:08 -0700

x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL

In 32-bit, __pa_symbol() in CONFIG_DEBUG_VIRTUAL accesses kernel data
(e.g.  max_low_pfn) that not only hasn't been setup yet in such early
boot phase, but since we are in linear mode, cannot even be detected
as uninitialized.

Thus, use __pa_nodebug() rather than __pa_symbol() to get a global
symbol's physical address.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1363705484-27645-1-git-send-email-fenghua...@intel.com
Reported-and-tested-by: Tetsuo Handa 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/microcode_intel_early.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel_early.c 
b/arch/x86/kernel/microcode_intel_early.c
index 7890bc8..5992ee8 100644
--- a/arch/x86/kernel/microcode_intel_early.c
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -90,13 +90,13 @@ microcode_phys(struct microcode_intel **mc_saved_tmp,
struct microcode_intel ***mc_saved;
 
mc_saved = (struct microcode_intel ***)
-  __pa_symbol(&mc_saved_data->mc_saved);
+  __pa_nodebug(&mc_saved_data->mc_saved);
for (i = 0; i < mc_saved_data->mc_saved_count; i++) {
struct microcode_intel *p;
 
p = *(struct microcode_intel **)
-   __pa(mc_saved_data->mc_saved + i);
-   mc_saved_tmp[i] = (struct microcode_intel *)__pa(p);
+   __pa_nodebug(mc_saved_data->mc_saved + i);
+   mc_saved_tmp[i] = (struct microcode_intel *)__pa_nodebug(p);
}
 }
 #endif
@@ -562,7 +562,7 @@ scan_microcode(unsigned long start, unsigned long end,
struct cpio_data cd;
long offset = 0;
 #ifdef CONFIG_X86_32
-   char *p = (char *)__pa_symbol(ucode_name);
+   char *p = (char *)__pa_nodebug(ucode_name);
 #else
char *p = ucode_name;
 #endif
@@ -630,8 +630,8 @@ static void __cpuinit print_ucode(struct ucode_cpu_info 
*uci)
if (mc_intel == NULL)
return;
 
-   delay_ucode_info_p = (int *)__pa_symbol(&delay_ucode_info);
-   current_mc_date_p = (int *)__pa_symbol(¤t_mc_date);
+   delay_ucode_info_p = (int *)__pa_nodebug(&delay_ucode_info);
+   current_mc_date_p = (int *)__pa_nodebug(¤t_mc_date);
 
*delay_ucode_info_p = 1;
*current_mc_date_p = mc_intel->hdr.date;
@@ -741,15 +741,15 @@ load_ucode_intel_bsp(void)
 #ifdef CONFIG_X86_32
struct boot_params *boot_params_p;
 
-   boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
+   boot_params_p = (struct boot_params *)__pa_nodebug(&boot_params);
ramdisk_image = boot_params_p->hdr.ramdisk_image;
ramdisk_size  = boot_params_p->hdr.ramdisk_size;
initrd_start_early = ramdisk_image;
initrd_end_early = initrd_start_early + ramdisk_size;
 
_load_ucode_intel_bsp(
-   (struct mc_saved_data *)__pa_symbol(&mc_saved_data),
-   (unsigned long *)__pa_symbol(&mc_saved_in_initrd),
+   (struct mc_saved_data *)__pa_nodebug(&mc_saved_data),
+   (unsigned long *)__pa_nodebug(&mc_saved_in_initrd),
initrd_start_early, initrd_end_early, &uci);
 #else
ramdisk_image = boot_params.hdr.ramdisk_image;
@@ -772,10 +772,10 @@ void __cpuinit load_ucode_intel_ap(void)
unsigned long *initrd_start_p;
 
mc_saved_in_initrd_p =
-   (unsigned long *)__pa_symbol(mc_saved_in_initrd);
-   mc_saved_data_p = (struct mc_saved_data *)__pa_symbol(&mc_saved_data);
-   initrd_start_p = (unsigned long *)__pa_symbol(&initrd_start);
-   initrd_start_addr = (unsigned long)__pa_symbol(*initrd_start_p);
+   (unsigned long *)__pa_nodebug(mc_saved_in_initrd);
+   mc_saved_data_p = (struct mc_saved_data *)__pa_nodebug(&mc_saved_data);
+   initrd_start_p = (unsigned long *)__pa_nodebug(&initrd_start);
+   initrd_start_addr = (unsigned long)__pa_nodebug(*initrd_start_p);
 #else
mc_saved_data_p = &mc_saved_data;
mc_saved_in_initrd_p = mc_saved_in_initrd;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/Kconfig: Make early microcode loading a configuration feature

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  da76f64e7eb28b718501d15c1b79af560b7ca4ea
Gitweb: http://git.kernel.org/tip/da76f64e7eb28b718501d15c1b79af560b7ca4ea
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:32 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:20:42 -0800

x86/Kconfig: Make early microcode loading a configuration feature

MICROCODE_INTEL_LIB, MICROCODE_INTEL_EARLY, and MICROCODE_EARLY are three new
configurations to enable or disable the feature.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-13-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79795af..e243da7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1029,6 +1029,24 @@ config MICROCODE_OLD_INTERFACE
def_bool y
depends on MICROCODE
 
+config MICROCODE_INTEL_LIB
+   def_bool y
+   depends on MICROCODE_INTEL
+
+config MICROCODE_INTEL_EARLY
+   bool "Early load microcode"
+   depends on MICROCODE_INTEL && BLK_DEV_INITRD
+   default y
+   help
+ This option provides functionality to read additional microcode data
+ at the beginning of initrd image. The data tells kernel to load
+ microcode to CPU's as early as possible. No functional change if no
+ microcode data is glued to the initrd, therefore it's safe to say Y.
+
+config MICROCODE_EARLY
+   def_bool y
+   depends on MICROCODE_INTEL_EARLY
+
 config X86_MSR
tristate "/dev/cpu/*/msr - Model-specific register support"
---help---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/mm/init.c: Copy ucode from initrd image to kernel memory

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  cd745be89e1580e8a1b47454a39f97f9c5c4b1e0
Gitweb: http://git.kernel.org/tip/cd745be89e1580e8a1b47454a39f97f9c5c4b1e0
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:31 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:20:26 -0800

x86/mm/init.c: Copy ucode from initrd image to kernel memory

Before initrd image is freed, copy valid ucode patches from initrd image
to kernel memory. The saved ucode will be used to update AP in resume
or hotplug.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-12-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/mm/init.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d418152..4903a03 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include/* for MAX_DMA_PFN */
+#include 
 
 #include "mm_internal.h"
 
@@ -534,6 +535,15 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
+#ifdef CONFIG_MICROCODE_EARLY
+   /*
+* Remember, initrd memory may contain microcode or other useful things.
+* Before we lose initrd mem, we need to find a place to hold them
+* now that normal virtual memory is enabled.
+*/
+   save_microcode_in_initrd();
+#endif
+
/*
 * end could be not aligned, and We can not align that,
 * decompresser could be confused by aligned initrd_end
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/head_32.S: Early update ucode in 32-bit

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  63b553c68db5a8d4febcd1010b194333d2b02e1c
Gitweb: http://git.kernel.org/tip/63b553c68db5a8d4febcd1010b194333d2b02e1c
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:29 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:19:20 -0800

x86/head_32.S: Early update ucode in 32-bit

This updates ucode in 32-bit kernel on BSP and AP. At this point, there is no
paging and no virtual address yet.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-10-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/head_32.S | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 8e7f655..2f70530 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -144,6 +144,11 @@ ENTRY(startup_32)
movl %eax, pa(olpc_ofw_pgd)
 #endif
 
+#ifdef CONFIG_MICROCODE_EARLY
+   /* Early load ucode on BSP. */
+   call load_ucode_bsp
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
@@ -299,6 +304,12 @@ ENTRY(startup_32_smp)
movl %eax,%ss
leal -__PAGE_OFFSET(%ecx),%esp
 
+#ifdef CONFIG_MICROCODE_EARLY
+   /* Early load ucode on AP. */
+   call load_ucode_ap
+#endif
+
+
 default_entry:
 /*
  * New page tables may be in 4Mbyte page mode and may
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  ec400ddeff200b068ddc6c70f7321f49ecf32ed5
Gitweb: http://git.kernel.org/tip/ec400ddeff200b068ddc6c70f7321f49ecf32ed5
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:28 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:19:18 -0800

x86/microcode_intel_early.c: Early update ucode on Intel's CPU

Implementation of early update ucode on Intel's CPU.

load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
ucode followed by ordinary initrd image file. The binary ucode file is stored
in kernel/x86/microcode/GenuineIntel.bin in the cpio data. All ucode
patches with the same model as BSP are saved in memory. A matching ucode patch
is updated on BSP.

load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-9-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/microcode_intel_early.c | 796 
 1 file changed, 796 insertions(+)

diff --git a/arch/x86/kernel/microcode_intel_early.c 
b/arch/x86/kernel/microcode_intel_early.c
new file mode 100644
index 000..7890bc8
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -0,0 +1,796 @@
+/*
+ * Intel CPU microcode early update for Linux
+ *
+ * Copyright (C) 2012 Fenghua Yu 
+ *H Peter Anvin" 
+ *
+ * This allows to early upgrade microcode on Intel processors
+ * belonging to IA-32 family - PentiumPro, Pentium II,
+ * Pentium III, Xeon, Pentium 4, etc.
+ *
+ * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ * Software Developer's Manual.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+unsigned long mc_saved_in_initrd[MAX_UCODE_COUNT];
+struct mc_saved_data {
+   unsigned int mc_saved_count;
+   struct microcode_intel **mc_saved;
+} mc_saved_data;
+
+static enum ucode_state __cpuinit
+generic_load_microcode_early(struct microcode_intel **mc_saved_p,
+unsigned int mc_saved_count,
+struct ucode_cpu_info *uci)
+{
+   struct microcode_intel *ucode_ptr, *new_mc = NULL;
+   int new_rev = uci->cpu_sig.rev;
+   enum ucode_state state = UCODE_OK;
+   unsigned int mc_size;
+   struct microcode_header_intel *mc_header;
+   unsigned int csig = uci->cpu_sig.sig;
+   unsigned int cpf = uci->cpu_sig.pf;
+   int i;
+
+   for (i = 0; i < mc_saved_count; i++) {
+   ucode_ptr = mc_saved_p[i];
+
+   mc_header = (struct microcode_header_intel *)ucode_ptr;
+   mc_size = get_totalsize(mc_header);
+   if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) {
+   new_rev = mc_header->rev;
+   new_mc  = ucode_ptr;
+   }
+   }
+
+   if (!new_mc) {
+   state = UCODE_NFOUND;
+   goto out;
+   }
+
+   uci->mc = (struct microcode_intel *)new_mc;
+out:
+   return state;
+}
+
+static void __cpuinit
+microcode_pointer(struct microcode_intel **mc_saved,
+ unsigned long *mc_saved_in_initrd,
+ unsigned long initrd_start, int mc_saved_count)
+{
+   int i;
+
+   for (i = 0; i < mc_saved_count; i++)
+   mc_saved[i] = (struct microcode_intel *)
+ (mc_saved_in_initrd[i] + initrd_start);
+}
+
+#ifdef CONFIG_X86_32
+static void __cpuinit
+microcode_phys(struct microcode_intel **mc_saved_tmp,
+  struct mc_saved_data *mc_saved_data)
+{
+   int i;
+   struct microcode_intel ***mc_saved;
+
+   mc_saved = (struct microcode_intel ***)
+  __pa_symbol(&mc_saved_data->mc_saved);
+   for (i = 0; i < mc_saved_data->mc_saved_count; i++) {
+   struct microcode_intel *p;
+
+   p = *(struct microcode_intel **)
+   __pa(mc_saved_data->mc_saved + i);
+   mc_saved_tmp[i] = (struct microcode_intel *)__pa(p);
+   }
+}
+#endif
+
+static enum ucode_state __cpuinit
+load_microcode(struct mc_saved_data *mc_saved_data,
+  unsigned long *mc_saved_in_initrd,
+  unsigned long initrd_start,
+  struct ucode_cpu_info *uci)
+{
+   struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT];
+   unsigned int count = mc_saved_data->mc_saved_count;
+
+   if (!mc_saved_data->mc_saved) {
+   microcode_pointer(mc_saved_tmp, mc_saved_in_initrd,
+ initrd_start, count);
+
+   return generi

[tip:x86/microcode] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  e666dfa273db1b12711eaec91facac5fec2ec851
Gitweb: http://git.kernel.org/tip/e666dfa273db1b12711eaec91facac5fec2ec851
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:26 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:19:14 -0800

x86/microcode_intel_lib.c: Early update ucode on Intel's CPU

Define interfaces microcode_sanity_check() and get_matching_microcode(). They
are called both in early boot time and in microcode Intel driver.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-7-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/microcode_intel_lib.c | 174 ++
 1 file changed, 174 insertions(+)

diff --git a/arch/x86/kernel/microcode_intel_lib.c 
b/arch/x86/kernel/microcode_intel_lib.c
new file mode 100644
index 000..ce69320
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_lib.c
@@ -0,0 +1,174 @@
+/*
+ * Intel CPU Microcode Update Driver for Linux
+ *
+ * Copyright (C) 2012 Fenghua Yu 
+ *H Peter Anvin" 
+ *
+ * This driver allows to upgrade microcode on Intel processors
+ * belonging to IA-32 family - PentiumPro, Pentium II,
+ * Pentium III, Xeon, Pentium 4, etc.
+ *
+ * Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
+ * Software Developer's Manual
+ * Order Number 253668 or free download from:
+ *
+ * http://developer.intel.com/Assets/PDF/manual/253668.pdf
+ *
+ * For more information, go to http://www.urbanmyth.org/microcode
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static inline int
+update_match_cpu(unsigned int csig, unsigned int cpf,
+unsigned int sig, unsigned int pf)
+{
+   return (!sigmatch(sig, csig, pf, cpf)) ? 0 : 1;
+}
+
+int
+update_match_revision(struct microcode_header_intel *mc_header, int rev)
+{
+   return (mc_header->rev <= rev) ? 0 : 1;
+}
+
+int microcode_sanity_check(void *mc, int print_err)
+{
+   unsigned long total_size, data_size, ext_table_size;
+   struct microcode_header_intel *mc_header = mc;
+   struct extended_sigtable *ext_header = NULL;
+   int sum, orig_sum, ext_sigcount = 0, i;
+   struct extended_signature *ext_sig;
+
+   total_size = get_totalsize(mc_header);
+   data_size = get_datasize(mc_header);
+
+   if (data_size + MC_HEADER_SIZE > total_size) {
+   if (print_err)
+   pr_err("error! Bad data size in microcode data file\n");
+   return -EINVAL;
+   }
+
+   if (mc_header->ldrver != 1 || mc_header->hdrver != 1) {
+   if (print_err)
+   pr_err("error! Unknown microcode update format\n");
+   return -EINVAL;
+   }
+   ext_table_size = total_size - (MC_HEADER_SIZE + data_size);
+   if (ext_table_size) {
+   if ((ext_table_size < EXT_HEADER_SIZE)
+|| ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) {
+   if (print_err)
+   pr_err("error! Small exttable size in microcode 
data file\n");
+   return -EINVAL;
+   }
+   ext_header = mc + MC_HEADER_SIZE + data_size;
+   if (ext_table_size != exttable_size(ext_header)) {
+   if (print_err)
+   pr_err("error! Bad exttable size in microcode 
data file\n");
+   return -EFAULT;
+   }
+   ext_sigcount = ext_header->count;
+   }
+
+   /* check extended table checksum */
+   if (ext_table_size) {
+   int ext_table_sum = 0;
+   int *ext_tablep = (int *)ext_header;
+
+   i = ext_table_size / DWSIZE;
+   while (i--)
+   ext_table_sum += ext_tablep[i];
+   if (ext_table_sum) {
+   if (print_err)
+   pr_warn("aborting, bad extended signature table 
checksum\n");
+   return -EINVAL;
+   }
+   }
+
+   /* calculate the checksum */
+   orig_sum = 0;
+   i = (MC_HEADER_SIZE + data_size) / DWSIZE;
+   while (i--)
+   orig_sum += ((int *)mc)[i];
+   if (orig_sum) {
+   if (print_err)
+   pr_err("aborting, bad checksum\n");
+   return -EINVAL;
+   }
+   if (!ext_table_size)
+   return 0;
+   /* check extended signature checksum */
+   for (i = 0; i < ext_sigcount; i++) {
+   ext_sig = (void *)ext_header + EXT_HEADER_SIZE +
+

[tip:x86/microcode] x86/microcode_core_early.c: Define interfaces for early loading ucode

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  a8ebf6d1d6971b90a20f5bd0465e6d520377e33b
Gitweb: http://git.kernel.org/tip/a8ebf6d1d6971b90a20f5bd0465e6d520377e33b
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:25 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:19:12 -0800

x86/microcode_core_early.c: Define interfaces for early loading ucode

Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and
AP in early boot time. These are generic interfaces. Internally they call
vendor specific implementations.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-6-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/microcode.h   | 14 +++
 arch/x86/kernel/microcode_core_early.c | 76 ++
 2 files changed, 90 insertions(+)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 43d921b..6825e2e 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -57,4 +57,18 @@ static inline struct microcode_ops * __init 
init_amd_microcode(void)
 static inline void __exit exit_amd_microcode(void) {}
 #endif
 
+#ifdef CONFIG_MICROCODE_EARLY
+#define MAX_UCODE_COUNT 128
+extern void __init load_ucode_bsp(void);
+extern __init void load_ucode_ap(void);
+extern int __init save_microcode_in_initrd(void);
+#else
+static inline void __init load_ucode_bsp(void) {}
+static inline __init void load_ucode_ap(void) {}
+static inline int __init save_microcode_in_initrd(void)
+{
+   return 0;
+}
+#endif
+
 #endif /* _ASM_X86_MICROCODE_H */
diff --git a/arch/x86/kernel/microcode_core_early.c 
b/arch/x86/kernel/microcode_core_early.c
new file mode 100644
index 000..577db84
--- /dev/null
+++ b/arch/x86/kernel/microcode_core_early.c
@@ -0,0 +1,76 @@
+/*
+ * X86 CPU microcode early update for Linux
+ *
+ * Copyright (C) 2012 Fenghua Yu 
+ *H Peter Anvin" 
+ *
+ * This driver allows to early upgrade microcode on Intel processors
+ * belonging to IA-32 family - PentiumPro, Pentium II,
+ * Pentium III, Xeon, Pentium 4, etc.
+ *
+ * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ * Software Developer's Manual.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include 
+#include 
+#include 
+
+#define QCHAR(a, b, c, d) ((a) + ((b) << 8) + ((c) << 16) + ((d) << 24))
+#define CPUID_INTEL1 QCHAR('G', 'e', 'n', 'u')
+#define CPUID_INTEL2 QCHAR('i', 'n', 'e', 'I')
+#define CPUID_INTEL3 QCHAR('n', 't', 'e', 'l')
+#define CPUID_AMD1 QCHAR('A', 'u', 't', 'h')
+#define CPUID_AMD2 QCHAR('e', 'n', 't', 'i')
+#define CPUID_AMD3 QCHAR('c', 'A', 'M', 'D')
+
+#define CPUID_IS(a, b, c, ebx, ecx, edx)   \
+   (!((ebx ^ (a))|(edx ^ (b))|(ecx ^ (c
+
+/*
+ * In early loading microcode phase on BSP, boot_cpu_data is not set up yet.
+ * x86_vendor() gets vendor id for BSP.
+ *
+ * In 32 bit AP case, accessing boot_cpu_data needs linear address. To simplify
+ * coding, we still use x86_vendor() to get vendor id for AP.
+ *
+ * x86_vendor() gets vendor information directly through cpuid.
+ */
+static int __cpuinit x86_vendor(void)
+{
+   u32 eax = 0x;
+   u32 ebx, ecx = 0, edx;
+
+   if (!have_cpuid_p())
+   return X86_VENDOR_UNKNOWN;
+
+   native_cpuid(&eax, &ebx, &ecx, &edx);
+
+   if (CPUID_IS(CPUID_INTEL1, CPUID_INTEL2, CPUID_INTEL3, ebx, ecx, edx))
+   return X86_VENDOR_INTEL;
+
+   if (CPUID_IS(CPUID_AMD1, CPUID_AMD2, CPUID_AMD3, ebx, ecx, edx))
+   return X86_VENDOR_AMD;
+
+   return X86_VENDOR_UNKNOWN;
+}
+
+void __init load_ucode_bsp(void)
+{
+   int vendor = x86_vendor();
+
+   if (vendor == X86_VENDOR_INTEL)
+   load_ucode_intel_bsp();
+}
+
+void __cpuinit load_ucode_ap(void)
+{
+   int vendor = x86_vendor();
+
+   if (vendor == X86_VENDOR_INTEL)
+   load_ucode_intel_ap();
+}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/common.c: Make have_cpuid_p() a global function

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  d288e1cf8e62f3e4034f1f021f047009c4ac0b3c
Gitweb: http://git.kernel.org/tip/d288e1cf8e62f3e4034f1f021f047009c4ac0b3c
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:23 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:18:58 -0800

x86/common.c: Make have_cpuid_p() a global function

Remove static declaration in have_cpuid_p() to make it a global function. The
function will be called in early loading microcode.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-4-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/processor.h | 8 
 arch/x86/kernel/cpu/common.c | 9 +++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index bdee8bd..3cdf4aa 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -190,6 +190,14 @@ extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 extern void detect_extended_topology(struct cpuinfo_x86 *c);
 extern void detect_ht(struct cpuinfo_x86 *c);
 
+#ifdef CONFIG_X86_32
+extern int have_cpuid_p(void);
+#else
+static inline int have_cpuid_p(void)
+{
+   return 1;
+}
+#endif
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9c3ab43..d7fd246 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -37,6 +37,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #ifdef CONFIG_X86_LOCAL_APIC
 #include 
@@ -213,7 +215,7 @@ static inline int flag_is_changeable_p(u32 flag)
 }
 
 /* Probe for the CPUID instruction */
-static int __cpuinit have_cpuid_p(void)
+int __cpuinit have_cpuid_p(void)
 {
return flag_is_changeable_p(X86_EFLAGS_ID);
 }
@@ -249,11 +251,6 @@ static inline int flag_is_changeable_p(u32 flag)
 {
return 1;
 }
-/* Probe for the CPUID instruction */
-static inline int have_cpuid_p(void)
-{
-   return 1;
-}
 static inline void squash_the_stupid_serial_number(struct cpuinfo_x86 *c)
 {
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/microcode_intel.h: Define functions and macros for early loading ucode

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  9cd4d78e21cfdc709b1af516214ec4f69ee0e6bd
Gitweb: http://git.kernel.org/tip/9cd4d78e21cfdc709b1af516214ec4f69ee0e6bd
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:22 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:18:50 -0800

x86/microcode_intel.h: Define functions and macros for early loading ucode

Define some functions and macros that will be used in early loading ucode. Some
of them are moved from microcode_intel.c driver in order to be called in early
boot phase before module can be called.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/microcode_intel.h |  85 ++
 arch/x86/kernel/Makefile   |   3 +
 arch/x86/kernel/microcode_core.c   |   7 +-
 arch/x86/kernel/microcode_intel.c  | 198 +
 4 files changed, 122 insertions(+), 171 deletions(-)

diff --git a/arch/x86/include/asm/microcode_intel.h 
b/arch/x86/include/asm/microcode_intel.h
new file mode 100644
index 000..5356f92
--- /dev/null
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -0,0 +1,85 @@
+#ifndef _ASM_X86_MICROCODE_INTEL_H
+#define _ASM_X86_MICROCODE_INTEL_H
+
+#include 
+
+struct microcode_header_intel {
+   unsigned inthdrver;
+   unsigned intrev;
+   unsigned intdate;
+   unsigned intsig;
+   unsigned intcksum;
+   unsigned intldrver;
+   unsigned intpf;
+   unsigned intdatasize;
+   unsigned inttotalsize;
+   unsigned intreserved[3];
+};
+
+struct microcode_intel {
+   struct microcode_header_intel hdr;
+   unsigned intbits[0];
+};
+
+/* microcode format is extended from prescott processors */
+struct extended_signature {
+   unsigned intsig;
+   unsigned intpf;
+   unsigned intcksum;
+};
+
+struct extended_sigtable {
+   unsigned intcount;
+   unsigned intcksum;
+   unsigned intreserved[3];
+   struct extended_signature sigs[0];
+};
+
+#define DEFAULT_UCODE_DATASIZE (2000)
+#define MC_HEADER_SIZE (sizeof(struct microcode_header_intel))
+#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
+#define EXT_HEADER_SIZE(sizeof(struct extended_sigtable))
+#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature))
+#define DWSIZE (sizeof(u32))
+
+#define get_totalsize(mc) \
+   (((struct microcode_intel *)mc)->hdr.totalsize ? \
+((struct microcode_intel *)mc)->hdr.totalsize : \
+DEFAULT_UCODE_TOTALSIZE)
+
+#define get_datasize(mc) \
+   (((struct microcode_intel *)mc)->hdr.datasize ? \
+((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE)
+
+#define sigmatch(s1, s2, p1, p2) \
+   (((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0
+
+#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
+
+extern int
+get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev);
+extern int microcode_sanity_check(void *mc, int print_err);
+extern int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev);
+extern int
+update_match_revision(struct microcode_header_intel *mc_header, int rev);
+
+#ifdef CONFIG_MICROCODE_INTEL_EARLY
+extern void __init load_ucode_intel_bsp(void);
+extern void __cpuinit load_ucode_intel_ap(void);
+extern void show_ucode_info_early(void);
+#else
+static inline __init void load_ucode_intel_bsp(void) {}
+static inline __cpuinit void load_ucode_intel_ap(void) {}
+static inline void show_ucode_info_early(void) {}
+#endif
+
+#if defined(CONFIG_MICROCODE_INTEL_EARLY) && defined(CONFIG_HOTPLUG_CPU)
+extern int save_mc_for_early(u8 *mc);
+#else
+static inline int save_mc_for_early(u8 *mc)
+{
+   return 0;
+}
+#endif
+
+#endif /* _ASM_X86_MICROCODE_INTEL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 34e923a..052abee 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -88,6 +88,9 @@ obj-$(CONFIG_PARAVIRT_CLOCK)  += pvclock.o
 
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
 
+obj-$(CONFIG_MICROCODE_EARLY)  += microcode_core_early.o
+obj-$(CONFIG_MICROCODE_INTEL_EARLY)+= microcode_intel_early.o
+obj-$(CONFIG_MICROCODE_INTEL_LIB)  += microcode_intel_lib.o
 microcode-y:= microcode_core.o
 microcode-$(CONFIG_MICROCODE_INTEL)+= microcode_intel.o
 microcode-$(CONFIG_MICROCODE_AMD)  += microcode_amd.o
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index 3a04b22..22db92b 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -364,10 +364,7 @@ static struct attribute_group mc_attr_group = {
 
 stat

[tip:x86/microcode] x86, doc: Documentation for early microcode loading

2013-01-31 Thread tip-bot for Fenghua Yu

Commit-ID:  0d91ea86a895b911fd7d999acb3f600706d9c8cd
Gitweb: http://git.kernel.org/tip/0d91ea86a895b911fd7d999acb3f600706d9c8cd
Author: Fenghua Yu 
AuthorDate: Thu, 20 Dec 2012 23:44:21 -0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 31 Jan 2013 13:18:47 -0800

x86, doc: Documentation for early microcode loading

Documenation for early loading microcode methodology.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1356075872-3054-2-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 Documentation/x86/early-microcode.txt | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/Documentation/x86/early-microcode.txt 
b/Documentation/x86/early-microcode.txt
new file mode 100644
index 000..4aaf0df
--- /dev/null
+++ b/Documentation/x86/early-microcode.txt
@@ -0,0 +1,43 @@
+Early load microcode
+
+By Fenghua Yu 
+
+Kernel can update microcode in early phase of boot time. Loading microcode 
early
+can fix CPU issues before they are observed during kernel boot time.
+
+Microcode is stored in an initrd file. The microcode is read from the initrd
+file and loaded to CPUs during boot time.
+
+The format of the combined initrd image is microcode in cpio format followed by
+the initrd image (maybe compressed). Kernel parses the combined initrd image
+during boot time. The microcode file in cpio name space is:
+kernel/x86/microcode/GenuineIntel.bin
+
+During BSP boot (before SMP starts), if the kernel finds the microcode file in
+the initrd file, it parses the microcode and saves matching microcode in 
memory.
+If matching microcode is found, it will be uploaded in BSP and later on in all
+APs.
+
+The cached microcode patch is applied when CPUs resume from a sleep state.
+
+There are two legacy user space interfaces to load microcode, either through
+/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file
+in sysfs.
+
+In addition to these two legacy methods, the early loading method described
+here is the third method with which microcode can be uploaded to a system's
+CPUs.
+
+The following example script shows how to generate a new combined initrd file 
in
+/boot/initrd-3.5.0.ucode.img with original microcode microcode.bin and
+original initrd image /boot/initrd-3.5.0.img.
+
+mkdir initrd
+cd initrd
+mkdir kernel
+mkdir kernel/x86
+mkdir kernel/x86/microcode
+cp ../microcode.bin kernel/x86/microcode/GenuineIntel.bin
+find .|cpio -oc >../ucode.cpio
+cd ..
+cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-11-30 Thread tip-bot for Fenghua Yu

Commit-ID:  474355fe313391de2429ae225e0fb02f67ec6c31
Gitweb: http://git.kernel.org/tip/474355fe313391de2429ae225e0fb02f67ec6c31
Author: Fenghua Yu 
AuthorDate: Thu, 29 Nov 2012 17:47:43 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 30 Nov 2012 15:18:16 -0800

x86/microcode_intel_early.c: Early update ucode on Intel's CPU

Implementation of early update ucode on Intel's CPU.

load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
ucode followed by ordinary initrd image file. The binary ucode file is stored
in kernel/x86/microcode/GenuineIntel/microcode.bin in the cpio data. All ucode
patches with the same model as BSP are saved in memory. A matching ucode patch
is updated on BSP.

load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1354240068-9821-6-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/microcode_intel_early.c | 438 
 1 file changed, 438 insertions(+)

diff --git a/arch/x86/kernel/microcode_intel_early.c 
b/arch/x86/kernel/microcode_intel_early.c
new file mode 100644
index 000..36b1df1
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -0,0 +1,438 @@
+/*
+ * Intel CPU Microcode Update Driver for Linux
+ *
+ * Copyright (C) 2012 Fenghua Yu 
+ *H Peter Anvin" 
+ *
+ * This driver allows to early upgrade microcode on Intel processors
+ * belonging to IA-32 family - PentiumPro, Pentium II,
+ * Pentium III, Xeon, Pentium 4, etc.
+ *
+ * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ * Software Developer's Manual.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
+struct mc_saved_data mc_saved_data;
+
+enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+unsigned int mc_saved_count,
+struct ucode_cpu_info *uci)
+{
+   struct microcode_intel *ucode_ptr, *new_mc = NULL;
+   int new_rev = uci->cpu_sig.rev;
+   enum ucode_state state = UCODE_OK;
+   unsigned int mc_size;
+   struct microcode_header_intel *mc_header;
+   unsigned int csig = uci->cpu_sig.sig;
+   unsigned int cpf = uci->cpu_sig.pf;
+   int i;
+
+   for (i = 0; i < mc_saved_count; i++) {
+   ucode_ptr = mc_saved_p[i];
+   mc_header = (struct microcode_header_intel *)ucode_ptr;
+   mc_size = get_totalsize(mc_header);
+   if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) {
+   new_rev = mc_header->rev;
+   new_mc  = ucode_ptr;
+   }
+   }
+
+   if (!new_mc) {
+   state = UCODE_NFOUND;
+   goto out;
+   }
+
+   uci->mc = (struct microcode_intel *)new_mc;
+out:
+   return state;
+}
+EXPORT_SYMBOL_GPL(generic_load_microcode_early);
+
+static enum ucode_state __init
+load_microcode(struct mc_saved_data *mc_saved_data, int cpu)
+{
+   struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
+
+   return generic_load_microcode_early(cpu, mc_saved_data->mc_saved,
+   mc_saved_data->mc_saved_count, uci);
+}
+
+static u8 get_x86_family(unsigned long sig)
+{
+   u8 x86;
+
+   x86 = (sig >> 8) & 0xf;
+
+   if (x86 == 0xf)
+   x86 += (sig >> 20) & 0xff;
+
+   return x86;
+}
+
+static u8 get_x86_model(unsigned long sig)
+{
+   u8 x86, x86_model;
+
+   x86 = get_x86_family(sig);
+   x86_model = (sig >> 4) & 0xf;
+
+   if (x86 == 0x6 || x86 == 0xf)
+   x86_model += ((sig >> 16) & 0xf) << 4;
+
+   return x86_model;
+}
+
+static enum ucode_state
+matching_model_microcode(struct microcode_header_intel *mc_header,
+   unsigned long sig)
+{
+   u8 x86, x86_model;
+   u8 x86_ucode, x86_model_ucode;
+
+   x86 = get_x86_family(sig);
+   x86_model = get_x86_model(sig);
+
+   x86_ucode = get_x86_family(mc_header->sig);
+   x86_model_ucode = get_x86_model(mc_header->sig);
+
+   if (x86 != x86_ucode || x86_model != x86_model_ucode)
+   return UCODE_ERROR;
+
+   return UCODE_OK;
+}
+
+static void
+save_microcode(struct mc_saved_data *mc_saved_data,
+  struct microcode_intel **mc_saved_src,
+  unsigned int mc_saved_count)
+{
+   int i;
+   struct microcode_intel **mc_saved_p;
+
+   if (!mc_saved_count)
+   return;
+
+

[tip:x86/microcode] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU

2012-11-30 Thread tip-bot for Fenghua Yu

Commit-ID:  da7d824a00ec0f4d19e2b51653410bde0de40226
Gitweb: http://git.kernel.org/tip/da7d824a00ec0f4d19e2b51653410bde0de40226
Author: Fenghua Yu 
AuthorDate: Thu, 29 Nov 2012 17:47:42 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 30 Nov 2012 15:18:15 -0800

x86/microcode_intel_lib.c: Early update ucode on Intel's CPU

Define interfaces microcode_sanity_check() and get_matching_microcode(). They
are called both in early boot time and in microcode Intel driver.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1354240068-9821-5-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/microcode_intel_lib.c | 174 ++
 1 file changed, 174 insertions(+)

diff --git a/arch/x86/kernel/microcode_intel_lib.c 
b/arch/x86/kernel/microcode_intel_lib.c
new file mode 100644
index 000..ce69320
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_lib.c
@@ -0,0 +1,174 @@
+/*
+ * Intel CPU Microcode Update Driver for Linux
+ *
+ * Copyright (C) 2012 Fenghua Yu 
+ *H Peter Anvin" 
+ *
+ * This driver allows to upgrade microcode on Intel processors
+ * belonging to IA-32 family - PentiumPro, Pentium II,
+ * Pentium III, Xeon, Pentium 4, etc.
+ *
+ * Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
+ * Software Developer's Manual
+ * Order Number 253668 or free download from:
+ *
+ * http://developer.intel.com/Assets/PDF/manual/253668.pdf
+ *
+ * For more information, go to http://www.urbanmyth.org/microcode
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static inline int
+update_match_cpu(unsigned int csig, unsigned int cpf,
+unsigned int sig, unsigned int pf)
+{
+   return (!sigmatch(sig, csig, pf, cpf)) ? 0 : 1;
+}
+
+int
+update_match_revision(struct microcode_header_intel *mc_header, int rev)
+{
+   return (mc_header->rev <= rev) ? 0 : 1;
+}
+
+int microcode_sanity_check(void *mc, int print_err)
+{
+   unsigned long total_size, data_size, ext_table_size;
+   struct microcode_header_intel *mc_header = mc;
+   struct extended_sigtable *ext_header = NULL;
+   int sum, orig_sum, ext_sigcount = 0, i;
+   struct extended_signature *ext_sig;
+
+   total_size = get_totalsize(mc_header);
+   data_size = get_datasize(mc_header);
+
+   if (data_size + MC_HEADER_SIZE > total_size) {
+   if (print_err)
+   pr_err("error! Bad data size in microcode data file\n");
+   return -EINVAL;
+   }
+
+   if (mc_header->ldrver != 1 || mc_header->hdrver != 1) {
+   if (print_err)
+   pr_err("error! Unknown microcode update format\n");
+   return -EINVAL;
+   }
+   ext_table_size = total_size - (MC_HEADER_SIZE + data_size);
+   if (ext_table_size) {
+   if ((ext_table_size < EXT_HEADER_SIZE)
+|| ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) {
+   if (print_err)
+   pr_err("error! Small exttable size in microcode 
data file\n");
+   return -EINVAL;
+   }
+   ext_header = mc + MC_HEADER_SIZE + data_size;
+   if (ext_table_size != exttable_size(ext_header)) {
+   if (print_err)
+   pr_err("error! Bad exttable size in microcode 
data file\n");
+   return -EFAULT;
+   }
+   ext_sigcount = ext_header->count;
+   }
+
+   /* check extended table checksum */
+   if (ext_table_size) {
+   int ext_table_sum = 0;
+   int *ext_tablep = (int *)ext_header;
+
+   i = ext_table_size / DWSIZE;
+   while (i--)
+   ext_table_sum += ext_tablep[i];
+   if (ext_table_sum) {
+   if (print_err)
+   pr_warn("aborting, bad extended signature table 
checksum\n");
+   return -EINVAL;
+   }
+   }
+
+   /* calculate the checksum */
+   orig_sum = 0;
+   i = (MC_HEADER_SIZE + data_size) / DWSIZE;
+   while (i--)
+   orig_sum += ((int *)mc)[i];
+   if (orig_sum) {
+   if (print_err)
+   pr_err("aborting, bad checksum\n");
+   return -EINVAL;
+   }
+   if (!ext_table_size)
+   return 0;
+   /* check extended signature checksum */
+   for (i = 0; i < ext_sigcount; i++) {
+   ext_sig = (void *)ext_header + EXT_HEADER_SIZE +
+

[tip:x86/microcode] x86/microcode_core_early.c: Define interfaces for early loading ucode

2012-11-30 Thread tip-bot for Fenghua Yu

Commit-ID:  d42bdf2139115faa4d5bdb0dc591d435a644fde4
Gitweb: http://git.kernel.org/tip/d42bdf2139115faa4d5bdb0dc591d435a644fde4
Author: Fenghua Yu 
AuthorDate: Thu, 29 Nov 2012 17:47:41 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 30 Nov 2012 15:18:15 -0800

x86/microcode_core_early.c: Define interfaces for early loading ucode

Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and
AP in early boot time. These are generic interfaces. Internally they call
vendor specific implementations.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1354240068-9821-4-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/microcode.h   | 23 +++
 arch/x86/kernel/microcode_core_early.c | 70 ++
 2 files changed, 93 insertions(+)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 43d921b..2e2ff3a 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -57,4 +57,27 @@ static inline struct microcode_ops * __init 
init_amd_microcode(void)
 static inline void __exit exit_amd_microcode(void) {}
 #endif
 
+struct mc_saved_data {
+   unsigned int mc_saved_count;
+   struct microcode_intel **mc_saved;
+   struct ucode_cpu_info *ucode_cpu_info;
+};
+#ifdef CONFIG_MICROCODE_EARLY
+#define MAX_UCODE_COUNT 128
+extern struct ucode_cpu_info ucode_cpu_info_early[NR_CPUS];
+extern struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
+extern struct mc_saved_data mc_saved_data;
+extern void __init load_ucode_bsp(char *real_mode_data);
+extern __init void load_ucode_ap(void);
+extern void __init
+save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+struct microcode_intel **mc_saved_in_initrd);
+#else
+static inline void __init load_ucode_bsp(char *real_mode_data) {}
+static inline __init void load_ucode_ap(void) {}
+static inline void __init
+save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+struct microcode_intel **mc_saved_in_initrd) {}
+#endif
+
 #endif /* _ASM_X86_MICROCODE_H */
diff --git a/arch/x86/kernel/microcode_core_early.c 
b/arch/x86/kernel/microcode_core_early.c
new file mode 100644
index 000..1c6cc8f
--- /dev/null
+++ b/arch/x86/kernel/microcode_core_early.c
@@ -0,0 +1,70 @@
+/*
+ * X86 CPU microcode early update for Linux
+ *
+ * Copyright (C) 2012 Fenghua Yu 
+ *H Peter Anvin" 
+ *
+ * This driver allows to early upgrade microcode on Intel processors
+ * belonging to IA-32 family - PentiumPro, Pentium II,
+ * Pentium III, Xeon, Pentium 4, etc.
+ *
+ * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ * Software Developer's Manual.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+
+struct ucode_cpu_info  ucode_cpu_info_early[NR_CPUS];
+EXPORT_SYMBOL_GPL(ucode_cpu_info_early);
+
+static inline int __init x86_vendor(void)
+{
+   unsigned int eax = 0x;
+   char x86_vendor_id[16];
+   int i;
+   struct {
+   char x86_vendor_id[16];
+   __u8 x86_vendor;
+   } cpu_vendor_table[] = {
+   { "GenuineIntel", X86_VENDOR_INTEL },
+   { "AuthenticAMD", X86_VENDOR_AMD },
+   };
+
+   memset(x86_vendor_id, 0, ARRAY_SIZE(x86_vendor_id));
+   /* Get vendor name */
+   native_cpuid(&eax,
+   (unsigned int *)&x86_vendor_id[0],
+   (unsigned int *)&x86_vendor_id[8],
+   (unsigned int *)&x86_vendor_id[4]);
+
+   for (i = 0; i < ARRAY_SIZE(cpu_vendor_table); i++) {
+   if (!strcmp(x86_vendor_id, cpu_vendor_table[i].x86_vendor_id))
+   return cpu_vendor_table[i].x86_vendor;
+   }
+
+   return X86_VENDOR_UNKNOWN;
+}
+
+
+void __init load_ucode_bsp(char *real_mode_data)
+{
+   /*
+* boot_cpu_data is not setup yet in this early phase.
+* So we get vendor information directly through cpuid.
+*/
+   if (x86_vendor() == X86_VENDOR_INTEL)
+   load_ucode_intel_bsp(real_mode_data);
+}
+
+void __cpuinit load_ucode_ap(void)
+{
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+   load_ucode_intel_ap();
+}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/microcode] x86/microcode_intel.h: Define functions and macros for early loading ucode

2012-11-30 Thread tip-bot for Fenghua Yu

Commit-ID:  17f1087f1a80d2dfada790c31720eb6a57da2d1f
Gitweb: http://git.kernel.org/tip/17f1087f1a80d2dfada790c31720eb6a57da2d1f
Author: Fenghua Yu 
AuthorDate: Thu, 29 Nov 2012 17:47:40 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 30 Nov 2012 15:18:14 -0800

x86/microcode_intel.h: Define functions and macros for early loading ucode

Define some functions and macros that will be used in early loading ucode. Some
of them are moved from microcode_intel.c driver in order to be called in early
boot phase before module can be called.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1354240068-9821-3-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/microcode_intel.h | 106 +++
 arch/x86/kernel/Makefile   |   3 +
 arch/x86/kernel/microcode_core.c   |   7 +-
 arch/x86/kernel/microcode_intel.c  | 185 ++---
 4 files changed, 120 insertions(+), 181 deletions(-)

diff --git a/arch/x86/include/asm/microcode_intel.h 
b/arch/x86/include/asm/microcode_intel.h
new file mode 100644
index 000..0544bf4
--- /dev/null
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -0,0 +1,106 @@
+#ifndef _ASM_X86_MICROCODE_INTEL_H
+#define _ASM_X86_MICROCODE_INTEL_H
+
+#include 
+
+struct microcode_header_intel {
+   unsigned inthdrver;
+   unsigned intrev;
+   unsigned intdate;
+   unsigned intsig;
+   unsigned intcksum;
+   unsigned intldrver;
+   unsigned intpf;
+   unsigned intdatasize;
+   unsigned inttotalsize;
+   unsigned intreserved[3];
+};
+
+struct microcode_intel {
+   struct microcode_header_intel hdr;
+   unsigned intbits[0];
+};
+
+/* microcode format is extended from prescott processors */
+struct extended_signature {
+   unsigned intsig;
+   unsigned intpf;
+   unsigned intcksum;
+};
+
+struct extended_sigtable {
+   unsigned intcount;
+   unsigned intcksum;
+   unsigned intreserved[3];
+   struct extended_signature sigs[0];
+};
+
+#define DEFAULT_UCODE_DATASIZE (2000)
+#define MC_HEADER_SIZE (sizeof(struct microcode_header_intel))
+#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
+#define EXT_HEADER_SIZE(sizeof(struct extended_sigtable))
+#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature))
+#define DWSIZE (sizeof(u32))
+
+#define get_totalsize(mc) \
+   (((struct microcode_intel *)mc)->hdr.totalsize ? \
+((struct microcode_intel *)mc)->hdr.totalsize : \
+DEFAULT_UCODE_TOTALSIZE)
+
+#define get_datasize(mc) \
+   (((struct microcode_intel *)mc)->hdr.datasize ? \
+((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE)
+
+#define sigmatch(s1, s2, p1, p2) \
+   (((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0
+
+#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
+
+extern int
+get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev);
+extern int microcode_sanity_check(void *mc, int print_err);
+extern int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev);
+extern int
+update_match_revision(struct microcode_header_intel *mc_header, int rev);
+
+#ifdef CONFIG_MICROCODE_INTEL_EARLY
+extern enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+struct mc_saved_data *mc_saved_data,
+struct microcode_intel **mc_saved_in_initrd,
+enum system_states system_state);
+extern enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+unsigned int mc_saved_count,
+struct ucode_cpu_info *uci);
+extern void __init
+load_ucode_intel_bsp(char *real_mode_data);
+extern void __init load_ucode_intel_ap(void);
+#else
+static inline enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+struct mc_saved_data *mc_saved_data,
+struct microcode_intel **mc_saved_in_initrd,
+enum system_states system_state)
+{
+   return UCODE_ERROR;
+}
+static inline enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+unsigned int mc_saved_count,
+struct ucode_cpu_info *uci)
+{
+   return UCODE_ERROR;
+}
+static inline __init void
+load_ucode_intel_bsp(char *real_mode_data)
+{
+}
+static inline __init void
+load_ucode_intel_ap(struct ucode_cpu_info *uci,
+   struct mc_saved_data *mc_saved_data)
+{
+}
+#endif
+
+#endif /* _ASM_X86_MICROCO

[tip:x86/microcode] x86, doc: Early microcode loading

2012-11-30 Thread tip-bot for Fenghua Yu

Commit-ID:  31ae1d90c127310c67618b8bd79f01c394116187
Gitweb: http://git.kernel.org/tip/31ae1d90c127310c67618b8bd79f01c394116187
Author: Fenghua Yu 
AuthorDate: Fri, 30 Nov 2012 07:45:51 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 30 Nov 2012 15:18:14 -0800

x86, doc: Early microcode loading

Documenation for early microcode loading.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1354290351-20988-1-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 Documentation/x86/early-microcode.txt | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/Documentation/x86/early-microcode.txt 
b/Documentation/x86/early-microcode.txt
new file mode 100644
index 000..4aaf0df
--- /dev/null
+++ b/Documentation/x86/early-microcode.txt
@@ -0,0 +1,43 @@
+Early load microcode
+
+By Fenghua Yu 
+
+Kernel can update microcode in early phase of boot time. Loading microcode 
early
+can fix CPU issues before they are observed during kernel boot time.
+
+Microcode is stored in an initrd file. The microcode is read from the initrd
+file and loaded to CPUs during boot time.
+
+The format of the combined initrd image is microcode in cpio format followed by
+the initrd image (maybe compressed). Kernel parses the combined initrd image
+during boot time. The microcode file in cpio name space is:
+kernel/x86/microcode/GenuineIntel.bin
+
+During BSP boot (before SMP starts), if the kernel finds the microcode file in
+the initrd file, it parses the microcode and saves matching microcode in 
memory.
+If matching microcode is found, it will be uploaded in BSP and later on in all
+APs.
+
+The cached microcode patch is applied when CPUs resume from a sleep state.
+
+There are two legacy user space interfaces to load microcode, either through
+/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file
+in sysfs.
+
+In addition to these two legacy methods, the early loading method described
+here is the third method with which microcode can be uploaded to a system's
+CPUs.
+
+The following example script shows how to generate a new combined initrd file 
in
+/boot/initrd-3.5.0.ucode.img with original microcode microcode.bin and
+original initrd image /boot/initrd-3.5.0.img.
+
+mkdir initrd
+cd initrd
+mkdir kernel
+mkdir kernel/x86
+mkdir kernel/x86/microcode
+cp ../microcode.bin kernel/x86/microcode/GenuineIntel.bin
+find .|cpio -oc >../ucode.cpio
+cd ..
+cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/bsp-hotplug] x86, topology: Debug CPU0 hotplug

2012-11-14 Thread tip-bot for Fenghua Yu

Commit-ID:  a71c8bc5dfefbbf80ef90739791554ef7ea4401b
Gitweb: http://git.kernel.org/tip/a71c8bc5dfefbbf80ef90739791554ef7ea4401b
Author: Fenghua Yu 
AuthorDate: Tue, 13 Nov 2012 11:32:51 -0800
Committer:  H. Peter Anvin 
CommitDate: Wed, 14 Nov 2012 15:28:11 -0800

x86, topology: Debug CPU0 hotplug

CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch
offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined.
User can online CPU0 back after boot time. The default value of the switch is
off.

To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either
turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving
cpu0_hotplug kernel parameter at boot.

It's safe and early place to take down CPU0 after all hotplug notifiers
are installed and SMP is booted.

Please note that some applications or drivers, e.g. some versions of udevd,
during boot time may put CPU0 online again in this CPU0 hotplug debug mode.

In this debug mode, setup_local_APIC() may report a warning on max_loops<=0
when CPU0 is onlined back after boot time. This is because pending interrupt in
IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on
other CPUs as well. It is harmless except the first CPU0 online takes a bit
longer time. And so this debug mode is useful to expose this issue. I'll send
a seperate patch to fix this generic warning issue.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/Kconfig   | 15 ++
 arch/x86/include/asm/cpu.h |  3 +++
 arch/x86/kernel/topology.c | 51 ++
 arch/x86/power/cpu.c   | 38 ++
 4 files changed, 107 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 036e89a..b6cfa5f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1727,6 +1727,21 @@ config BOOTPARAM_HOTPLUG_CPU0
  You still can enable the CPU0 hotplug feature at boot by kernel
  parameter cpu0_hotplug.
 
+config DEBUG_HOTPLUG_CPU0
+   def_bool n
+   prompt "Debug CPU0 hotplug"
+   depends on HOTPLUG_CPU && EXPERIMENTAL
+   ---help---
+ Enabling this option offlines CPU0 (if CPU0 can be offlined) as
+ soon as possible and boots up userspace with CPU0 offlined. User
+ can online CPU0 back after boot time.
+
+ To debug CPU0 hotplug, you need to enable CPU0 offline/online
+ feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during
+ compilation or giving cpu0_hotplug kernel parameter at boot.
+
+ If unsure, say N.
+
 config COMPAT_VDSO
def_bool y
prompt "Compat VDSO support"
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index a119572..5f9a124 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -29,6 +29,9 @@ struct x86_cpu {
 extern int arch_register_cpu(int num);
 extern void arch_unregister_cpu(int);
 extern void __cpuinit start_cpu0(void);
+#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
+extern int _debug_hotplug_cpu(int cpu, int action);
+#endif
 #endif
 
 DECLARE_PER_CPU(int, cpu_state);
diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c
index 0e7b4a7..6e60b5f 100644
--- a/arch/x86/kernel/topology.c
+++ b/arch/x86/kernel/topology.c
@@ -50,6 +50,57 @@ static int __init enable_cpu0_hotplug(char *str)
 __setup("cpu0_hotplug", enable_cpu0_hotplug);
 #endif
 
+#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
+/*
+ * This function offlines a CPU as early as possible and allows userspace to
+ * boot up without the CPU. The CPU can be onlined back by user after boot.
+ *
+ * This is only called for debugging CPU offline/online feature.
+ */
+int __ref _debug_hotplug_cpu(int cpu, int action)
+{
+   struct device *dev = get_cpu_device(cpu);
+   int ret;
+
+   if (!cpu_is_hotpluggable(cpu))
+   return -EINVAL;
+
+   cpu_hotplug_driver_lock();
+
+   switch (action) {
+   case 0:
+   ret = cpu_down(cpu);
+   if (!ret) {
+   pr_info("CPU %u is now offline\n", cpu);
+   kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
+   } else
+   pr_debug("Can't offline CPU%d.\n", cpu);
+   break;
+   case 1:
+   ret = cpu_up(cpu);
+   if (!ret)
+   kobject_uevent(&dev->kobj, KOBJ_ONLINE);
+   else
+   pr_debug("Can't online CPU%d.\n", cpu);
+   break;
+   default:
+   ret = -EINVAL;
+   }
+
+   cpu_hotplug_driver_unlock();
+
+   return ret;
+}
+
+static int __init debug_hotplug_cpu(void)
+{
+   _debug_hotplug_cpu(0, 0);
+   return 0;
+}
+
+late_initcall_sync(debug_hotplug_cpu);
+#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */
+
 int __ref arch_regis

[tip:x86/bsp-hotplug] x86, hotplug: The first online processor saves the MTRR state

2012-11-14 Thread tip-bot for Fenghua Yu

Commit-ID:  30242aa6023b71325c6b8addac06faf544a85fd0
Gitweb: http://git.kernel.org/tip/30242aa6023b71325c6b8addac06faf544a85fd0
Author: Fenghua Yu 
AuthorDate: Tue, 13 Nov 2012 11:32:48 -0800
Committer:  H. Peter Anvin 
CommitDate: Wed, 14 Nov 2012 15:28:10 -0800

x86, hotplug: The first online processor saves the MTRR state

Ask the first online CPU to save mtrr instead of asking BSP. BSP could be
offline when mtrr_save_state() is called.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1352835171-3958-12-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/cpu/mtrr/main.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 6b96110..e4c1a41 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -695,11 +695,16 @@ void mtrr_ap_init(void)
 }
 
 /**
- * Save current fixed-range MTRR state of the BSP
+ * Save current fixed-range MTRR state of the first cpu in cpu_online_mask.
  */
 void mtrr_save_state(void)
 {
-   smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1);
+   int first_cpu;
+
+   get_online_cpus();
+   first_cpu = cpumask_first(cpu_online_mask);
+   smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
+   put_online_cpus();
 }
 
 void set_mtrr_aps_delayed_init(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/bsp-hotplug] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI

2012-11-14 Thread tip-bot for Fenghua Yu

Commit-ID:  e1c467e69040c3be68959332959c07fb3d818e87
Gitweb: http://git.kernel.org/tip/e1c467e69040c3be68959332959c07fb3d818e87
Author: Fenghua Yu 
AuthorDate: Wed, 14 Nov 2012 04:36:53 -0800
Committer:  H. Peter Anvin 
CommitDate: Wed, 14 Nov 2012 15:28:03 -0800

x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI

Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
code which is not a desired behavior for waking up BSP. To avoid the boot-strap
code, wake up CPU0 by NMI instead.

This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
physically hot removed and then hot added), NMI won't wake it up. We'll change
this code in the future to wake up hard offlined CPU0 if real platform and
request are available.

AP is still waken up as before by INIT, SIPI, SIPI sequence.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/cpu.h |   1 +
 arch/x86/kernel/smpboot.c  | 111 ++---
 2 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 4564c8e..a119572 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -28,6 +28,7 @@ struct x86_cpu {
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
 extern void arch_unregister_cpu(int);
+extern void __cpuinit start_cpu0(void);
 #endif
 
 DECLARE_PER_CPU(int, cpu_state);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c297907..ef53e66 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -138,15 +138,17 @@ static void __cpuinit smp_callin(void)
 * we may get here before an INIT-deassert IPI reaches
 * our local APIC.  We have to wait for the IPI or we'll
 * lock up on an APIC access.
+*
+* Since CPU0 is not wakened up by INIT, it doesn't wait for the IPI.
 */
-   if (apic->wait_for_init_deassert)
+   cpuid = smp_processor_id();
+   if (apic->wait_for_init_deassert && cpuid != 0)
apic->wait_for_init_deassert(&init_deasserted);
 
/*
 * (This works even if the APIC is not enabled.)
 */
phys_id = read_apic_id();
-   cpuid = smp_processor_id();
if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
phys_id, cpuid);
@@ -228,6 +230,8 @@ static void __cpuinit smp_callin(void)
cpumask_set_cpu(cpuid, cpu_callin_mask);
 }
 
+static int cpu0_logical_apicid;
+static int enable_start_cpu0;
 /*
  * Activate a secondary processor.
  */
@@ -243,6 +247,8 @@ notrace static void __cpuinit start_secondary(void *unused)
preempt_disable();
smp_callin();
 
+   enable_start_cpu0 = 0;
+
 #ifdef CONFIG_X86_32
/* switch away from the initial page table */
load_cr3(swapper_pg_dir);
@@ -492,7 +498,7 @@ void __inquire_remote_apic(int apicid)
  * won't ... remember to clear down the APIC, etc later.
  */
 int __cpuinit
-wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
+wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
 {
unsigned long send_status, accept_status = 0;
int maxlvt;
@@ -500,7 +506,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned 
long start_eip)
/* Target chip */
/* Boot on the stack */
/* Kick the second */
-   apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid);
+   apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid);
 
pr_debug("Waiting for send to finish...\n");
send_status = safe_apic_wait_icr_idle();
@@ -660,6 +666,63 @@ static void __cpuinit announce_cpu(int cpu, int apicid)
node, cpu, apicid);
 }
 
+static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs)
+{
+   int cpu;
+
+   cpu = smp_processor_id();
+   if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0)
+   return NMI_HANDLED;
+
+   return NMI_DONE;
+}
+
+/*
+ * Wake up AP by INIT, INIT, STARTUP sequence.
+ *
+ * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS
+ * boot-strap code which is not a desired behavior for waking up BSP. To
+ * void the boot-strap code, wake up CPU0 by NMI instead.
+ *
+ * This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined
+ * (i.e. physically hot removed and then hot added), NMI won't wake it up.
+ * We'll change this code in the future to wake up hard offlined CPU0 if
+ * real platform and request are available.
+ */
+static int __cpuinit
+wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
+  int *cpu0_nmi_registered)
+{
+   int id;
+   int boot_error;
+
+   /*
+* Wake

[tip:x86/bsp-hotplug] kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback

2012-11-14 Thread tip-bot for Fenghua Yu

Commit-ID:  6e32d479db6079dd5d4309aa66aecbcf2664a5fe
Gitweb: http://git.kernel.org/tip/6e32d479db6079dd5d4309aa66aecbcf2664a5fe
Author: Fenghua Yu 
AuthorDate: Tue, 13 Nov 2012 11:32:43 -0800
Committer:  H. Peter Anvin 
CommitDate: Wed, 14 Nov 2012 09:39:50 -0800

kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback

cpu_hotplug_pm_callback should have higher priority than
bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug
to avoid race during bsp online checking.

This is to hightlight the priorities between the two callbacks in case people
may overlook the order.

Ideally the priorities should be defined in macro/enum instead of fixed values.
To do that, a seperate patchset may be pushed which will touch serveral other
generic files and is out of scope of this patchset.

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua...@intel.com
Reviewed-by: Srivatsa S. Bhat 
Acked-by: Rafael J. Wysocki 
Signed-off-by: H. Peter Anvin 
---
 kernel/cpu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 42bd331..a2491a2 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -601,6 +601,11 @@ cpu_hotplug_pm_callback(struct notifier_block *nb,
 
 static int __init cpu_hotplug_pm_sync_init(void)
 {
+   /*
+* cpu_hotplug_pm_callback has higher priority than x86
+* bsp_pm_callback which depends on cpu_hotplug_pm_callback
+* to disable cpu hotplug to avoid cpu hotplug race.
+*/
pm_notifier(cpu_hotplug_pm_callback, 0);
return 0;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 121 matches

Mail list logo