[tip:x86/urgent] x86/umwait: Fix error handling in umwait_init()
Commit-ID: e7409258845a0f64967f8377e99294d438137537 Gitweb: https://git.kernel.org/tip/e7409258845a0f64967f8377e99294d438137537 Author: Fenghua Yu AuthorDate: Fri, 9 Aug 2019 18:40:37 -0700 Committer: Thomas Gleixner CommitDate: Mon, 12 Aug 2019 14:51:13 +0200 x86/umwait: Fix error handling in umwait_init() Currently, failure of cpuhp_setup_state() is ignored and the syscore ops and the control interfaces can still be added even after the failure. But, this error handling will cause a few issues: 1. The CPUs may have different values in the IA32_UMWAIT_CONTROL MSR because there is no way to roll back the control MSR on the CPUs which already set the MSR before the failure. 2. If the sysfs interface is added successfully, there will be a mismatch between the global control value and the control MSR: - The interface shows the default global control value. But, the control MSR is not set to the value because the CPU online function, which is supposed to set the MSR to the value, is not installed. - If the sysadmin changes the global control value through the interface, the control MSR on all current online CPUs is set to the new value. But, the control MSR on newly onlined CPUs after the value change will not be set to the new value due to lack of the CPU online function. 3. On resume from suspend/hibernation, the boot CPU restores the control MSR to the global control value through the syscore ops. But, the control MSR on all APs is not set due to lake of the CPU online function. To solve the issues and enforce consistent behavior on the failure of the CPU hotplug setup, make the following changes: 1. Cache the original control MSR value which is configured by hardware or BIOS before kernel boot. This value is likely to be 0. But it could be a different number as well. Cache the control MSR only once before the MSR is changed. 2. Add the CPU offline function so that the MSR is restored to the original control value on all CPUs on the failure. 3. On the failure, exit from cpumait_init() so that the syscore ops and the control interfaces are not added. Reported-by: Valdis Kletnieks Suggested-by: Thomas Gleixner Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1565401237-60936-1-git-send-email-fenghua...@intel.com --- arch/x86/kernel/cpu/umwait.c | 39 ++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c index 6a204e7336c1..32b4dc9030aa 100644 --- a/arch/x86/kernel/cpu/umwait.c +++ b/arch/x86/kernel/cpu/umwait.c @@ -17,6 +17,12 @@ */ static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE); +/* + * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by + * hardware or BIOS before kernel boot. + */ +static u32 orig_umwait_control_cached __ro_after_init; + /* * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in * the sysfs write functions. @@ -52,6 +58,23 @@ static int umwait_cpu_online(unsigned int cpu) return 0; } +/* + * The CPU hotplug callback sets the control MSR to the original control + * value. + */ +static int umwait_cpu_offline(unsigned int cpu) +{ + /* +* This code is protected by the CPU hotplug already and +* orig_umwait_control_cached is never changed after it caches +* the original control MSR value in umwait_init(). So there +* is no race condition here. +*/ + wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0); + + return 0; +} + /* * On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which * is the only active CPU at this time. The MSR is set up on the APs via the @@ -185,8 +208,22 @@ static int __init umwait_init(void) if (!boot_cpu_has(X86_FEATURE_WAITPKG)) return -ENODEV; + /* +* Cache the original control MSR value before the control MSR is +* changed. This is the only place where orig_umwait_control_cached +* is modified. +*/ + rdmsrl(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached); + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait:online", - umwait_cpu_online, NULL); + umwait_cpu_online, umwait_cpu_offline); + if (ret < 0) { + /* +* On failure, the control MSR on all CPUs has the +* original control value. +*/ + return ret; + } register_syscore_ops(&umwait_syscore_ops);
[tip:x86/cpu] x86/umwait: Add sysfs interface to control umwait maximum time
Commit-ID: bd9a0c97e53c3d7a56b2751179903ddc5da42683 Gitweb: https://git.kernel.org/tip/bd9a0c97e53c3d7a56b2751179903ddc5da42683 Author: Fenghua Yu AuthorDate: Wed, 19 Jun 2019 18:33:57 -0700 Committer: Thomas Gleixner CommitDate: Mon, 24 Jun 2019 01:44:20 +0200 x86/umwait: Add sysfs interface to control umwait maximum time IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta that processor can stay in C0.1 or C0.2. A zero value means no maximum time. Each instruction sets its own deadline in the instruction's implicit input EDX:EAX value. The instruction wakes up if the time-stamp counter reaches or exceeds the specified deadline, or the umwait maximum time expires, or a store happens in the monitored address range in umwait. The administrator can write an unsigned 32-bit number to /sys/devices/system/cpu/umwait_control/max_time to change the default value. Note that a value of zero means there is no limit. The lower two bits of the value must be zero. [ tglx: Simplify the write function. Massage changelog ] Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Tony Luck Cc: "Borislav Petkov" Cc: "H Peter Anvin" Cc: "Andy Lutomirski" Cc: "Peter Zijlstra" Cc: "Ravi V Shankar" Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua...@intel.com --- arch/x86/kernel/cpu/umwait.c | 36 1 file changed, 36 insertions(+) diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c index 56149d630e35..6a204e7336c1 100644 --- a/arch/x86/kernel/cpu/umwait.c +++ b/arch/x86/kernel/cpu/umwait.c @@ -131,8 +131,44 @@ static ssize_t enable_c02_store(struct device *dev, } static DEVICE_ATTR_RW(enable_c02); +static ssize_t +max_time_show(struct device *kobj, struct device_attribute *attr, char *buf) +{ + u32 ctrl = READ_ONCE(umwait_control_cached); + + return sprintf(buf, "%u\n", umwait_ctrl_max_time(ctrl)); +} + +static ssize_t max_time_store(struct device *kobj, + struct device_attribute *attr, + const char *buf, size_t count) +{ + u32 max_time, ctrl; + int ret; + + ret = kstrtou32(buf, 0, &max_time); + if (ret) + return ret; + + /* bits[1:0] must be zero */ + if (max_time & ~MSR_IA32_UMWAIT_CONTROL_TIME_MASK) + return -EINVAL; + + mutex_lock(&umwait_lock); + + ctrl = READ_ONCE(umwait_control_cached); + if (max_time != umwait_ctrl_max_time(ctrl)) + umwait_update_control(max_time, umwait_ctrl_c02_enabled(ctrl)); + + mutex_unlock(&umwait_lock); + + return count; +} +static DEVICE_ATTR_RW(max_time); + static struct attribute *umwait_attrs[] = { &dev_attr_enable_c02.attr, + &dev_attr_max_time.attr, NULL };
[tip:x86/cpu] x86/cpufeatures: Enumerate user wait instructions
Commit-ID: 6dbbf5ec9e1e9f607a4c51266d0f9a63ba754b63 Gitweb: https://git.kernel.org/tip/6dbbf5ec9e1e9f607a4c51266d0f9a63ba754b63 Author: Fenghua Yu AuthorDate: Wed, 19 Jun 2019 18:33:54 -0700 Committer: Thomas Gleixner CommitDate: Mon, 24 Jun 2019 01:44:19 +0200 x86/cpufeatures: Enumerate user wait instructions umonitor, umwait, and tpause are a set of user wait instructions. umonitor arms address monitoring hardware using an address. The address range is determined by using CPUID.0x5. A store to an address within the specified address range triggers the monitoring hardware to wake up the processor waiting in umwait. umwait instructs the processor to enter an implementation-dependent optimized state while monitoring a range of addresses. The optimized state may be either a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state). tpause instructs the processor to enter an implementation-dependent optimized state C0.1 or C0.2 state and wake up when time-stamp counter reaches specified timeout. The three instructions may be executed at any privilege level. The instructions provide power saving method while waiting in user space. Additionally, they can allow a sibling hyperthread to make faster progress while this thread is waiting. One example of an application usage of umwait is when waiting for input data from another application, such as a user level multi-threaded packet processing engine. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. Detailed information on the instructions and CPUID feature WAITPKG flag can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference and Intel 64 and IA-32 Architectures Software Developer's Manual. Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Andy Lutomirski Cc: "Borislav Petkov" Cc: "H Peter Anvin" Cc: "Peter Zijlstra" Cc: "Tony Luck" Cc: "Ravi V Shankar" Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua...@intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 8ecd9fac97c3..998c2cc08363 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -330,6 +330,7 @@ #define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ +#define X86_FEATURE_WAITPKG(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */
[tip:x86/cpu] x86/umwait: Add sysfs interface to control umwait C0.2 state
Commit-ID: ff4b353f2ef9dc8e396d7cb9572801e34a8c7374 Gitweb: https://git.kernel.org/tip/ff4b353f2ef9dc8e396d7cb9572801e34a8c7374 Author: Fenghua Yu AuthorDate: Wed, 19 Jun 2019 18:33:56 -0700 Committer: Thomas Gleixner CommitDate: Mon, 24 Jun 2019 01:44:20 +0200 x86/umwait: Add sysfs interface to control umwait C0.2 state C0.2 state in umwait and tpause instructions can be enabled or disabled on a processor through IA32_UMWAIT_CONTROL MSR register. By default, C0.2 is enabled and the user wait instructions results in lower power consumption with slower wakeup time. But in real time systems which require faster wakeup time although power savings could be smaller, the administrator needs to disable C0.2 and all umwait invocations from user applications use C0.1. Create a sysfs interface which allows the administrator to control C0.2 state during run time. Andy Lutomirski suggested to turn off local irqs before writing the MSR to ensure the cached control value is not changed by a concurrent sysfs write from a different CPU via IPI. [ tglx: Simplified the update logic in the write function and got rid of all the convoluted type casts. Added a shared update function and made the namespace consistent. Moved the sysfs create invocation. Massaged changelog ] Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Tony Luck Cc: "Borislav Petkov" Cc: "H Peter Anvin" Cc: "Andy Lutomirski" Cc: "Peter Zijlstra" Cc: "Ravi V Shankar" Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua...@intel.com --- arch/x86/kernel/cpu/umwait.c | 118 --- 1 file changed, 110 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c index 0a113c731df3..56149d630e35 100644 --- a/arch/x86/kernel/cpu/umwait.c +++ b/arch/x86/kernel/cpu/umwait.c @@ -7,8 +7,8 @@ #define UMWAIT_C02_ENABLE 0 -#define UMWAIT_CTRL_VAL(maxtime, c02_disable) \ - (((maxtime) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) | \ +#define UMWAIT_CTRL_VAL(max_time, c02_disable) \ + (((max_time) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) | \ ((c02_disable) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE)) /* @@ -17,10 +17,38 @@ */ static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE); -/* Set IA32_UMWAIT_CONTROL MSR on this CPU to the current global setting. */ +/* + * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in + * the sysfs write functions. + */ +static DEFINE_MUTEX(umwait_lock); + +static void umwait_update_control_msr(void * unused) +{ + lockdep_assert_irqs_disabled(); + wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0); +} + +/* + * The CPU hotplug callback sets the control MSR to the global control + * value. + * + * Disable interrupts so the read of umwait_control_cached and the WRMSR + * are protected against a concurrent sysfs write. Otherwise the sysfs + * write could update the cached value after it had been read on this CPU + * and issue the IPI before the old value had been written. The IPI would + * interrupt, write the new value and after return from IPI the previous + * value would be written by this CPU. + * + * With interrupts disabled the upcoming CPU either sees the new control + * value or the IPI is updating this CPU to the new control value after + * interrupts have been reenabled. + */ static int umwait_cpu_online(unsigned int cpu) { - wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0); + local_irq_disable(); + umwait_update_control_msr(NULL); + local_irq_enable(); return 0; } @@ -36,15 +64,86 @@ static int umwait_cpu_online(unsigned int cpu) */ static void umwait_syscore_resume(void) { - wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0); + umwait_update_control_msr(NULL); } static struct syscore_ops umwait_syscore_ops = { .resume = umwait_syscore_resume, }; +/* sysfs interface */ + +/* + * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled. + * Otherwise, C0.2 is enabled. + */ +static inline bool umwait_ctrl_c02_enabled(u32 ctrl) +{ + return !(ctrl & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE); +} + +static inline u32 umwait_ctrl_max_time(u32 ctrl) +{ + return ctrl & MSR_IA32_UMWAIT_CONTROL_TIME_MASK; +} + +static inline void umwait_update_control(u32 maxtime, bool c02_enable) +{ + u32 ctrl = maxtime & MSR_IA32_UMWAIT_CONTROL_TIME_MASK; + + if (!c02_enable) + ctrl |= MSR_IA32_UMWAIT_CONTROL_C02_DISABLE; + + WRITE_ONCE(umwait_control_cached, ctrl); + /* Propagate to all CPUs */ + on_each_cpu(umwait_update_control_msr, NULL, 1); +} + +static ssize_t +enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + u32 ctrl =
[tip:x86/cpu] Documentation/ABI: Document umwait control sysfs interfaces
Commit-ID: 203dffacf592317e54480704f569a09f8b7ca380 Gitweb: https://git.kernel.org/tip/203dffacf592317e54480704f569a09f8b7ca380 Author: Fenghua Yu AuthorDate: Wed, 19 Jun 2019 18:33:58 -0700 Committer: Thomas Gleixner CommitDate: Mon, 24 Jun 2019 01:44:35 +0200 Documentation/ABI: Document umwait control sysfs interfaces Since two new sysfs interface files are created for umwait control, add an ABI document entry for the files: /sys/devices/system/cpu/umwait_control/enable_c02 /sys/devices/system/cpu/umwait_control/max_time [ tglx: Made the write value instructions readable ] Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Cc: "Borislav Petkov" Cc: "H Peter Anvin" Cc: "Andy Lutomirski" Cc: "Peter Zijlstra" Cc: "Tony Luck" Cc: "Ravi V Shankar" Link: https://lkml.kernel.org/r/1560994438-235698-6-git-send-email-fenghua...@intel.com --- Documentation/ABI/testing/sysfs-devices-system-cpu | 23 ++ 1 file changed, 23 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 1528239f69b2..923fe2001472 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -538,3 +538,26 @@ Description: Intel Energy and Performance Bias Hint (EPB) This attribute is present for all online CPUs supporting the Intel EPB feature. + +What: /sys/devices/system/cpu/umwait_control + /sys/devices/system/cpu/umwait_control/enable_c02 + /sys/devices/system/cpu/umwait_control/max_time +Date: May 2019 +Contact: Linux kernel mailing list +Description: Umwait control + + enable_c02: Read/write interface to control umwait C0.2 state + Read returns C0.2 state status: + 0: C0.2 is disabled + 1: C0.2 is enabled + + Write 'y' or '1' or 'on' to enable C0.2 state. + Write 'n' or '0' or 'off' to disable C0.2 state. + + The interface is case insensitive. + + max_time: Read/write interface to control umwait maximum time + in TSC-quanta that the CPU can reside in either C0.1 + or C0.2 state. The time is an unsigned 32-bit number. + Note that a value of zero means there is no limit. + Low order two bits must be zero.
[tip:x86/cpu] x86/umwait: Initialize umwait control values
Commit-ID: bd688c69b7e6693de3bd78f38fd63f7850c2711e Gitweb: https://git.kernel.org/tip/bd688c69b7e6693de3bd78f38fd63f7850c2711e Author: Fenghua Yu AuthorDate: Wed, 19 Jun 2019 18:33:55 -0700 Committer: Thomas Gleixner CommitDate: Mon, 24 Jun 2019 01:44:19 +0200 x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 10 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Andy Lutomirski Cc: "Borislav Petkov" Cc: "H Peter Anvin" Cc: "Peter Zijlstra" Cc: "Tony Luck" Cc: "Ravi V Shankar" Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua...@intel.com --- arch/x86/include/asm/msr-index.h | 9 ++ arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/umwait.c | 62 3 files changed, 72 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 979ef971cc78..6b4fc2788078 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -61,6 +61,15 @@ #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT 31 #define MSR_PLATFORM_INFO_CPUID_FAULT BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT) +#define MSR_IA32_UMWAIT_CONTROL0xe1 +#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLEBIT(0) +#define MSR_IA32_UMWAIT_CONTROL_RESERVED BIT(1) +/* + * The time field is bit[31:2], but representing a 32bit value with + * bit[1:0] zero. + */ +#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U) + #define MSR_PKG_CST_CONFIG_CONTROL 0x00e2 #define NHM_C3_AUTO_DEMOTE (1UL << 25) #define NHM_C1_AUTO_DEMOTE (1UL << 26) diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index a7d9a4cb3ab6..4b4eb06e117c 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -24,6 +24,7 @@ obj-y += match.o obj-y += bugs.o obj-y += aperfmperf.o obj-y += cpuid-deps.o +obj-y += umwait.o obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c new file mode 100644 index ..0a113c731df3 --- /dev/null +++ b/arch/x86/kernel/cpu/umwait.c @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include + +#define UMWAIT_C02_ENABLE 0 + +#define UMWAIT_CTRL_VAL(maxtime, c02_disable) \ + (((maxtime) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) | \ + ((c02_disable) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE)) + +/* + * Cache IA32_UMWAIT_CONTROL MSR. This is a systemwide control. By default, + * umwait max time is 10 in TSC-quanta and C0.2 is enabled + */ +static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE); + +/* Set IA32_UMWAIT_CONTROL MSR on this CPU to the current global setting. */ +static int umwait_cpu_online(unsigned int cpu) +{ + wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0); + return 0; +} + +/* + * On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which + * is the only active CPU at this time. The MSR is set up on the APs via the + * CPU hotplug callback. + * + * This function is invoked on resume from suspend and hibernation. On + * resume from suspend the restore should be not required, but we neither + * trust the firmware nor does it matter if the same value is written + * again. + */ +static void umwait_syscore_resume(void) +{ + wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0); +} + +static struct syscore_ops
[tip:x86/cpu] x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions
Commit-ID: b302e4b176d00e1cbc80148c5d0aee36751f7480 Gitweb: https://git.kernel.org/tip/b302e4b176d00e1cbc80148c5d0aee36751f7480 Author: Fenghua Yu AuthorDate: Mon, 17 Jun 2019 11:00:16 -0700 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 12:38:49 +0200 x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point format (BF16) for deep learning optimization. BF16 is a short version of 32-bit single-precision floating-point format (FP32) and has several advantages over 16-bit half-precision floating-point format (FP16). BF16 keeps FP32 accumulation after multiplication without loss of precision, offers more than enough range for deep learning training tasks, and doesn't need to handle hardware exception. AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5] AVX512_BF16. CPUID.7.1:EAX contains only feature bits. Reuse the currently empty word 12 as a pure features word to hold the feature bits including AVX512_BF16. Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ] Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Cc: "Chang S. Bae" Cc: Frederic Weisbecker Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jann Horn Cc: Masahiro Yamada Cc: Michael Ellerman Cc: Nadav Amit Cc: Paolo Bonzini Cc: Pavel Tatashin Cc: Peter Feiner Cc: Radim Krcmar Cc: "Rafael J. Wysocki" Cc: "Ravi V Shankar" Cc: Robert Hoo Cc: "Sean J Christopherson" Cc: Thomas Gleixner Cc: Thomas Lendacky Cc: x86 Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua...@intel.com --- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/cpufeatures.h | 3 +++ arch/x86/kernel/cpu/common.c | 6 ++ arch/x86/kernel/cpu/cpuid-deps.c | 1 + 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 403f70c2e431..58acda503817 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -23,7 +23,7 @@ enum cpuid_leafs CPUID_7_0_EBX, CPUID_D_1_EAX, CPUID_LNX_4, - CPUID_DUMMY, + CPUID_7_1_EAX, CPUID_8000_0008_EBX, CPUID_6_EAX, CPUID_8000_000A_EDX, diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index be858b86023a..8ecd9fac97c3 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -282,6 +282,9 @@ #define X86_FEATURE_CQM_MBM_TOTAL (11*32+ 2) /* LLC Total MBM monitoring */ #define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */ +/* Intel-defined CPU features, CPUID level 0x0007:1 (EAX), word 12 */ +#define X86_FEATURE_AVX512_BF16(12*32+ 5) /* AVX512 BFLOAT16 instructions */ + /* AMD-defined CPU features, CPUID level 0x8008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ #define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index efb114298cfb..dad20bc891d5 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -847,6 +847,12 @@ void get_cpu_cap(struct cpuinfo_x86 *c) c->x86_capability[CPUID_7_0_EBX] = ebx; c->x86_capability[CPUID_7_ECX] = ecx; c->x86_capability[CPUID_7_EDX] = edx; + + /* Check valid sub-leaf index before accessing it */ + if (eax >= 1) { + cpuid_count(0x0007, 1, &eax, &ebx, &ecx, &edx); + c->x86_capability[CPUID_7_1_EAX] = eax; + } } /* Extended state features: level 0x000d */ diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index fa07a224e7b9..a444028d8145 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -62,6 +62,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_CQM_OCCUP_LLC,X86_FEATURE_CQM_LLC }, { X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC }, { X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC }, + { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, {} };
[tip:x86/cpu] x86/cpufeatures: Combine word 11 and 12 into a new scattered features word
Commit-ID: acec0ce081de0c36459eea91647faf99296445a3 Gitweb: https://git.kernel.org/tip/acec0ce081de0c36459eea91647faf99296445a3 Author: Fenghua Yu AuthorDate: Wed, 19 Jun 2019 18:51:09 +0200 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 12:38:44 +0200 x86/cpufeatures: Combine word 11 and 12 into a new scattered features word It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two whole feature bits words. To better utilize feature words, re-define word 11 to host scattered features and move the four X86_FEATURE_CQM_* features into Linux defined word 11. More scattered features can be added in word 11 in the future. Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a Linux-defined leaf. Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful name in the next patch when CPUID.7.1:EAX occupies world 12. Maximum number of RMID and cache occupancy scale are retrieved from CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the code into a separate function. KVM doesn't support resctrl now. So it's safe to move the X86_FEATURE_CQM_* features to scattered features word 11 for KVM. Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Cc: Aaron Lewis Cc: Andy Lutomirski Cc: Babu Moger Cc: "Chang S. Bae" Cc: "Sean J Christopherson" Cc: Frederic Weisbecker Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jann Horn Cc: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: kvm ML Cc: Masahiro Yamada Cc: Masami Hiramatsu Cc: Nadav Amit Cc: Paolo Bonzini Cc: Pavel Tatashin Cc: Peter Feiner Cc: "Peter Zijlstra (Intel)" Cc: "Radim Krčmář" Cc: "Rafael J. Wysocki" Cc: Ravi V Shankar Cc: Sherry Hurwitz Cc: Thomas Gleixner Cc: Thomas Lendacky Cc: x86 Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua...@intel.com --- arch/x86/include/asm/cpufeature.h | 4 ++-- arch/x86/include/asm/cpufeatures.h | 17 ++--- arch/x86/kernel/cpu/common.c | 38 +++--- arch/x86/kernel/cpu/cpuid-deps.c | 3 +++ arch/x86/kernel/cpu/scattered.c| 4 arch/x86/kvm/cpuid.h | 2 -- 6 files changed, 34 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 1d337c51f7e6..403f70c2e431 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -22,8 +22,8 @@ enum cpuid_leafs CPUID_LNX_3, CPUID_7_0_EBX, CPUID_D_1_EAX, - CPUID_F_0_EDX, - CPUID_F_1_EDX, + CPUID_LNX_4, + CPUID_DUMMY, CPUID_8000_0008_EBX, CPUID_6_EAX, CPUID_8000_000A_EDX, diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 1017b9c7dfe0..be858b86023a 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -271,13 +271,16 @@ #define X86_FEATURE_XGETBV1(10*32+ 2) /* XGETBV with ECX = 1 instruction */ #define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */ -/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x000F:0 (EDX), word 11 */ -#define X86_FEATURE_CQM_LLC(11*32+ 1) /* LLC QoS if 1 */ - -/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x000F:1 (EDX), word 12 */ -#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring */ -#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ -#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */ +/* + * Extended auxiliary flags: Linux defined - for features scattered in various + * CPUID levels like 0xf, etc. + * + * Reuse free bits when adding new feature flags! + */ +#define X86_FEATURE_CQM_LLC(11*32+ 0) /* LLC QoS if 1 */ +#define X86_FEATURE_CQM_OCCUP_LLC (11*32+ 1) /* LLC occupancy monitoring */ +#define X86_FEATURE_CQM_MBM_TOTAL (11*32+ 2) /* LLC Total MBM monitoring */ +#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */ /* AMD-defined CPU features, CPUID level 0x8008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index fe6ed9696467..efb114298cfb 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -803,33 +803,25 @@ static void init_speculation_control(struct cpuinfo_x86 *c) static void init_cqm(struct cpuinfo_x86 *c) { - u32 eax, ebx, ecx, edx; - - /* Additional Intel-defined flags: level 0x000F */ - if (c->cpuid_level >= 0x000F) { + if (!cpu_has(c, X86_FEATURE_CQM_LLC)) { + c->x86_cache_max_rmid = -1; + c->x86_cache_occ_scale = -1; + return; + } - /* QoS sub-leaf, EAX=0Fh, ECX=0 */ - cpuid_count(0x000F, 0, &eax, &ebx, &ecx, &edx); - c->x86_c
[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIR64B instruction
Commit-ID: ace6485a03266cc3c198ce8e927a1ce0ce139699 Gitweb: https://git.kernel.org/tip/ace6485a03266cc3c198ce8e927a1ce0ce139699 Author: Fenghua Yu AuthorDate: Wed, 24 Oct 2018 14:57:17 -0700 Committer: Ingo Molnar CommitDate: Thu, 25 Oct 2018 07:42:48 +0200 x86/cpufeatures: Enumerate MOVDIR64B instruction MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity. Direct store is implemented by using write combining (WC) for writing data directly into memory without caching the data. In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes work descriptors (and data in some cases) to device-hosted work-queues atomically without cache pollution. Availability of the MOVDIR64B instruction is indicated by the presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIR64B flag. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Ashok Raj Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Ravi V Shankar Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1540418237-125817-3-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 90934ee7b79a..28c4a502b419 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -332,6 +332,7 @@ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */ +#define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIRI instruction
Commit-ID: 33823f4d63f7a010653d219800539409a78ef4be Gitweb: https://git.kernel.org/tip/33823f4d63f7a010653d219800539409a78ef4be Author: Fenghua Yu AuthorDate: Wed, 24 Oct 2018 14:57:16 -0700 Committer: Ingo Molnar CommitDate: Thu, 25 Oct 2018 07:42:48 +0200 x86/cpufeatures: Enumerate MOVDIRI instruction MOVDIRI moves doubleword or quadword from register to memory through direct store which is implemented by using write combining (WC) for writing data directly into memory without caching the data. Programmable agents can handle streaming offload (e.g. high speed packet processing in network). Hardware implements a doorbell (tail pointer) register that is updated by software when adding new work-elements to the streaming offload work-queue. MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte uncachable write to MMIO. MOVDIRI has lower overhead than other ways to write the doorbell. Availability of the MOVDIRI instruction is indicated by the presence of the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIRI flag. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Ashok Raj Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Ravi V Shankar Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1540418237-125817-2-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 89a048c2faec..90934ee7b79a 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -331,6 +331,7 @@ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ +#define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
[tip:x86/urgent] x86/intel_rdt: Add Reinette as co-maintainer for RDT
Commit-ID: a8b3bb338e4ee4cc84a2b9a6fdf27049b84baa59 Gitweb: https://git.kernel.org/tip/a8b3bb338e4ee4cc84a2b9a6fdf27049b84baa59 Author: Fenghua Yu AuthorDate: Thu, 20 Sep 2018 12:37:08 -0700 Committer: Thomas Gleixner CommitDate: Thu, 20 Sep 2018 21:44:35 +0200 x86/intel_rdt: Add Reinette as co-maintainer for RDT Reinette Chatre is doing great job on enabling pseudo-locking and other features in RDT. Add her as co-maintainer for RDT. Suggested-by: Thomas Gleixner Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Acked-by: Ingo Molnar Acked-by: Reinette Chatre Cc: "H Peter Anvin" Cc: "Tony Luck" Link: https://lkml.kernel.org/r/1537472228-221799-1-git-send-email-fenghua...@intel.com --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 091e66b60cd2..140ea6ee3ac8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12260,6 +12260,7 @@ F: Documentation/networking/rds.txt RDT - RESOURCE ALLOCATION M: Fenghua Yu +M: Reinette Chatre L: linux-kernel@vger.kernel.org S: Supported F: arch/x86/kernel/cpu/intel_rdt*
[tip:x86/urgent] x86/cpufeatures: Enumerate cldemote instruction
Commit-ID: 9124130573950dcfc06b6a59306edfda2fc33ec7 Gitweb: https://git.kernel.org/tip/9124130573950dcfc06b6a59306edfda2fc33ec7 Author: Fenghua Yu AuthorDate: Mon, 23 Apr 2018 11:29:22 -0700 Committer: Ingo Molnar CommitDate: Thu, 26 Apr 2018 07:31:12 +0200 x86/cpufeatures: Enumerate cldemote instruction cldemote is a new instruction in future x86 processors. It hints to hardware that a specified cache line should be moved ("demoted") from the cache(s) closest to the processor core to a level more distant from the processor core. This instruction is faster than snooping to make the cache line available for other cores. cldemote instruction is indicated by the presence of the CPUID feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]). More details on cldemote instruction can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "H. Peter Anvin" Cc: "Ashok Raj" Link: https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d554c11e01ff..578793e97431 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -320,6 +320,7 @@ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ +#define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
[tip:x86/urgent] x86/cpufeatures: Enumerate cldemote instruction
Commit-ID: ec8c7206b71d46ee50a23697933dfafec8d5c426 Gitweb: https://git.kernel.org/tip/ec8c7206b71d46ee50a23697933dfafec8d5c426 Author: Fenghua Yu AuthorDate: Mon, 23 Apr 2018 11:29:22 -0700 Committer: Thomas Gleixner CommitDate: Wed, 25 Apr 2018 10:56:24 +0200 x86/cpufeatures: Enumerate cldemote instruction cldemote is a new instruction in future x86 processors. It hints to hardware that a specified cache line should be moved ("demoted") from the cache(s) closest to the processor core to a level more distant from the processor core. This instruction is faster than snooping to make the cache line available for other cores. cldemote instruction is indicated by the presence of the CPUID feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]). More details on cldemote instruction can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "H. Peter Anvin" Cc: "Ashok Raj" Link: https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua...@intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d554c11e01ff..578793e97431 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -320,6 +320,7 @@ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ +#define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
[tip:x86/cache] x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
Commit-ID: a511e7935378ef1f321456a90beae2a2632d3d83 Gitweb: https://git.kernel.org/tip/a511e7935378ef1f321456a90beae2a2632d3d83 Author: Fenghua Yu AuthorDate: Wed, 20 Dec 2017 14:57:21 -0800 Committer: Thomas Gleixner CommitDate: Thu, 18 Jan 2018 09:33:30 +0100 x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature L2 Code and Data Prioritization (CDP) is enumerated in CPUID(EAX=0x10, ECX=0x2):ECX.bit2 Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: Vikas" Cc: Sai Praneeth" Cc: Reinette" Link: https://lkml.kernel.org/r/1513810644-78015-4-git-send-email-fenghua...@intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/scattered.c| 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 25b9375..67bbfaa 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -206,6 +206,7 @@ #define X86_FEATURE_RETPOLINE ( 7*32+12) /* Generic Retpoline mitigation for Spectre variant 2 */ #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */ #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ +#define X86_FEATURE_CDP_L2 ( 7*32+15) /* Code and Data Prioritization L2 */ #define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */ diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c index d0e6976..df4d8f7 100644 --- a/arch/x86/kernel/cpu/scattered.c +++ b/arch/x86/kernel/cpu/scattered.c @@ -26,6 +26,7 @@ static const struct cpuid_bit cpuid_bits[] = { { X86_FEATURE_CAT_L3, CPUID_EBX, 1, 0x0010, 0 }, { X86_FEATURE_CAT_L2, CPUID_EBX, 2, 0x0010, 0 }, { X86_FEATURE_CDP_L3, CPUID_ECX, 2, 0x0010, 1 }, + { X86_FEATURE_CDP_L2, CPUID_ECX, 2, 0x0010, 2 }, { X86_FEATURE_MBA, CPUID_EBX, 3, 0x0010, 0 }, { X86_FEATURE_HW_PSTATE,CPUID_EDX, 7, 0x8007, 0 }, { X86_FEATURE_CPB, CPUID_EDX, 9, 0x8007, 0 },
[tip:x86/cache] x86/intel_rdt: Add L2CDP support in documentation
Commit-ID: aa55d5a4bd919f26fce519c470d11a58541c6aa7 Gitweb: https://git.kernel.org/tip/aa55d5a4bd919f26fce519c470d11a58541c6aa7 Author: Fenghua Yu AuthorDate: Wed, 20 Dec 2017 14:57:20 -0800 Committer: Thomas Gleixner CommitDate: Thu, 18 Jan 2018 09:33:30 +0100 x86/intel_rdt: Add L2CDP support in documentation L2 and L3 Code and Data Prioritization (CDP) can be enabled separately. The existing mount parameter "cdp" is only for enabling L3 CDP and will be kept for backwards compability. Add a new mount parameter 'cdpl2' for L2 CDP. [ tglx: Made changelog readable ] Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: Vikas" Cc: Sai Praneeth" Cc: Reinette" Link: https://lkml.kernel.org/r/1513810644-78015-3-git-send-email-fenghua...@intel.com --- Documentation/x86/intel_rdt_ui.txt | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index 1ad77b1..756fd76 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -10,18 +10,21 @@ This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the X86 /proc/cpuinfo flag bits: RDT (Resource Director Technology) Allocation - "rdt_a" CAT (Cache Allocation Technology) - "cat_l3", "cat_l2" -CDP (Code and Data Prioritization ) - "cdp_l3" +CDP (Code and Data Prioritization ) - "cdp_l3", "cdp_l2" CQM (Cache QoS Monitoring) - "cqm_llc", "cqm_occup_llc" MBM (Memory Bandwidth Monitoring) - "cqm_mbm_total", "cqm_mbm_local" MBA (Memory Bandwidth Allocation) - "mba" To use the feature mount the file system: - # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl + # mount -t resctrl resctrl [-o cdp[,cdpl2]] /sys/fs/resctrl mount options are: "cdp": Enable code/data prioritization in L3 cache allocations. +"cdpl2": Enable code/data prioritization in L2 cache allocations. + +L2 and L3 CDP are controlled seperately. RDT features are orthogonal. A particular system may support only monitoring, only control, or both monitoring and control.
[tip:x86/cache] x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
Commit-ID: 99adde9b370de8e07ef76630c6f60dbf586cdf0e Gitweb: https://git.kernel.org/tip/99adde9b370de8e07ef76630c6f60dbf586cdf0e Author: Fenghua Yu AuthorDate: Wed, 20 Dec 2017 14:57:23 -0800 Committer: Thomas Gleixner CommitDate: Thu, 18 Jan 2018 09:33:31 +0100 x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG Bit 0 in MSR IA32_L2_QOS_CFG (0xc82) is L2 CDP enable bit. By default, the bit is zero, i.e. L2 CAT is enabled, and L2 CDP is disabled. When the resctrl mount parameter "cdpl2" is given, the bit is set to 1 and L2 CDP is enabled. In L2 CDP mode, the L2 CAT mask MSRs are re-mapped into interleaved pairs of mask MSRs for code (referenced by an odd CLOSID) and data (referenced by an even CLOSID). Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: Vikas" Cc: Sai Praneeth" Cc: Reinette" Link: https://lkml.kernel.org/r/1513810644-78015-6-git-send-email-fenghua...@intel.com --- arch/x86/kernel/cpu/intel_rdt.h | 3 + arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 117 --- 2 files changed, 94 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h index 19ffc5a..3fd7a70 100644 --- a/arch/x86/kernel/cpu/intel_rdt.h +++ b/arch/x86/kernel/cpu/intel_rdt.h @@ -7,12 +7,15 @@ #include #define IA32_L3_QOS_CFG0xc81 +#define IA32_L2_QOS_CFG0xc82 #define IA32_L3_CBM_BASE 0xc90 #define IA32_L2_CBM_BASE 0xd10 #define IA32_MBA_THRTL_BASE0xd50 #define L3_QOS_CDP_ENABLE 0x01ULL +#define L2_QOS_CDP_ENABLE 0x01ULL + /* * Event IDs are used to program IA32_QM_EVTSEL before reading event * counter from IA32_QM_CTR diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 64c5ff9..bdab7d2 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -990,6 +990,7 @@ out_destroy: kernfs_remove(kn); return ret; } + static void l3_qos_cfg_update(void *arg) { bool *enable = arg; @@ -997,8 +998,17 @@ static void l3_qos_cfg_update(void *arg) wrmsrl(IA32_L3_QOS_CFG, *enable ? L3_QOS_CDP_ENABLE : 0ULL); } -static int set_l3_qos_cfg(struct rdt_resource *r, bool enable) +static void l2_qos_cfg_update(void *arg) { + bool *enable = arg; + + wrmsrl(IA32_L2_QOS_CFG, *enable ? L2_QOS_CDP_ENABLE : 0ULL); +} + +static int set_cache_qos_cfg(int level, bool enable) +{ + void (*update)(void *arg); + struct rdt_resource *r_l; cpumask_var_t cpu_mask; struct rdt_domain *d; int cpu; @@ -1006,16 +1016,24 @@ static int set_l3_qos_cfg(struct rdt_resource *r, bool enable) if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) return -ENOMEM; - list_for_each_entry(d, &r->domains, list) { + if (level == RDT_RESOURCE_L3) + update = l3_qos_cfg_update; + else if (level == RDT_RESOURCE_L2) + update = l2_qos_cfg_update; + else + return -EINVAL; + + r_l = &rdt_resources_all[level]; + list_for_each_entry(d, &r_l->domains, list) { /* Pick one CPU from each domain instance to update MSR */ cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask); } cpu = get_cpu(); /* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */ if (cpumask_test_cpu(cpu, cpu_mask)) - l3_qos_cfg_update(&enable); + update(&enable); /* Update QOS_CFG MSR on all other cpus in cpu_mask. */ - smp_call_function_many(cpu_mask, l3_qos_cfg_update, &enable, 1); + smp_call_function_many(cpu_mask, update, &enable, 1); put_cpu(); free_cpumask_var(cpu_mask); @@ -1023,52 +1041,99 @@ static int set_l3_qos_cfg(struct rdt_resource *r, bool enable) return 0; } -static int cdp_enable(void) +static int cdp_enable(int level, int data_type, int code_type) { - struct rdt_resource *r_l3data = &rdt_resources_all[RDT_RESOURCE_L3DATA]; - struct rdt_resource *r_l3code = &rdt_resources_all[RDT_RESOURCE_L3CODE]; - struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3]; + struct rdt_resource *r_ldata = &rdt_resources_all[data_type]; + struct rdt_resource *r_lcode = &rdt_resources_all[code_type]; + struct rdt_resource *r_l = &rdt_resources_all[level]; int ret; - if (!r_l3->alloc_capable || !r_l3data->alloc_capable || - !r_l3code->alloc_capable) + if (!r_l->alloc_capable || !r_ldata->alloc_capable || + !r_lcode->alloc_capable) return -EINVAL; - ret = set_l3_qos_cfg(r_l3, true); + ret = set_cache_qos_cfg(level, true); if (!ret) { - r_l3->alloc_enabled = false; - r_l3data->alloc_enabled = true; - r_l3code->alloc_enabled =
[tip:x86/cache] x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
Commit-ID: def10853930a82456ab862a3a8292a3a16c386e7 Gitweb: https://git.kernel.org/tip/def10853930a82456ab862a3a8292a3a16c386e7 Author: Fenghua Yu AuthorDate: Wed, 20 Dec 2017 14:57:22 -0800 Committer: Thomas Gleixner CommitDate: Thu, 18 Jan 2018 09:33:31 +0100 x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP) L2 data and L2 code are added as new resources in rdt_resources_all[] and data in the resources are configured. When L2 CDP is enabled, the schemata will have the two resources in this format: L2DATA:l2id0=;l2id1=; L2CODE:l2id0=;l2id1=; represent CBM (Cache Bit Mask) values in the schemata, similar to all others (L2 CAT/L3 CAT/L3 CDP). Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: Vikas" Cc: Sai Praneeth" Cc: Reinette" Link: https://lkml.kernel.org/r/1513810644-78015-5-git-send-email-fenghua...@intel.com --- arch/x86/kernel/cpu/intel_rdt.c | 66 ++--- arch/x86/kernel/cpu/intel_rdt.h | 2 ++ 2 files changed, 58 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 9944237..5202da0 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -135,6 +135,40 @@ struct rdt_resource rdt_resources_all[] = { .format_str = "%d=%0*x", .fflags = RFTYPE_RES_CACHE, }, + [RDT_RESOURCE_L2DATA] = + { + .rid= RDT_RESOURCE_L2DATA, + .name = "L2DATA", + .domains= domain_init(RDT_RESOURCE_L2DATA), + .msr_base = IA32_L2_CBM_BASE, + .msr_update = cat_wrmsr, + .cache_level= 2, + .cache = { + .min_cbm_bits = 1, + .cbm_idx_mult = 2, + .cbm_idx_offset = 0, + }, + .parse_ctrlval = parse_cbm, + .format_str = "%d=%0*x", + .fflags = RFTYPE_RES_CACHE, + }, + [RDT_RESOURCE_L2CODE] = + { + .rid= RDT_RESOURCE_L2CODE, + .name = "L2CODE", + .domains= domain_init(RDT_RESOURCE_L2CODE), + .msr_base = IA32_L2_CBM_BASE, + .msr_update = cat_wrmsr, + .cache_level= 2, + .cache = { + .min_cbm_bits = 1, + .cbm_idx_mult = 2, + .cbm_idx_offset = 1, + }, + .parse_ctrlval = parse_cbm, + .format_str = "%d=%0*x", + .fflags = RFTYPE_RES_CACHE, + }, [RDT_RESOURCE_MBA] = { .rid= RDT_RESOURCE_MBA, @@ -259,15 +293,15 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r) r->alloc_enabled = true; } -static void rdt_get_cdp_l3_config(int type) +static void rdt_get_cdp_config(int level, int type) { - struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3]; + struct rdt_resource *r_l = &rdt_resources_all[level]; struct rdt_resource *r = &rdt_resources_all[type]; - r->num_closid = r_l3->num_closid / 2; - r->cache.cbm_len = r_l3->cache.cbm_len; - r->default_ctrl = r_l3->default_ctrl; - r->cache.shareable_bits = r_l3->cache.shareable_bits; + r->num_closid = r_l->num_closid / 2; + r->cache.cbm_len = r_l->cache.cbm_len; + r->default_ctrl = r_l->default_ctrl; + r->cache.shareable_bits = r_l->cache.shareable_bits; r->data_width = (r->cache.cbm_len + 3) / 4; r->alloc_capable = true; /* @@ -277,6 +311,18 @@ static void rdt_get_cdp_l3_config(int type) r->alloc_enabled = false; } +static void rdt_get_cdp_l3_config(void) +{ + rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA); + rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3CODE); +} + +static void rdt_get_cdp_l2_config(void) +{ + rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA); + rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2CODE); +} + static int get_cache_id(int cpu, int level) { struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu); @@ -729,15 +775,15 @@ static __init bool get_rdt_alloc_resources(void) if (rdt_cpu_has(X86_FEATURE_CAT_L3)) { rdt_get_cache_alloc_cfg(1, &rdt_resources_all[RDT_RESOURCE_L3]); - if (rdt_cpu_has(X86_FEATURE_CDP_L3)) { - rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA); - rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE); -
[tip:x86/cache] x86/intel_rdt: Add command line parameter to control L2_CDP
Commit-ID: 31516de306c0c9235156cdc7acb976ea21f1f646 Gitweb: https://git.kernel.org/tip/31516de306c0c9235156cdc7acb976ea21f1f646 Author: Fenghua Yu AuthorDate: Wed, 20 Dec 2017 14:57:24 -0800 Committer: Thomas Gleixner CommitDate: Thu, 18 Jan 2018 09:33:32 +0100 x86/intel_rdt: Add command line parameter to control L2_CDP L2 CDP can be controlled by kernel parameter "rdt=". If "rdt=l2cdp", L2 CDP is turned on. If "rdt=!l2cdp", L2 CDP is turned off. Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: Vikas" Cc: Sai Praneeth" Cc: Reinette" Link: https://lkml.kernel.org/r/1513810644-78015-7-git-send-email-fenghua...@intel.com --- Documentation/admin-guide/kernel-parameters.txt | 3 ++- arch/x86/kernel/cpu/intel_rdt.c | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 46b26bf..fde058c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3682,7 +3682,8 @@ rdt=[HW,X86,RDT] Turn on/off individual RDT features. List is: - cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, mba. + cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, + mba. E.g. to turn on cmt and turn off mba use: rdt=cmt,!mba diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 5202da0..410629f 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -691,6 +691,7 @@ enum { RDT_FLAG_L3_CAT, RDT_FLAG_L3_CDP, RDT_FLAG_L2_CAT, + RDT_FLAG_L2_CDP, RDT_FLAG_MBA, }; @@ -713,6 +714,7 @@ static struct rdt_options rdt_options[] __initdata = { RDT_OPT(RDT_FLAG_L3_CAT,"l3cat",X86_FEATURE_CAT_L3), RDT_OPT(RDT_FLAG_L3_CDP,"l3cdp",X86_FEATURE_CDP_L3), RDT_OPT(RDT_FLAG_L2_CAT,"l2cat",X86_FEATURE_CAT_L2), + RDT_OPT(RDT_FLAG_L2_CDP,"l2cdp",X86_FEATURE_CDP_L2), RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA), }; #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
[tip:x86/cache] x86/intel_rdt: Update documentation
Commit-ID: 0ff8e080b18d1d2dbe5c866d5f31c27ab806a785 Gitweb: https://git.kernel.org/tip/0ff8e080b18d1d2dbe5c866d5f31c27ab806a785 Author: Fenghua Yu AuthorDate: Wed, 20 Dec 2017 14:57:19 -0800 Committer: Thomas Gleixner CommitDate: Thu, 18 Jan 2018 09:33:30 +0100 x86/intel_rdt: Update documentation With more flag bits in /proc/cpuinfo for RDT, it's better to classify the bits for readability. Some previously missing bits are added as well. Signed-off-by: Fenghua Yu Signed-off-by: Thomas Gleixner Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: Vikas" Cc: Sai Praneeth" Cc: Reinette" Link: https://lkml.kernel.org/r/1513810644-78015-2-git-send-email-fenghua...@intel.com --- Documentation/x86/intel_rdt_ui.txt | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index 6851854..1ad77b1 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -7,7 +7,13 @@ Tony Luck Vikas Shivappa This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the -X86 /proc/cpuinfo flag bits "rdt", "cqm", "cat_l3" and "cdp_l3". +X86 /proc/cpuinfo flag bits: +RDT (Resource Director Technology) Allocation - "rdt_a" +CAT (Cache Allocation Technology) - "cat_l3", "cat_l2" +CDP (Code and Data Prioritization ) - "cdp_l3" +CQM (Cache QoS Monitoring) - "cqm_llc", "cqm_occup_llc" +MBM (Memory Bandwidth Monitoring) - "cqm_mbm_total", "cqm_mbm_local" +MBA (Memory Bandwidth Allocation) - "mba" To use the feature mount the file system:
[tip:x86/cache] x86/intel_rdt: Show bitmask of shareable resource with other executing units
Commit-ID: 0dd2d7494cd818d06a2ae1cd840cd62124a2d25e Gitweb: http://git.kernel.org/tip/0dd2d7494cd818d06a2ae1cd840cd62124a2d25e Author: Fenghua Yu AuthorDate: Tue, 25 Jul 2017 15:39:04 -0700 Committer: Thomas Gleixner CommitDate: Tue, 1 Aug 2017 22:41:30 +0200 x86/intel_rdt: Show bitmask of shareable resource with other executing units CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource. Each set bit within the length of the CBM indicates the corresponding unit of the resource allocation may be used by other entities in the platform (e.g. an integrated graphics engine or hardware units outside the processor core and have direct access to the resource). Each cleared bit within the length of the CBM indicates the corresponding allocation unit can be configured to implement a priority-based allocation scheme without interference with other hardware agents in the system. Bits outside the length of the CBM are reserved. More details on the bit mask are described in x86 Software Developer's Manual. The bitmask is shown in "info" directory for each resource. It's up to user to decide how to use the bitmask within a CBM in a partition to share or isolate a resource with other executing units. Suggested-by: Reinette Chatre Signed-off-by: Fenghua Yu Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Cc: ravi.v.shan...@intel.com Cc: pet...@infradead.org Cc: eran...@google.com Cc: a...@linux.intel.com Cc: davi...@google.com Cc: vikas.shiva...@linux.intel.com Link: http://lkml.kernel.org/r/20170725223904.12996-1-tony.l...@intel.com --- Documentation/x86/intel_rdt_ui.txt | 7 +++ arch/x86/kernel/cpu/intel_rdt.c | 2 ++ arch/x86/kernel/cpu/intel_rdt.h | 3 +++ arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 16 4 files changed, 28 insertions(+) diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index 76f21e2..4d8848e 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -48,6 +48,13 @@ related to allocation: "min_cbm_bits":The minimum number of consecutive bits which must be set when writing a mask. +"shareable_bits": Bitmask of shareable resource with other executing + entities (e.g. I/O). User can use this when + setting up exclusive cache partitions. Note that + some platforms support devices that have their + own settings for cache use which can over-ride + these bits. + Memory bandwitdh(MB) subdirectory contains the following files with respect to allocation: diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index de26aa7..da4f389 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -193,6 +193,7 @@ static inline bool cache_alloc_hsw_probe(void) r->num_closid = 4; r->default_ctrl = max_cbm; r->cache.cbm_len = 20; + r->cache.shareable_bits = 0xc; r->cache.min_cbm_bits = 2; r->alloc_capable = true; r->alloc_enabled = true; @@ -260,6 +261,7 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r) r->num_closid = edx.split.cos_max + 1; r->cache.cbm_len = eax.split.cbm_len + 1; r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1; + r->cache.shareable_bits = ebx & r->default_ctrl; r->data_width = (r->cache.cbm_len + 3) / 4; r->alloc_capable = true; r->alloc_enabled = true; diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h index 94e488a..4040bf1 100644 --- a/arch/x86/kernel/cpu/intel_rdt.h +++ b/arch/x86/kernel/cpu/intel_rdt.h @@ -227,12 +227,15 @@ struct msr_param { * @cbm_idx_offset:Offset of CBM index. CBM index is computed by: * closid * cbm_idx_multi + cbm_idx_offset * in a cache bit mask + * @shareable_bits:Bitmask of shareable resource with other + * executing entities */ struct rdt_cache { unsigned intcbm_len; unsigned intmin_cbm_bits; unsigned intcbm_idx_mult; unsigned intcbm_idx_offset; + unsigned intshareable_bits; }; /** diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index c24dd06..2621ae3 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -596,6 +596,15 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of, return 0; } +static int rdt_shareable_bits_show(struct kernfs_open_file *of, + struct seq_file *seq, void *v) +{ + struct rdt_resource *r = of->kn->parent->priv; + + seq_printf(seq, "%x\n", r->cache.shareable_bits); +
[tip:x86/cache] x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled
Commit-ID: 74fcdae1a7fdf30de5413ccc1eca271415d01124 Gitweb: http://git.kernel.org/tip/74fcdae1a7fdf30de5413ccc1eca271415d01124 Author: Fenghua Yu AuthorDate: Thu, 1 Dec 2016 12:55:14 -0800 Committer: Thomas Gleixner CommitDate: Fri, 2 Dec 2016 01:13:02 +0100 x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled intel_rdt_sched_in() must be called with preemption disabled because the function accesses percpu variables (pqr_state and closid). If a task moves itself via move_myself() preemption is enabled, which violates the calling convention and can result in incorrect closid selection when the task gets preempted or migrated. Add the required protection and a comment about the calling convention. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Marcelo Tosatti" Cc: "Sai Prakhya" Cc: "Vikas Shivappa" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1480625714-54246-1-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/intel_rdt.h | 2 ++ arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 6e90e87..95ce5c8 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -192,6 +192,8 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of, * resctrl file system. * - Caches the per cpu CLOSid values and does the MSR write only * when a task with a different CLOSid is scheduled in. + * + * Must be called with preemption disabled. */ static inline void intel_rdt_sched_in(void) { diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index fb8e03e..1afd3f3 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -326,8 +326,10 @@ static void move_myself(struct callback_head *head) kfree(rdtgrp); } + preempt_disable(); /* update PQR_ASSOC MSR to make resource group go into effect */ intel_rdt_sched_in(); + preempt_enable(); kfree(callback); }
[tip:x86/cache] x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount
Commit-ID: 0efc89be9471b152599d2db7eb47de8e0d71c59f Gitweb: http://git.kernel.org/tip/0efc89be9471b152599d2db7eb47de8e0d71c59f Author: Fenghua Yu AuthorDate: Fri, 18 Nov 2016 15:18:04 -0800 Committer: Thomas Gleixner CommitDate: Mon, 28 Nov 2016 11:07:50 +0100 x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount When removing a sub directory/rdtgroup by rmdir or umount, closid in a task in the sub directory is set to default rdtgroup's closid which is 0. If the task is running on a CPU, the PQR_ASSOC MSR is only updated when the task runs through a context switch. Up to the context switch, the task runs with the wrong closid. Make the change immediately effective by invoking a smp function call on all CPUs which are running moved task. If one of the affected tasks was moved or scheduled out before the function call is executed on the CPU the only damage is the extra interruption of the CPU. [ tglx: Reworked it to avoid blindly interrupting all CPUs and extra loops ] Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Sai Prakhya" Cc: "Vikas Shivappa" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1479511084-59727-2-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 113 +++ 1 file changed, 83 insertions(+), 30 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index eccea8a..fb8e03e 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -194,12 +194,13 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of, /* * This is safe against intel_rdt_sched_in() called from __switch_to() * because __switch_to() is executed with interrupts disabled. A local call - * from rdt_update_percpu_closid() is proteced against __switch_to() because + * from rdt_update_closid() is proteced against __switch_to() because * preemption is disabled. */ -static void rdt_update_cpu_closid(void *v) +static void rdt_update_cpu_closid(void *closid) { - this_cpu_write(cpu_closid, *(int *)v); + if (closid) + this_cpu_write(cpu_closid, *(int *)closid); /* * We cannot unconditionally write the MSR because the current * executing task might have its own closid selected. Just reuse @@ -208,14 +209,23 @@ static void rdt_update_cpu_closid(void *v) intel_rdt_sched_in(); } -/* Update the per cpu closid and eventually the PGR_ASSOC MSR */ -static void rdt_update_percpu_closid(const struct cpumask *cpu_mask, int closid) +/* + * Update the PGR_ASSOC MSR on all cpus in @cpu_mask, + * + * Per task closids must have been set up before calling this function. + * + * The per cpu closids are updated with the smp function call, when @closid + * is not NULL. If @closid is NULL then all affected percpu closids must + * have been set up before calling this function. + */ +static void +rdt_update_closid(const struct cpumask *cpu_mask, int *closid) { int cpu = get_cpu(); if (cpumask_test_cpu(cpu, cpu_mask)) - rdt_update_cpu_closid(&closid); - smp_call_function_many(cpu_mask, rdt_update_cpu_closid, &closid, 1); + rdt_update_cpu_closid(closid); + smp_call_function_many(cpu_mask, rdt_update_cpu_closid, closid, 1); put_cpu(); } @@ -264,7 +274,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, /* Give any dropped cpus to rdtgroup_default */ cpumask_or(&rdtgroup_default.cpu_mask, &rdtgroup_default.cpu_mask, tmpmask); - rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid); + rdt_update_closid(tmpmask, &rdtgroup_default.closid); } /* @@ -278,7 +288,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, continue; cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask); } - rdt_update_percpu_closid(tmpmask, rdtgrp->closid); + rdt_update_closid(tmpmask, &rdtgrp->closid); } /* Done pushing/pulling - update this group with new mask */ @@ -807,18 +817,49 @@ static int reset_all_cbms(struct rdt_resource *r) } /* - * Forcibly remove all of subdirectories under root. + * Move tasks from one to the other group. If @from is NULL, then all tasks + * in the systems are moved unconditionally (used for teardown). + * + * If @mask is not NULL the cpus on which moved tasks are running are set + * in that mask so the update smp function call is restricted to affected + * cpus. */ -static void rmdir_all_sub(void) +static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to, +struct cpumask *mask) { - struct rdtgroup *rdtgrp, *tmp; struct task_struct *p,
[tip:x86/cache] x86/intel_rdt: Fix setting of closid when adding CPUs to a group
Commit-ID: 2659f46da8307871989f475accdcdfc4807e9e6c Gitweb: http://git.kernel.org/tip/2659f46da8307871989f475accdcdfc4807e9e6c Author: Fenghua Yu AuthorDate: Fri, 18 Nov 2016 15:18:03 -0800 Committer: Thomas Gleixner CommitDate: Mon, 28 Nov 2016 11:07:50 +0100 x86/intel_rdt: Fix setting of closid when adding CPUs to a group There was a cut & paste error when adding code to update the per-cpu closid when changing the bitmask of CPUs to an rdt group. The update erronously assigns the closid of the default group to the CPUs which are moved to a group instead of assigning the closid of their new group. Use the proper closid. Fixes: f410770293a1 ("x86/intel_rdt: Update percpu closid immeditately on CPUs affected by change") Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Sai Prakhya" Cc: "Vikas Shivappa" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1479511084-59727-1-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 98edba4..eccea8a 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -278,7 +278,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, continue; cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask); } - rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid); + rdt_update_percpu_closid(tmpmask, rdtgrp->closid); } /* Done pushing/pulling - update this group with new mask */
[tip:x86/cache] x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
Commit-ID: f410770293a1fbc08906474c24104a7a11943eb6 Gitweb: http://git.kernel.org/tip/f410770293a1fbc08906474c24104a7a11943eb6 Author: Fenghua Yu AuthorDate: Fri, 11 Nov 2016 17:02:38 -0800 Committer: Thomas Gleixner CommitDate: Tue, 15 Nov 2016 18:35:50 +0100 x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee If CPUs are moved to or removed from a rdtgroup, the percpu closid storage is updated. If tasks running on an affected CPU use the percpu closid then the PQR_ASSOC MSR is only updated when the task runs through a context switch. Up to the context switch the CPUs operate on the wrong closid. This state is potentially unbound. Make the change immediately effective by invoking a smp function call on the affected CPUs which stores the new closid in the perpu storage and calls the rdt_sched_in() function which updates the MSR, if the current task uses the percpu closid. [ tglx: Made it work and massaged changelog once more ] Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Sai Prakhya" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1478912558-55514-3-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 72 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index d6bad09..98edba4 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -191,12 +191,40 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of, return ret; } +/* + * This is safe against intel_rdt_sched_in() called from __switch_to() + * because __switch_to() is executed with interrupts disabled. A local call + * from rdt_update_percpu_closid() is proteced against __switch_to() because + * preemption is disabled. + */ +static void rdt_update_cpu_closid(void *v) +{ + this_cpu_write(cpu_closid, *(int *)v); + /* +* We cannot unconditionally write the MSR because the current +* executing task might have its own closid selected. Just reuse +* the context switch code. +*/ + intel_rdt_sched_in(); +} + +/* Update the per cpu closid and eventually the PGR_ASSOC MSR */ +static void rdt_update_percpu_closid(const struct cpumask *cpu_mask, int closid) +{ + int cpu = get_cpu(); + + if (cpumask_test_cpu(cpu, cpu_mask)) + rdt_update_cpu_closid(&closid); + smp_call_function_many(cpu_mask, rdt_update_cpu_closid, &closid, 1); + put_cpu(); +} + static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { cpumask_var_t tmpmask, newmask; struct rdtgroup *rdtgrp, *r; - int ret, cpu; + int ret; if (!buf) return -EINVAL; @@ -236,8 +264,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, /* Give any dropped cpus to rdtgroup_default */ cpumask_or(&rdtgroup_default.cpu_mask, &rdtgroup_default.cpu_mask, tmpmask); - for_each_cpu(cpu, tmpmask) - per_cpu(cpu_closid, cpu) = 0; + rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid); } /* @@ -251,8 +278,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, continue; cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask); } - for_each_cpu(cpu, tmpmask) - per_cpu(cpu_closid, cpu) = rdtgrp->closid; + rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid); } /* Done pushing/pulling - update this group with new mask */ @@ -781,25 +807,12 @@ static int reset_all_cbms(struct rdt_resource *r) } /* - * MSR_IA32_PQR_ASSOC is scoped per logical CPU, so all updates - * are always in thread context. - */ -static void rdt_reset_pqr_assoc_closid(void *v) -{ - struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); - - state->closid = 0; - wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0); -} - -/* * Forcibly remove all of subdirectories under root. */ static void rmdir_all_sub(void) { struct rdtgroup *rdtgrp, *tmp; struct task_struct *p, *t; - int cpu; /* move all tasks to default resource group */ read_lock(&tasklist_lock); @@ -807,14 +820,6 @@ static void rmdir_all_sub(void) t->closid = 0; read_unlock(&tasklist_lock); - get_cpu(); - /* Reset PQR_ASSOC MSR on this cpu. */ - rdt_reset_pqr_assoc_closid(NULL); - /* Reset PQR_ASSOC MSR on the rest of cpus. */ - smp_call_function_many(cpu_online_mask, rdt_reset_pqr_assoc_closid, -
[tip:x86/cache] x86/intel_rdt: Protect info directory from removal
Commit-ID: f57b308728902d9ffade53466e9201e999a870e4 Gitweb: http://git.kernel.org/tip/f57b308728902d9ffade53466e9201e999a870e4 Author: Fenghua Yu AuthorDate: Fri, 11 Nov 2016 17:02:36 -0800 Committer: Thomas Gleixner CommitDate: Tue, 15 Nov 2016 18:35:49 +0100 x86/intel_rdt: Protect info directory from removal The info directory and the per-resource subdirectories of the info directory have no reference to a struct rdtgroup in kn->priv. An attempt to remove one of those directories results in a NULL pointer dereference. Protect the directories from removal and return -EPERM instead of -ENOENT. [ tglx: Massaged changelog ] Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Sai Prakhya" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1478912558-55514-1-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 4795880..cff286e 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -644,16 +644,29 @@ static int parse_rdtgroupfs_options(char *data) */ static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn) { - if (kernfs_type(kn) == KERNFS_DIR) - return kn->priv; - else + if (kernfs_type(kn) == KERNFS_DIR) { + /* +* All the resource directories use "kn->priv" +* to point to the "struct rdtgroup" for the +* resource. "info" and its subdirectories don't +* have rdtgroup structures, so return NULL here. +*/ + if (kn == kn_info || kn->parent == kn_info) + return NULL; + else + return kn->priv; + } else { return kn->parent->priv; + } } struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn) { struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn); + if (!rdtgrp) + return NULL; + atomic_inc(&rdtgrp->waitcount); kernfs_break_active_protection(kn); @@ -670,6 +683,9 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn) { struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn); + if (!rdtgrp) + return; + mutex_unlock(&rdtgroup_mutex); if (atomic_dec_and_test(&rdtgrp->waitcount) && @@ -918,7 +934,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) rdtgrp = rdtgroup_kn_lock_live(kn); if (!rdtgrp) { rdtgroup_kn_unlock(kn); - return -ENOENT; + return -EPERM; } /* Give any tasks back to the default group */
[tip:x86/cache] x86/intel_rdt: Reset per cpu closids on unmount
Commit-ID: c7cc0cc10cdecc275211c8749defba6c41aaf5de Gitweb: http://git.kernel.org/tip/c7cc0cc10cdecc275211c8749defba6c41aaf5de Author: Fenghua Yu AuthorDate: Fri, 11 Nov 2016 17:02:37 -0800 Committer: Thomas Gleixner CommitDate: Tue, 15 Nov 2016 18:35:50 +0100 x86/intel_rdt: Reset per cpu closids on unmount All CPUs in a rdtgroup are given back to the default rdtgroup before the rdtgroup is removed during umount. After umount, the default rdtgroup contains all online CPUs, but the per cpu closids are not cleared. As a result the stale closid value will be used immediately after the next mount. Move all cpus to the default group and update the percpu closid storage. [ tglx: Massaged changelong ] Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Sai Prakhya" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1478912558-55514-2-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 16 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 2f54931..d6bad09 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -799,6 +799,7 @@ static void rmdir_all_sub(void) { struct rdtgroup *rdtgrp, *tmp; struct task_struct *p, *t; + int cpu; /* move all tasks to default resource group */ read_lock(&tasklist_lock); @@ -813,14 +814,29 @@ static void rmdir_all_sub(void) smp_call_function_many(cpu_online_mask, rdt_reset_pqr_assoc_closid, NULL, 1); put_cpu(); + list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { /* Remove each rdtgroup other than root */ if (rdtgrp == &rdtgroup_default) continue; + + /* +* Give any CPUs back to the default group. We cannot copy +* cpu_online_mask because a CPU might have executed the +* offline callback already, but is still marked online. +*/ + cpumask_or(&rdtgroup_default.cpu_mask, + &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); + kernfs_remove(rdtgrp->kn); list_del(&rdtgrp->rdtgroup_list); kfree(rdtgrp); } + + /* Reset all per cpu closids to the default value */ + for_each_cpu(cpu, &rdtgroup_default.cpu_mask) + per_cpu(cpu_closid, cpu) = 0; + kernfs_remove(kn_info); }
[tip:x86/cache] x86/intel_rdt: Add scheduler hook
Commit-ID: 4f341a5e48443fcc2e2d935ca990e462c02bb1a6 Gitweb: http://git.kernel.org/tip/4f341a5e48443fcc2e2d935ca990e462c02bb1a6 Author: Fenghua Yu AuthorDate: Fri, 28 Oct 2016 15:04:48 -0700 Committer: Thomas Gleixner CommitDate: Sun, 30 Oct 2016 19:10:16 -0600 x86/intel_rdt: Add scheduler hook Hook the x86 scheduler code to update closid based on whether the current task is assigned to a specific closid or running on a CPU assigned to a specific closid. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Shaohua Li" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "David Carrillo-Cisneros" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477692289-37412-10-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/intel_rdt.h | 42 arch/x86/kernel/cpu/intel_rdt.c | 1 - arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 3 +++ arch/x86/kernel/process_32.c | 4 +++ arch/x86/kernel/process_64.c | 4 +++ 5 files changed, 53 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 2e5eab0..5bc72a4 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -1,8 +1,12 @@ #ifndef _ASM_X86_INTEL_RDT_H #define _ASM_X86_INTEL_RDT_H +#ifdef CONFIG_INTEL_RDT_A + #include +#include + #define IA32_L3_QOS_CFG0xc81 #define IA32_L3_CBM_BASE 0xc90 #define IA32_L2_CBM_BASE 0xd10 @@ -176,4 +180,42 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); int rdtgroup_schemata_show(struct kernfs_open_file *of, struct seq_file *s, void *v); + +/* + * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * + * Following considerations are made so that this has minimal impact + * on scheduler hot path: + * - This will stay as no-op unless we are running on an Intel SKU + * which supports resource control and we enable by mounting the + * resctrl file system. + * - Caches the per cpu CLOSid values and does the MSR write only + * when a task with a different CLOSid is scheduled in. + */ +static inline void intel_rdt_sched_in(void) +{ + if (static_branch_likely(&rdt_enable_key)) { + struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); + int closid; + + /* +* If this task has a closid assigned, use it. +* Else use the closid assigned to this cpu. +*/ + closid = current->closid; + if (closid == 0) + closid = this_cpu_read(cpu_closid); + + if (closid != state->closid) { + state->closid = closid; + wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid); + } + } +} + +#else + +static inline void intel_rdt_sched_in(void) {} + +#endif /* CONFIG_INTEL_RDT_A */ #endif /* _ASM_X86_INTEL_RDT_H */ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 40094ae..5a533fe 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -29,7 +29,6 @@ #include #include -#include #include #include diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 5c4bab9..a90ad22 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -292,6 +292,9 @@ static void move_myself(struct callback_head *head) kfree(rdtgrp); } + /* update PQR_ASSOC MSR to make resource group go into effect */ + intel_rdt_sched_in(); + kfree(callback); } diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index bd7be8e..efe7f9f 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -54,6 +54,7 @@ #include #include #include +#include void __show_regs(struct pt_regs *regs, int all) { @@ -299,5 +300,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) this_cpu_write(current_task, next_p); + /* Load the Intel cache allocation PQR MSR. */ + intel_rdt_sched_in(); + return prev_p; } diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index b3760b3..acd7d6f 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -50,6 +50,7 @@ #include #include #include +#include __visible DEFINE_PER_CPU(unsigned long, rsp_scratch); @@ -473,6 +474,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) loadsegment(ss, __KERNEL_DS); } + /* Load the Intel
[tip:x86/cache] MAINTAINERS: Add maintainer for Intel RDT resource allocation
Commit-ID: 48553d103d0b63991a08980889c6a35b3e05b5e3 Gitweb: http://git.kernel.org/tip/48553d103d0b63991a08980889c6a35b3e05b5e3 Author: Fenghua Yu AuthorDate: Fri, 28 Oct 2016 15:04:49 -0700 Committer: Thomas Gleixner CommitDate: Sun, 30 Oct 2016 19:10:17 -0600 MAINTAINERS: Add maintainer for Intel RDT resource allocation We create five new files for Intel RDT resource allocation: arch/x86/kernel/cpu/intel_rdt.c arch/x86/kernel/cpu/intel_rdt_rdtgroup.c arch/x86/kernel/cpu/intel_rdt_schemata.c arch/x86/include/asm/intel_rdt.h Documentation/x86/intel_rdt_ui.txt Fenghua Yu will maintain this code. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Shaohua Li" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "David Carrillo-Cisneros" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477692289-37412-11-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index c447953..4e6a044 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10107,6 +10107,14 @@ L: linux-r...@vger.kernel.org S: Supported F: drivers/infiniband/sw/rdmavt +RDT - RESOURCE ALLOCATION +M: Fenghua Yu +L: linux-kernel@vger.kernel.org +S: Supported +F: arch/x86/kernel/cpu/intel_rdt* +F: arch/x86/include/asm/intel_rdt* +F: Documentation/x86/intel_rdt* + READ-COPY UPDATE (RCU) M: "Paul E. McKenney" M: Josh Triplett
[tip:x86/cache] x86/intel_rdt: Add tasks files
Commit-ID: e02737d5b82640497637d18428e2793bb7f02881 Gitweb: http://git.kernel.org/tip/e02737d5b82640497637d18428e2793bb7f02881 Author: Fenghua Yu AuthorDate: Fri, 28 Oct 2016 15:04:46 -0700 Committer: Thomas Gleixner CommitDate: Sun, 30 Oct 2016 19:10:15 -0600 x86/intel_rdt: Add tasks files The root directory all subdirectories are automatically populated with a read/write (mode 0644) file named "tasks". When read it will show all the task IDs assigned to the resource group. Tasks can be added (one at a time) to a group by writing the task ID to the file. E.g. Membership in a resource group is indicated by a new field in the task_struct "int closid" which holds the CLOSID for each task. The default resource group uses CLOSID=0 which means that all existing tasks when the resctrl file system is mounted belong to the default group. If a group is removed, tasks which are members of that group are moved to the default group. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Shaohua Li" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "David Carrillo-Cisneros" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477692289-37412-8-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 180 +++ include/linux/sched.h| 3 + 2 files changed, 183 insertions(+) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index e05a186..5cc0865 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -28,6 +28,7 @@ #include #include #include +#include #include @@ -267,6 +268,162 @@ unlock: return ret ?: nbytes; } +struct task_move_callback { + struct callback_headwork; + struct rdtgroup *rdtgrp; +}; + +static void move_myself(struct callback_head *head) +{ + struct task_move_callback *callback; + struct rdtgroup *rdtgrp; + + callback = container_of(head, struct task_move_callback, work); + rdtgrp = callback->rdtgrp; + + /* +* If resource group was deleted before this task work callback +* was invoked, then assign the task to root group and free the +* resource group. +*/ + if (atomic_dec_and_test(&rdtgrp->waitcount) && + (rdtgrp->flags & RDT_DELETED)) { + current->closid = 0; + kfree(rdtgrp); + } + + kfree(callback); +} + +static int __rdtgroup_move_task(struct task_struct *tsk, + struct rdtgroup *rdtgrp) +{ + struct task_move_callback *callback; + int ret; + + callback = kzalloc(sizeof(*callback), GFP_KERNEL); + if (!callback) + return -ENOMEM; + callback->work.func = move_myself; + callback->rdtgrp = rdtgrp; + + /* +* Take a refcount, so rdtgrp cannot be freed before the +* callback has been invoked. +*/ + atomic_inc(&rdtgrp->waitcount); + ret = task_work_add(tsk, &callback->work, true); + if (ret) { + /* +* Task is exiting. Drop the refcount and free the callback. +* No need to check the refcount as the group cannot be +* deleted before the write function unlocks rdtgroup_mutex. +*/ + atomic_dec(&rdtgrp->waitcount); + kfree(callback); + } else { + tsk->closid = rdtgrp->closid; + } + return ret; +} + +static int rdtgroup_task_write_permission(struct task_struct *task, + struct kernfs_open_file *of) +{ + const struct cred *tcred = get_task_cred(task); + const struct cred *cred = current_cred(); + int ret = 0; + + /* +* Even if we're attaching all tasks in the thread group, we only +* need to check permissions on one of them. +*/ + if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && + !uid_eq(cred->euid, tcred->uid) && + !uid_eq(cred->euid, tcred->suid)) + ret = -EPERM; + + put_cred(tcred); + return ret; +} + +static int rdtgroup_move_task(pid_t pid, struct rdtgroup *rdtgrp, + struct kernfs_open_file *of) +{ + struct task_struct *tsk; + int ret; + + rcu_read_lock(); + if (pid) { + tsk = find_task_by_vpid(pid); + if (!tsk) { + rcu_read_unlock(); + return -ESRCH; + } + } else { + tsk = current; + } + + get_task_struct(tsk); + rcu_read_unlock(); + + ret = rdtgroup_task_write_permission(tsk, of); + if (!ret) + ret =
[tip:x86/cache] x86/intel_rdt: Add basic resctrl filesystem support
Commit-ID: 5ff193fbde20df5d80fec367cea3e7856c057320 Gitweb: http://git.kernel.org/tip/5ff193fbde20df5d80fec367cea3e7856c057320 Author: Fenghua Yu AuthorDate: Fri, 28 Oct 2016 15:04:42 -0700 Committer: Thomas Gleixner CommitDate: Sun, 30 Oct 2016 19:10:14 -0600 x86/intel_rdt: Add basic resctrl filesystem support Use kernfs as basis for our user interface filesystem. This patch supports mount/umount, and one mount parameter "cdp" to enable code/data prioritization (though all we do at this point is ensure that the system can support CDP). The file system is not populated yet in this patch. [ tglx: Fixed up a few nits and added cdp handling in case of error ] Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Shaohua Li" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "David Carrillo-Cisneros" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/intel_rdt.h | 26 +++ arch/x86/kernel/cpu/Makefile | 2 +- arch/x86/kernel/cpu/intel_rdt.c | 8 +- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 271 +++ include/uapi/linux/magic.h | 1 + 5 files changed, 306 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index c0d0a6e..09d00e6 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -1,9 +1,31 @@ #ifndef _ASM_X86_INTEL_RDT_H #define _ASM_X86_INTEL_RDT_H +#include + +#define IA32_L3_QOS_CFG0xc81 #define IA32_L3_CBM_BASE 0xc90 #define IA32_L2_CBM_BASE 0xd10 +#define L3_QOS_CDP_ENABLE 0x01ULL + +/** + * struct rdtgroup - store rdtgroup's data in resctrl file system. + * @kn:kernfs node + * @rdtgroup_list: linked list for all rdtgroups + * @closid:closid for this rdtgroup + */ +struct rdtgroup { + struct kernfs_node *kn; + struct list_headrdtgroup_list; + int closid; +}; + +/* List of all resource groups */ +extern struct list_head rdt_all_groups; + +int __init rdtgroup_init(void); + /** * struct rdt_resource - attributes of an RDT resource * @enabled: Is this feature enabled on this machine @@ -68,6 +90,10 @@ struct msr_param { extern struct mutex rdtgroup_mutex; extern struct rdt_resource rdt_resources_all[]; +extern struct rdtgroup rdtgroup_default; +DECLARE_STATIC_KEY_FALSE(rdt_enable_key); + +int __init rdtgroup_init(void); enum { RDT_RESOURCE_L3, diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index cf4bfd0..b4334e8 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -34,7 +34,7 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o -obj-$(CONFIG_INTEL_RDT_A) += intel_rdt.o +obj-$(CONFIG_INTEL_RDT_A) += intel_rdt.o intel_rdt_rdtgroup.o obj-$(CONFIG_X86_MCE) += mcheck/ obj-$(CONFIG_MTRR) += mtrr/ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 3d4b397..9d95414 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -361,7 +361,7 @@ static int intel_rdt_offline_cpu(unsigned int cpu) static int __init intel_rdt_late_init(void) { struct rdt_resource *r; - int state; + int state, ret; if (!get_rdt_resources()) return -ENODEV; @@ -372,6 +372,12 @@ static int __init intel_rdt_late_init(void) if (state < 0) return state; + ret = rdtgroup_init(); + if (ret) { + cpuhp_remove_state(state); + return ret; + } + for_each_capable_rdt_resource(r) pr_info("Intel RDT %s allocation detected\n", r->name); diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c new file mode 100644 index 000..106e4ce --- /dev/null +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -0,0 +1,271 @@ +/* + * User interface for Resource Alloction in Resource Director Technology(RDT) + * + * Copyright (C) 2016 Intel Corporation + * + * Author: Fenghua Yu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
[tip:x86/cache] x86/intel_rdt: Add mkdir to resctrl file system
Commit-ID: 60cf5e101fd4441ab112a81e88726efb6fd7542c Gitweb: http://git.kernel.org/tip/60cf5e101fd4441ab112a81e88726efb6fd7542c Author: Fenghua Yu AuthorDate: Fri, 28 Oct 2016 15:04:44 -0700 Committer: Thomas Gleixner CommitDate: Sun, 30 Oct 2016 19:10:14 -0600 x86/intel_rdt: Add mkdir to resctrl file system Resource control groups are represented as directories in the resctrl file system. The root directory describes the default resources available to tasks that have not been assigned specific resources. Other directories can be created at the root level to make new resource groups. It is not permitted to make directories within other directories. Hardware uses a CLOSID (Class of service ID) to determine which resource limits are currently in effect. The exact number available is enumerated by CPUID leaf 0x10, but on current implementations it is a small number. We implement a simple bitmask allocator for CLOSIDs. Each resource control group uses one CLOSID, which limits the total number of directories that can be created. Resource groups can be removed using rmdir. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Shaohua Li" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "David Carrillo-Cisneros" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477692289-37412-6-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/intel_rdt.h | 9 ++ arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 231 +++ 2 files changed, 240 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 5b7b3f6..8032ace 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -14,13 +14,20 @@ * @kn:kernfs node * @rdtgroup_list: linked list for all rdtgroups * @closid:closid for this rdtgroup + * @flags: status bits + * @waitcount: how many cpus expect to find this */ struct rdtgroup { struct kernfs_node *kn; struct list_headrdtgroup_list; int closid; + int flags; + atomic_twaitcount; }; +/* rdtgroup.flags */ +#defineRDT_DELETED 1 + /* List of all resource groups */ extern struct list_head rdt_all_groups; @@ -156,4 +163,6 @@ union cpuid_0x10_1_edx { }; void rdt_cbm_update(void *arg); +struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn); +void rdtgroup_kn_unlock(struct kernfs_node *kn); #endif /* _ASM_X86_INTEL_RDT_H */ diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index fbb42e7..85d31ea 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -26,10 +26,12 @@ #include #include #include +#include #include #include +#include DEFINE_STATIC_KEY_FALSE(rdt_enable_key); struct kernfs_root *rdt_root; @@ -39,6 +41,55 @@ LIST_HEAD(rdt_all_groups); /* Kernel fs node for "info" directory under root */ static struct kernfs_node *kn_info; +/* + * Trivial allocator for CLOSIDs. Since h/w only supports a small number, + * we can keep a bitmap of free CLOSIDs in a single integer. + * + * Using a global CLOSID across all resources has some advantages and + * some drawbacks: + * + We can simply set "current->closid" to assign a task to a resource + * group. + * + Context switch code can avoid extra memory references deciding which + * CLOSID to load into the PQR_ASSOC MSR + * - We give up some options in configuring resource groups across multi-socket + * systems. + * - Our choices on how to configure each resource become progressively more + * limited as the number of resources grows. + */ +static int closid_free_map; + +static void closid_init(void) +{ + struct rdt_resource *r; + int rdt_min_closid = 32; + + /* Compute rdt_min_closid across all resources */ + for_each_enabled_rdt_resource(r) + rdt_min_closid = min(rdt_min_closid, r->num_closid); + + closid_free_map = BIT_MASK(rdt_min_closid) - 1; + + /* CLOSID 0 is always reserved for the default group */ + closid_free_map &= ~1; +} + +int closid_alloc(void) +{ + int closid = ffs(closid_free_map); + + if (closid == 0) + return -ENOSPC; + closid--; + closid_free_map &= ~(1 << closid); + + return closid; +} + +static void closid_free(int closid) +{ + closid_free_map |= 1 << closid; +} + /* set uid and gid of rdtgroup dirs and files to that of the creator */ static int rdtgroup_kn_set_ugid(struct kernfs_node *kn) { @@ -287,6 +338,54 @@ static int parse_rdtgroupfs_options(char *data) return ret; } +
[tip:x86/cache] x86/intel_rdt: Add "info" files to resctrl file system
Commit-ID: 4e978d06dedb8207b298a5a8a49fce4b2ab80d12 Gitweb: http://git.kernel.org/tip/4e978d06dedb8207b298a5a8a49fce4b2ab80d12 Author: Fenghua Yu AuthorDate: Fri, 28 Oct 2016 15:04:43 -0700 Committer: Thomas Gleixner CommitDate: Sun, 30 Oct 2016 19:10:14 -0600 x86/intel_rdt: Add "info" files to resctrl file system For the convenience of applications we make the decoded values of some of the CPUID values available in read-only (0444) files. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Shaohua Li" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "David Carrillo-Cisneros" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477692289-37412-5-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/intel_rdt.h | 24 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 185 +++ 2 files changed, 209 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 09d00e6..5b7b3f6 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -27,6 +27,30 @@ extern struct list_head rdt_all_groups; int __init rdtgroup_init(void); /** + * struct rftype - describe each file in the resctrl file system + * @name: file name + * @mode: access mode + * @kf_ops: operations + * @seq_show: show content of the file + * @write: write to the file + */ +struct rftype { + char*name; + umode_t mode; + struct kernfs_ops *kf_ops; + + int (*seq_show)(struct kernfs_open_file *of, + struct seq_file *sf, void *v); + /* +* write() is the generic write callback which maps directly to +* kernfs write operation and overrides all other operations. +* Maximum write size is determined by ->max_write_len. +*/ + ssize_t (*write)(struct kernfs_open_file *of, +char *buf, size_t nbytes, loff_t off); +}; + +/** * struct rdt_resource - attributes of an RDT resource * @enabled: Is this feature enabled on this machine * @capable: Is this feature available on this machine diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 106e4ce..fbb42e7 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include #include #include @@ -34,6 +36,176 @@ struct kernfs_root *rdt_root; struct rdtgroup rdtgroup_default; LIST_HEAD(rdt_all_groups); +/* Kernel fs node for "info" directory under root */ +static struct kernfs_node *kn_info; + +/* set uid and gid of rdtgroup dirs and files to that of the creator */ +static int rdtgroup_kn_set_ugid(struct kernfs_node *kn) +{ + struct iattr iattr = { .ia_valid = ATTR_UID | ATTR_GID, + .ia_uid = current_fsuid(), + .ia_gid = current_fsgid(), }; + + if (uid_eq(iattr.ia_uid, GLOBAL_ROOT_UID) && + gid_eq(iattr.ia_gid, GLOBAL_ROOT_GID)) + return 0; + + return kernfs_setattr(kn, &iattr); +} + +static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft) +{ + struct kernfs_node *kn; + int ret; + + kn = __kernfs_create_file(parent_kn, rft->name, rft->mode, + 0, rft->kf_ops, rft, NULL, NULL); + if (IS_ERR(kn)) + return PTR_ERR(kn); + + ret = rdtgroup_kn_set_ugid(kn); + if (ret) { + kernfs_remove(kn); + return ret; + } + + return 0; +} + +static int rdtgroup_add_files(struct kernfs_node *kn, struct rftype *rfts, + int len) +{ + struct rftype *rft; + int ret; + + lockdep_assert_held(&rdtgroup_mutex); + + for (rft = rfts; rft < rfts + len; rft++) { + ret = rdtgroup_add_file(kn, rft); + if (ret) + goto error; + } + + return 0; +error: + pr_warn("Failed to add %s, err=%d\n", rft->name, ret); + while (--rft >= rfts) + kernfs_remove_by_name(kn, rft->name); + return ret; +} + +static int rdtgroup_seqfile_show(struct seq_file *m, void *arg) +{ + struct kernfs_open_file *of = m->private; + struct rftype *rft = of->kn->priv; + + if (rft->seq_show) + return rft->seq_show(of, m, arg); + return 0; +} + +static ssize_t rdtgroup_file_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct rftype *rft = of->kn->priv; + + if (rft->write) + return rft->write(of, buf, nbytes, off); + +
[tip:x86/cache] Documentation, x86: Documentation for Intel resource allocation user interface
Commit-ID: f20e57892806ad244eaec7a7ae365e78fee53377 Gitweb: http://git.kernel.org/tip/f20e57892806ad244eaec7a7ae365e78fee53377 Author: Fenghua Yu AuthorDate: Fri, 28 Oct 2016 15:04:40 -0700 Committer: Thomas Gleixner CommitDate: Sun, 30 Oct 2016 19:10:13 -0600 Documentation, x86: Documentation for Intel resource allocation user interface The documentation describes user interface of how to allocate resource in Intel RDT. Please note that the documentation covers generic user interface. Current patch set code only implemente CAT L3. CAT L2 code will be sent later. [ tglx: Added cpu example ] Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "Shaohua Li" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "David Carrillo-Cisneros" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477692289-37412-2-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- Documentation/x86/intel_rdt_ui.txt | 195 + 1 file changed, 195 insertions(+) diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt new file mode 100644 index 000..3b0ebd4 --- /dev/null +++ b/Documentation/x86/intel_rdt_ui.txt @@ -0,0 +1,195 @@ +User Interface for Resource Allocation in Intel Resource Director Technology + +Copyright (C) 2016 Intel Corporation + +Fenghua Yu +Tony Luck + +This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the +X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3". + +To use the feature mount the file system: + + # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl + +mount options are: + +"cdp": Enable code/data prioritization in L3 cache allocations. + + +Resource groups +--- +Resource groups are represented as directories in the resctrl file +system. The default group is the root directory. Other groups may be +created as desired by the system administrator using the "mkdir(1)" +command, and removed using "rmdir(1)". + +There are three files associated with each group: + +"tasks": A list of tasks that belongs to this group. Tasks can be + added to a group by writing the task ID to the "tasks" file + (which will automatically remove them from the previous + group to which they belonged). New tasks created by fork(2) + and clone(2) are added to the same group as their parent. + If a pid is not in any sub partition, it is in root partition + (i.e. default partition). + +"cpus": A bitmask of logical CPUs assigned to this group. Writing + a new mask can add/remove CPUs from this group. Added CPUs + are removed from their previous group. Removed ones are + given to the default (root) group. You cannot remove CPUs + from the default group. + +"schemata": A list of all the resources available to this group. + Each resource has its own line and format - see below for + details. + +When a task is running the following rules define which resources +are available to it: + +1) If the task is a member of a non-default group, then the schemata +for that group is used. + +2) Else if the task belongs to the default group, but is running on a +CPU that is assigned to some specific group, then the schemata for +the CPU's group is used. + +3) Otherwise the schemata for the default group is used. + + +Schemata files - general concepts +- +Each line in the file describes one resource. The line starts with +the name of the resource, followed by specific values to be applied +in each of the instances of that resource on the system. + +Cache IDs +- +On current generation systems there is one L3 cache per socket and L2 +caches are generally just shared by the hyperthreads on a core, but this +isn't an architectural requirement. We could have multiple separate L3 +caches on a socket, multiple cores could share an L2 cache. So instead +of using "socket" or "core" to define the set of logical cpus sharing +a resource we use a "Cache ID". At a given cache level this will be a +unique number across the whole system (but it isn't guaranteed to be a +contiguous sequence, there may be gaps). To find the ID for each logical +CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id + +Cache Bit Masks (CBM) +- +For cache resources we describe the portion of the cache that is available +for allocation using a bitmask. The maximum value of the mask is defined +by each cpu model (and may be different for different cache levels). It +is found using CPUID, but is also provided in the "info" directory of +the resctrl file system in "info/{resource}/cbm_mask". X86 hardware +requires that these masks have all the '1' bits in a contiguous block. So +0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9 +and 0xA are not. On a system with a 20-bit m
[tip:x86/cache] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID
Commit-ID: c1c7c3f9d6bb6999a45f66ea4c6bfbcab87ff34b Gitweb: http://git.kernel.org/tip/c1c7c3f9d6bb6999a45f66ea4c6bfbcab87ff34b Author: Fenghua Yu AuthorDate: Sat, 22 Oct 2016 06:19:55 -0700 Committer: Thomas Gleixner CommitDate: Wed, 26 Oct 2016 23:12:39 +0200 x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID Define struct rdt_resource to hold all the parameterized values for an RDT resource and fill in the CPUID enumerated values from leaf 0x10 if available. Hard code them for the MSR detected Haswells. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "David Carrillo-Cisneros" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "Shaohua Li" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477142405-32078-9-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/intel_rdt.h | 68 arch/x86/kernel/cpu/intel_rdt.c | 111 --- 2 files changed, 172 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 3aca86d..9780409 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -2,5 +2,73 @@ #define _ASM_X86_INTEL_RDT_H #define IA32_L3_CBM_BASE 0xc90 +#define IA32_L2_CBM_BASE 0xd10 +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @capable: Is this feature available on this machine + * @name: Name to use in "schemata" file + * @num_closid:Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @domains: All domains for this resource + * @num_domains: Number of domains active + * @msr_base: Base MSR address for CBMs + * @tmp_cbms: Scratch space when updating schemata + * @cache_level: Which cache level defines scope of this domain + * @cbm_idx_multi: Multiplier of CBM index + * @cbm_idx_offset:Offset of CBM index. CBM index is computed by: + * closid * cbm_idx_multi + cbm_idx_offset + */ +struct rdt_resource { + boolenabled; + boolcapable; + char*name; + int num_closid; + int cbm_len; + int min_cbm_bits; + u32 max_cbm; + struct list_headdomains; + int num_domains; + int msr_base; + u32 *tmp_cbms; + int cache_level; + int cbm_idx_multi; + int cbm_idx_offset; +}; + +extern struct rdt_resource rdt_resources_all[]; + +enum { + RDT_RESOURCE_L3, + RDT_RESOURCE_L3DATA, + RDT_RESOURCE_L3CODE, + RDT_RESOURCE_L2, + + /* Must be the last */ + RDT_NUM_RESOURCES, +}; + +#define for_each_capable_rdt_resource(r) \ + for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ +r++) \ + if (r->capable) + +/* CPUID.(EAX=10H, ECX=ResID=1).EAX */ +union cpuid_0x10_1_eax { + struct { + unsigned int cbm_len:5; + } split; + unsigned int full; +}; + +/* CPUID.(EAX=10H, ECX=ResID=1).EDX */ +union cpuid_0x10_1_edx { + struct { + unsigned int cos_max:16; + } split; + unsigned int full; +}; #endif /* _ASM_X86_INTEL_RDT_H */ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index f8e35cf..157dc8d0 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -31,6 +31,47 @@ #include #include +#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains) + +struct rdt_resource rdt_resources_all[] = { + { + .name = "L3", + .domains= domain_init(RDT_RESOURCE_L3), + .msr_base = IA32_L3_CBM_BASE, + .min_cbm_bits = 1, + .cache_level= 3, + .cbm_idx_multi = 1, + .cbm_idx_offset = 0 + }, + { + .name = "L3DATA", + .domains= domain_init(RDT_RESOURCE_L3DATA), + .msr_base = IA32_L3_CBM_BASE, + .min_cbm_bits = 1, + .cache_level= 3, + .cbm_
[tip:x86/cache] x86/cqm: Share PQR_ASSOC related data between CQM and CAT
Commit-ID: 6b281569df649ed76145c527028fbbe8a32493aa Gitweb: http://git.kernel.org/tip/6b281569df649ed76145c527028fbbe8a32493aa Author: Fenghua Yu AuthorDate: Sat, 22 Oct 2016 06:19:56 -0700 Committer: Thomas Gleixner CommitDate: Wed, 26 Oct 2016 23:12:39 +0200 x86/cqm: Share PQR_ASSOC related data between CQM and CAT PQR_ASSOC MSR contains the RMID used for preformance monitoring of cache occupancy and memory bandwidth. The upper 32bit of this MSR contain the CLOSID for cache allocation. So we need to share the information between the two facilities. Move the rdt data structure declaration into the shared header file and make the per cpu data structure containing the MSR values global. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "David Carrillo-Cisneros" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "Shaohua Li" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477142405-32078-10-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/events/intel/cqm.c | 21 + arch/x86/include/asm/intel_rdt_common.h | 21 + 2 files changed, 22 insertions(+), 20 deletions(-) diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c index df86874..0c45cc8 100644 --- a/arch/x86/events/intel/cqm.c +++ b/arch/x86/events/intel/cqm.c @@ -24,32 +24,13 @@ static unsigned int cqm_l3_scale; /* supposedly cacheline size */ static bool cqm_enabled, mbm_enabled; unsigned int mbm_socket_max; -/** - * struct intel_pqr_state - State cache for the PQR MSR - * @rmid: The cached Resource Monitoring ID - * @closid:The cached Class Of Service ID - * @rmid_usecnt: The usage counter for rmid - * - * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the - * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always - * contains both parts, so we need to cache them. - * - * The cache also helps to avoid pointless updates if the value does - * not change. - */ -struct intel_pqr_state { - u32 rmid; - u32 closid; - int rmid_usecnt; -}; - /* * The cached intel_pqr_state is strictly per CPU and can never be * updated from a remote CPU. Both functions which modify the state * (intel_cqm_event_start and intel_cqm_event_stop) are called with * interrupts disabled, which is sufficient for the protection. */ -static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state); +DEFINE_PER_CPU(struct intel_pqr_state, pqr_state); static struct hrtimer *mbm_timers; /** * struct sample - mbm event's (local or total) data diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h index e6e15cf..b31081b 100644 --- a/arch/x86/include/asm/intel_rdt_common.h +++ b/arch/x86/include/asm/intel_rdt_common.h @@ -3,4 +3,25 @@ #define MSR_IA32_PQR_ASSOC 0x0c8f +/** + * struct intel_pqr_state - State cache for the PQR MSR + * @rmid: The cached Resource Monitoring ID + * @closid:The cached Class Of Service ID + * @rmid_usecnt: The usage counter for rmid + * + * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the + * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always + * contains both parts, so we need to cache them. + * + * The cache also helps to avoid pointless updates if the value does + * not change. + */ +struct intel_pqr_state { + u32 rmid; + u32 closid; + int rmid_usecnt; +}; + +DECLARE_PER_CPU(struct intel_pqr_state, pqr_state); + #endif /* _ASM_X86_INTEL_RDT_COMMON_H */
[tip:x86/cache] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
Commit-ID: 78e99b4a2b9afb1c304259fcd4a1c71ca97e3acd Gitweb: http://git.kernel.org/tip/78e99b4a2b9afb1c304259fcd4a1c71ca97e3acd Author: Fenghua Yu AuthorDate: Sat, 22 Oct 2016 06:19:53 -0700 Committer: Thomas Gleixner CommitDate: Wed, 26 Oct 2016 23:12:38 +0200 x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to control inclusion of Resource Director Technology in the build. Simple init() routine just checks which features are present. If they are pr_info() one line summary for each feature for now. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "David Carrillo-Cisneros" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "Shaohua Li" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/Kconfig| 12 + arch/x86/kernel/cpu/Makefile| 2 ++ arch/x86/kernel/cpu/intel_rdt.c | 54 + 3 files changed, 68 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index bada636..770fb5f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -407,6 +407,18 @@ config GOLDFISH def_bool y depends on X86_GOLDFISH +config INTEL_RDT_A + bool "Intel Resource Director Technology Allocation support" + default n + depends on X86 && CPU_SUP_INTEL + help + Select to enable resource allocation which is a sub-feature of + Intel Resource Director Technology(RDT). More information about + RDT can be found in the Intel x86 Architecture Software + Developer Manual. + + Say N if unsure. + if X86_32 config X86_EXTENDED_PLATFORM bool "Support for extended (non-PC) x86 platforms" diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 4a8697f..cf4bfd0 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -34,6 +34,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o +obj-$(CONFIG_INTEL_RDT_A) += intel_rdt.o + obj-$(CONFIG_X86_MCE) += mcheck/ obj-$(CONFIG_MTRR) += mtrr/ obj-$(CONFIG_MICROCODE)+= microcode/ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c new file mode 100644 index 000..7d7aebe --- /dev/null +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -0,0 +1,54 @@ +/* + * Resource Director Technology(RDT) + * - Cache Allocation code. + * + * Copyright (C) 2016 Intel Corporation + * + * Authors: + *Fenghua Yu + *Tony Luck + *Vikas Shivappa + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * More information about RDT be found in the Intel (R) x86 Architecture + * Software Developer Manual June 2016, volume 3, section 17.17. + */ + +#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt + +#include +#include + +static inline bool get_rdt_resources(void) +{ + bool ret = false; + + if (!boot_cpu_has(X86_FEATURE_RDT_A)) + return false; + if (boot_cpu_has(X86_FEATURE_CAT_L3)) + ret = true; + + return ret; +} + +static int __init intel_rdt_late_init(void) +{ + if (!get_rdt_resources()) + return -ENODEV; + + pr_info("Intel RDT cache allocation detected\n"); + if (boot_cpu_has(X86_FEATURE_CDP_L3)) + pr_info("Intel RDT code data prioritization detected\n"); + + return 0; +} + +late_initcall(intel_rdt_late_init);
[tip:x86/cache] x86/intel_rdt: Add Haswell feature discovery
Commit-ID: 113c60970cf41723891e3a1b303517eaf8510bb5 Gitweb: http://git.kernel.org/tip/113c60970cf41723891e3a1b303517eaf8510bb5 Author: Fenghua Yu AuthorDate: Sat, 22 Oct 2016 06:19:54 -0700 Committer: Thomas Gleixner CommitDate: Wed, 26 Oct 2016 23:12:38 +0200 x86/intel_rdt: Add Haswell feature discovery Some Haswell generation CPUs support RDT, but they don't enumerate this via CPUID. Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on cpu model 63 (INTEL_FAM6_HASWELL_X) Move the relevant defines into a common header file which is shared between RDT/CQM and RDT/Allocation to avoid duplication. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "David Carrillo-Cisneros" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "Shaohua Li" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477142405-32078-8-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/events/intel/cqm.c | 2 +- arch/x86/include/asm/intel_rdt.h| 6 arch/x86/include/asm/intel_rdt_common.h | 6 arch/x86/kernel/cpu/intel_rdt.c | 49 ++--- 4 files changed, 58 insertions(+), 5 deletions(-) diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c index 8f82b02..df86874 100644 --- a/arch/x86/events/intel/cqm.c +++ b/arch/x86/events/intel/cqm.c @@ -7,9 +7,9 @@ #include #include #include +#include #include "../perf_event.h" -#define MSR_IA32_PQR_ASSOC 0x0c8f #define MSR_IA32_QM_CTR0x0c8e #define MSR_IA32_QM_EVTSEL 0x0c8d diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h new file mode 100644 index 000..3aca86d --- /dev/null +++ b/arch/x86/include/asm/intel_rdt.h @@ -0,0 +1,6 @@ +#ifndef _ASM_X86_INTEL_RDT_H +#define _ASM_X86_INTEL_RDT_H + +#define IA32_L3_CBM_BASE 0xc90 + +#endif /* _ASM_X86_INTEL_RDT_H */ diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h new file mode 100644 index 000..e6e15cf --- /dev/null +++ b/arch/x86/include/asm/intel_rdt_common.h @@ -0,0 +1,6 @@ +#ifndef _ASM_X86_INTEL_RDT_COMMON_H +#define _ASM_X86_INTEL_RDT_COMMON_H + +#define MSR_IA32_PQR_ASSOC 0x0c8f + +#endif /* _ASM_X86_INTEL_RDT_COMMON_H */ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 7d7aebe..f8e35cf 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -27,16 +27,57 @@ #include #include +#include +#include +#include + +/* + * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs + * as they do not have CPUID enumeration support for Cache allocation. + * The check for Vendor/Family/Model is not enough to guarantee that + * the MSRs won't #GP fault because only the following SKUs support + * CAT: + * Intel(R) Xeon(R) CPU E5-2658 v3 @ 2.20GHz + * Intel(R) Xeon(R) CPU E5-2648L v3 @ 1.80GHz + * Intel(R) Xeon(R) CPU E5-2628L v3 @ 2.00GHz + * Intel(R) Xeon(R) CPU E5-2618L v3 @ 2.30GHz + * Intel(R) Xeon(R) CPU E5-2608L v3 @ 2.00GHz + * Intel(R) Xeon(R) CPU E5-2658A v3 @ 2.20GHz + * + * Probe by trying to write the first of the L3 cach mask registers + * and checking that the bits stick. Max CLOSids is always 4 and max cbm length + * is always 20 on hsw server parts. The minimum cache bitmask length + * allowed for HSW server is always 2 bits. Hardcode all of them. + */ +static inline bool cache_alloc_hsw_probe(void) +{ + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && + boot_cpu_data.x86 == 6 && + boot_cpu_data.x86_model == INTEL_FAM6_HASWELL_X) { + u32 l, h, max_cbm = BIT_MASK(20) - 1; + + if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0)) + return false; + rdmsr(IA32_L3_CBM_BASE, l, h); + + /* If all the bits were set in MSR, return success */ + return l == max_cbm; + } + + return false; +} + static inline bool get_rdt_resources(void) { - bool ret = false; + if (cache_alloc_hsw_probe()) + return true; if (!boot_cpu_has(X86_FEATURE_RDT_A)) return false; - if (boot_cpu_has(X86_FEATURE_CAT_L3)) - ret = true; + if (!boot_cpu_has(X86_FEATURE_CAT_L3)) + return false; - return ret; + return true; } static int __init intel_rdt_late_init(void)
[tip:x86/cache] x86/cpufeature: Add RDT CPUID feature bits
Commit-ID: 4ab1586488cb56ed8728e54c4157cc38646874d9 Gitweb: http://git.kernel.org/tip/4ab1586488cb56ed8728e54c4157cc38646874d9 Author: Fenghua Yu AuthorDate: Sat, 22 Oct 2016 06:19:51 -0700 Committer: Thomas Gleixner CommitDate: Wed, 26 Oct 2016 23:12:38 +0200 x86/cpufeature: Add RDT CPUID feature bits Check CPUID leaves for all the Resource Director Technology (RDT) Cache Allocation Technology (CAT) bits. Presence of allocation features: CPUID.(EAX=7H, ECX=0):EBX[bit 15] X86_FEATURE_RDT_A L2 and L3 caches are each separately enabled: CPUID.(EAX=10H, ECX=0):EBX[bit 1] X86_FEATURE_CAT_L3 CPUID.(EAX=10H, ECX=0):EBX[bit 2] X86_FEATURE_CAT_L2 L3 cache may support independent control of allocation for code and data (CDP = Code/Data Prioritization): CPUID.(EAX=10H, ECX=1):ECX[bit 2] X86_FEATURE_CDP_L3 [ tglx: Fixed up Borislavs comments and moved the feature bits into a gap ] Signed-off-by: Fenghua Yu Acked-by: "Borislav Petkov" Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "David Carrillo-Cisneros" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "Shaohua Li" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477142405-32078-5-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/cpufeatures.h | 4 arch/x86/kernel/cpu/scattered.c| 3 +++ 2 files changed, 7 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index a396292..90b8c0b 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -189,6 +189,9 @@ #define X86_FEATURE_CPB( 7*32+ 2) /* AMD Core Performance Boost */ #define X86_FEATURE_EPB( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ +#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ +#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ +#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ @@ -221,6 +224,7 @@ #define X86_FEATURE_RTM( 9*32+11) /* Restricted Transactional Memory */ #define X86_FEATURE_CQM( 9*32+12) /* Cache QoS Monitoring */ #define X86_FEATURE_MPX( 9*32+14) /* Memory Protection Extension */ +#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ #define X86_FEATURE_AVX512F( 9*32+16) /* AVX-512 Foundation */ #define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c index 1db8dc4..49fb680 100644 --- a/arch/x86/kernel/cpu/scattered.c +++ b/arch/x86/kernel/cpu/scattered.c @@ -36,6 +36,9 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c) { X86_FEATURE_AVX512_4FMAPS,CR_EDX, 3, 0x0007, 0 }, { X86_FEATURE_APERFMPERF, CR_ECX, 0, 0x0006, 0 }, { X86_FEATURE_EPB, CR_ECX, 3, 0x0006, 0 }, + { X86_FEATURE_CAT_L3, CR_EBX, 1, 0x0010, 0 }, + { X86_FEATURE_CAT_L2, CR_EBX, 2, 0x0010, 0 }, + { X86_FEATURE_CDP_L3, CR_ECX, 2, 0x0010, 1 }, { X86_FEATURE_HW_PSTATE,CR_EDX, 7, 0x8007, 0 }, { X86_FEATURE_CPB, CR_EDX, 9, 0x8007, 0 }, { X86_FEATURE_PROC_FEEDBACK,CR_EDX,11, 0x8007, 0 },
[tip:x86/cache] x86/intel_cacheinfo: Enable cache id in cache info
Commit-ID: d57e3ab7e34c51a8badeea1b500bfb738d0af66e Gitweb: http://git.kernel.org/tip/d57e3ab7e34c51a8badeea1b500bfb738d0af66e Author: Fenghua Yu AuthorDate: Sat, 22 Oct 2016 06:19:50 -0700 Committer: Thomas Gleixner CommitDate: Wed, 26 Oct 2016 23:12:37 +0200 x86/intel_cacheinfo: Enable cache id in cache info Cache id is retrieved from APIC ID and CPUID leaf 4 on x86. For more details please see the section on "Cache ID Extraction Parameters" in "Intel 64 Architecture Processor Topology Enumeration". Also the documentation of the CPUID instruction in the "Intel 64 and IA-32 Architectures Software Developer's Manual" Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "David Carrillo-Cisneros" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "Shaohua Li" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/intel_cacheinfo.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c index de6626c..8dc5720 100644 --- a/arch/x86/kernel/cpu/intel_cacheinfo.c +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c @@ -153,6 +153,7 @@ struct _cpuid4_info_regs { union _cpuid4_leaf_eax eax; union _cpuid4_leaf_ebx ebx; union _cpuid4_leaf_ecx ecx; + unsigned int id; unsigned long size; struct amd_northbridge *nb; }; @@ -894,6 +895,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int index, static void ci_leaf_init(struct cacheinfo *this_leaf, struct _cpuid4_info_regs *base) { + this_leaf->id = base->id; + this_leaf->attributes = CACHE_ID; this_leaf->level = base->eax.split.level; this_leaf->type = cache_type_map[base->eax.split.type]; this_leaf->coherency_line_size = @@ -920,6 +923,22 @@ static int __init_cache_level(unsigned int cpu) return 0; } +/* + * The max shared threads number comes from CPUID.4:EAX[25-14] with input + * ECX as cache index. Then right shift apicid by the number's order to get + * cache id for this cache node. + */ +static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4_regs) +{ + struct cpuinfo_x86 *c = &cpu_data(cpu); + unsigned long num_threads_sharing; + int index_msb; + + num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing; + index_msb = get_count_order(num_threads_sharing); + id4_regs->id = c->apicid >> index_msb; +} + static int __populate_cache_leaves(unsigned int cpu) { unsigned int idx, ret; @@ -931,6 +950,7 @@ static int __populate_cache_leaves(unsigned int cpu) ret = cpuid4_cache_lookup_regs(idx, &id4_regs); if (ret) return ret; + get_cache_id(cpu, &id4_regs); ci_leaf_init(this_leaf++, &id4_regs); __cache_cpumap_setup(cpu, idx, &id4_regs); }
[tip:x86/cache] cacheinfo: Introduce cache id
Commit-ID: e9a2ea5a1ba09c35258f3663842fb8d8cf2e00c2 Gitweb: http://git.kernel.org/tip/e9a2ea5a1ba09c35258f3663842fb8d8cf2e00c2 Author: Fenghua Yu AuthorDate: Sat, 22 Oct 2016 06:19:49 -0700 Committer: Thomas Gleixner CommitDate: Wed, 26 Oct 2016 23:12:37 +0200 cacheinfo: Introduce cache id Cache management software needs an id for each instance of a cache of a particular type. The current cacheinfo structure does not provide any information about the underlying hardware so there is no way to expose it. Hardware with cache management features provides means (cpuid, enumeration etc.) to retrieve the hardware id of a particular cache instance. Cache instances which share hardware have the same hardware id. Add an 'id' field to struct cacheinfo to store this information. Expose this information under the /sys/devices/system/cpu/cpu*/cache/index*/ directory as well. Signed-off-by: Fenghua Yu Cc: "Ravi V Shankar" Cc: "Tony Luck" Cc: "David Carrillo-Cisneros" Cc: "Sai Prakhya" Cc: "Peter Zijlstra" Cc: "Stephane Eranian" Cc: "Dave Hansen" Cc: "Shaohua Li" Cc: "Nilay Vaish" Cc: "Vikas Shivappa" Cc: "Ingo Molnar" Cc: "Borislav Petkov" Cc: "H. Peter Anvin" Link: http://lkml.kernel.org/r/1477142405-32078-3-git-send-email-fenghua...@intel.com Signed-off-by: Thomas Gleixner --- drivers/base/cacheinfo.c | 5 + include/linux/cacheinfo.h | 3 +++ 2 files changed, 8 insertions(+) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index e9fd32e..00a9688 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -233,6 +233,7 @@ static ssize_t file_name##_show(struct device *dev, \ return sprintf(buf, "%u\n", this_leaf->object); \ } +show_one(id, id); show_one(level, level); show_one(coherency_line_size, coherency_line_size); show_one(number_of_sets, number_of_sets); @@ -314,6 +315,7 @@ static ssize_t write_policy_show(struct device *dev, return n; } +static DEVICE_ATTR_RO(id); static DEVICE_ATTR_RO(level); static DEVICE_ATTR_RO(type); static DEVICE_ATTR_RO(coherency_line_size); @@ -327,6 +329,7 @@ static DEVICE_ATTR_RO(shared_cpu_list); static DEVICE_ATTR_RO(physical_line_partition); static struct attribute *cache_default_attrs[] = { + &dev_attr_id.attr, &dev_attr_type.attr, &dev_attr_level.attr, &dev_attr_shared_cpu_map.attr, @@ -350,6 +353,8 @@ cache_default_attrs_is_visible(struct kobject *kobj, const struct cpumask *mask = &this_leaf->shared_cpu_map; umode_t mode = attr->mode; + if ((attr == &dev_attr_id.attr) && (this_leaf->attributes & CACHE_ID)) + return mode; if ((attr == &dev_attr_type.attr) && this_leaf->type) return mode; if ((attr == &dev_attr_level.attr) && this_leaf->level) diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h index 2189935..0bcbb67 100644 --- a/include/linux/cacheinfo.h +++ b/include/linux/cacheinfo.h @@ -18,6 +18,7 @@ enum cache_type { /** * struct cacheinfo - represent a cache leaf node + * @id: This cache's id. It is unique among caches with the same (type, level). * @type: type of the cache - data, inst or unified * @level: represents the hierarchy in the multi-level cache * @coherency_line_size: size of each cache line usually representing @@ -44,6 +45,7 @@ enum cache_type { * keeping, the remaining members form the core properties of the cache */ struct cacheinfo { + unsigned int id; enum cache_type type; unsigned int level; unsigned int coherency_line_size; @@ -61,6 +63,7 @@ struct cacheinfo { #define CACHE_WRITE_ALLOCATE BIT(3) #define CACHE_ALLOCATE_POLICY_MASK \ (CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE) +#define CACHE_ID BIT(4) struct device_node *of_node; bool disable_sysfs;
[tip:x86/fpu] x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
Commit-ID: 7d9370607d28afd454775c623d5447603473a3c3 Gitweb: http://git.kernel.org/tip/7d9370607d28afd454775c623d5447603473a3c3 Author: Fenghua Yu AuthorDate: Fri, 20 May 2016 10:47:07 -0700 Committer: Ingo Molnar CommitDate: Sat, 18 Jun 2016 10:10:19 +0200 x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization Keep init_fpstate.xsave.header.xfeatures as zero for init optimization. This is important for init optimization that is implemented in processor. If a bit corresponding to an xstate in xstate_bv is 0, it means the xstate is in init status and will not be read from memory to the processor during XRSTOR/XRSTORS instruction. This largely impacts context switch performance. Signed-off-by: Fenghua Yu Signed-off-by: Yu-cheng Yu Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Ravi V. Shankar Cc: Sai Praneeth Prakhya Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/2fb4ec7f18b76e8cda057a8c0038def74a9b8044.1463760376.git.yu-cheng...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/fpu/xstate.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 46abfaf..dbfef1b 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -329,13 +329,11 @@ static void __init setup_init_fpu_buf(void) setup_xstate_features(); print_xstate_features(); - if (boot_cpu_has(X86_FEATURE_XSAVES)) { + if (boot_cpu_has(X86_FEATURE_XSAVES)) init_fpstate.xsave.header.xcomp_bv = (u64)1 << 63 | xfeatures_mask; - init_fpstate.xsave.header.xfeatures = xfeatures_mask; - } /* -* Init all the features state with header_bv being 0x0 +* Init all the features state with header.xfeatures being 0x0 */ copy_kernel_to_xregs_booting(&init_fpstate.xsave);
[tip:x86/fpu] x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
Commit-ID: a1141e0b5ca6ee3e5e35d5f1a310a5ecb9c96ce5 Gitweb: http://git.kernel.org/tip/a1141e0b5ca6ee3e5e35d5f1a310a5ecb9c96ce5 Author: Fenghua Yu AuthorDate: Fri, 20 May 2016 10:47:05 -0700 Committer: Ingo Molnar CommitDate: Sat, 18 Jun 2016 10:10:18 +0200 x86/fpu/xstate: Define and use 'fpu_user_xstate_size' The kernel xstate area can be in standard or compacted format; it is always in standard format for user mode. When XSAVES is enabled, the kernel uses the compacted format and it is necessary to use a separate fpu_user_xstate_size for signal/ptrace frames. Signed-off-by: Fenghua Yu [ Rebased the patch and cleaned up the naming. ] Signed-off-by: Yu-cheng Yu Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Ravi V. Shankar Cc: Sai Praneeth Prakhya Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/8756ec34dabddfc727cda5743195eb81e8caf91c.1463760376.git.yu-cheng...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/fpu/xstate.h | 1 - arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/fpu/init.c| 5 ++- arch/x86/kernel/fpu/signal.c | 27 ++ arch/x86/kernel/fpu/xstate.c | 76 --- 5 files changed, 73 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 38951b0..16df2c4 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -39,7 +39,6 @@ #define REX_PREFIX #endif -extern unsigned int xstate_size; extern u64 xfeatures_mask; extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS]; diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 62c6cc3..0a16a16 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -368,6 +368,7 @@ DECLARE_PER_CPU(struct irq_stack *, softirq_stack); #endif /* X86_64 */ extern unsigned int xstate_size; +extern unsigned int fpu_user_xstate_size; struct perf_event; diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index aacfd7a..5b1928c 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void) } /* - * Set up the xstate_size based on the legacy FPU context size. + * Set up the user and kernel xstate_size based on the legacy FPU context size. * * We set this up first, and later it will be overwritten by * fpu__init_system_xstate() if the CPU knows about xstates. @@ -226,6 +226,9 @@ static void __init fpu__init_system_xstate_size_legacy(void) else xstate_size = sizeof(struct fregs_state); } + + fpu_user_xstate_size = xstate_size; + /* * Quirk: we don't yet handle the XSAVES* instructions * correctly, as we don't correctly convert between diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index c6f2a3c..0d29d4d 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -32,7 +32,7 @@ static inline int check_for_xstate(struct fxregs_state __user *buf, /* Check for the first magic field and other error scenarios. */ if (fx_sw->magic1 != FP_XSTATE_MAGIC1 || fx_sw->xstate_size < min_xstate_size || - fx_sw->xstate_size > xstate_size || + fx_sw->xstate_size > fpu_user_xstate_size || fx_sw->xstate_size > fx_sw->extended_size) return -1; @@ -89,7 +89,8 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame) if (!use_xsave()) return err; - err |= __put_user(FP_XSTATE_MAGIC2, (__u32 *)(buf + xstate_size)); + err |= __put_user(FP_XSTATE_MAGIC2, + (__u32 *)(buf + fpu_user_xstate_size)); /* * Read the xfeatures which we copied (directly from the cpu or @@ -126,7 +127,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf) else err = copy_fregs_to_user((struct fregs_state __user *) buf); - if (unlikely(err) && __clear_user(buf, xstate_size)) + if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size)) err = -EFAULT; return err; } @@ -176,8 +177,19 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size) if (ia32_fxstate) copy_fxregs_to_kernel(&tsk->thread.fpu); } else { + /* +* It is a *bug* if kernel uses compacted-format for xsave +* area and we copy it out directly to a signal frame. It +* should have been handled above by saving the registers +* directly. +*/ +
[tip:x86/fpu] x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
Commit-ID: bf15a8cf8d14879b785c548728415d36ccb6a33b Gitweb: http://git.kernel.org/tip/bf15a8cf8d14879b785c548728415d36ccb6a33b Author: Fenghua Yu AuthorDate: Fri, 20 May 2016 10:47:06 -0700 Committer: Ingo Molnar CommitDate: Sat, 18 Jun 2016 10:10:18 +0200 x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size' User space uses standard format xsave area. fpstate in signal frame should have standard format size. To explicitly distinguish between xstate size in kernel space and the one in user space, we rename 'xstate_size' to 'fpu_kernel_xstate_size'. Cleanup only, no change in functionality. Signed-off-by: Fenghua Yu [ Rebased the patch and cleaned up the naming. ] Signed-off-by: Yu-cheng Yu Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Ravi V. Shankar Cc: Sai Praneeth Prakhya Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/2ecbae347a5152d94be52adf7d0f3b7305d90d99.1463760376.git.yu-cheng...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/processor.h | 2 +- arch/x86/kernel/fpu/core.c | 7 --- arch/x86/kernel/fpu/init.c | 20 +++- arch/x86/kernel/fpu/signal.c | 2 +- arch/x86/kernel/fpu/xstate.c | 8 5 files changed, 21 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 0a16a16..965c5d2 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -367,7 +367,7 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack); DECLARE_PER_CPU(struct irq_stack *, softirq_stack); #endif /* X86_64 */ -extern unsigned int xstate_size; +extern unsigned int fpu_kernel_xstate_size; extern unsigned int fpu_user_xstate_size; struct perf_event; diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 7d56474..c759bd0 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -227,7 +227,7 @@ void fpstate_init(union fpregs_state *state) return; } - memset(state, 0, xstate_size); + memset(state, 0, fpu_kernel_xstate_size); if (static_cpu_has(X86_FEATURE_FXSR)) fpstate_init_fxstate(&state->fxsave); @@ -252,7 +252,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu) * leak into the child task: */ if (use_eager_fpu()) - memset(&dst_fpu->state.xsave, 0, xstate_size); + memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size); /* * Save current FPU registers directly into the child @@ -271,7 +271,8 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu) */ preempt_disable(); if (!copy_fpregs_to_fpstate(dst_fpu)) { - memcpy(&src_fpu->state, &dst_fpu->state, xstate_size); + memcpy(&src_fpu->state, &dst_fpu->state, + fpu_kernel_xstate_size); if (use_eager_fpu()) copy_kernel_to_fpregs(&src_fpu->state); diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index 5b1928c..60f3839 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -145,8 +145,8 @@ static void __init fpu__init_system_generic(void) * This is inherent to the XSAVE architecture which puts all state * components into a single, continuous memory block: */ -unsigned int xstate_size; -EXPORT_SYMBOL_GPL(xstate_size); +unsigned int fpu_kernel_xstate_size; +EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size); /* Get alignment of the TYPE. */ #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test) @@ -178,7 +178,7 @@ static void __init fpu__init_task_struct_size(void) * Add back the dynamically-calculated register state * size. */ - task_size += xstate_size; + task_size += fpu_kernel_xstate_size; /* * We dynamically size 'struct fpu', so we require that @@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void) } /* - * Set up the user and kernel xstate_size based on the legacy FPU context size. + * Set up the user and kernel xstate sizes based on the legacy FPU context size. * * We set this up first, and later it will be overwritten by * fpu__init_system_xstate() if the CPU knows about xstates. @@ -208,7 +208,7 @@ static void __init fpu__init_system_xstate_size_legacy(void) on_boot_cpu = 0; /* -* Note that xstate_size might be overwriten later during +* Note that xstate sizes might be overwritten later during * fpu__init_system_xstate(). */ @@ -219,15 +219,17 @@ static void __init fpu__init_system_xstate_size_legacy(void) */ setup_clear_cpu_cap
[tip:x86/fpu] x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
Commit-ID: 63a5db07a03947218e5f4fb0776df6b6ca328287 Gitweb: http://git.kernel.org/tip/63a5db07a03947218e5f4fb0776df6b6ca328287 Author: Fenghua Yu AuthorDate: Fri, 20 May 2016 10:47:06 -0700 Committer: Ingo Molnar CommitDate: Fri, 17 Jun 2016 10:10:22 +0200 x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size' User space uses standard format xsave area. fpstate in signal frame should have standard format size. To explicitly distinguish between xstate size in kernel space and the one in user space, we rename 'xstate_size' to 'fpu_kernel_xstate_size'. Cleanup only, no change in functionality. Signed-off-by: Fenghua Yu [ Rebased the patch and cleaned up the naming. ] Signed-off-by: Yu-cheng Yu Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Ravi V. Shankar Cc: Sai Praneeth Prakhya Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/2ecbae347a5152d94be52adf7d0f3b7305d90d99.1463760376.git.yu-cheng...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/processor.h | 2 +- arch/x86/kernel/fpu/core.c | 7 --- arch/x86/kernel/fpu/init.c | 20 +++- arch/x86/kernel/fpu/signal.c | 2 +- arch/x86/kernel/fpu/xstate.c | 8 5 files changed, 21 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 0a16a16..965c5d2 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -367,7 +367,7 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack); DECLARE_PER_CPU(struct irq_stack *, softirq_stack); #endif /* X86_64 */ -extern unsigned int xstate_size; +extern unsigned int fpu_kernel_xstate_size; extern unsigned int fpu_user_xstate_size; struct perf_event; diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 7d56474..c759bd0 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -227,7 +227,7 @@ void fpstate_init(union fpregs_state *state) return; } - memset(state, 0, xstate_size); + memset(state, 0, fpu_kernel_xstate_size); if (static_cpu_has(X86_FEATURE_FXSR)) fpstate_init_fxstate(&state->fxsave); @@ -252,7 +252,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu) * leak into the child task: */ if (use_eager_fpu()) - memset(&dst_fpu->state.xsave, 0, xstate_size); + memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size); /* * Save current FPU registers directly into the child @@ -271,7 +271,8 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu) */ preempt_disable(); if (!copy_fpregs_to_fpstate(dst_fpu)) { - memcpy(&src_fpu->state, &dst_fpu->state, xstate_size); + memcpy(&src_fpu->state, &dst_fpu->state, + fpu_kernel_xstate_size); if (use_eager_fpu()) copy_kernel_to_fpregs(&src_fpu->state); diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index 5b1928c..60f3839 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -145,8 +145,8 @@ static void __init fpu__init_system_generic(void) * This is inherent to the XSAVE architecture which puts all state * components into a single, continuous memory block: */ -unsigned int xstate_size; -EXPORT_SYMBOL_GPL(xstate_size); +unsigned int fpu_kernel_xstate_size; +EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size); /* Get alignment of the TYPE. */ #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test) @@ -178,7 +178,7 @@ static void __init fpu__init_task_struct_size(void) * Add back the dynamically-calculated register state * size. */ - task_size += xstate_size; + task_size += fpu_kernel_xstate_size; /* * We dynamically size 'struct fpu', so we require that @@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void) } /* - * Set up the user and kernel xstate_size based on the legacy FPU context size. + * Set up the user and kernel xstate sizes based on the legacy FPU context size. * * We set this up first, and later it will be overwritten by * fpu__init_system_xstate() if the CPU knows about xstates. @@ -208,7 +208,7 @@ static void __init fpu__init_system_xstate_size_legacy(void) on_boot_cpu = 0; /* -* Note that xstate_size might be overwriten later during +* Note that xstate sizes might be overwritten later during * fpu__init_system_xstate(). */ @@ -219,15 +219,17 @@ static void __init fpu__init_system_xstate_size_legacy(void) */ setup_clear_cpu_cap
[tip:x86/fpu] x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
Commit-ID: 2729818f35c9b1a1614624e2edcd3e80c59c8689 Gitweb: http://git.kernel.org/tip/2729818f35c9b1a1614624e2edcd3e80c59c8689 Author: Fenghua Yu AuthorDate: Fri, 20 May 2016 10:47:07 -0700 Committer: Ingo Molnar CommitDate: Fri, 17 Jun 2016 10:10:23 +0200 x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization Keep init_fpstate.xsave.header.xfeatures as zero for init optimization. This is important for init optimization that is implemented in processor. If a bit corresponding to an xstate in xstate_bv is 0, it means the xstate is in init status and will not be read from memory to the processor during XRSTOR/XRSTORS instruction. This largely impacts context switch performance. Signed-off-by: Fenghua Yu Signed-off-by: Yu-cheng Yu Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Ravi V. Shankar Cc: Sai Praneeth Prakhya Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/2fb4ec7f18b76e8cda057a8c0038def74a9b8044.1463760376.git.yu-cheng...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/fpu/xstate.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 46abfaf..dbfef1b 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -329,13 +329,11 @@ static void __init setup_init_fpu_buf(void) setup_xstate_features(); print_xstate_features(); - if (boot_cpu_has(X86_FEATURE_XSAVES)) { + if (boot_cpu_has(X86_FEATURE_XSAVES)) init_fpstate.xsave.header.xcomp_bv = (u64)1 << 63 | xfeatures_mask; - init_fpstate.xsave.header.xfeatures = xfeatures_mask; - } /* -* Init all the features state with header_bv being 0x0 +* Init all the features state with header.xfeatures being 0x0 */ copy_kernel_to_xregs_booting(&init_fpstate.xsave);
[tip:x86/fpu] x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
Commit-ID: 4543ea7e282d313b48cd34bbb9dc89c1dbdd13a7 Gitweb: http://git.kernel.org/tip/4543ea7e282d313b48cd34bbb9dc89c1dbdd13a7 Author: Fenghua Yu AuthorDate: Fri, 20 May 2016 10:47:05 -0700 Committer: Ingo Molnar CommitDate: Fri, 17 Jun 2016 10:10:22 +0200 x86/fpu/xstate: Define and use 'fpu_user_xstate_size' The kernel xstate area can be in standard or compacted format; it is always in standard format for user mode. When XSAVES is enabled, the kernel uses the compacted format and it is necessary to use a separate fpu_user_xstate_size for signal/ptrace frames. Signed-off-by: Fenghua Yu [ Rebased the patch and cleaned up the naming. ] Signed-off-by: Yu-cheng Yu Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Ravi V. Shankar Cc: Sai Praneeth Prakhya Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/8756ec34dabddfc727cda5743195eb81e8caf91c.1463760376.git.yu-cheng...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/fpu/xstate.h | 1 - arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/fpu/init.c| 5 ++- arch/x86/kernel/fpu/signal.c | 27 ++ arch/x86/kernel/fpu/xstate.c | 76 --- 5 files changed, 73 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 38951b0..16df2c4 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -39,7 +39,6 @@ #define REX_PREFIX #endif -extern unsigned int xstate_size; extern u64 xfeatures_mask; extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS]; diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 62c6cc3..0a16a16 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -368,6 +368,7 @@ DECLARE_PER_CPU(struct irq_stack *, softirq_stack); #endif /* X86_64 */ extern unsigned int xstate_size; +extern unsigned int fpu_user_xstate_size; struct perf_event; diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index aacfd7a..5b1928c 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -195,7 +195,7 @@ static void __init fpu__init_task_struct_size(void) } /* - * Set up the xstate_size based on the legacy FPU context size. + * Set up the user and kernel xstate_size based on the legacy FPU context size. * * We set this up first, and later it will be overwritten by * fpu__init_system_xstate() if the CPU knows about xstates. @@ -226,6 +226,9 @@ static void __init fpu__init_system_xstate_size_legacy(void) else xstate_size = sizeof(struct fregs_state); } + + fpu_user_xstate_size = xstate_size; + /* * Quirk: we don't yet handle the XSAVES* instructions * correctly, as we don't correctly convert between diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index c6f2a3c..0d29d4d 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -32,7 +32,7 @@ static inline int check_for_xstate(struct fxregs_state __user *buf, /* Check for the first magic field and other error scenarios. */ if (fx_sw->magic1 != FP_XSTATE_MAGIC1 || fx_sw->xstate_size < min_xstate_size || - fx_sw->xstate_size > xstate_size || + fx_sw->xstate_size > fpu_user_xstate_size || fx_sw->xstate_size > fx_sw->extended_size) return -1; @@ -89,7 +89,8 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame) if (!use_xsave()) return err; - err |= __put_user(FP_XSTATE_MAGIC2, (__u32 *)(buf + xstate_size)); + err |= __put_user(FP_XSTATE_MAGIC2, + (__u32 *)(buf + fpu_user_xstate_size)); /* * Read the xfeatures which we copied (directly from the cpu or @@ -126,7 +127,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf) else err = copy_fregs_to_user((struct fregs_state __user *) buf); - if (unlikely(err) && __clear_user(buf, xstate_size)) + if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size)) err = -EFAULT; return err; } @@ -176,8 +177,19 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size) if (ia32_fxstate) copy_fxregs_to_kernel(&tsk->thread.fpu); } else { + /* +* It is a *bug* if kernel uses compacted-format for xsave +* area and we copy it out directly to a signal frame. It +* should have been handled above by saving the registers +* directly. +*/ +
[tip:x86/asm] x86/cpufeature: Enable new AVX-512 features
Commit-ID: d05004944206cbbf1c453e179768163731c7c6f1 Gitweb: http://git.kernel.org/tip/d05004944206cbbf1c453e179768163731c7c6f1 Author: Fenghua Yu AuthorDate: Thu, 10 Mar 2016 19:38:18 -0800 Committer: Ingo Molnar CommitDate: Sat, 12 Mar 2016 17:30:53 +0100 x86/cpufeature: Enable new AVX-512 features A few new AVX-512 instruction groups/features are added in cpufeatures.h for enuermation: AVX512DQ, AVX512BW, and AVX512VL. Clear the flags in fpu__xstate_clear_all_cpu_caps(). The specification for latest AVX-512 including the features can be found at: https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up the flags and enable them in KVM. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: Denys Vlasenko Cc: Gleb Natapov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Ravi V Shankar Cc: Thomas Gleixner Cc: k...@vger.kernel.org Link: http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua...@intel.com [ Added more detailed feature descriptions. ] Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 3 +++ arch/x86/kernel/fpu/xstate.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d11a3aa..9e0567f 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -220,6 +220,7 @@ #define X86_FEATURE_CQM( 9*32+12) /* Cache QoS Monitoring */ #define X86_FEATURE_MPX( 9*32+14) /* Memory Protection Extension */ #define X86_FEATURE_AVX512F( 9*32+16) /* AVX-512 Foundation */ +#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX instructions */ #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ @@ -230,6 +231,8 @@ #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ #define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction Extensions */ +#define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ +#define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */ /* Extended state features, CPUID level 0x000d:1 (eax), word 10 */ #define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index d425cda5..6e8354f 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -51,6 +51,9 @@ void fpu__xstate_clear_all_cpu_caps(void) setup_clear_cpu_cap(X86_FEATURE_AVX512PF); setup_clear_cpu_cap(X86_FEATURE_AVX512ER); setup_clear_cpu_cap(X86_FEATURE_AVX512CD); + setup_clear_cpu_cap(X86_FEATURE_AVX512DQ); + setup_clear_cpu_cap(X86_FEATURE_AVX512BW); + setup_clear_cpu_cap(X86_FEATURE_AVX512VL); setup_clear_cpu_cap(X86_FEATURE_MPX); setup_clear_cpu_cap(X86_FEATURE_XGETBV1); }
[tip:x86/cache] x86/intel_rapl: Modify hot cpu notification handling
Commit-ID: 2a7a6718afed6b61628ca1845dc49827759bed7d Gitweb: http://git.kernel.org/tip/2a7a6718afed6b61628ca1845dc49827759bed7d Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:07 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:55 -0800 x86/intel_rapl: Modify hot cpu notification handling From: Vikas Shivappa - In rapl_cpu_init, use the existing package<->core map instead of looping through all cpus in rapl_cpumask. - In rapl_cpu_exit, use the same mapping instead of looping all online cpus. In large systems with large number of cpus the time taken to loop may be expensive and also the time increase linearly. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-3-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 35 ++--- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index ed446bd..0e0fe70 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -130,6 +130,12 @@ static struct pmu rapl_pmu_class; static cpumask_t rapl_cpu_mask; static int rapl_cntr_mask; +/* + * Temporary cpumask used during hot cpu notificaiton handling. The usage + * is serialized by hot cpu locks. + */ +static cpumask_t tmp_cpumask; + static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu); static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free); @@ -533,18 +539,16 @@ static struct pmu rapl_pmu_class = { static void rapl_cpu_exit(int cpu) { struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); - int i, phys_id = topology_physical_package_id(cpu); int target = -1; + int i; /* find a new cpu on same package */ - for_each_online_cpu(i) { - if (i == cpu) - continue; - if (phys_id == topology_physical_package_id(i)) { - target = i; - break; - } - } + cpumask_and(&tmp_cpumask, topology_core_cpumask(cpu), cpu_online_mask); + cpumask_clear_cpu(cpu, &tmp_cpumask); + i = cpumask_any(&tmp_cpumask); + if (i < nr_cpu_ids) + target = i; + /* * clear cpu from cpumask * if was set in cpumask and still some cpu on package, @@ -566,15 +570,10 @@ static void rapl_cpu_exit(int cpu) static void rapl_cpu_init(int cpu) { - int i, phys_id = topology_physical_package_id(cpu); - - /* check if phys_is is already covered */ - for_each_cpu(i, &rapl_cpu_mask) { - if (phys_id == topology_physical_package_id(i)) - return; - } - /* was not found, so add it */ - cpumask_set_cpu(cpu, &rapl_cpu_mask); + /* check if cpu's package is already covered.If not, add it.*/ + cpumask_and(&tmp_cpumask, &rapl_cpu_mask, topology_core_cpumask(cpu)); + if (cpumask_empty(&tmp_cpumask)) + cpumask_set_cpu(cpu, &rapl_cpu_mask); } static __init void rapl_hsw_server_quirk(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cache] x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation
Commit-ID: 5ad9144cdb9a591caa8f9b33b618f137e1fbea93 Gitweb: http://git.kernel.org/tip/5ad9144cdb9a591caa8f9b33b618f137e1fbea93 Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:16 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:57 -0800 x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation From: Vikas Shivappa Add a new cgroup 'intel_rdt' to manage cache allocation. Each cgroup directory is associated with a class of service id(closid). To map a task with closid during scheduling, this patch removes the closid field from task_struct and uses the already existing 'cgroups' field in task_struct. The cgroup has a file 'l3_cbm' which represents the L3 cache capacity bitmask(CBM). The CBM is global for the whole system currently. The capacity bitmask needs to have only contiguous bits set and number of bits that can be set is less than the max bits that can be set. The tasks belonging to a cgroup get to fill in the L3 cache represented by the capacity bitmask of the cgroup. For ex: if the max bits in the CBM is 10 and the cache size is 10MB, each bit represents 1MB of cache capacity. Root cgroup always has all the bits set in the l3_cbm. User can create more cgroups with mkdir syscall. By default the child cgroups inherit the capacity bitmask(CBM) from parent. User can change the CBM specified in hex for each cgroup. Each unique bitmask is associated with a class of service ID and an -ENOSPC is returned once we run out of closids. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-12-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/include/asm/intel_rdt.h | 37 +++- arch/x86/kernel/cpu/intel_rdt.c | 199 +-- include/linux/cgroup_subsys.h| 4 + include/linux/sched.h| 3 - init/Kconfig | 4 +- 5 files changed, 234 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index afb6da3..fbe1e00 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -3,6 +3,7 @@ #ifdef CONFIG_INTEL_RDT +#include #include #define MAX_CBM_LENGTH 32 @@ -12,20 +13,54 @@ extern struct static_key rdt_enable_key; void __intel_rdt_sched_in(void *dummy); +struct intel_rdt { + struct cgroup_subsys_state css; + u32 closid; +}; + struct clos_cbm_table { unsigned long l3_cbm; unsigned int clos_refcnt; }; /* + * Return rdt group corresponding to this container. + */ +static inline struct intel_rdt *css_rdt(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct intel_rdt, css) : NULL; +} + +static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir) +{ + return css_rdt(ir->css.parent); +} + +/* + * Return rdt group to which this task belongs. + */ +static inline struct intel_rdt *task_rdt(struct task_struct *task) +{ + return css_rdt(task_css(task, intel_rdt_cgrp_id)); +} + +/* * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR * * Following considerations are made so that this has minimal impact * on scheduler hot path: * - This will stay as no-op unless we are running on an Intel SKU * which supports L3 cache allocation. + * - When support is present and enabled, does not do any + * IA32_PQR_MSR writes until the user starts really using the feature + * ie creates a rdt cgroup directory and assigns a cache_mask thats + * different from the root cgroup's cache_mask. * - Caches the per cpu CLOSid values and does the MSR write only - * when a task with a different CLOSid is scheduled in. + * when a task with a different CLOSid is scheduled in. That + * means the task belongs to a different cgroup. + * - Closids are allocated so that different cgroup directories + * with same cache_mask gets the same CLOSid. This minimizes CLOSids + * used and reduces MSR write frequency. */ static inline void intel_rdt_sched_in(void) { diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index ecaf8e6..acbede2 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -53,11 +53,17 @@ static cpumask_t tmp_cpumask; static DEFINE_MUTEX(rdt_group_mutex); struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE; +static struct intel_rdt rdt_root_group; +#define rdt_for_each_child(pos_css, parent_ir) \ + css_for_each_child((pos_css), &(parent_ir)->css) + struct rdt_remote_data { int msr; u64 val; }; +static DEFINE_SPINLOCK(closid_lock); + /* * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs * as it does not have CPUID enumeration support for Cache allocation. @@ -108,17 +114,18 @@ static inline bool cache_alloc_supported(struct cpuinfo_x86 *c) return false; } - void __intel_rdt_sched_in(void *dummy)
[tip:x86/cache] x86,cgroup/intel_rdt : Add intel_rdt cgroup documentation
Commit-ID: f5faa67fb17b931e2b0223dc8a4d29e64c9bfa9d Gitweb: http://git.kernel.org/tip/f5faa67fb17b931e2b0223dc8a4d29e64c9bfa9d Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:15 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:57 -0800 x86,cgroup/intel_rdt : Add intel_rdt cgroup documentation From: Vikas Shivappa Add documentation on using the cache allocation cgroup interface with examples. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-11-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- Documentation/cgroups/rdt.txt | 133 ++ 1 file changed, 133 insertions(+) diff --git a/Documentation/cgroups/rdt.txt b/Documentation/cgroups/rdt.txt new file mode 100644 index 000..9fa6c6a --- /dev/null +++ b/Documentation/cgroups/rdt.txt @@ -0,0 +1,133 @@ +RDT +--- + +Copyright (C) 2014 Intel Corporation +Written by vikas.shiva...@linux.intel.com + +CONTENTS: += + +1. Cache Allocation Technology + 1.1 Why is Cache allocation needed? +2. Usage Examples and Syntax + +1. Cache Allocation Technology +=== + +1.1 Why is Cache allocation needed +-- + +In today's new processors the number of cores is continuously increasing +especially in large scale usage models where VMs are used like +webservers and datacenters. The number of cores increase the number of +threads or workloads that can simultaneously be run. When +multi-threaded-applications, VMs, workloads run concurrently they +compete for shared resources including L3 cache. + +The architecture also allows dynamically changing these subsets during +runtime to further optimize the performance of the higher priority +application with minimal degradation to the low priority app. +Additionally, resources can be rebalanced for system throughput benefit. +This technique may be useful in managing large computer systems which +large L3 cache. + +Cloud/Container use case: +The key use case scenarios are in large server clusters in a typical +cloud or container context. A central 'managing agent' would control +resource allocations to a set of VMs or containers. In today's resource +management, cgroups are widely used already and a significant amount of +plumbing in user space is already done to perform tasks like +allocating/configuring resources dynamically and statically. An +important example is dockers using systemd and systemd in turn using +cgroups in its core to manage resources. This makes cgroup interface an +easily adaptable interface for cache allocation. + +Noisy neighbour use case: +A more specific use case may be when a streaming app which is constantly +copying data and accessing linear space larger than L3 cache +and hence evicting a large amount of cache which could have +otherwise been used by a high priority computing application. Using the +cache allocation feature, the 'noisy neighbours' like the streaming +application can be confined to use a smaller cache and the high priority +application be awarded a larger amount of cache space. A managing agent +can monitor the cache allocation using cache monitoring through libperf +and be able to make resource adjustments either statically or +dynamically. +This interface hence helps in maintaining a resource policy to +provide the quality of service requirements like number of requests +handled, response time. + +More information can be found in the Intel SDM June 2015, Volume 3, +section 17.16. More information on kernel implementation details can be +found in Documentation/x86/intel_rdt.txt. + +2. Usage examples and syntax + + +Following is an example on how a system administrator/root user can +configure L3 cache allocation to threads. + +To enable the cache allocation during compile time set the +CONFIG_INTEL_RDT=y. + +To check if Cache allocation was enabled on your system + $ dmesg | grep -i intel_rdt + intel_rdt: Intel Cache Allocation enabled + + $ cat /proc/cpuinfo +output would have 'rdt' (if rdt is enabled) and 'cat_l3' (if L3 +cache allocation is enabled). + +example1: Following would mount the cache allocation cgroup subsystem +and create 2 directories. + + $ cd /sys/fs/cgroup + $ mkdir rdt + $ mount -t cgroup -ointel_rdt intel_rdt /sys/fs/cgroup/rdt + $ cd rdt + $ mkdir group1 + $ mkdir group2 + +Following are some of the Files in the directory + + $ ls + intel_rdt.l3_cbm + tasks + +Say if the cache is 4MB (looked up from /proc/cpuinfo) and max cbm is 16 +bits (indicated by the root nodes cbm). This assigns 1MB of cache to +group1 and group2 which is exclusive between them. + + $ cd group1 + $ /bin/echo 0xf > intel_rdt.l3_cbm + + $ cd group2 + $ /bin/echo 0xf0 > intel_rdt.l3_cbm + +Assign tasks to the group2 + + $ /bin/echo PID1 > tasks + $ /bin/echo PID2 > tasks + +Now threads PID1 and PID2 get to fill the 1MB of cache that was +allocated
[tip:x86/cache] x86/intel_rdt: Intel haswell Cache Allocation enumeration
Commit-ID: 8741b655628d89380bfbe0ded7a83c0bc2293a72 Gitweb: http://git.kernel.org/tip/8741b655628d89380bfbe0ded7a83c0bc2293a72 Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:14 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:56 -0800 x86/intel_rdt: Intel haswell Cache Allocation enumeration From: Vikas Shivappa This patch is specific to Intel haswell (hsw) server SKUs. Cache Allocation on hsw server needs to be enumerated separately as HSW does not have support for CPUID enumeration for Cache Allocation. This patch does a probe by writing a CLOSid (Class of service id) into high 32 bits of IA32_PQR_MSR and see if the bits stick. The probe is only done after confirming that the CPU is HSW server. Other hardcoded values are: - L3 cache bit mask must be at least two bits. - Maximum CLOSids supported is always 4. - Maximum bits support in cache bit mask is always 20. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-10-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/kernel/cpu/intel_rdt.c | 59 +++-- 1 file changed, 57 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 31f8588..ecaf8e6 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -38,6 +38,10 @@ static struct clos_cbm_table *cctable; */ unsigned long *closmap; /* + * Minimum bits required in Cache bitmask. + */ +static unsigned int min_bitmask_len = 1; +/* * Mask of CPUs for writing CBM values. We only need one CPU per-socket. */ static cpumask_t rdt_cpumask; @@ -54,6 +58,57 @@ struct rdt_remote_data { u64 val; }; +/* + * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs + * as it does not have CPUID enumeration support for Cache allocation. + * + * Probes by writing to the high 32 bits(CLOSid) of the IA32_PQR_MSR and + * testing if the bits stick. Max CLOSids is always 4 and max cbm length + * is always 20 on hsw server parts. The minimum cache bitmask length + * allowed for HSW server is always 2 bits. Hardcode all of them. + */ +static inline bool cache_alloc_hsw_probe(void) +{ + u32 l, h_old, h_new, h_tmp; + + if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old)) + return false; + + /* +* Default value is always 0 if feature is present. +*/ + h_tmp = h_old ^ 0x1U; + if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp) || + rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_new)) + return false; + + if (h_tmp != h_new) + return false; + + wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_old); + + boot_cpu_data.x86_cache_max_closid = 4; + boot_cpu_data.x86_cache_max_cbm_len = 20; + min_bitmask_len = 2; + + return true; +} + +static inline bool cache_alloc_supported(struct cpuinfo_x86 *c) +{ + if (cpu_has(c, X86_FEATURE_CAT_L3)) + return true; + + /* +* Probe for Haswell server CPUs. +*/ + if (c->x86 == 0x6 && c->x86_model == 0x3f) + return cache_alloc_hsw_probe(); + + return false; +} + + void __intel_rdt_sched_in(void *dummy) { struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); @@ -126,7 +181,7 @@ static bool cbm_validate(unsigned long var) unsigned long first_bit, zero_bit; u64 max_cbm; - if (bitmap_weight(&var, max_cbm_len) < 1) + if (bitmap_weight(&var, max_cbm_len) < min_bitmask_len) return false; max_cbm = (1ULL << max_cbm_len) - 1; @@ -310,7 +365,7 @@ static int __init intel_rdt_late_init(void) u32 maxid, max_cbm_len; int err = 0, size, i; - if (!cpu_has(c, X86_FEATURE_CAT_L3)) + if (!cache_alloc_supported(c)) return -ENODEV; maxid = c->x86_cache_max_closid; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cache] x86/intel_rdt: Hot cpu support for Cache Allocation
Commit-ID: cf0978cd31053d58c99ab74e613147f86ecd1724 Gitweb: http://git.kernel.org/tip/cf0978cd31053d58c99ab74e613147f86ecd1724 Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:13 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:56 -0800 x86/intel_rdt: Hot cpu support for Cache Allocation From: Vikas Shivappa This patch adds hot plug cpu support for Intel Cache allocation. Support includes updating the cache bitmask MSRs IA32_L3_QOS_n when a new CPU package comes online or goes offline. The IA32_L3_QOS_n MSRs are one per Class of service on each CPU package. The new package's MSRs are synchronized with the values of existing MSRs. Also the software cache for IA32_PQR_ASSOC MSRs are reset during hot cpu notifications. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-9-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/kernel/cpu/intel_rdt.c | 76 + 1 file changed, 76 insertions(+) diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 8379df8..31f8588 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -24,6 +24,7 @@ #include #include +#include #include #include #include @@ -234,6 +235,75 @@ static inline bool rdt_cpumask_update(int cpu) return false; } +/* + * cbm_update_msrs() - Updates all the existing IA32_L3_MASK_n MSRs + * which are one per CLOSid on the current package. + */ +static void cbm_update_msrs(void *dummy) +{ + int maxid = boot_cpu_data.x86_cache_max_closid; + struct rdt_remote_data info; + unsigned int i; + + for (i = 0; i < maxid; i++) { + if (cctable[i].clos_refcnt) { + info.msr = CBM_FROM_INDEX(i); + info.val = cctable[i].l3_cbm; + msr_cpu_update(&info); + } + } +} + +static inline void intel_rdt_cpu_start(int cpu) +{ + struct intel_pqr_state *state = &per_cpu(pqr_state, cpu); + + state->closid = 0; + mutex_lock(&rdt_group_mutex); + if (rdt_cpumask_update(cpu)) + smp_call_function_single(cpu, cbm_update_msrs, NULL, 1); + mutex_unlock(&rdt_group_mutex); +} + +static void intel_rdt_cpu_exit(unsigned int cpu) +{ + int i; + + mutex_lock(&rdt_group_mutex); + if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) { + mutex_unlock(&rdt_group_mutex); + return; + } + + cpumask_and(&tmp_cpumask, topology_core_cpumask(cpu), cpu_online_mask); + cpumask_clear_cpu(cpu, &tmp_cpumask); + i = cpumask_any(&tmp_cpumask); + + if (i < nr_cpu_ids) + cpumask_set_cpu(i, &rdt_cpumask); + mutex_unlock(&rdt_group_mutex); +} + +static int intel_rdt_cpu_notifier(struct notifier_block *nb, + unsigned long action, void *hcpu) +{ + unsigned int cpu = (unsigned long)hcpu; + + switch (action) { + case CPU_DOWN_FAILED: + case CPU_ONLINE: + intel_rdt_cpu_start(cpu); + break; + case CPU_DOWN_PREPARE: + intel_rdt_cpu_exit(cpu); + break; + default: + break; + } + + return NOTIFY_OK; +} + static int __init intel_rdt_late_init(void) { struct cpuinfo_x86 *c = &boot_cpu_data; @@ -261,9 +331,15 @@ static int __init intel_rdt_late_init(void) goto out_err; } + cpu_notifier_register_begin(); + for_each_online_cpu(i) rdt_cpumask_update(i); + __hotcpu_notifier(intel_rdt_cpu_notifier, 0); + + cpu_notifier_register_done(); + static_key_slow_inc(&rdt_enable_key); pr_info("Intel cache allocation enabled\n"); out_err: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cache] x86/intel_rdt: Implement scheduling support for Intel RDT
Commit-ID: f17254c756e640c8299212b6822faf142a89b813 Gitweb: http://git.kernel.org/tip/f17254c756e640c8299212b6822faf142a89b813 Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:12 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:56 -0800 x86/intel_rdt: Implement scheduling support for Intel RDT From: Vikas Shivappa Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. For Cache Allocation, MSR write would let the task fill in the cache 'subset' represented by the task's capacity bit mask. The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the CLOSid. During context switch kernel implements this by writing the CLOSid of the task belongs to the CPU's IA32_PQR_ASSOC MSR. This patch also implements a common software cache for IA32_PQR_MSR (RMID 0:9, CLOSId 32:63) to be used by both Cache monitoring (CMT) and Cache allocation. CMT updates the RMID where as cache_alloc updates the CLOSid in the software cache. During scheduling when the new RMID/CLOSid value is different from the cached values, IA32_PQR_MSR is updated. Since the measured rdmsr latency for IA32_PQR_MSR is very high (~250 cycles) this software cache is necessary to avoid reading the MSR to compare the current CLOSid value. The following considerations are done for the PQR MSR write so that it minimally impacts scheduler hot path: - This path does not exist on any non-intel platforms. - On Intel platforms, this would not exist by default unless INTEL_RDT is enabled. - remains a no-op when INTEL_RDT is enabled and intel SKU does not support the feature. - When feature is available and enabled, never does MSR write till the user manually starts using one of the capacity bit masks. - MSR write is only done when there is a task with different Closid is scheduled on the CPU. Typically if the task groups are bound to be scheduled on a set of CPUs, the number of MSR writes is greatly reduced. - A per CPU cache of CLOSids is maintained to do the check so that we don't have to do a rdmsr which actually costs a lot of cycles. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-8-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/include/asm/intel_rdt.h | 28 arch/x86/include/asm/pqr_common.h | 27 +++ arch/x86/kernel/cpu/intel_rdt.c| 25 + arch/x86/kernel/cpu/perf_event_intel_cqm.c | 26 +++--- arch/x86/kernel/process_64.c | 6 ++ 5 files changed, 89 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 4f45dc8..afb6da3 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -3,14 +3,42 @@ #ifdef CONFIG_INTEL_RDT +#include + #define MAX_CBM_LENGTH 32 #define IA32_L3_CBM_BASE 0xc90 #define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) +extern struct static_key rdt_enable_key; +void __intel_rdt_sched_in(void *dummy); + struct clos_cbm_table { unsigned long l3_cbm; unsigned int clos_refcnt; }; +/* + * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR + * + * Following considerations are made so that this has minimal impact + * on scheduler hot path: + * - This will stay as no-op unless we are running on an Intel SKU + * which supports L3 cache allocation. + * - Caches the per cpu CLOSid values and does the MSR write only + * when a task with a different CLOSid is scheduled in. + */ +static inline void intel_rdt_sched_in(void) +{ + /* +* Call the schedule in code only when RDT is enabled. +*/ + if (static_key_false(&rdt_enable_key)) + __intel_rdt_sched_in(NULL); +} + +#else + +static inline void intel_rdt_sched_in(void) {} + #endif #endif diff --git a/arch/x86/include/asm/pqr_common.h b/arch/x86/include/asm/pqr_common.h new file mode 100644 index 000..11e985c --- /dev/null +++ b/arch/x86/include/asm/pqr_common.h @@ -0,0 +1,27 @@ +#ifndef _X86_RDT_H_ +#define _X86_RDT_H_ + +#define MSR_IA32_PQR_ASSOC 0x0c8f + +/** + * struct intel_pqr_state - State cache for the PQR MSR + * @rmid: The cached Resource Monitoring ID + * @closid:The cached Class Of Service ID + * @rmid_usecnt: The usage counter for rmid + * + * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the + * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always + * contains both parts, so we need to cache them. + * + * The cache also helps to avoid pointless updates if the value does + * not change. + */ +struct intel_pqr_state { + u32 rmid; + u32 closid; + int rmid_usecnt; +}; + +DECLARE_PER_CPU(struct intel_pqr_state, pqr_state); + +#endif diff --git a/arch/x86/kernel/cpu/i
[tip:x86/cache] x86/intel_rdt: Add support for Cache Allocation detection
Commit-ID: 257372262056d9e963990a1ad6a917ca0b57d80e Gitweb: http://git.kernel.org/tip/257372262056d9e963990a1ad6a917ca0b57d80e Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:09 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:55 -0800 x86/intel_rdt: Add support for Cache Allocation detection From: Vikas Shivappa This patch includes CPUID enumeration routines for Cache allocation and new values to track resources to the cpuinfo_x86 structure. Cache allocation provides a way for the Software (OS/VMM) to restrict cache allocation to a defined 'subset' of cache which may be overlapping with other 'subsets'. This feature is used when allocating a line in cache ie when pulling new data into the cache. The programming of the hardware is done via programming MSRs (model specific registers). Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-5-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/include/asm/cpufeature.h | 6 +- arch/x86/include/asm/processor.h | 3 +++ arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/common.c | 15 +++ arch/x86/kernel/cpu/intel_rdt.c | 40 +++ init/Kconfig | 12 6 files changed, 76 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e4f8010..671abaa 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -12,7 +12,7 @@ #include #endif -#define NCAPINTS 14 /* N 32-bit words worth of info */ +#define NCAPINTS 15 /* N 32-bit words worth of info */ #define NBUGINTS 1 /* N 32-bit bug flags */ /* @@ -231,6 +231,7 @@ #define X86_FEATURE_RTM( 9*32+11) /* Restricted Transactional Memory */ #define X86_FEATURE_CQM( 9*32+12) /* Cache QoS Monitoring */ #define X86_FEATURE_MPX( 9*32+14) /* Memory Protection Extension */ +#define X86_FEATURE_RDT( 9*32+15) /* Resource Allocation */ #define X86_FEATURE_AVX512F( 9*32+16) /* AVX-512 Foundation */ #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX instructions */ @@ -258,6 +259,9 @@ /* AMD-defined CPU features, CPUID level 0x8008 (ebx), word 13 */ #define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ +/* Intel-defined CPU features, CPUID level 0x0010:0 (ebx), word 13 */ +#define X86_FEATURE_CAT_L3 (14*32 + 1) /* Cache Allocation L3 */ + /* * BUG word(s) */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6752225..c0aa1eb 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -120,6 +120,9 @@ struct cpuinfo_x86 { int x86_cache_occ_scale;/* scale to bytes */ int x86_power; unsigned long loops_per_jiffy; + /* Cache Allocation values: */ + u16 x86_cache_max_cbm_len; + u16 x86_cache_max_closid; /* cpuid returned max cores value: */ u16 x86_max_cores; u16 apicid; diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 5803130..b3292a4 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -51,6 +51,7 @@ obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_msr.o obj-$(CONFIG_CPU_SUP_AMD) += perf_event_msr.o endif +obj-$(CONFIG_INTEL_RDT)+= intel_rdt.o obj-$(CONFIG_X86_MCE) += mcheck/ obj-$(CONFIG_MTRR) += mtrr/ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index c2b7522..e64dc78 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -653,6 +653,21 @@ void get_cpu_cap(struct cpuinfo_x86 *c) } } + /* Additional Intel-defined flags: level 0x0010 */ + if (c->cpuid_level >= 0x0010) { + u32 eax, ebx, ecx, edx; + + cpuid_count(0x0010, 0, &eax, &ebx, &ecx, &edx); + c->x86_capability[14] = ebx; + + if (cpu_has(c, X86_FEATURE_CAT_L3)) { + + cpuid_count(0x0010, 1, &eax, &ebx, &ecx, &edx); + c->x86_cache_max_closid = edx + 1; + c->x86_cache_max_cbm_len = eax + 1; + } + } + /* AMD-defined flags: level 0x8001 */ xlvl = cpuid_eax(0x8000); c->extended_cpuid_level = xlvl; diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c new file mode 100644 index 000..f49e970 --- /dev/null +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -0,0 +1,40 @@ +/* +
[tip:x86/cache] x86/intel_rdt: Add Class of service management
Commit-ID: d4223b381c10bff94dc7491806b6108429831fc6 Gitweb: http://git.kernel.org/tip/d4223b381c10bff94dc7491806b6108429831fc6 Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:10 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:56 -0800 x86/intel_rdt: Add Class of service management From: Vikas Shivappa Adds some data-structures and APIs to support Class of service management(closid). There is a new clos_cbm table which keeps a 1:1 mapping between closid and capacity bit mask (cbm) and a count of usage of closid. Each task would be associated with a Closid at a time and this patch adds a new field closid to task_struct to keep track of the same. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-6-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/include/asm/intel_rdt.h | 12 ++ arch/x86/kernel/cpu/intel_rdt.c | 82 +++- include/linux/sched.h| 3 ++ 3 files changed, 95 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h new file mode 100644 index 000..88b7643 --- /dev/null +++ b/arch/x86/include/asm/intel_rdt.h @@ -0,0 +1,12 @@ +#ifndef _RDT_H_ +#define _RDT_H_ + +#ifdef CONFIG_INTEL_RDT + +struct clos_cbm_table { + unsigned long l3_cbm; + unsigned int clos_refcnt; +}; + +#endif +#endif diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index f49e970..d79213a 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -24,17 +24,95 @@ #include #include +#include + +/* + * cctable maintains 1:1 mapping between CLOSid and cache bitmask. + */ +static struct clos_cbm_table *cctable; +/* + * closid availability bit map. + */ +unsigned long *closmap; +static DEFINE_MUTEX(rdt_group_mutex); + +static inline void closid_get(u32 closid) +{ + struct clos_cbm_table *cct = &cctable[closid]; + + lockdep_assert_held(&rdt_group_mutex); + + cct->clos_refcnt++; +} + +static int closid_alloc(u32 *closid) +{ + u32 maxid; + u32 id; + + lockdep_assert_held(&rdt_group_mutex); + + maxid = boot_cpu_data.x86_cache_max_closid; + id = find_first_zero_bit(closmap, maxid); + if (id == maxid) + return -ENOSPC; + + set_bit(id, closmap); + closid_get(id); + *closid = id; + + return 0; +} + +static inline void closid_free(u32 closid) +{ + clear_bit(closid, closmap); + cctable[closid].l3_cbm = 0; +} + +static void closid_put(u32 closid) +{ + struct clos_cbm_table *cct = &cctable[closid]; + + lockdep_assert_held(&rdt_group_mutex); + if (WARN_ON(!cct->clos_refcnt)) + return; + + if (!--cct->clos_refcnt) + closid_free(closid); +} static int __init intel_rdt_late_init(void) { struct cpuinfo_x86 *c = &boot_cpu_data; + u32 maxid, max_cbm_len; + int err = 0, size; if (!cpu_has(c, X86_FEATURE_CAT_L3)) return -ENODEV; - pr_info("Intel cache allocation detected\n"); + maxid = c->x86_cache_max_closid; + max_cbm_len = c->x86_cache_max_cbm_len; - return 0; + size = maxid * sizeof(struct clos_cbm_table); + cctable = kzalloc(size, GFP_KERNEL); + if (!cctable) { + err = -ENOMEM; + goto out_err; + } + + size = BITS_TO_LONGS(maxid) * sizeof(long); + closmap = kzalloc(size, GFP_KERNEL); + if (!closmap) { + kfree(cctable); + err = -ENOMEM; + goto out_err; + } + + pr_info("Intel cache allocation enabled\n"); +out_err: + + return err; } late_initcall(intel_rdt_late_init); diff --git a/include/linux/sched.h b/include/linux/sched.h index edad7a4..0a6db46 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1668,6 +1668,9 @@ struct task_struct { /* cg_list protected by css_set_lock and tsk->alloc_lock */ struct list_head cg_list; #endif +#ifdef CONFIG_INTEL_RDT + u32 closid; +#endif #ifdef CONFIG_FUTEX struct robust_list_head __user *robust_list; #ifdef CONFIG_COMPAT -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cache] x86/intel_rdt: Add L3 cache capacity bitmask management
Commit-ID: a424209c74c3c30fb1677075afa5d9277e01c46b Gitweb: http://git.kernel.org/tip/a424209c74c3c30fb1677075afa5d9277e01c46b Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:11 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:56 -0800 x86/intel_rdt: Add L3 cache capacity bitmask management From: Vikas Shivappa This patch adds different APIs to manage the L3 cache capacity bitmask. The capacity bit mask(CBM) needs to have only contiguous bits set. The current implementation has a global CBM for each class of service id. There are APIs added to update the CBM via MSR write to IA32_L3_MASK_n on all packages. Other APIs are to read and write entries to the clos_cbm_table. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-7-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/include/asm/intel_rdt.h | 4 ++ arch/x86/kernel/cpu/intel_rdt.c | 133 ++- 2 files changed, 136 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 88b7643..4f45dc8 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -3,6 +3,10 @@ #ifdef CONFIG_INTEL_RDT +#define MAX_CBM_LENGTH 32 +#define IA32_L3_CBM_BASE 0xc90 +#define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) + struct clos_cbm_table { unsigned long l3_cbm; unsigned int clos_refcnt; diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index d79213a..6ad5b48 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -34,8 +34,22 @@ static struct clos_cbm_table *cctable; * closid availability bit map. */ unsigned long *closmap; +/* + * Mask of CPUs for writing CBM values. We only need one CPU per-socket. + */ +static cpumask_t rdt_cpumask; +/* + * Temporary cpumask used during hot cpu notificaiton handling. The usage + * is serialized by hot cpu locks. + */ +static cpumask_t tmp_cpumask; static DEFINE_MUTEX(rdt_group_mutex); +struct rdt_remote_data { + int msr; + u64 val; +}; + static inline void closid_get(u32 closid) { struct clos_cbm_table *cct = &cctable[closid]; @@ -82,11 +96,126 @@ static void closid_put(u32 closid) closid_free(closid); } +static bool cbm_validate(unsigned long var) +{ + u32 max_cbm_len = boot_cpu_data.x86_cache_max_cbm_len; + unsigned long first_bit, zero_bit; + u64 max_cbm; + + if (bitmap_weight(&var, max_cbm_len) < 1) + return false; + + max_cbm = (1ULL << max_cbm_len) - 1; + if (var & ~max_cbm) + return false; + + first_bit = find_first_bit(&var, max_cbm_len); + zero_bit = find_next_zero_bit(&var, max_cbm_len, first_bit); + + if (find_next_bit(&var, max_cbm_len, zero_bit) < max_cbm_len) + return false; + + return true; +} + +static int clos_cbm_table_read(u32 closid, unsigned long *l3_cbm) +{ + u32 maxid = boot_cpu_data.x86_cache_max_closid; + + lockdep_assert_held(&rdt_group_mutex); + + if (closid >= maxid) + return -EINVAL; + + *l3_cbm = cctable[closid].l3_cbm; + + return 0; +} + +/* + * clos_cbm_table_update() - Update a clos cbm table entry. + * @closid: the closid whose cbm needs to be updated + * @cbm: the new cbm value that has to be updated + * + * This assumes the cbm is validated as per the interface requirements + * and the cache allocation requirements(through the cbm_validate). + */ +static int clos_cbm_table_update(u32 closid, unsigned long cbm) +{ + u32 maxid = boot_cpu_data.x86_cache_max_closid; + + lockdep_assert_held(&rdt_group_mutex); + + if (closid >= maxid) + return -EINVAL; + + cctable[closid].l3_cbm = cbm; + + return 0; +} + +static bool cbm_search(unsigned long cbm, u32 *closid) +{ + u32 maxid = boot_cpu_data.x86_cache_max_closid; + u32 i; + + for (i = 0; i < maxid; i++) { + if (cctable[i].clos_refcnt && + bitmap_equal(&cbm, &cctable[i].l3_cbm, MAX_CBM_LENGTH)) { + *closid = i; + return true; + } + } + + return false; +} + +static void closcbm_map_dump(void) +{ + u32 i; + + pr_debug("CBMMAP\n"); + for (i = 0; i < boot_cpu_data.x86_cache_max_closid; i++) { + pr_debug("l3_cbm: 0x%x,clos_refcnt: %u\n", +(unsigned int)cctable[i].l3_cbm, cctable[i].clos_refcnt); + } +} + +static void msr_cpu_update(void *arg) +{ + struct rdt_remote_data *info = arg; + + wrmsrl(info->msr, info->val); +} + +/* + * msr_update_all() - Update the msr for all packages. + */ +static inline void msr_update_all(int msr, u64 val) +{ + struct rdt_remote_data info; + + info.msr = msr; + info.v
[tip:x86/cache] x86/intel_rdt: Cache Allocation documentation
Commit-ID: 133b3d646e2cc7b49c71dc0fdff76a690611a5d0 Gitweb: http://git.kernel.org/tip/133b3d646e2cc7b49c71dc0fdff76a690611a5d0 Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:08 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:55 -0800 x86/intel_rdt: Cache Allocation documentation From: Vikas Shivappa Adds a description of Cache allocation technology, overview of kernel framework implementation. The framework has APIs to manage class of service, capacity bitmask(CBM), scheduling support and other architecture specific implementation. The APIs are used to build the cgroup interface in later patches. Cache allocation is a sub-feature of Resource Director Technology (RDT) or Platform Shared resource control which provides support to control Platform shared resources like L3 cache. Cache Allocation Technology provides a way for the Software (OS/VMM) to restrict cache allocation to a defined 'subset' of cache which may be overlapping with other 'subsets'. This feature is used when allocating a line in cache ie when pulling new data into the cache. The tasks are grouped into CLOS (class of service). OS uses MSR writes to indicate the CLOSid of the thread when scheduling in and to indicate the cache capacity associated with the CLOSid. Currently cache allocation is supported for L3 cache. More information can be found in the Intel SDM June 2015, Volume 3, section 17.16. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-4-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- Documentation/x86/intel_rdt.txt | 109 1 file changed, 109 insertions(+) diff --git a/Documentation/x86/intel_rdt.txt b/Documentation/x86/intel_rdt.txt new file mode 100644 index 000..05ec819 --- /dev/null +++ b/Documentation/x86/intel_rdt.txt @@ -0,0 +1,109 @@ +Intel RDT +- + +Copyright (C) 2014 Intel Corporation +Written by vikas.shiva...@linux.intel.com + +CONTENTS: += + +1. Cache Allocation Technology + 1.1 What is RDT and Cache allocation ? + 1.2 Why is Cache allocation needed ? + 1.3 Cache allocation implementation overview + 1.4 Assignment of CBM and CLOS + 1.5 Scheduling and Context Switch + +1. Cache Allocation Technology +=== + +1.1 What is RDT and Cache allocation + + +Cache allocation is a sub-feature of Resource Director Technology (RDT) +Allocation or Platform Shared resource control which provides support to +control Platform shared resources like L3 cache. Currently L3 Cache is +the only resource that is supported in RDT. More information can be +found in the Intel SDM June 2015, Volume 3, section 17.16. + +Cache Allocation Technology provides a way for the Software (OS/VMM) to +restrict cache allocation to a defined 'subset' of cache which may be +overlapping with other 'subsets'. This feature is used when allocating a +line in cache ie when pulling new data into the cache. The programming +of the h/w is done via programming MSRs. + +The different cache subsets are identified by CLOS identifier (class of +service) and each CLOS has a CBM (cache bit mask). The CBM is a +contiguous set of bits which defines the amount of cache resource that +is available for each 'subset'. + +1.2 Why is Cache allocation needed +-- + +In todays new processors the number of cores is continuously increasing +especially in large scale usage models where VMs are used like +webservers and datacenters. The number of cores increase the number of +threads or workloads that can simultaneously be run. When +multi-threaded-applications, VMs, workloads run concurrently they +compete for shared resources including L3 cache. + +The architecture also allows dynamically changing these subsets during +runtime to further optimize the performance of the higher priority +application with minimal degradation to the low priority app. +Additionally, resources can be rebalanced for system throughput benefit. + +This technique may be useful in managing large computer server systems +with large L3 cache, in the cloud and container context. Examples may be +large servers running instances of webservers or database servers. In +such complex systems, these subsets can be used for more careful placing +of the available cache resources by a centralized root accessible +interface. + +A specific use case may be to solve the noisy neighbour issue when a app +which is constantly copying data like streaming app is using large +amount of cache which could have otherwise been used by a high priority +computing application. Using the cache allocation feature, the streaming +application can be confined to use a smaller cache and the high priority +application be awarded a larger amount of cache space. + +1.3 Cache allocation implementation Overview + + +Kernel has a new field i
[tip:x86/cache] x86/intel_cqm: Modify hot cpu notification handling
Commit-ID: 8a91dc4e92327b61fbe5941d25e74660e2a44579 Gitweb: http://git.kernel.org/tip/8a91dc4e92327b61fbe5941d25e74660e2a44579 Author: Fenghua Yu AuthorDate: Thu, 17 Dec 2015 14:46:06 -0800 Committer: H. Peter Anvin CommitDate: Fri, 18 Dec 2015 13:17:55 -0800 x86/intel_cqm: Modify hot cpu notification handling From: Vikas Shivappa - In cqm_pick_event_reader, use the existing package<->core map instead of looping through all cpus in cqm_cpumask. - In intel_cqm_cpu_exit, use the same map instead of looping through all online cpus. In large systems with large number of cpus the time taken to loop may be expensive and also the time increases linearly. Signed-off-by: Vikas Shivappa Link: http://lkml.kernel.org/r/1450392376-6397-2-git-send-email-fenghua...@intel.com Signed-off-by: Fenghua Yu --- arch/x86/kernel/cpu/perf_event_intel_cqm.c | 34 +++--- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c index a316ca9..dd82bc7 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c +++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c @@ -62,6 +62,12 @@ static LIST_HEAD(cache_groups); */ static cpumask_t cqm_cpumask; +/* + * Temporary cpumask used during hot cpu notificaiton handling. The usage + * is serialized by hot cpu locks. + */ +static cpumask_t tmp_cpumask; + #define RMID_VAL_ERROR (1ULL << 63) #define RMID_VAL_UNAVAIL (1ULL << 62) @@ -1244,15 +1250,13 @@ static struct pmu intel_cqm_pmu = { static inline void cqm_pick_event_reader(int cpu) { - int phys_id = topology_physical_package_id(cpu); - int i; + cpumask_and(&tmp_cpumask, &cqm_cpumask, topology_core_cpumask(cpu)); - for_each_cpu(i, &cqm_cpumask) { - if (phys_id == topology_physical_package_id(i)) - return; /* already got reader for this socket */ - } - - cpumask_set_cpu(cpu, &cqm_cpumask); + /* +* Pick a reader if there isn't one already. +*/ + if (cpumask_empty(&tmp_cpumask)) + cpumask_set_cpu(cpu, &cqm_cpumask); } static void intel_cqm_cpu_starting(unsigned int cpu) @@ -1270,7 +1274,6 @@ static void intel_cqm_cpu_starting(unsigned int cpu) static void intel_cqm_cpu_exit(unsigned int cpu) { - int phys_id = topology_physical_package_id(cpu); int i; /* @@ -1279,15 +1282,12 @@ static void intel_cqm_cpu_exit(unsigned int cpu) if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask)) return; - for_each_online_cpu(i) { - if (i == cpu) - continue; + cpumask_and(&tmp_cpumask, topology_core_cpumask(cpu), cpu_online_mask); + cpumask_clear_cpu(cpu, &tmp_cpumask); + i = cpumask_any(&tmp_cpumask); - if (phys_id == topology_physical_package_id(i)) { - cpumask_set_cpu(i, &cqm_cpumask); - break; - } - } + if (i < nr_cpu_ids) + cpumask_set_cpu(i, &cqm_cpumask); } static int intel_cqm_cpu_notifier(struct notifier_block *nb, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes
Commit-ID: 1db491f77b6ed0f32f1d4a3ac40a5be9524f1914 Gitweb: http://git.kernel.org/tip/1db491f77b6ed0f32f1d4a3ac40a5be9524f1914 Author: Fenghua Yu AuthorDate: Thu, 15 Jan 2015 20:30:01 -0800 Committer: Ingo Molnar CommitDate: Thu, 19 Feb 2015 01:28:38 +0100 x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes With more embedded systems emerging using Quark, among other things, 32-bit kernel matters again. 32-bit machine and kernel uses PAE paging, which currently wastes at least 4K of memory per process on Linux where we have to reserve an entire page to support a single 32-byte PGD structure. It would be a very good thing if we could eliminate that wastage. PAE paging is used to access more than 4GB memory on x86-32. And it is required for NX. In this patch, we still allocate one page for pgd for a Xen domain and 64-bit kernel because one page pgd is assumed in these cases. But we can save memory space by only allocating 32-byte pgd for 32-bit PAE kernel when it is not running as a Xen domain. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Christoph Lameter Cc: Dave Hansen Cc: Glenn Williamson Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1421382601-46912-1-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/mm/pgtable.c | 81 +-- 1 file changed, 78 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 6fb6927..d223e1f 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -271,12 +271,87 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[]) } } +/* + * Xen paravirt assumes pgd table should be in one page. 64 bit kernel also + * assumes that pgd should be in one page. + * + * But kernel with PAE paging that is not running as a Xen domain + * only needs to allocate 32 bytes for pgd instead of one page. + */ +#ifdef CONFIG_X86_PAE + +#include + +#define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t)) +#define PGD_ALIGN 32 + +static struct kmem_cache *pgd_cache; + +static int __init pgd_cache_init(void) +{ + /* +* When PAE kernel is running as a Xen domain, it does not use +* shared kernel pmd. And this requires a whole page for pgd. +*/ + if (!SHARED_KERNEL_PMD) + return 0; + + /* +* when PAE kernel is not running as a Xen domain, it uses +* shared kernel pmd. Shared kernel pmd does not require a whole +* page for pgd. We are able to just allocate a 32-byte for pgd. +* During boot time, we create a 32-byte slab for pgd table allocation. +*/ + pgd_cache = kmem_cache_create("pgd_cache", PGD_SIZE, PGD_ALIGN, + SLAB_PANIC, NULL); + if (!pgd_cache) + return -ENOMEM; + + return 0; +} +core_initcall(pgd_cache_init); + +static inline pgd_t *_pgd_alloc(void) +{ + /* +* If no SHARED_KERNEL_PMD, PAE kernel is running as a Xen domain. +* We allocate one page for pgd. +*/ + if (!SHARED_KERNEL_PMD) + return (pgd_t *)__get_free_page(PGALLOC_GFP); + + /* +* Now PAE kernel is not running as a Xen domain. We can allocate +* a 32-byte slab for pgd to save memory space. +*/ + return kmem_cache_alloc(pgd_cache, PGALLOC_GFP); +} + +static inline void _pgd_free(pgd_t *pgd) +{ + if (!SHARED_KERNEL_PMD) + free_page((unsigned long)pgd); + else + kmem_cache_free(pgd_cache, pgd); +} +#else +static inline pgd_t *_pgd_alloc(void) +{ + return (pgd_t *)__get_free_page(PGALLOC_GFP); +} + +static inline void _pgd_free(pgd_t *pgd) +{ + free_page((unsigned long)pgd); +} +#endif /* CONFIG_X86_PAE */ + pgd_t *pgd_alloc(struct mm_struct *mm) { pgd_t *pgd; pmd_t *pmds[PREALLOCATED_PMDS]; - pgd = (pgd_t *)__get_free_page(PGALLOC_GFP); + pgd = _pgd_alloc(); if (pgd == NULL) goto out; @@ -306,7 +381,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm) out_free_pmds: free_pmds(pmds); out_free_pgd: - free_page((unsigned long)pgd); + _pgd_free(pgd); out: return NULL; } @@ -316,7 +391,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd) pgd_mop_up_pmds(mm, pgd); pgd_dtor(pgd); paravirt_pgd_free(mm, pgd); - free_page((unsigned long)pgd); + _pgd_free(pgd); } /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Clean up code in xstate offsets computation in xsave area
Commit-ID: 8ff925e10f2c72680918b95173ef4f8bb982d59e Gitweb: http://git.kernel.org/tip/8ff925e10f2c72680918b95173ef4f8bb982d59e Author: Fenghua Yu AuthorDate: Fri, 30 May 2014 14:59:24 -0700 Committer: H. Peter Anvin CommitDate: Fri, 30 May 2014 17:12:41 -0700 x86/xsaves: Clean up code in xstate offsets computation in xsave area This patch cleans up some code in xstate offsets computation in xsave area: 1. It changes xstate_comp_offsets as an array. This avoids possible NULL pointer caused by possible kmalloc() failure during boot time. 2. It changes the global variable xstate_comp_sizes to a local variable because it is used only in setup_xstate_comp(). 3. It adds missing offsets for FP and SSE in xsave area. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/xsave.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index a6cb823..940b142 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -26,7 +26,7 @@ struct xsave_struct *init_xstate_buf; static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32; static unsigned int *xstate_offsets, *xstate_sizes; -static unsigned int *xstate_comp_offsets, *xstate_comp_sizes; +static unsigned int xstate_comp_offsets[sizeof(pcntxt_mask)*8]; static unsigned int xstate_features; /* @@ -491,11 +491,16 @@ static void __init setup_xstate_features(void) */ void setup_xstate_comp(void) { + unsigned int xstate_comp_sizes[sizeof(pcntxt_mask)*8]; int i; - xstate_comp_offsets = kmalloc(xstate_features * sizeof(int), - GFP_KERNEL); - xstate_comp_sizes = kmalloc(xstate_features * sizeof(int), GFP_KERNEL); + /* +* The FP xstates and SSE xstates are legacy states. They are always +* in the fixed offsets in the xsave area in either compacted form +* or standard form. +*/ + xstate_comp_offsets[0] = 0; + xstate_comp_offsets[1] = offsetof(struct i387_fxsave_struct, xmm_space); if (!cpu_has_xsaves) { for (i = 2; i < xstate_features; i++) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/cpufeature.h: Reformat x86 feature macros
Commit-ID: 446fd806f5408b623fa51f3aa084e56844563779 Gitweb: http://git.kernel.org/tip/446fd806f5408b623fa51f3aa084e56844563779 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:29 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 12:37:10 -0700 x86/cpufeature.h: Reformat x86 feature macros In each X86 feature macro definition, add one space in front of the word number which is a one-digit number currently. The purpose of reformatting the macros is to align one-digit and two-digit word numbers. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-2-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/cpufeature.h | 362 +++--- 1 file changed, 181 insertions(+), 181 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e265ff9..2837b92 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -18,213 +18,213 @@ */ /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */ -#define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */ -#define X86_FEATURE_VME(0*32+ 1) /* Virtual Mode Extensions */ -#define X86_FEATURE_DE (0*32+ 2) /* Debugging Extensions */ -#define X86_FEATURE_PSE(0*32+ 3) /* Page Size Extensions */ -#define X86_FEATURE_TSC(0*32+ 4) /* Time Stamp Counter */ -#define X86_FEATURE_MSR(0*32+ 5) /* Model-Specific Registers */ -#define X86_FEATURE_PAE(0*32+ 6) /* Physical Address Extensions */ -#define X86_FEATURE_MCE(0*32+ 7) /* Machine Check Exception */ -#define X86_FEATURE_CX8(0*32+ 8) /* CMPXCHG8 instruction */ -#define X86_FEATURE_APIC (0*32+ 9) /* Onboard APIC */ -#define X86_FEATURE_SEP(0*32+11) /* SYSENTER/SYSEXIT */ -#define X86_FEATURE_MTRR (0*32+12) /* Memory Type Range Registers */ -#define X86_FEATURE_PGE(0*32+13) /* Page Global Enable */ -#define X86_FEATURE_MCA(0*32+14) /* Machine Check Architecture */ -#define X86_FEATURE_CMOV (0*32+15) /* CMOV instructions */ +#define X86_FEATURE_FPU( 0*32+ 0) /* Onboard FPU */ +#define X86_FEATURE_VME( 0*32+ 1) /* Virtual Mode Extensions */ +#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ +#define X86_FEATURE_PSE( 0*32+ 3) /* Page Size Extensions */ +#define X86_FEATURE_TSC( 0*32+ 4) /* Time Stamp Counter */ +#define X86_FEATURE_MSR( 0*32+ 5) /* Model-Specific Registers */ +#define X86_FEATURE_PAE( 0*32+ 6) /* Physical Address Extensions */ +#define X86_FEATURE_MCE( 0*32+ 7) /* Machine Check Exception */ +#define X86_FEATURE_CX8( 0*32+ 8) /* CMPXCHG8 instruction */ +#define X86_FEATURE_APIC ( 0*32+ 9) /* Onboard APIC */ +#define X86_FEATURE_SEP( 0*32+11) /* SYSENTER/SYSEXIT */ +#define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ +#define X86_FEATURE_PGE( 0*32+13) /* Page Global Enable */ +#define X86_FEATURE_MCA( 0*32+14) /* Machine Check Architecture */ +#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ /* (plus FCMOVcc, FCOMI with FPU) */ -#define X86_FEATURE_PAT(0*32+16) /* Page Attribute Table */ -#define X86_FEATURE_PSE36 (0*32+17) /* 36-bit PSEs */ -#define X86_FEATURE_PN (0*32+18) /* Processor serial number */ -#define X86_FEATURE_CLFLUSH(0*32+19) /* CLFLUSH instruction */ -#define X86_FEATURE_DS (0*32+21) /* "dts" Debug Store */ -#define X86_FEATURE_ACPI (0*32+22) /* ACPI via MSR */ -#define X86_FEATURE_MMX(0*32+23) /* Multimedia Extensions */ -#define X86_FEATURE_FXSR (0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */ -#define X86_FEATURE_XMM(0*32+25) /* "sse" */ -#define X86_FEATURE_XMM2 (0*32+26) /* "sse2" */ -#define X86_FEATURE_SELFSNOOP (0*32+27) /* "ss" CPU self snoop */ -#define X86_FEATURE_HT (0*32+28) /* Hyper-Threading */ -#define X86_FEATURE_ACC(0*32+29) /* "tm" Automatic clock control */ -#define X86_FEATURE_IA64 (0*32+30) /* IA-64 processor */ -#define X86_FEATURE_PBE(0*32+31) /* Pending Break Enable */ +#define X86_FEATURE_PAT( 0*32+16) /* Page Attribute Table */ +#define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ +#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ +#define X86_FEATURE_CLFLUSH( 0*32+19) /* CLFLUSH instruction */ +#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */ +#define X86_FEATURE_ACPI ( 0*32+22) /* ACPI via MSR */ +#define X86_FEATURE_MMX( 0*32+23) /* Multimedia Extensions */ +#define X86_FEATURE
[tip:x86/xsave] Define kernel API to get address of each state in xsave area
Commit-ID: 7496d6458fe3219d63848ce4a9afbd86245cab22 Gitweb: http://git.kernel.org/tip/7496d6458fe3219d63848ce4a9afbd86245cab22 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:44 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:33:09 -0700 Define kernel API to get address of each state in xsave area In standard form, each state is saved in the xsave area in fixed offset. But in compacted form, offset of each saved state only can be calculated during run time because some xstates may not be enabled and saved. We define kernel API get_xsave_addr() returns address of a given state saved in a xsave area. It can be called in kernel to get address of each xstate in xsave area in either standard format or compacted format. It's useful when kernel wants to directly access each state in xsave area. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/xsave.h | 3 +++ arch/x86/kernel/process.c| 1 + arch/x86/kernel/xsave.c | 64 3 files changed, 68 insertions(+) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index aa3ff0c..1ba577c 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -255,4 +255,7 @@ static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask) return err; } +void *get_xsave_addr(struct xsave_struct *xsave, int xstate); +void setup_xstate_comp(void); + #endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 4505e2a..f804dc9 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -93,6 +93,7 @@ void arch_task_cache_init(void) kmem_cache_create("task_xstate", xstate_size, __alignof__(union thread_xstate), SLAB_PANIC | SLAB_NOTRACK, NULL); + setup_xstate_comp(); } /* diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index f930f8a..a6cb823 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -482,6 +482,47 @@ static void __init setup_xstate_features(void) } /* + * This function sets up offsets and sizes of all extended states in + * xsave area. This supports both standard format and compacted format + * of the xsave aread. + * + * Input: void + * Output: void + */ +void setup_xstate_comp(void) +{ + int i; + + xstate_comp_offsets = kmalloc(xstate_features * sizeof(int), + GFP_KERNEL); + xstate_comp_sizes = kmalloc(xstate_features * sizeof(int), GFP_KERNEL); + + if (!cpu_has_xsaves) { + for (i = 2; i < xstate_features; i++) { + if (test_bit(i, (unsigned long *)&pcntxt_mask)) { + xstate_comp_offsets[i] = xstate_offsets[i]; + xstate_comp_sizes[i] = xstate_sizes[i]; + } + } + return; + } + + xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE; + + for (i = 2; i < xstate_features; i++) { + if (test_bit(i, (unsigned long *)&pcntxt_mask)) + xstate_comp_sizes[i] = xstate_sizes[i]; + else + xstate_comp_sizes[i] = 0; + + if (i > 2) + xstate_comp_offsets[i] = xstate_comp_offsets[i-1] + + xstate_comp_sizes[i-1]; + + } +} + +/* * setup the xstate image representing the init state */ static void __init setup_init_fpu_buf(void) @@ -668,3 +709,26 @@ void eager_fpu_init(void) else fxrstor_checking(&init_xstate_buf->i387); } + +/* + * Given the xsave area and a state inside, this function returns the + * address of the state. + * + * This is the API that is called to get xstate address in either + * standard format or compacted format of xsave area. + * + * Inputs: + * xsave: base address of the xsave area; + * xstate: state which is defined in xsave.h (e.g. XSTATE_FP, XSTATE_SSE, + * etc.) + * Output: + * address of the state in the xsave area. + */ +void *get_xsave_addr(struct xsave_struct *xsave, int xstate) +{ + int feature = fls64(xstate) - 1; + if (!test_bit(feature, (unsigned long *)&pcntxt_mask)) + return NULL; + + return (void *)xsave + xstate_comp_offsets[feature]; +} -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Enable xsaves/xrstors
Commit-ID: 7e7ce87f6ad4e1730364e5e76628b43c5759b700 Gitweb: http://git.kernel.org/tip/7e7ce87f6ad4e1730364e5e76628b43c5759b700 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:43 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:33:07 -0700 x86/xsaves: Enable xsaves/xrstors If xsaves/xrstors is enabled, compacted format of xsave area will be used and less memory may be used for context per process. And modified optimization implemented in xsaves/xrstors improves performance of saving xstate. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-16-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/xsave.c | 39 +-- 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index 8fa7c7d..f930f8a 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -24,7 +25,9 @@ u64 pcntxt_mask; struct xsave_struct *init_xstate_buf; static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32; -static unsigned int *xstate_offsets, *xstate_sizes, xstate_features; +static unsigned int *xstate_offsets, *xstate_sizes; +static unsigned int *xstate_comp_offsets, *xstate_comp_sizes; +static unsigned int xstate_features; /* * If a processor implementation discern that a processor state component is @@ -283,7 +286,7 @@ sanitize_restored_xstate(struct task_struct *tsk, if (use_xsave()) { /* These bits must be zero. */ - xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0; + memset(xsave_hdr->reserved, 0, 48); /* * Init the state that is not present in the memory @@ -526,6 +529,30 @@ static int __init eager_fpu_setup(char *s) } __setup("eagerfpu=", eager_fpu_setup); + +/* + * Calculate total size of enabled xstates in XCR0/pcntxt_mask. + */ +static void __init init_xstate_size(void) +{ + unsigned int eax, ebx, ecx, edx; + int i; + + if (!cpu_has_xsaves) { + cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx); + xstate_size = ebx; + return; + } + + xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE; + for (i = 2; i < 64; i++) { + if (test_bit(i, (unsigned long *)&pcntxt_mask)) { + cpuid_count(XSTATE_CPUID, i, &eax, &ebx, &ecx, &edx); + xstate_size += eax; + } + } +} + /* * Enable and initialize the xsave feature. */ @@ -557,8 +584,7 @@ static void __init xstate_enable_boot_cpu(void) /* * Recompute the context size for enabled features */ - cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx); - xstate_size = ebx; + init_xstate_size(); update_regset_xstate_info(xstate_size, pcntxt_mask); prepare_fx_sw_frame(); @@ -578,8 +604,9 @@ static void __init xstate_enable_boot_cpu(void) } } - pr_info("enabled xstate_bv 0x%llx, cntxt size 0x%x\n", - pcntxt_mask, xstate_size); + pr_info("enabled xstate_bv 0x%llx, cntxt size 0x%x using %s\n", + pcntxt_mask, xstate_size, + cpu_has_xsaves ? "compacted form" : "standard form"); } /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Clear reserved bits in xsave header
Commit-ID: 21e726c4a3625a1038e97795b7aad97109ba7e19 Gitweb: http://git.kernel.org/tip/21e726c4a3625a1038e97795b7aad97109ba7e19 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:39 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:33:00 -0700 x86/xsaves: Clear reserved bits in xsave header The reserved bits (128~511) in the xsave header must be zero according to X86 SDM. Clear the bits in this patch. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-12-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/i387.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index d5dd808..a9a4229 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -375,7 +375,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, /* * These bits must be zero. */ - xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0; + memset(xsave_hdr->reserved, 0, 48); return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
Commit-ID: 47c2f292cc8669f70644a949cadd5fa5ee0e0e07 Gitweb: http://git.kernel.org/tip/47c2f292cc8669f70644a949cadd5fa5ee0e0e07 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:42 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:33:06 -0700 x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf setup_init_fpu_buf() calls booting time xsaves and xrstors to save and restore xstate in xsave area. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-15-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/xsave.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index a4b451c..8fa7c7d 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -496,15 +496,21 @@ static void __init setup_init_fpu_buf(void) setup_xstate_features(); + if (cpu_has_xsaves) { + init_xstate_buf->xsave_hdr.xcomp_bv = + (u64)1 << 63 | pcntxt_mask; + init_xstate_buf->xsave_hdr.xstate_bv = pcntxt_mask; + } + /* * Init all the features state with header_bv being 0x0 */ - xrstor_state(init_xstate_buf, -1); + xrstor_state_booting(init_xstate_buf, -1); /* * Dump the init state again. This is to identify the init state * of any feature which is not represented by all zero's. */ - xsave_state(init_xstate_buf, -1); + xsave_state_booting(init_xstate_buf, -1); } static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Save xstate to task' s xsave area in __save_fpu during booting time
Commit-ID: f41d830fa890044cb60f6bb39fc8f6493ffebb47 Gitweb: http://git.kernel.org/tip/f41d830fa890044cb60f6bb39fc8f6493ffebb47 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:41 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:33:04 -0700 x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time __save_fpu() can be called during early booting time when cpu caps are not enabled and alternative can not be used yet. Therefore, it calls xsave_state_booting() during booting time to save xstate to task's xsave area. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-14-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/fpu-internal.h | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index cea1c76..6099c0e 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -508,9 +508,12 @@ static inline void user_fpu_begin(void) static inline void __save_fpu(struct task_struct *tsk) { - if (use_xsave()) - xsave_state(&tsk->thread.fpu.state->xsave, -1); - else + if (use_xsave()) { + if (unlikely(system_state == SYSTEM_BOOTING)) + xsave_state_booting(&tsk->thread.fpu.state->xsave, -1); + else + xsave_state(&tsk->thread.fpu.state->xsave, -1); + } else fpu_fxsave(&tsk->thread.fpu); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Add xsaves and xrstors support for booting time
Commit-ID: adb9d526e98268b647a74726346e1c40e6a37d2e Gitweb: http://git.kernel.org/tip/adb9d526e98268b647a74726346e1c40e6a37d2e Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:40 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:33:02 -0700 x86/xsaves: Add xsaves and xrstors support for booting time Since boot_cpu_data and cpu capabilities are not enabled yet during early booting time, alternative can not be used in some functions to access xsave area. Therefore, we define two new functions xrstor_state_booting() and xsave_state_booting() to access xsave area just during early booting time. xrstor_state_booting restores xstate from xsave area during early booting time. xsave_state_booting saves xstate to xsave area during early booting time. The two functions are similar to xrstor_state and xsave_state respectively. But the two functions don't use alternatives because alternatives are not enabled when they are called in such early booting time. xrstor_state_booting is called only by functions defined as __init. So it's defined as __init and will be removed from memory after booting time. There is no extra memory cost caused by this function during running time. But because xsave_state_booting can be called by run-time function __save_fpu(), it's not defined as __init and will stay in memory during running time although it will not be called anymore during running time. It is not ideal to have this function stay in memory during running time. But it's a pretty small function and the memory cost will be small. By doing in this way, we can avoid to change a lot of code to just remove this small function and save a bit memory for running time. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-13-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/xsave.h | 60 1 file changed, 60 insertions(+) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index 0d15231..aa3ff0c 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -66,6 +66,66 @@ extern int init_fpu(struct task_struct *child); : [err] "=r" (err) /* + * This function is called only during boot time when x86 caps are not set + * up and alternative can not be used yet. + */ +static int xsave_state_booting(struct xsave_struct *fx, u64 mask) +{ + u32 lmask = mask; + u32 hmask = mask >> 32; + int err = 0; + + WARN_ON(system_state != SYSTEM_BOOTING); + + if (boot_cpu_has(X86_FEATURE_XSAVES)) + asm volatile("1:"XSAVES"\n\t" + "2:\n\t" + : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask) + : "memory"); + else + asm volatile("1:"XSAVE"\n\t" + "2:\n\t" + : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask) + : "memory"); + + asm volatile(xstate_fault +: "0" (0) +: "memory"); + + return err; +} + +/* + * This function is called only during boot time when x86 caps are not set + * up and alternative can not be used yet. + */ +static inline int xrstor_state_booting(struct xsave_struct *fx, u64 mask) +{ + u32 lmask = mask; + u32 hmask = mask >> 32; + int err = 0; + + WARN_ON(system_state != SYSTEM_BOOTING); + + if (boot_cpu_has(X86_FEATURE_XSAVES)) + asm volatile("1:"XRSTORS"\n\t" + "2:\n\t" + : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask) + : "memory"); + else + asm volatile("1:"XRSTOR"\n\t" + "2:\n\t" + : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask) + : "memory"); + + asm volatile(xstate_fault +: "0" (0) +: "memory"); + + return err; +} + +/* * Save processor xstate to xsave area. */ static inline int xsave_state(struct xsave_struct *fx, u64 mask) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Use xsave/ xrstor for saving and restoring user space context
Commit-ID: facbf4d91ae64f84ef93a00e4037135cd9f4b2ab Gitweb: http://git.kernel.org/tip/facbf4d91ae64f84ef93a00e4037135cd9f4b2ab Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:38 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:32:57 -0700 x86/xsaves: Use xsave/xrstor for saving and restoring user space context We use legacy xsave/xrstor to save and restore standard form of xsave area in user space context. No xsaveopt or xsaves is used here for two reasons. First, we don't want to use modified optimization which is implemented in xsaveopt and xsaves because xrstor/xrstors might track a wrong user space application. Secondly, we don't use compacted format of xsave area for backward compatibility because legacy user space applications only don't understand the compacted format of the xsave area. Using standard form of the xsave area may allocate more memory for user context than compacted form, but preserves compatibility with legacy applications. Furthermore, even with holes, the relevant cache lines don't get touched and thus the performance impact is limited. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-11-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/xsave.h | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index 8b75824..0d15231 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -145,6 +145,16 @@ static inline int fpu_xrstor_checking(struct xsave_struct *fx) return xrstor_state(fx, -1); } +/* + * Save xstate to user space xsave area. + * + * We don't use modified optimization because xrstor/xrstors might track + * a different application. + * + * We don't use compacted format xsave area for + * backward compatibility for old applications which don't understand + * compacted format of xsave area. + */ static inline int xsave_user(struct xsave_struct __user *buf) { int err; @@ -158,35 +168,28 @@ static inline int xsave_user(struct xsave_struct __user *buf) return -EFAULT; __asm__ __volatile__(ASM_STAC "\n" -"1: .byte " REX_PREFIX "0x0f,0xae,0x27\n" +"1:"XSAVE"\n" "2: " ASM_CLAC "\n" -".section .fixup,\"ax\"\n" -"3: movl $-1,%[err]\n" -"jmp 2b\n" -".previous\n" -_ASM_EXTABLE(1b,3b) -: [err] "=r" (err) +xstate_fault : "D" (buf), "a" (-1), "d" (-1), "0" (0) : "memory"); return err; } +/* + * Restore xstate from user space xsave area. + */ static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask) { - int err; + int err = 0; struct xsave_struct *xstate = ((__force struct xsave_struct *)buf); u32 lmask = mask; u32 hmask = mask >> 32; __asm__ __volatile__(ASM_STAC "\n" -"1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n" +"1:"XRSTOR"\n" "2: " ASM_CLAC "\n" -".section .fixup,\"ax\"\n" -"3: movl $-1,%[err]\n" -"jmp 2b\n" -".previous\n" -_ASM_EXTABLE(1b,3b) -: [err] "=r" (err) +xstate_fault : "D" (xstate), "a" (lmask), "d" (hmask), "0" (0) : "memory"); /* memory required? */ return err; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Use xsaves/ xrstors to save and restore xsave area
Commit-ID: f31a9f7c71691569359fa7fb8b0acaa44bce0324 Gitweb: http://git.kernel.org/tip/f31a9f7c71691569359fa7fb8b0acaa44bce0324 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:36 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:31:21 -0700 x86/xsaves: Use xsaves/xrstors to save and restore xsave area If xsaves is eanbled, use xsaves/xrstors instrucitons to save and restore xstate. xsaves and xrstors support compacted format, init optimization, modified optimization, and supervisor states. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-9-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/xsave.h | 84 +--- 1 file changed, 64 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index 76c2459..f9177a2 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -65,6 +65,70 @@ extern int init_fpu(struct task_struct *child); _ASM_EXTABLE(1b, 3b)\ : [err] "=r" (err) +/* + * Save processor xstate to xsave area. + */ +static inline int xsave_state(struct xsave_struct *fx, u64 mask) +{ + u32 lmask = mask; + u32 hmask = mask >> 32; + int err = 0; + + /* +* If xsaves is enabled, xsaves replaces xsaveopt because +* it supports compact format and supervisor states in addition to +* modified optimization in xsaveopt. +* +* Otherwise, if xsaveopt is enabled, xsaveopt replaces xsave +* because xsaveopt supports modified optimization which is not +* supported by xsave. +* +* If none of xsaves and xsaveopt is enabled, use xsave. +*/ + alternative_input_2( + "1:"XSAVE, + "1:"XSAVEOPT, + X86_FEATURE_XSAVEOPT, + "1:"XSAVES, + X86_FEATURE_XSAVES, + [fx] "D" (fx), "a" (lmask), "d" (hmask) : + "memory"); + asm volatile("2:\n\t" +xstate_fault +: "0" (0) +: "memory"); + + return err; +} + +/* + * Restore processor xstate from xsave area. + */ +static inline int xrstor_state(struct xsave_struct *fx, u64 mask) +{ + int err = 0; + u32 lmask = mask; + u32 hmask = mask >> 32; + + /* +* Use xrstors to restore context if it is enabled. xrstors supports +* compacted format of xsave area which is not supported by xrstor. +*/ + alternative_input( + "1: " XRSTOR, + "1: " XRSTORS, + X86_FEATURE_XSAVES, + "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask) + : "memory"); + + asm volatile("2:\n" +xstate_fault +: "0" (0) +: "memory"); + + return err; +} + static inline int fpu_xrstor_checking(struct xsave_struct *fx) { int err; @@ -130,26 +194,6 @@ static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask) return err; } -static inline void xrstor_state(struct xsave_struct *fx, u64 mask) -{ - u32 lmask = mask; - u32 hmask = mask >> 32; - - asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t" -: : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask) -: "memory"); -} - -static inline void xsave_state(struct xsave_struct *fx, u64 mask) -{ - u32 lmask = mask; - u32 hmask = mask >> 32; - - asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x27\n\t" -: : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask) -: "memory"); -} - static inline void fpu_xsave(struct fpu *fpu) { /* This, however, we can work around by forcing the compiler to select -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Use xsaves/xrstors for context switch
Commit-ID: f9de314b340f4816671f037e79ed01f685ac9787 Gitweb: http://git.kernel.org/tip/f9de314b340f4816671f037e79ed01f685ac9787 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:37 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:31:25 -0700 x86/xsaves: Use xsaves/xrstors for context switch If xsaves is eanbled, use xsaves/xrstors for context switch to support compacted format xsave area to occupy less memory and modified optimization to improve saving performance. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-10-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/xsave.h | 37 - 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index f9177a2..8b75824 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -129,22 +129,20 @@ static inline int xrstor_state(struct xsave_struct *fx, u64 mask) return err; } -static inline int fpu_xrstor_checking(struct xsave_struct *fx) +/* + * Save xstate context for old process during context switch. + */ +static inline void fpu_xsave(struct fpu *fpu) { - int err; - - asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t" -"2:\n" -".section .fixup,\"ax\"\n" -"3: movl $-1,%[err]\n" -"jmp 2b\n" -".previous\n" -_ASM_EXTABLE(1b, 3b) -: [err] "=r" (err) -: "D" (fx), "m" (*fx), "a" (-1), "d" (-1), "0" (0) -: "memory"); + xsave_state(&fpu->state->xsave, -1); +} - return err; +/* + * Restore xstate context for new process during context switch. + */ +static inline int fpu_xrstor_checking(struct xsave_struct *fx) +{ + return xrstor_state(fx, -1); } static inline int xsave_user(struct xsave_struct __user *buf) @@ -194,15 +192,4 @@ static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask) return err; } -static inline void fpu_xsave(struct fpu *fpu) -{ - /* This, however, we can work around by forcing the compiler to select - an addressing mode that doesn't require extended registers. */ - alternative_input( - ".byte " REX_PREFIX "0x0f,0xae,0x27", - ".byte " REX_PREFIX "0x0f,0xae,0x37", - X86_FEATURE_XSAVEOPT, - [fx] "D" (&fpu->state->xsave), "a" (-1), "d" (-1) : - "memory"); -} #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Define a macro for handling xsave/ xrstor instruction fault
Commit-ID: b84e70552e5aad71a1c14536e6ffcfe7934b73e4 Gitweb: http://git.kernel.org/tip/b84e70552e5aad71a1c14536e6ffcfe7934b73e4 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:35 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:31:18 -0700 x86/xsaves: Define a macro for handling xsave/xrstor instruction fault Define a macro to handle fault generated by xsave, xsaveopt, xsaves, xrstor, and xrstors instructions. It is used in functions like xsave_state() etc. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-8-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/xsave.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index 71bdde4..76c2459 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -58,6 +58,13 @@ extern int init_fpu(struct task_struct *child); #define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f" #define XRSTORS".byte " REX_PREFIX "0x0f,0xc7,0x1f" +#define xstate_fault ".section .fixup,\"ax\"\n" \ + "3: movl $-1,%[err]\n" \ + "jmp 2b\n" \ + ".previous\n" \ + _ASM_EXTABLE(1b, 3b)\ + : [err] "=r" (err) + static inline int fpu_xrstor_checking(struct xsave_struct *fx) { int err; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Change compacted format xsave area header
Commit-ID: 0b29643a58439dc9a8b0c0cacad0e7cb608c8199 Gitweb: http://git.kernel.org/tip/0b29643a58439dc9a8b0c0cacad0e7cb608c8199 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:33 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:31:10 -0700 x86/xsaves: Change compacted format xsave area header The XSAVE area header is changed to support both compacted format and standard format of xsave area. The XSAVE header of an xsave area comprises the 64 bytes starting at offset 512 from the area base address: - Bytes 7:0 of the xsave header is a state-component bitmap called xstate_bv. It identifies the state components in the xsave area. - Bytes 15:8 of the xsave header is a state-component bitmap called xcomp_bv. It is used as follows: - xcomp_bv[63] indicates the format of the extended region of the xsave area. If it is clear, the standard format is used. If it is set, the compacted format is used. - xcomp_bv[62:0] indicate which features (starting at feature 2) have space allocated for them in the compacted format. - Bytes 63:16 of the xsave header are reserved. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-6-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/processor.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..2c8d3b8 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -386,8 +386,8 @@ struct bndcsr_struct { struct xsave_hdr_struct { u64 xstate_bv; - u64 reserved1[2]; - u64 reserved2[5]; + u64 xcomp_bv; + u64 reserved[6]; } __attribute__((packed)); struct xsave_struct { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Define macros for xsave instructions
Commit-ID: 200b08a970b2ae764b670a326088ab8bc0a989cc Gitweb: http://git.kernel.org/tip/200b08a970b2ae764b670a326088ab8bc0a989cc Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:34 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:31:16 -0700 x86/xsaves: Define macros for xsave instructions Define macros for xsave, xsaveopt, xsaves, xrstor, and xrstors inline instructions. The instructions will be used for saving and restoring xstate. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/xsave.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index d949ef2..71bdde4 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -52,6 +52,12 @@ extern void xsave_init(void); extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask); extern int init_fpu(struct task_struct *child); +#define XSAVE ".byte " REX_PREFIX "0x0f,0xae,0x27" +#define XSAVEOPT ".byte " REX_PREFIX "0x0f,0xae,0x37" +#define XSAVES ".byte " REX_PREFIX "0x0f,0xc7,0x2f" +#define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f" +#define XRSTORS".byte " REX_PREFIX "0x0f,0xc7,0x1f" + static inline int fpu_xrstor_checking(struct xsave_struct *fx) { int err; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/xsaves: Detect xsaves/xrstors feature
Commit-ID: 6229ad278ca74acdbc8bd3a3d469322a3de91039 Gitweb: http://git.kernel.org/tip/6229ad278ca74acdbc8bd3a3d469322a3de91039 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:30 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:24:28 -0700 x86/xsaves: Detect xsaves/xrstors feature Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended state enumberation sub-leaf (eax=0x0d, ecx=1): Bit 00: XSAVEOPT is available Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set Bit 02: Supports XGETBV with ECX = 1 if set Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set The above features are defined in the new word 10 in cpu features. The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies the state components that software has enabled xsaves and xrstors to manage. If the bit corresponding to a state component is clear in XCR0 | IA32_XSS, xsaves and xrstors will not operate on that state component, regardless of the value of the instruction mask. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/cpufeature.h | 10 -- arch/x86/include/uapi/asm/msr-index.h | 2 ++ arch/x86/kernel/cpu/common.c | 9 + arch/x86/kernel/cpu/scattered.c | 1 - 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 2837b92..b82f951 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -8,7 +8,7 @@ #include #endif -#define NCAPINTS 10 /* N 32-bit words worth of info */ +#define NCAPINTS 11 /* N 32-bit words worth of info */ #define NBUGINTS 1 /* N 32-bit bug flags */ /* @@ -180,7 +180,6 @@ #define X86_FEATURE_ARAT ( 7*32+ 1) /* Always Running APIC Timer */ #define X86_FEATURE_CPB( 7*32+ 2) /* AMD Core Performance Boost */ #define X86_FEATURE_EPB( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ -#define X86_FEATURE_XSAVEOPT ( 7*32+ 4) /* Optimized Xsave */ #define X86_FEATURE_PLN( 7*32+ 5) /* Intel Power Limit Notification */ #define X86_FEATURE_PTS( 7*32+ 6) /* Intel Package Thermal Status */ #define X86_FEATURE_DTHERM ( 7*32+ 7) /* Digital Thermal Sensor */ @@ -226,6 +225,12 @@ #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ +/* Extended state features, CPUID level 0x000d:1 (eax), word 10 */ +#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ +#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ +#define X86_FEATURE_XGETBV1(10*32+ 2) /* XGETBV with ECX = 1 */ +#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */ + /* * BUG word(s) */ @@ -328,6 +333,7 @@ extern const char * const x86_power_flags[32]; #define cpu_has_x2apic boot_cpu_has(X86_FEATURE_X2APIC) #define cpu_has_xsave boot_cpu_has(X86_FEATURE_XSAVE) #define cpu_has_xsaveopt boot_cpu_has(X86_FEATURE_XSAVEOPT) +#define cpu_has_xsaves boot_cpu_has(X86_FEATURE_XSAVES) #define cpu_has_osxsaveboot_cpu_has(X86_FEATURE_OSXSAVE) #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index fcf2b3a..5cd1569 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -297,6 +297,8 @@ #define MSR_IA32_TSC_ADJUST 0x003b #define MSR_IA32_BNDCFGS 0x0d90 +#define MSR_IA32_XSS 0x0da0 + #define FEATURE_CONTROL_LOCKED (1<<0) #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) #define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index a135239..e7c4b97 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -632,6 +632,15 @@ void get_cpu_cap(struct cpuinfo_x86 *c) c->x86_capability[9] = ebx; } + /* Extended state features: level 0x000d */ + if (c->cpuid_level >= 0x000d) { + u32 eax, ebx, ecx, edx; + + cpuid_count(0x000d, 1, &eax, &ebx, &ecx, &edx); + + c->x86_capability[10] = eax; + } + /* AMD-defined flags: level 0x8001 */ xlvl = cpuid_eax(0x8000); c->extended_cpuid_level = xlvl; diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c index b6f794a..4a8013d 100644 --- a/arch/x86/kernel/cpu/scattered.c +++ b/arch/x86/kernel/cpu/scattered.c @@ -38,7 +38,6 @@ void ini
[tip:x86/xsave] x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
Commit-ID: b6f42a4a3c886bd18baf319d433a841ac9942c02 Gitweb: http://git.kernel.org/tip/b6f42a4a3c886bd18baf319d433a841ac9942c02 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:31 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:24:52 -0700 x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors This patch adds a kernel parameter noxsaves to disable xsaves/xrstors feature. The kernel will fall back to use xsaveopt and xrstor to save and restor xstates. By using this parameter, xsave area occupies more memory because standard form of xsave area in xsaveopt/xrstor occupies more memory than compacted form of xsave area. This patch adds a description of the kernel parameter noxsaveopt in doc. The code to support the parameter noxsaveopt has been in the kernel before. This patch just adds the description of this parameter in the doc. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-4-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- Documentation/kernel-parameters.txt | 15 +++ arch/x86/kernel/cpu/common.c| 8 2 files changed, 23 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 30a8ad0d..0ebd952 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2124,6 +2124,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted. and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. + noxsaveopt [X86] Disables xsaveopt used in saving x86 extended + register states. The kernel will fall back to use + xsave to save the states. By using this parameter, + performance of saving the states is degraded because + xsave doesn't support modified optimization while + xsaveopt supports it on xsaveopt enabled systems. + + noxsaves[X86] Disables xsaves and xrstors used in saving and + restoring x86 extended register state in compacted + form of xsave area. The kernel will fall back to use + xsaveopt and xrstor to save and restore the states + in standard form of xsave area. By using this + parameter, xsave area per process might occupy more + memory on xsaves enabled systems. + eagerfpu= [X86] on enable eager fpu restore off disable eager fpu restore diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e7c4b97..cdc9585 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -146,6 +146,7 @@ static int __init x86_xsave_setup(char *s) { setup_clear_cpu_cap(X86_FEATURE_XSAVE); setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT); + setup_clear_cpu_cap(X86_FEATURE_XSAVES); setup_clear_cpu_cap(X86_FEATURE_AVX); setup_clear_cpu_cap(X86_FEATURE_AVX2); return 1; @@ -159,6 +160,13 @@ static int __init x86_xsaveopt_setup(char *s) } __setup("noxsaveopt", x86_xsaveopt_setup); +static int __init x86_xsaves_setup(char *s) +{ + setup_clear_cpu_cap(X86_FEATURE_XSAVES); + return 1; +} +__setup("noxsaves", x86_xsaves_setup); + #ifdef CONFIG_X86_32 static int cachesize_override = -1; static int disable_x86_serial_nr = 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/xsave] x86/alternative: Add alternative_input_2 to support alternative with two features and input
Commit-ID: 5b3e83f46a2a7e8625258dbf84a26e7f4032bfa8 Gitweb: http://git.kernel.org/tip/5b3e83f46a2a7e8625258dbf84a26e7f4032bfa8 Author: Fenghua Yu AuthorDate: Thu, 29 May 2014 11:12:32 -0700 Committer: H. Peter Anvin CommitDate: Thu, 29 May 2014 14:24:53 -0700 x86/alternative: Add alternative_input_2 to support alternative with two features and input alternative_input_2() replaces old instruction with new instructions with input based on two features. In alternative_input_2(oldinstr, newinstr1, feature1, newinstr2, feature2, input...), feature2 has higher priority to replace oldinstr than feature1. If CPU has feature2, newinstr2 replaces oldinstr and newinstr2 is executed during run time. If CPU doesn't have feature2, but it has feature1, newinstr1 replaces oldinstr and newinstr1 is executed during run time. If CPU doesn't have feature2 and feature1, oldinstr is executed during run time. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1401387164-43416-5-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/alternative.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 0a3f9c9..473bdbe 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -161,6 +161,20 @@ static inline int alternatives_text_reserved(void *start, void *end) asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \ : : "i" (0), ## input) +/* + * This is similar to alternative_input. But it has two features and + * respective instructions. + * + * If CPU has feature2, newinstr2 is used. + * Otherwise, if CPU has feature1, newinstr1 is used. + * Otherwise, oldinstr is used. + */ +#define alternative_input_2(oldinstr, newinstr1, feature1, newinstr2, \ + feature2, input...) \ + asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1,\ + newinstr2, feature2) \ + : : "i" (0), ## input) + /* Like alternative_input, but with a single output argument */ #define alternative_io(oldinstr, newinstr, feature, output, input...) \ asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cpufeature] x86, AVX-512: Enable AVX-512 States Context Switch
Commit-ID: c2bc11f10a39527cd1bb252097b5525664560956 Gitweb: http://git.kernel.org/tip/c2bc11f10a39527cd1bb252097b5525664560956 Author: Fenghua Yu AuthorDate: Thu, 20 Feb 2014 13:24:51 -0800 Committer: H. Peter Anvin CommitDate: Thu, 20 Feb 2014 13:56:55 -0800 x86, AVX-512: Enable AVX-512 States Context Switch This patch enables Opmask, ZMM_Hi256, and Hi16_ZMM AVX-512 states for xstate context switch. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1392931491-33237-2-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin Cc: # hw enabling --- arch/x86/include/asm/xsave.h | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index 5547389..6c1d741 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -6,11 +6,14 @@ #define XSTATE_CPUID 0x000d -#define XSTATE_FP 0x1 -#define XSTATE_SSE 0x2 -#define XSTATE_YMM 0x4 -#define XSTATE_BNDREGS 0x8 -#define XSTATE_BNDCSR 0x10 +#define XSTATE_FP 0x1 +#define XSTATE_SSE 0x2 +#define XSTATE_YMM 0x4 +#define XSTATE_BNDREGS 0x8 +#define XSTATE_BNDCSR 0x10 +#define XSTATE_OPMASK 0x20 +#define XSTATE_ZMM_Hi256 0x40 +#define XSTATE_Hi16_ZMM0x80 #define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE) @@ -23,7 +26,8 @@ #define XSAVE_YMM_OFFSET(XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET) /* Supported features which support lazy state saving */ -#define XSTATE_LAZY(XSTATE_FP | XSTATE_SSE | XSTATE_YMM) +#define XSTATE_LAZY(XSTATE_FP | XSTATE_SSE | XSTATE_YMM \ + | XSTATE_OPMASK | XSTATE_ZMM_Hi256 | XSTATE_Hi16_ZMM) /* Supported features which require eager state saving */ #define XSTATE_EAGER (XSTATE_BNDREGS | XSTATE_BNDCSR) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cpufeature] x86, AVX-512: AVX-512 Feature Detection
Commit-ID: 8e5780fdeef7dc490b3f0b3a62704593721fa4f3 Gitweb: http://git.kernel.org/tip/8e5780fdeef7dc490b3f0b3a62704593721fa4f3 Author: Fenghua Yu AuthorDate: Thu, 20 Feb 2014 13:24:50 -0800 Committer: H. Peter Anvin CommitDate: Thu, 20 Feb 2014 13:56:55 -0800 x86, AVX-512: AVX-512 Feature Detection AVX-512 is an extention of AVX2. Its spec can be found at: http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf This patch detects AVX-512 features by CPUID. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1392931491-33237-1-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin Cc: # hw enabling --- arch/x86/include/asm/cpufeature.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e099f95..5f12968 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -217,9 +217,13 @@ #define X86_FEATURE_INVPCID(9*32+10) /* Invalidate Processor Context ID */ #define X86_FEATURE_RTM(9*32+11) /* Restricted Transactional Memory */ #define X86_FEATURE_MPX(9*32+14) /* Memory Protection Extension */ +#define X86_FEATURE_AVX512F(9*32+16) /* AVX-512 Foundation */ #define X86_FEATURE_RDSEED (9*32+18) /* The RDSEED instruction */ #define X86_FEATURE_ADX(9*32+19) /* The ADCX and ADOX instructions */ #define X86_FEATURE_SMAP (9*32+20) /* Supervisor Mode Access Prevention */ +#define X86_FEATURE_AVX512PF (9*32+26) /* AVX-512 Prefetch */ +#define X86_FEATURE_AVX512ER (9*32+27) /* AVX-512 Exponential and Reciprocal */ +#define X86_FEATURE_AVX512CD (9*32+28) /* AVX-512 Conflict Detection */ /* * BUG word(s) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86/apic, doc: Justification for disabling IO APIC before Local APIC
Commit-ID: 2885432aaf15c1b7e65c787bfe7c5fec428296f0 Gitweb: http://git.kernel.org/tip/2885432aaf15c1b7e65c787bfe7c5fec428296f0 Author: Fenghua Yu AuthorDate: Wed, 4 Dec 2013 16:07:49 -0800 Committer: H. Peter Anvin CommitDate: Wed, 4 Dec 2013 19:33:21 -0800 x86/apic, doc: Justification for disabling IO APIC before Local APIC Since erratum AVR31 in "Intel Atom Processor C2000 Product Family Specification Update" is now published, I added a justification comment for disabling IO APIC before Local APIC, as changed in commit: 522e66464467 x86/apic: Disable I/O APIC before shutdown of the local APIC Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1386202069-51515-1-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/reboot.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index da3c599..c752cb4 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -558,6 +558,17 @@ void native_machine_shutdown(void) { /* Stop the cpus and apics */ #ifdef CONFIG_X86_IO_APIC + /* +* Disabling IO APIC before local APIC is a workaround for +* erratum AVR31 in "Intel Atom Processor C2000 Product Family +* Specification Update". In this situation, interrupts that target +* a Logical Processor whose Local APIC is either in the process of +* being hardware disabled or software disabled are neither delivered +* nor discarded. When this erratum occurs, the processor may hang. +* +* Even without the erratum, it still makes sense to quiet IO APIC +* before disabling Local APIC. +*/ disable_IO_APIC(); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/asm] x86-64, copy_user: Remove zero byte check before copy user buffer.
Commit-ID: f4cb1cc18f364d761d5614eb62936647f259 Gitweb: http://git.kernel.org/tip/f4cb1cc18f364d761d5614eb62936647f259 Author: Fenghua Yu AuthorDate: Sat, 16 Nov 2013 12:37:01 -0800 Committer: H. Peter Anvin CommitDate: Sat, 16 Nov 2013 18:00:58 -0800 x86-64, copy_user: Remove zero byte check before copy user buffer. Operation of rep movsb instruction handles zero byte copy. As pointed out by Linus, there is no need to check zero size in kernel. Removing this redundant check saves a few cycles in copy user functions. Reported-by: Linus Torvalds Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1384634221-6006-1-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/lib/copy_user_64.S | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index a30ca15..ffe4eb9 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -236,8 +236,6 @@ ENDPROC(copy_user_generic_unrolled) ENTRY(copy_user_generic_string) CFI_STARTPROC ASM_STAC - andl %edx,%edx - jz 4f cmpl $8,%edx jb 2f /* less than 8 bytes, go to byte copy loop */ ALIGN_DESTINATION @@ -249,7 +247,7 @@ ENTRY(copy_user_generic_string) 2: movl %edx,%ecx 3: rep movsb -4: xorl %eax,%eax + xorl %eax,%eax ASM_CLAC ret @@ -279,12 +277,10 @@ ENDPROC(copy_user_generic_string) ENTRY(copy_user_enhanced_fast_string) CFI_STARTPROC ASM_STAC - andl %edx,%edx - jz 2f movl %edx,%ecx 1: rep movsb -2: xorl %eax,%eax + xorl %eax,%eax ASM_CLAC ret -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/apic] x86/apic: Disable I/ O APIC before shutdown of the local APIC
Commit-ID: 522e66464467543c0d88d023336eec4df03ad40b Gitweb: http://git.kernel.org/tip/522e66464467543c0d88d023336eec4df03ad40b Author: Fenghua Yu AuthorDate: Wed, 23 Oct 2013 18:30:12 -0700 Committer: Ingo Molnar CommitDate: Thu, 7 Nov 2013 10:12:37 +0100 x86/apic: Disable I/O APIC before shutdown of the local APIC In reboot and crash path, when we shut down the local APIC, the I/O APIC is still active. This may cause issues because external interrupts can still come in and disturb the local APIC during shutdown process. To quiet external interrupts, disable I/O APIC before shutdown local APIC. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua...@intel.com Cc: [ I suppose the 'issue' is a hang during shutdown. It's a fine change nevertheless. ] Signed-off-by: Ingo Molnar --- arch/x86/kernel/crash.c | 2 +- arch/x86/kernel/reboot.c | 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index e0e0841..18677a9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -127,12 +127,12 @@ void native_machine_crash_shutdown(struct pt_regs *regs) cpu_emergency_vmxoff(); cpu_emergency_svm_disable(); - lapic_shutdown(); #ifdef CONFIG_X86_IO_APIC /* Prevent crash_kexec() from deadlocking on ioapic_lock. */ ioapic_zap_locks(); disable_IO_APIC(); #endif + lapic_shutdown(); #ifdef CONFIG_HPET_TIMER hpet_disable(); #endif diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 7e920bf..618ce26 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -550,6 +550,10 @@ static void native_machine_emergency_restart(void) void native_machine_shutdown(void) { /* Stop the cpus and apics */ +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + #ifdef CONFIG_SMP /* * Stop all of the others. Also disable the local irq to @@ -562,10 +566,6 @@ void native_machine_shutdown(void) lapic_shutdown(); -#ifdef CONFIG_X86_IO_APIC - disable_IO_APIC(); -#endif - #ifdef CONFIG_HPET_TIMER hpet_disable(); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL
Commit-ID: c83a9d5e425d4678b05ca058fec6254f18601474 Gitweb: http://git.kernel.org/tip/c83a9d5e425d4678b05ca058fec6254f18601474 Author: Fenghua Yu AuthorDate: Tue, 19 Mar 2013 08:04:44 -0700 Committer: H. Peter Anvin CommitDate: Tue, 19 Mar 2013 19:51:08 -0700 x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL In 32-bit, __pa_symbol() in CONFIG_DEBUG_VIRTUAL accesses kernel data (e.g. max_low_pfn) that not only hasn't been setup yet in such early boot phase, but since we are in linear mode, cannot even be detected as uninitialized. Thus, use __pa_nodebug() rather than __pa_symbol() to get a global symbol's physical address. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1363705484-27645-1-git-send-email-fenghua...@intel.com Reported-and-tested-by: Tetsuo Handa Signed-off-by: H. Peter Anvin --- arch/x86/kernel/microcode_intel_early.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c index 7890bc8..5992ee8 100644 --- a/arch/x86/kernel/microcode_intel_early.c +++ b/arch/x86/kernel/microcode_intel_early.c @@ -90,13 +90,13 @@ microcode_phys(struct microcode_intel **mc_saved_tmp, struct microcode_intel ***mc_saved; mc_saved = (struct microcode_intel ***) - __pa_symbol(&mc_saved_data->mc_saved); + __pa_nodebug(&mc_saved_data->mc_saved); for (i = 0; i < mc_saved_data->mc_saved_count; i++) { struct microcode_intel *p; p = *(struct microcode_intel **) - __pa(mc_saved_data->mc_saved + i); - mc_saved_tmp[i] = (struct microcode_intel *)__pa(p); + __pa_nodebug(mc_saved_data->mc_saved + i); + mc_saved_tmp[i] = (struct microcode_intel *)__pa_nodebug(p); } } #endif @@ -562,7 +562,7 @@ scan_microcode(unsigned long start, unsigned long end, struct cpio_data cd; long offset = 0; #ifdef CONFIG_X86_32 - char *p = (char *)__pa_symbol(ucode_name); + char *p = (char *)__pa_nodebug(ucode_name); #else char *p = ucode_name; #endif @@ -630,8 +630,8 @@ static void __cpuinit print_ucode(struct ucode_cpu_info *uci) if (mc_intel == NULL) return; - delay_ucode_info_p = (int *)__pa_symbol(&delay_ucode_info); - current_mc_date_p = (int *)__pa_symbol(¤t_mc_date); + delay_ucode_info_p = (int *)__pa_nodebug(&delay_ucode_info); + current_mc_date_p = (int *)__pa_nodebug(¤t_mc_date); *delay_ucode_info_p = 1; *current_mc_date_p = mc_intel->hdr.date; @@ -741,15 +741,15 @@ load_ucode_intel_bsp(void) #ifdef CONFIG_X86_32 struct boot_params *boot_params_p; - boot_params_p = (struct boot_params *)__pa_symbol(&boot_params); + boot_params_p = (struct boot_params *)__pa_nodebug(&boot_params); ramdisk_image = boot_params_p->hdr.ramdisk_image; ramdisk_size = boot_params_p->hdr.ramdisk_size; initrd_start_early = ramdisk_image; initrd_end_early = initrd_start_early + ramdisk_size; _load_ucode_intel_bsp( - (struct mc_saved_data *)__pa_symbol(&mc_saved_data), - (unsigned long *)__pa_symbol(&mc_saved_in_initrd), + (struct mc_saved_data *)__pa_nodebug(&mc_saved_data), + (unsigned long *)__pa_nodebug(&mc_saved_in_initrd), initrd_start_early, initrd_end_early, &uci); #else ramdisk_image = boot_params.hdr.ramdisk_image; @@ -772,10 +772,10 @@ void __cpuinit load_ucode_intel_ap(void) unsigned long *initrd_start_p; mc_saved_in_initrd_p = - (unsigned long *)__pa_symbol(mc_saved_in_initrd); - mc_saved_data_p = (struct mc_saved_data *)__pa_symbol(&mc_saved_data); - initrd_start_p = (unsigned long *)__pa_symbol(&initrd_start); - initrd_start_addr = (unsigned long)__pa_symbol(*initrd_start_p); + (unsigned long *)__pa_nodebug(mc_saved_in_initrd); + mc_saved_data_p = (struct mc_saved_data *)__pa_nodebug(&mc_saved_data); + initrd_start_p = (unsigned long *)__pa_nodebug(&initrd_start); + initrd_start_addr = (unsigned long)__pa_nodebug(*initrd_start_p); #else mc_saved_data_p = &mc_saved_data; mc_saved_in_initrd_p = mc_saved_in_initrd; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/Kconfig: Make early microcode loading a configuration feature
Commit-ID: da76f64e7eb28b718501d15c1b79af560b7ca4ea Gitweb: http://git.kernel.org/tip/da76f64e7eb28b718501d15c1b79af560b7ca4ea Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:32 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:20:42 -0800 x86/Kconfig: Make early microcode loading a configuration feature MICROCODE_INTEL_LIB, MICROCODE_INTEL_EARLY, and MICROCODE_EARLY are three new configurations to enable or disable the feature. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-13-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 79795af..e243da7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1029,6 +1029,24 @@ config MICROCODE_OLD_INTERFACE def_bool y depends on MICROCODE +config MICROCODE_INTEL_LIB + def_bool y + depends on MICROCODE_INTEL + +config MICROCODE_INTEL_EARLY + bool "Early load microcode" + depends on MICROCODE_INTEL && BLK_DEV_INITRD + default y + help + This option provides functionality to read additional microcode data + at the beginning of initrd image. The data tells kernel to load + microcode to CPU's as early as possible. No functional change if no + microcode data is glued to the initrd, therefore it's safe to say Y. + +config MICROCODE_EARLY + def_bool y + depends on MICROCODE_INTEL_EARLY + config X86_MSR tristate "/dev/cpu/*/msr - Model-specific register support" ---help--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/mm/init.c: Copy ucode from initrd image to kernel memory
Commit-ID: cd745be89e1580e8a1b47454a39f97f9c5c4b1e0 Gitweb: http://git.kernel.org/tip/cd745be89e1580e8a1b47454a39f97f9c5c4b1e0 Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:31 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:20:26 -0800 x86/mm/init.c: Copy ucode from initrd image to kernel memory Before initrd image is freed, copy valid ucode patches from initrd image to kernel memory. The saved ucode will be used to update AP in resume or hotplug. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-12-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/mm/init.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d418152..4903a03 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -16,6 +16,7 @@ #include #include #include/* for MAX_DMA_PFN */ +#include #include "mm_internal.h" @@ -534,6 +535,15 @@ void free_initmem(void) #ifdef CONFIG_BLK_DEV_INITRD void __init free_initrd_mem(unsigned long start, unsigned long end) { +#ifdef CONFIG_MICROCODE_EARLY + /* +* Remember, initrd memory may contain microcode or other useful things. +* Before we lose initrd mem, we need to find a place to hold them +* now that normal virtual memory is enabled. +*/ + save_microcode_in_initrd(); +#endif + /* * end could be not aligned, and We can not align that, * decompresser could be confused by aligned initrd_end -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/head_32.S: Early update ucode in 32-bit
Commit-ID: 63b553c68db5a8d4febcd1010b194333d2b02e1c Gitweb: http://git.kernel.org/tip/63b553c68db5a8d4febcd1010b194333d2b02e1c Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:29 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:19:20 -0800 x86/head_32.S: Early update ucode in 32-bit This updates ucode in 32-bit kernel on BSP and AP. At this point, there is no paging and no virtual address yet. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-10-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/head_32.S | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 8e7f655..2f70530 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -144,6 +144,11 @@ ENTRY(startup_32) movl %eax, pa(olpc_ofw_pgd) #endif +#ifdef CONFIG_MICROCODE_EARLY + /* Early load ucode on BSP. */ + call load_ucode_bsp +#endif + /* * Initialize page tables. This creates a PDE and a set of page * tables, which are located immediately beyond __brk_base. The variable @@ -299,6 +304,12 @@ ENTRY(startup_32_smp) movl %eax,%ss leal -__PAGE_OFFSET(%ecx),%esp +#ifdef CONFIG_MICROCODE_EARLY + /* Early load ucode on AP. */ + call load_ucode_ap +#endif + + default_entry: /* * New page tables may be in 4Mbyte page mode and may -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
Commit-ID: ec400ddeff200b068ddc6c70f7321f49ecf32ed5 Gitweb: http://git.kernel.org/tip/ec400ddeff200b068ddc6c70f7321f49ecf32ed5 Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:28 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:19:18 -0800 x86/microcode_intel_early.c: Early update ucode on Intel's CPU Implementation of early update ucode on Intel's CPU. load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format ucode followed by ordinary initrd image file. The binary ucode file is stored in kernel/x86/microcode/GenuineIntel.bin in the cpio data. All ucode patches with the same model as BSP are saved in memory. A matching ucode patch is updated on BSP. load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-9-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/microcode_intel_early.c | 796 1 file changed, 796 insertions(+) diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c new file mode 100644 index 000..7890bc8 --- /dev/null +++ b/arch/x86/kernel/microcode_intel_early.c @@ -0,0 +1,796 @@ +/* + * Intel CPU microcode early update for Linux + * + * Copyright (C) 2012 Fenghua Yu + *H Peter Anvin" + * + * This allows to early upgrade microcode on Intel processors + * belonging to IA-32 family - PentiumPro, Pentium II, + * Pentium III, Xeon, Pentium 4, etc. + * + * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture + * Software Developer's Manual. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +unsigned long mc_saved_in_initrd[MAX_UCODE_COUNT]; +struct mc_saved_data { + unsigned int mc_saved_count; + struct microcode_intel **mc_saved; +} mc_saved_data; + +static enum ucode_state __cpuinit +generic_load_microcode_early(struct microcode_intel **mc_saved_p, +unsigned int mc_saved_count, +struct ucode_cpu_info *uci) +{ + struct microcode_intel *ucode_ptr, *new_mc = NULL; + int new_rev = uci->cpu_sig.rev; + enum ucode_state state = UCODE_OK; + unsigned int mc_size; + struct microcode_header_intel *mc_header; + unsigned int csig = uci->cpu_sig.sig; + unsigned int cpf = uci->cpu_sig.pf; + int i; + + for (i = 0; i < mc_saved_count; i++) { + ucode_ptr = mc_saved_p[i]; + + mc_header = (struct microcode_header_intel *)ucode_ptr; + mc_size = get_totalsize(mc_header); + if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) { + new_rev = mc_header->rev; + new_mc = ucode_ptr; + } + } + + if (!new_mc) { + state = UCODE_NFOUND; + goto out; + } + + uci->mc = (struct microcode_intel *)new_mc; +out: + return state; +} + +static void __cpuinit +microcode_pointer(struct microcode_intel **mc_saved, + unsigned long *mc_saved_in_initrd, + unsigned long initrd_start, int mc_saved_count) +{ + int i; + + for (i = 0; i < mc_saved_count; i++) + mc_saved[i] = (struct microcode_intel *) + (mc_saved_in_initrd[i] + initrd_start); +} + +#ifdef CONFIG_X86_32 +static void __cpuinit +microcode_phys(struct microcode_intel **mc_saved_tmp, + struct mc_saved_data *mc_saved_data) +{ + int i; + struct microcode_intel ***mc_saved; + + mc_saved = (struct microcode_intel ***) + __pa_symbol(&mc_saved_data->mc_saved); + for (i = 0; i < mc_saved_data->mc_saved_count; i++) { + struct microcode_intel *p; + + p = *(struct microcode_intel **) + __pa(mc_saved_data->mc_saved + i); + mc_saved_tmp[i] = (struct microcode_intel *)__pa(p); + } +} +#endif + +static enum ucode_state __cpuinit +load_microcode(struct mc_saved_data *mc_saved_data, + unsigned long *mc_saved_in_initrd, + unsigned long initrd_start, + struct ucode_cpu_info *uci) +{ + struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT]; + unsigned int count = mc_saved_data->mc_saved_count; + + if (!mc_saved_data->mc_saved) { + microcode_pointer(mc_saved_tmp, mc_saved_in_initrd, + initrd_start, count); + + return generi
[tip:x86/microcode] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
Commit-ID: e666dfa273db1b12711eaec91facac5fec2ec851 Gitweb: http://git.kernel.org/tip/e666dfa273db1b12711eaec91facac5fec2ec851 Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:26 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:19:14 -0800 x86/microcode_intel_lib.c: Early update ucode on Intel's CPU Define interfaces microcode_sanity_check() and get_matching_microcode(). They are called both in early boot time and in microcode Intel driver. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-7-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/microcode_intel_lib.c | 174 ++ 1 file changed, 174 insertions(+) diff --git a/arch/x86/kernel/microcode_intel_lib.c b/arch/x86/kernel/microcode_intel_lib.c new file mode 100644 index 000..ce69320 --- /dev/null +++ b/arch/x86/kernel/microcode_intel_lib.c @@ -0,0 +1,174 @@ +/* + * Intel CPU Microcode Update Driver for Linux + * + * Copyright (C) 2012 Fenghua Yu + *H Peter Anvin" + * + * This driver allows to upgrade microcode on Intel processors + * belonging to IA-32 family - PentiumPro, Pentium II, + * Pentium III, Xeon, Pentium 4, etc. + * + * Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture + * Software Developer's Manual + * Order Number 253668 or free download from: + * + * http://developer.intel.com/Assets/PDF/manual/253668.pdf + * + * For more information, go to http://www.urbanmyth.org/microcode + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ +#include +#include +#include +#include + +#include +#include +#include + +static inline int +update_match_cpu(unsigned int csig, unsigned int cpf, +unsigned int sig, unsigned int pf) +{ + return (!sigmatch(sig, csig, pf, cpf)) ? 0 : 1; +} + +int +update_match_revision(struct microcode_header_intel *mc_header, int rev) +{ + return (mc_header->rev <= rev) ? 0 : 1; +} + +int microcode_sanity_check(void *mc, int print_err) +{ + unsigned long total_size, data_size, ext_table_size; + struct microcode_header_intel *mc_header = mc; + struct extended_sigtable *ext_header = NULL; + int sum, orig_sum, ext_sigcount = 0, i; + struct extended_signature *ext_sig; + + total_size = get_totalsize(mc_header); + data_size = get_datasize(mc_header); + + if (data_size + MC_HEADER_SIZE > total_size) { + if (print_err) + pr_err("error! Bad data size in microcode data file\n"); + return -EINVAL; + } + + if (mc_header->ldrver != 1 || mc_header->hdrver != 1) { + if (print_err) + pr_err("error! Unknown microcode update format\n"); + return -EINVAL; + } + ext_table_size = total_size - (MC_HEADER_SIZE + data_size); + if (ext_table_size) { + if ((ext_table_size < EXT_HEADER_SIZE) +|| ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) { + if (print_err) + pr_err("error! Small exttable size in microcode data file\n"); + return -EINVAL; + } + ext_header = mc + MC_HEADER_SIZE + data_size; + if (ext_table_size != exttable_size(ext_header)) { + if (print_err) + pr_err("error! Bad exttable size in microcode data file\n"); + return -EFAULT; + } + ext_sigcount = ext_header->count; + } + + /* check extended table checksum */ + if (ext_table_size) { + int ext_table_sum = 0; + int *ext_tablep = (int *)ext_header; + + i = ext_table_size / DWSIZE; + while (i--) + ext_table_sum += ext_tablep[i]; + if (ext_table_sum) { + if (print_err) + pr_warn("aborting, bad extended signature table checksum\n"); + return -EINVAL; + } + } + + /* calculate the checksum */ + orig_sum = 0; + i = (MC_HEADER_SIZE + data_size) / DWSIZE; + while (i--) + orig_sum += ((int *)mc)[i]; + if (orig_sum) { + if (print_err) + pr_err("aborting, bad checksum\n"); + return -EINVAL; + } + if (!ext_table_size) + return 0; + /* check extended signature checksum */ + for (i = 0; i < ext_sigcount; i++) { + ext_sig = (void *)ext_header + EXT_HEADER_SIZE + +
[tip:x86/microcode] x86/microcode_core_early.c: Define interfaces for early loading ucode
Commit-ID: a8ebf6d1d6971b90a20f5bd0465e6d520377e33b Gitweb: http://git.kernel.org/tip/a8ebf6d1d6971b90a20f5bd0465e6d520377e33b Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:25 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:19:12 -0800 x86/microcode_core_early.c: Define interfaces for early loading ucode Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and AP in early boot time. These are generic interfaces. Internally they call vendor specific implementations. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-6-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/microcode.h | 14 +++ arch/x86/kernel/microcode_core_early.c | 76 ++ 2 files changed, 90 insertions(+) diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h index 43d921b..6825e2e 100644 --- a/arch/x86/include/asm/microcode.h +++ b/arch/x86/include/asm/microcode.h @@ -57,4 +57,18 @@ static inline struct microcode_ops * __init init_amd_microcode(void) static inline void __exit exit_amd_microcode(void) {} #endif +#ifdef CONFIG_MICROCODE_EARLY +#define MAX_UCODE_COUNT 128 +extern void __init load_ucode_bsp(void); +extern __init void load_ucode_ap(void); +extern int __init save_microcode_in_initrd(void); +#else +static inline void __init load_ucode_bsp(void) {} +static inline __init void load_ucode_ap(void) {} +static inline int __init save_microcode_in_initrd(void) +{ + return 0; +} +#endif + #endif /* _ASM_X86_MICROCODE_H */ diff --git a/arch/x86/kernel/microcode_core_early.c b/arch/x86/kernel/microcode_core_early.c new file mode 100644 index 000..577db84 --- /dev/null +++ b/arch/x86/kernel/microcode_core_early.c @@ -0,0 +1,76 @@ +/* + * X86 CPU microcode early update for Linux + * + * Copyright (C) 2012 Fenghua Yu + *H Peter Anvin" + * + * This driver allows to early upgrade microcode on Intel processors + * belonging to IA-32 family - PentiumPro, Pentium II, + * Pentium III, Xeon, Pentium 4, etc. + * + * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture + * Software Developer's Manual. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include + +#define QCHAR(a, b, c, d) ((a) + ((b) << 8) + ((c) << 16) + ((d) << 24)) +#define CPUID_INTEL1 QCHAR('G', 'e', 'n', 'u') +#define CPUID_INTEL2 QCHAR('i', 'n', 'e', 'I') +#define CPUID_INTEL3 QCHAR('n', 't', 'e', 'l') +#define CPUID_AMD1 QCHAR('A', 'u', 't', 'h') +#define CPUID_AMD2 QCHAR('e', 'n', 't', 'i') +#define CPUID_AMD3 QCHAR('c', 'A', 'M', 'D') + +#define CPUID_IS(a, b, c, ebx, ecx, edx) \ + (!((ebx ^ (a))|(edx ^ (b))|(ecx ^ (c + +/* + * In early loading microcode phase on BSP, boot_cpu_data is not set up yet. + * x86_vendor() gets vendor id for BSP. + * + * In 32 bit AP case, accessing boot_cpu_data needs linear address. To simplify + * coding, we still use x86_vendor() to get vendor id for AP. + * + * x86_vendor() gets vendor information directly through cpuid. + */ +static int __cpuinit x86_vendor(void) +{ + u32 eax = 0x; + u32 ebx, ecx = 0, edx; + + if (!have_cpuid_p()) + return X86_VENDOR_UNKNOWN; + + native_cpuid(&eax, &ebx, &ecx, &edx); + + if (CPUID_IS(CPUID_INTEL1, CPUID_INTEL2, CPUID_INTEL3, ebx, ecx, edx)) + return X86_VENDOR_INTEL; + + if (CPUID_IS(CPUID_AMD1, CPUID_AMD2, CPUID_AMD3, ebx, ecx, edx)) + return X86_VENDOR_AMD; + + return X86_VENDOR_UNKNOWN; +} + +void __init load_ucode_bsp(void) +{ + int vendor = x86_vendor(); + + if (vendor == X86_VENDOR_INTEL) + load_ucode_intel_bsp(); +} + +void __cpuinit load_ucode_ap(void) +{ + int vendor = x86_vendor(); + + if (vendor == X86_VENDOR_INTEL) + load_ucode_intel_ap(); +} -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/common.c: Make have_cpuid_p() a global function
Commit-ID: d288e1cf8e62f3e4034f1f021f047009c4ac0b3c Gitweb: http://git.kernel.org/tip/d288e1cf8e62f3e4034f1f021f047009c4ac0b3c Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:23 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:18:58 -0800 x86/common.c: Make have_cpuid_p() a global function Remove static declaration in have_cpuid_p() to make it a global function. The function will be called in early loading microcode. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-4-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/processor.h | 8 arch/x86/kernel/cpu/common.c | 9 +++-- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index bdee8bd..3cdf4aa 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -190,6 +190,14 @@ extern void init_amd_cacheinfo(struct cpuinfo_x86 *c); extern void detect_extended_topology(struct cpuinfo_x86 *c); extern void detect_ht(struct cpuinfo_x86 *c); +#ifdef CONFIG_X86_32 +extern int have_cpuid_p(void); +#else +static inline int have_cpuid_p(void) +{ + return 1; +} +#endif static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9c3ab43..d7fd246 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -37,6 +37,8 @@ #include #include #include +#include +#include #ifdef CONFIG_X86_LOCAL_APIC #include @@ -213,7 +215,7 @@ static inline int flag_is_changeable_p(u32 flag) } /* Probe for the CPUID instruction */ -static int __cpuinit have_cpuid_p(void) +int __cpuinit have_cpuid_p(void) { return flag_is_changeable_p(X86_EFLAGS_ID); } @@ -249,11 +251,6 @@ static inline int flag_is_changeable_p(u32 flag) { return 1; } -/* Probe for the CPUID instruction */ -static inline int have_cpuid_p(void) -{ - return 1; -} static inline void squash_the_stupid_serial_number(struct cpuinfo_x86 *c) { } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/microcode_intel.h: Define functions and macros for early loading ucode
Commit-ID: 9cd4d78e21cfdc709b1af516214ec4f69ee0e6bd Gitweb: http://git.kernel.org/tip/9cd4d78e21cfdc709b1af516214ec4f69ee0e6bd Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:22 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:18:50 -0800 x86/microcode_intel.h: Define functions and macros for early loading ucode Define some functions and macros that will be used in early loading ucode. Some of them are moved from microcode_intel.c driver in order to be called in early boot phase before module can be called. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/microcode_intel.h | 85 ++ arch/x86/kernel/Makefile | 3 + arch/x86/kernel/microcode_core.c | 7 +- arch/x86/kernel/microcode_intel.c | 198 + 4 files changed, 122 insertions(+), 171 deletions(-) diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h new file mode 100644 index 000..5356f92 --- /dev/null +++ b/arch/x86/include/asm/microcode_intel.h @@ -0,0 +1,85 @@ +#ifndef _ASM_X86_MICROCODE_INTEL_H +#define _ASM_X86_MICROCODE_INTEL_H + +#include + +struct microcode_header_intel { + unsigned inthdrver; + unsigned intrev; + unsigned intdate; + unsigned intsig; + unsigned intcksum; + unsigned intldrver; + unsigned intpf; + unsigned intdatasize; + unsigned inttotalsize; + unsigned intreserved[3]; +}; + +struct microcode_intel { + struct microcode_header_intel hdr; + unsigned intbits[0]; +}; + +/* microcode format is extended from prescott processors */ +struct extended_signature { + unsigned intsig; + unsigned intpf; + unsigned intcksum; +}; + +struct extended_sigtable { + unsigned intcount; + unsigned intcksum; + unsigned intreserved[3]; + struct extended_signature sigs[0]; +}; + +#define DEFAULT_UCODE_DATASIZE (2000) +#define MC_HEADER_SIZE (sizeof(struct microcode_header_intel)) +#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE) +#define EXT_HEADER_SIZE(sizeof(struct extended_sigtable)) +#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature)) +#define DWSIZE (sizeof(u32)) + +#define get_totalsize(mc) \ + (((struct microcode_intel *)mc)->hdr.totalsize ? \ +((struct microcode_intel *)mc)->hdr.totalsize : \ +DEFAULT_UCODE_TOTALSIZE) + +#define get_datasize(mc) \ + (((struct microcode_intel *)mc)->hdr.datasize ? \ +((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE) + +#define sigmatch(s1, s2, p1, p2) \ + (((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0 + +#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE) + +extern int +get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev); +extern int microcode_sanity_check(void *mc, int print_err); +extern int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev); +extern int +update_match_revision(struct microcode_header_intel *mc_header, int rev); + +#ifdef CONFIG_MICROCODE_INTEL_EARLY +extern void __init load_ucode_intel_bsp(void); +extern void __cpuinit load_ucode_intel_ap(void); +extern void show_ucode_info_early(void); +#else +static inline __init void load_ucode_intel_bsp(void) {} +static inline __cpuinit void load_ucode_intel_ap(void) {} +static inline void show_ucode_info_early(void) {} +#endif + +#if defined(CONFIG_MICROCODE_INTEL_EARLY) && defined(CONFIG_HOTPLUG_CPU) +extern int save_mc_for_early(u8 *mc); +#else +static inline int save_mc_for_early(u8 *mc) +{ + return 0; +} +#endif + +#endif /* _ASM_X86_MICROCODE_INTEL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 34e923a..052abee 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -88,6 +88,9 @@ obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o +obj-$(CONFIG_MICROCODE_EARLY) += microcode_core_early.o +obj-$(CONFIG_MICROCODE_INTEL_EARLY)+= microcode_intel_early.o +obj-$(CONFIG_MICROCODE_INTEL_LIB) += microcode_intel_lib.o microcode-y:= microcode_core.o microcode-$(CONFIG_MICROCODE_INTEL)+= microcode_intel.o microcode-$(CONFIG_MICROCODE_AMD) += microcode_amd.o diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c index 3a04b22..22db92b 100644 --- a/arch/x86/kernel/microcode_core.c +++ b/arch/x86/kernel/microcode_core.c @@ -364,10 +364,7 @@ static struct attribute_group mc_attr_group = { stat
[tip:x86/microcode] x86, doc: Documentation for early microcode loading
Commit-ID: 0d91ea86a895b911fd7d999acb3f600706d9c8cd Gitweb: http://git.kernel.org/tip/0d91ea86a895b911fd7d999acb3f600706d9c8cd Author: Fenghua Yu AuthorDate: Thu, 20 Dec 2012 23:44:21 -0800 Committer: H. Peter Anvin CommitDate: Thu, 31 Jan 2013 13:18:47 -0800 x86, doc: Documentation for early microcode loading Documenation for early loading microcode methodology. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1356075872-3054-2-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- Documentation/x86/early-microcode.txt | 43 +++ 1 file changed, 43 insertions(+) diff --git a/Documentation/x86/early-microcode.txt b/Documentation/x86/early-microcode.txt new file mode 100644 index 000..4aaf0df --- /dev/null +++ b/Documentation/x86/early-microcode.txt @@ -0,0 +1,43 @@ +Early load microcode + +By Fenghua Yu + +Kernel can update microcode in early phase of boot time. Loading microcode early +can fix CPU issues before they are observed during kernel boot time. + +Microcode is stored in an initrd file. The microcode is read from the initrd +file and loaded to CPUs during boot time. + +The format of the combined initrd image is microcode in cpio format followed by +the initrd image (maybe compressed). Kernel parses the combined initrd image +during boot time. The microcode file in cpio name space is: +kernel/x86/microcode/GenuineIntel.bin + +During BSP boot (before SMP starts), if the kernel finds the microcode file in +the initrd file, it parses the microcode and saves matching microcode in memory. +If matching microcode is found, it will be uploaded in BSP and later on in all +APs. + +The cached microcode patch is applied when CPUs resume from a sleep state. + +There are two legacy user space interfaces to load microcode, either through +/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file +in sysfs. + +In addition to these two legacy methods, the early loading method described +here is the third method with which microcode can be uploaded to a system's +CPUs. + +The following example script shows how to generate a new combined initrd file in +/boot/initrd-3.5.0.ucode.img with original microcode microcode.bin and +original initrd image /boot/initrd-3.5.0.img. + +mkdir initrd +cd initrd +mkdir kernel +mkdir kernel/x86 +mkdir kernel/x86/microcode +cp ../microcode.bin kernel/x86/microcode/GenuineIntel.bin +find .|cpio -oc >../ucode.cpio +cd .. +cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
Commit-ID: 474355fe313391de2429ae225e0fb02f67ec6c31 Gitweb: http://git.kernel.org/tip/474355fe313391de2429ae225e0fb02f67ec6c31 Author: Fenghua Yu AuthorDate: Thu, 29 Nov 2012 17:47:43 -0800 Committer: H. Peter Anvin CommitDate: Fri, 30 Nov 2012 15:18:16 -0800 x86/microcode_intel_early.c: Early update ucode on Intel's CPU Implementation of early update ucode on Intel's CPU. load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format ucode followed by ordinary initrd image file. The binary ucode file is stored in kernel/x86/microcode/GenuineIntel/microcode.bin in the cpio data. All ucode patches with the same model as BSP are saved in memory. A matching ucode patch is updated on BSP. load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1354240068-9821-6-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/microcode_intel_early.c | 438 1 file changed, 438 insertions(+) diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c new file mode 100644 index 000..36b1df1 --- /dev/null +++ b/arch/x86/kernel/microcode_intel_early.c @@ -0,0 +1,438 @@ +/* + * Intel CPU Microcode Update Driver for Linux + * + * Copyright (C) 2012 Fenghua Yu + *H Peter Anvin" + * + * This driver allows to early upgrade microcode on Intel processors + * belonging to IA-32 family - PentiumPro, Pentium II, + * Pentium III, Xeon, Pentium 4, etc. + * + * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture + * Software Developer's Manual. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include +#include +#include + +struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT]; +struct mc_saved_data mc_saved_data; + +enum ucode_state +generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p, +unsigned int mc_saved_count, +struct ucode_cpu_info *uci) +{ + struct microcode_intel *ucode_ptr, *new_mc = NULL; + int new_rev = uci->cpu_sig.rev; + enum ucode_state state = UCODE_OK; + unsigned int mc_size; + struct microcode_header_intel *mc_header; + unsigned int csig = uci->cpu_sig.sig; + unsigned int cpf = uci->cpu_sig.pf; + int i; + + for (i = 0; i < mc_saved_count; i++) { + ucode_ptr = mc_saved_p[i]; + mc_header = (struct microcode_header_intel *)ucode_ptr; + mc_size = get_totalsize(mc_header); + if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) { + new_rev = mc_header->rev; + new_mc = ucode_ptr; + } + } + + if (!new_mc) { + state = UCODE_NFOUND; + goto out; + } + + uci->mc = (struct microcode_intel *)new_mc; +out: + return state; +} +EXPORT_SYMBOL_GPL(generic_load_microcode_early); + +static enum ucode_state __init +load_microcode(struct mc_saved_data *mc_saved_data, int cpu) +{ + struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu; + + return generic_load_microcode_early(cpu, mc_saved_data->mc_saved, + mc_saved_data->mc_saved_count, uci); +} + +static u8 get_x86_family(unsigned long sig) +{ + u8 x86; + + x86 = (sig >> 8) & 0xf; + + if (x86 == 0xf) + x86 += (sig >> 20) & 0xff; + + return x86; +} + +static u8 get_x86_model(unsigned long sig) +{ + u8 x86, x86_model; + + x86 = get_x86_family(sig); + x86_model = (sig >> 4) & 0xf; + + if (x86 == 0x6 || x86 == 0xf) + x86_model += ((sig >> 16) & 0xf) << 4; + + return x86_model; +} + +static enum ucode_state +matching_model_microcode(struct microcode_header_intel *mc_header, + unsigned long sig) +{ + u8 x86, x86_model; + u8 x86_ucode, x86_model_ucode; + + x86 = get_x86_family(sig); + x86_model = get_x86_model(sig); + + x86_ucode = get_x86_family(mc_header->sig); + x86_model_ucode = get_x86_model(mc_header->sig); + + if (x86 != x86_ucode || x86_model != x86_model_ucode) + return UCODE_ERROR; + + return UCODE_OK; +} + +static void +save_microcode(struct mc_saved_data *mc_saved_data, + struct microcode_intel **mc_saved_src, + unsigned int mc_saved_count) +{ + int i; + struct microcode_intel **mc_saved_p; + + if (!mc_saved_count) + return; + +
[tip:x86/microcode] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
Commit-ID: da7d824a00ec0f4d19e2b51653410bde0de40226 Gitweb: http://git.kernel.org/tip/da7d824a00ec0f4d19e2b51653410bde0de40226 Author: Fenghua Yu AuthorDate: Thu, 29 Nov 2012 17:47:42 -0800 Committer: H. Peter Anvin CommitDate: Fri, 30 Nov 2012 15:18:15 -0800 x86/microcode_intel_lib.c: Early update ucode on Intel's CPU Define interfaces microcode_sanity_check() and get_matching_microcode(). They are called both in early boot time and in microcode Intel driver. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1354240068-9821-5-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/microcode_intel_lib.c | 174 ++ 1 file changed, 174 insertions(+) diff --git a/arch/x86/kernel/microcode_intel_lib.c b/arch/x86/kernel/microcode_intel_lib.c new file mode 100644 index 000..ce69320 --- /dev/null +++ b/arch/x86/kernel/microcode_intel_lib.c @@ -0,0 +1,174 @@ +/* + * Intel CPU Microcode Update Driver for Linux + * + * Copyright (C) 2012 Fenghua Yu + *H Peter Anvin" + * + * This driver allows to upgrade microcode on Intel processors + * belonging to IA-32 family - PentiumPro, Pentium II, + * Pentium III, Xeon, Pentium 4, etc. + * + * Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture + * Software Developer's Manual + * Order Number 253668 or free download from: + * + * http://developer.intel.com/Assets/PDF/manual/253668.pdf + * + * For more information, go to http://www.urbanmyth.org/microcode + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ +#include +#include +#include +#include + +#include +#include +#include + +static inline int +update_match_cpu(unsigned int csig, unsigned int cpf, +unsigned int sig, unsigned int pf) +{ + return (!sigmatch(sig, csig, pf, cpf)) ? 0 : 1; +} + +int +update_match_revision(struct microcode_header_intel *mc_header, int rev) +{ + return (mc_header->rev <= rev) ? 0 : 1; +} + +int microcode_sanity_check(void *mc, int print_err) +{ + unsigned long total_size, data_size, ext_table_size; + struct microcode_header_intel *mc_header = mc; + struct extended_sigtable *ext_header = NULL; + int sum, orig_sum, ext_sigcount = 0, i; + struct extended_signature *ext_sig; + + total_size = get_totalsize(mc_header); + data_size = get_datasize(mc_header); + + if (data_size + MC_HEADER_SIZE > total_size) { + if (print_err) + pr_err("error! Bad data size in microcode data file\n"); + return -EINVAL; + } + + if (mc_header->ldrver != 1 || mc_header->hdrver != 1) { + if (print_err) + pr_err("error! Unknown microcode update format\n"); + return -EINVAL; + } + ext_table_size = total_size - (MC_HEADER_SIZE + data_size); + if (ext_table_size) { + if ((ext_table_size < EXT_HEADER_SIZE) +|| ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) { + if (print_err) + pr_err("error! Small exttable size in microcode data file\n"); + return -EINVAL; + } + ext_header = mc + MC_HEADER_SIZE + data_size; + if (ext_table_size != exttable_size(ext_header)) { + if (print_err) + pr_err("error! Bad exttable size in microcode data file\n"); + return -EFAULT; + } + ext_sigcount = ext_header->count; + } + + /* check extended table checksum */ + if (ext_table_size) { + int ext_table_sum = 0; + int *ext_tablep = (int *)ext_header; + + i = ext_table_size / DWSIZE; + while (i--) + ext_table_sum += ext_tablep[i]; + if (ext_table_sum) { + if (print_err) + pr_warn("aborting, bad extended signature table checksum\n"); + return -EINVAL; + } + } + + /* calculate the checksum */ + orig_sum = 0; + i = (MC_HEADER_SIZE + data_size) / DWSIZE; + while (i--) + orig_sum += ((int *)mc)[i]; + if (orig_sum) { + if (print_err) + pr_err("aborting, bad checksum\n"); + return -EINVAL; + } + if (!ext_table_size) + return 0; + /* check extended signature checksum */ + for (i = 0; i < ext_sigcount; i++) { + ext_sig = (void *)ext_header + EXT_HEADER_SIZE + +
[tip:x86/microcode] x86/microcode_core_early.c: Define interfaces for early loading ucode
Commit-ID: d42bdf2139115faa4d5bdb0dc591d435a644fde4 Gitweb: http://git.kernel.org/tip/d42bdf2139115faa4d5bdb0dc591d435a644fde4 Author: Fenghua Yu AuthorDate: Thu, 29 Nov 2012 17:47:41 -0800 Committer: H. Peter Anvin CommitDate: Fri, 30 Nov 2012 15:18:15 -0800 x86/microcode_core_early.c: Define interfaces for early loading ucode Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and AP in early boot time. These are generic interfaces. Internally they call vendor specific implementations. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1354240068-9821-4-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/microcode.h | 23 +++ arch/x86/kernel/microcode_core_early.c | 70 ++ 2 files changed, 93 insertions(+) diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h index 43d921b..2e2ff3a 100644 --- a/arch/x86/include/asm/microcode.h +++ b/arch/x86/include/asm/microcode.h @@ -57,4 +57,27 @@ static inline struct microcode_ops * __init init_amd_microcode(void) static inline void __exit exit_amd_microcode(void) {} #endif +struct mc_saved_data { + unsigned int mc_saved_count; + struct microcode_intel **mc_saved; + struct ucode_cpu_info *ucode_cpu_info; +}; +#ifdef CONFIG_MICROCODE_EARLY +#define MAX_UCODE_COUNT 128 +extern struct ucode_cpu_info ucode_cpu_info_early[NR_CPUS]; +extern struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT]; +extern struct mc_saved_data mc_saved_data; +extern void __init load_ucode_bsp(char *real_mode_data); +extern __init void load_ucode_ap(void); +extern void __init +save_microcode_in_initrd(struct mc_saved_data *mc_saved_data, +struct microcode_intel **mc_saved_in_initrd); +#else +static inline void __init load_ucode_bsp(char *real_mode_data) {} +static inline __init void load_ucode_ap(void) {} +static inline void __init +save_microcode_in_initrd(struct mc_saved_data *mc_saved_data, +struct microcode_intel **mc_saved_in_initrd) {} +#endif + #endif /* _ASM_X86_MICROCODE_H */ diff --git a/arch/x86/kernel/microcode_core_early.c b/arch/x86/kernel/microcode_core_early.c new file mode 100644 index 000..1c6cc8f --- /dev/null +++ b/arch/x86/kernel/microcode_core_early.c @@ -0,0 +1,70 @@ +/* + * X86 CPU microcode early update for Linux + * + * Copyright (C) 2012 Fenghua Yu + *H Peter Anvin" + * + * This driver allows to early upgrade microcode on Intel processors + * belonging to IA-32 family - PentiumPro, Pentium II, + * Pentium III, Xeon, Pentium 4, etc. + * + * Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture + * Software Developer's Manual. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include + +struct ucode_cpu_info ucode_cpu_info_early[NR_CPUS]; +EXPORT_SYMBOL_GPL(ucode_cpu_info_early); + +static inline int __init x86_vendor(void) +{ + unsigned int eax = 0x; + char x86_vendor_id[16]; + int i; + struct { + char x86_vendor_id[16]; + __u8 x86_vendor; + } cpu_vendor_table[] = { + { "GenuineIntel", X86_VENDOR_INTEL }, + { "AuthenticAMD", X86_VENDOR_AMD }, + }; + + memset(x86_vendor_id, 0, ARRAY_SIZE(x86_vendor_id)); + /* Get vendor name */ + native_cpuid(&eax, + (unsigned int *)&x86_vendor_id[0], + (unsigned int *)&x86_vendor_id[8], + (unsigned int *)&x86_vendor_id[4]); + + for (i = 0; i < ARRAY_SIZE(cpu_vendor_table); i++) { + if (!strcmp(x86_vendor_id, cpu_vendor_table[i].x86_vendor_id)) + return cpu_vendor_table[i].x86_vendor; + } + + return X86_VENDOR_UNKNOWN; +} + + +void __init load_ucode_bsp(char *real_mode_data) +{ + /* +* boot_cpu_data is not setup yet in this early phase. +* So we get vendor information directly through cpuid. +*/ + if (x86_vendor() == X86_VENDOR_INTEL) + load_ucode_intel_bsp(real_mode_data); +} + +void __cpuinit load_ucode_ap(void) +{ + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + load_ucode_intel_ap(); +} -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/microcode] x86/microcode_intel.h: Define functions and macros for early loading ucode
Commit-ID: 17f1087f1a80d2dfada790c31720eb6a57da2d1f Gitweb: http://git.kernel.org/tip/17f1087f1a80d2dfada790c31720eb6a57da2d1f Author: Fenghua Yu AuthorDate: Thu, 29 Nov 2012 17:47:40 -0800 Committer: H. Peter Anvin CommitDate: Fri, 30 Nov 2012 15:18:14 -0800 x86/microcode_intel.h: Define functions and macros for early loading ucode Define some functions and macros that will be used in early loading ucode. Some of them are moved from microcode_intel.c driver in order to be called in early boot phase before module can be called. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1354240068-9821-3-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/microcode_intel.h | 106 +++ arch/x86/kernel/Makefile | 3 + arch/x86/kernel/microcode_core.c | 7 +- arch/x86/kernel/microcode_intel.c | 185 ++--- 4 files changed, 120 insertions(+), 181 deletions(-) diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h new file mode 100644 index 000..0544bf4 --- /dev/null +++ b/arch/x86/include/asm/microcode_intel.h @@ -0,0 +1,106 @@ +#ifndef _ASM_X86_MICROCODE_INTEL_H +#define _ASM_X86_MICROCODE_INTEL_H + +#include + +struct microcode_header_intel { + unsigned inthdrver; + unsigned intrev; + unsigned intdate; + unsigned intsig; + unsigned intcksum; + unsigned intldrver; + unsigned intpf; + unsigned intdatasize; + unsigned inttotalsize; + unsigned intreserved[3]; +}; + +struct microcode_intel { + struct microcode_header_intel hdr; + unsigned intbits[0]; +}; + +/* microcode format is extended from prescott processors */ +struct extended_signature { + unsigned intsig; + unsigned intpf; + unsigned intcksum; +}; + +struct extended_sigtable { + unsigned intcount; + unsigned intcksum; + unsigned intreserved[3]; + struct extended_signature sigs[0]; +}; + +#define DEFAULT_UCODE_DATASIZE (2000) +#define MC_HEADER_SIZE (sizeof(struct microcode_header_intel)) +#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE) +#define EXT_HEADER_SIZE(sizeof(struct extended_sigtable)) +#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature)) +#define DWSIZE (sizeof(u32)) + +#define get_totalsize(mc) \ + (((struct microcode_intel *)mc)->hdr.totalsize ? \ +((struct microcode_intel *)mc)->hdr.totalsize : \ +DEFAULT_UCODE_TOTALSIZE) + +#define get_datasize(mc) \ + (((struct microcode_intel *)mc)->hdr.datasize ? \ +((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE) + +#define sigmatch(s1, s2, p1, p2) \ + (((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0 + +#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE) + +extern int +get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev); +extern int microcode_sanity_check(void *mc, int print_err); +extern int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev); +extern int +update_match_revision(struct microcode_header_intel *mc_header, int rev); + +#ifdef CONFIG_MICROCODE_INTEL_EARLY +extern enum ucode_state +get_matching_model_microcode(int cpu, void *data, size_t size, +struct mc_saved_data *mc_saved_data, +struct microcode_intel **mc_saved_in_initrd, +enum system_states system_state); +extern enum ucode_state +generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p, +unsigned int mc_saved_count, +struct ucode_cpu_info *uci); +extern void __init +load_ucode_intel_bsp(char *real_mode_data); +extern void __init load_ucode_intel_ap(void); +#else +static inline enum ucode_state +get_matching_model_microcode(int cpu, void *data, size_t size, +struct mc_saved_data *mc_saved_data, +struct microcode_intel **mc_saved_in_initrd, +enum system_states system_state) +{ + return UCODE_ERROR; +} +static inline enum ucode_state +generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p, +unsigned int mc_saved_count, +struct ucode_cpu_info *uci) +{ + return UCODE_ERROR; +} +static inline __init void +load_ucode_intel_bsp(char *real_mode_data) +{ +} +static inline __init void +load_ucode_intel_ap(struct ucode_cpu_info *uci, + struct mc_saved_data *mc_saved_data) +{ +} +#endif + +#endif /* _ASM_X86_MICROCO
[tip:x86/microcode] x86, doc: Early microcode loading
Commit-ID: 31ae1d90c127310c67618b8bd79f01c394116187 Gitweb: http://git.kernel.org/tip/31ae1d90c127310c67618b8bd79f01c394116187 Author: Fenghua Yu AuthorDate: Fri, 30 Nov 2012 07:45:51 -0800 Committer: H. Peter Anvin CommitDate: Fri, 30 Nov 2012 15:18:14 -0800 x86, doc: Early microcode loading Documenation for early microcode loading. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1354290351-20988-1-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- Documentation/x86/early-microcode.txt | 43 +++ 1 file changed, 43 insertions(+) diff --git a/Documentation/x86/early-microcode.txt b/Documentation/x86/early-microcode.txt new file mode 100644 index 000..4aaf0df --- /dev/null +++ b/Documentation/x86/early-microcode.txt @@ -0,0 +1,43 @@ +Early load microcode + +By Fenghua Yu + +Kernel can update microcode in early phase of boot time. Loading microcode early +can fix CPU issues before they are observed during kernel boot time. + +Microcode is stored in an initrd file. The microcode is read from the initrd +file and loaded to CPUs during boot time. + +The format of the combined initrd image is microcode in cpio format followed by +the initrd image (maybe compressed). Kernel parses the combined initrd image +during boot time. The microcode file in cpio name space is: +kernel/x86/microcode/GenuineIntel.bin + +During BSP boot (before SMP starts), if the kernel finds the microcode file in +the initrd file, it parses the microcode and saves matching microcode in memory. +If matching microcode is found, it will be uploaded in BSP and later on in all +APs. + +The cached microcode patch is applied when CPUs resume from a sleep state. + +There are two legacy user space interfaces to load microcode, either through +/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file +in sysfs. + +In addition to these two legacy methods, the early loading method described +here is the third method with which microcode can be uploaded to a system's +CPUs. + +The following example script shows how to generate a new combined initrd file in +/boot/initrd-3.5.0.ucode.img with original microcode microcode.bin and +original initrd image /boot/initrd-3.5.0.img. + +mkdir initrd +cd initrd +mkdir kernel +mkdir kernel/x86 +mkdir kernel/x86/microcode +cp ../microcode.bin kernel/x86/microcode/GenuineIntel.bin +find .|cpio -oc >../ucode.cpio +cd .. +cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/bsp-hotplug] x86, topology: Debug CPU0 hotplug
Commit-ID: a71c8bc5dfefbbf80ef90739791554ef7ea4401b Gitweb: http://git.kernel.org/tip/a71c8bc5dfefbbf80ef90739791554ef7ea4401b Author: Fenghua Yu AuthorDate: Tue, 13 Nov 2012 11:32:51 -0800 Committer: H. Peter Anvin CommitDate: Wed, 14 Nov 2012 15:28:11 -0800 x86, topology: Debug CPU0 hotplug CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined. User can online CPU0 back after boot time. The default value of the switch is off. To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving cpu0_hotplug kernel parameter at boot. It's safe and early place to take down CPU0 after all hotplug notifiers are installed and SMP is booted. Please note that some applications or drivers, e.g. some versions of udevd, during boot time may put CPU0 online again in this CPU0 hotplug debug mode. In this debug mode, setup_local_APIC() may report a warning on max_loops<=0 when CPU0 is onlined back after boot time. This is because pending interrupt in IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on other CPUs as well. It is harmless except the first CPU0 online takes a bit longer time. And so this debug mode is useful to expose this issue. I'll send a seperate patch to fix this generic warning issue. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/Kconfig | 15 ++ arch/x86/include/asm/cpu.h | 3 +++ arch/x86/kernel/topology.c | 51 ++ arch/x86/power/cpu.c | 38 ++ 4 files changed, 107 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 036e89a..b6cfa5f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1727,6 +1727,21 @@ config BOOTPARAM_HOTPLUG_CPU0 You still can enable the CPU0 hotplug feature at boot by kernel parameter cpu0_hotplug. +config DEBUG_HOTPLUG_CPU0 + def_bool n + prompt "Debug CPU0 hotplug" + depends on HOTPLUG_CPU && EXPERIMENTAL + ---help--- + Enabling this option offlines CPU0 (if CPU0 can be offlined) as + soon as possible and boots up userspace with CPU0 offlined. User + can online CPU0 back after boot time. + + To debug CPU0 hotplug, you need to enable CPU0 offline/online + feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during + compilation or giving cpu0_hotplug kernel parameter at boot. + + If unsure, say N. + config COMPAT_VDSO def_bool y prompt "Compat VDSO support" diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index a119572..5f9a124 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -29,6 +29,9 @@ struct x86_cpu { extern int arch_register_cpu(int num); extern void arch_unregister_cpu(int); extern void __cpuinit start_cpu0(void); +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 +extern int _debug_hotplug_cpu(int cpu, int action); +#endif #endif DECLARE_PER_CPU(int, cpu_state); diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c index 0e7b4a7..6e60b5f 100644 --- a/arch/x86/kernel/topology.c +++ b/arch/x86/kernel/topology.c @@ -50,6 +50,57 @@ static int __init enable_cpu0_hotplug(char *str) __setup("cpu0_hotplug", enable_cpu0_hotplug); #endif +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 +/* + * This function offlines a CPU as early as possible and allows userspace to + * boot up without the CPU. The CPU can be onlined back by user after boot. + * + * This is only called for debugging CPU offline/online feature. + */ +int __ref _debug_hotplug_cpu(int cpu, int action) +{ + struct device *dev = get_cpu_device(cpu); + int ret; + + if (!cpu_is_hotpluggable(cpu)) + return -EINVAL; + + cpu_hotplug_driver_lock(); + + switch (action) { + case 0: + ret = cpu_down(cpu); + if (!ret) { + pr_info("CPU %u is now offline\n", cpu); + kobject_uevent(&dev->kobj, KOBJ_OFFLINE); + } else + pr_debug("Can't offline CPU%d.\n", cpu); + break; + case 1: + ret = cpu_up(cpu); + if (!ret) + kobject_uevent(&dev->kobj, KOBJ_ONLINE); + else + pr_debug("Can't online CPU%d.\n", cpu); + break; + default: + ret = -EINVAL; + } + + cpu_hotplug_driver_unlock(); + + return ret; +} + +static int __init debug_hotplug_cpu(void) +{ + _debug_hotplug_cpu(0, 0); + return 0; +} + +late_initcall_sync(debug_hotplug_cpu); +#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */ + int __ref arch_regis
[tip:x86/bsp-hotplug] x86, hotplug: The first online processor saves the MTRR state
Commit-ID: 30242aa6023b71325c6b8addac06faf544a85fd0 Gitweb: http://git.kernel.org/tip/30242aa6023b71325c6b8addac06faf544a85fd0 Author: Fenghua Yu AuthorDate: Tue, 13 Nov 2012 11:32:48 -0800 Committer: H. Peter Anvin CommitDate: Wed, 14 Nov 2012 15:28:10 -0800 x86, hotplug: The first online processor saves the MTRR state Ask the first online CPU to save mtrr instead of asking BSP. BSP could be offline when mtrr_save_state() is called. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-12-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/cpu/mtrr/main.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 6b96110..e4c1a41 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -695,11 +695,16 @@ void mtrr_ap_init(void) } /** - * Save current fixed-range MTRR state of the BSP + * Save current fixed-range MTRR state of the first cpu in cpu_online_mask. */ void mtrr_save_state(void) { - smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1); + int first_cpu; + + get_online_cpus(); + first_cpu = cpumask_first(cpu_online_mask); + smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); + put_online_cpus(); } void set_mtrr_aps_delayed_init(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/bsp-hotplug] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
Commit-ID: e1c467e69040c3be68959332959c07fb3d818e87 Gitweb: http://git.kernel.org/tip/e1c467e69040c3be68959332959c07fb3d818e87 Author: Fenghua Yu AuthorDate: Wed, 14 Nov 2012 04:36:53 -0800 Committer: H. Peter Anvin CommitDate: Wed, 14 Nov 2012 15:28:03 -0800 x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap code which is not a desired behavior for waking up BSP. To avoid the boot-strap code, wake up CPU0 by NMI instead. This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e. physically hot removed and then hot added), NMI won't wake it up. We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available. AP is still waken up as before by INIT, SIPI, SIPI sequence. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua...@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/cpu.h | 1 + arch/x86/kernel/smpboot.c | 111 ++--- 2 files changed, 105 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index 4564c8e..a119572 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -28,6 +28,7 @@ struct x86_cpu { #ifdef CONFIG_HOTPLUG_CPU extern int arch_register_cpu(int num); extern void arch_unregister_cpu(int); +extern void __cpuinit start_cpu0(void); #endif DECLARE_PER_CPU(int, cpu_state); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c297907..ef53e66 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -138,15 +138,17 @@ static void __cpuinit smp_callin(void) * we may get here before an INIT-deassert IPI reaches * our local APIC. We have to wait for the IPI or we'll * lock up on an APIC access. +* +* Since CPU0 is not wakened up by INIT, it doesn't wait for the IPI. */ - if (apic->wait_for_init_deassert) + cpuid = smp_processor_id(); + if (apic->wait_for_init_deassert && cpuid != 0) apic->wait_for_init_deassert(&init_deasserted); /* * (This works even if the APIC is not enabled.) */ phys_id = read_apic_id(); - cpuid = smp_processor_id(); if (cpumask_test_cpu(cpuid, cpu_callin_mask)) { panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__, phys_id, cpuid); @@ -228,6 +230,8 @@ static void __cpuinit smp_callin(void) cpumask_set_cpu(cpuid, cpu_callin_mask); } +static int cpu0_logical_apicid; +static int enable_start_cpu0; /* * Activate a secondary processor. */ @@ -243,6 +247,8 @@ notrace static void __cpuinit start_secondary(void *unused) preempt_disable(); smp_callin(); + enable_start_cpu0 = 0; + #ifdef CONFIG_X86_32 /* switch away from the initial page table */ load_cr3(swapper_pg_dir); @@ -492,7 +498,7 @@ void __inquire_remote_apic(int apicid) * won't ... remember to clear down the APIC, etc later. */ int __cpuinit -wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip) +wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip) { unsigned long send_status, accept_status = 0; int maxlvt; @@ -500,7 +506,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip) /* Target chip */ /* Boot on the stack */ /* Kick the second */ - apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid); + apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid); pr_debug("Waiting for send to finish...\n"); send_status = safe_apic_wait_icr_idle(); @@ -660,6 +666,63 @@ static void __cpuinit announce_cpu(int cpu, int apicid) node, cpu, apicid); } +static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs) +{ + int cpu; + + cpu = smp_processor_id(); + if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0) + return NMI_HANDLED; + + return NMI_DONE; +} + +/* + * Wake up AP by INIT, INIT, STARTUP sequence. + * + * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS + * boot-strap code which is not a desired behavior for waking up BSP. To + * void the boot-strap code, wake up CPU0 by NMI instead. + * + * This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined + * (i.e. physically hot removed and then hot added), NMI won't wake it up. + * We'll change this code in the future to wake up hard offlined CPU0 if + * real platform and request are available. + */ +static int __cpuinit +wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid, + int *cpu0_nmi_registered) +{ + int id; + int boot_error; + + /* +* Wake
[tip:x86/bsp-hotplug] kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
Commit-ID: 6e32d479db6079dd5d4309aa66aecbcf2664a5fe Gitweb: http://git.kernel.org/tip/6e32d479db6079dd5d4309aa66aecbcf2664a5fe Author: Fenghua Yu AuthorDate: Tue, 13 Nov 2012 11:32:43 -0800 Committer: H. Peter Anvin CommitDate: Wed, 14 Nov 2012 09:39:50 -0800 kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback cpu_hotplug_pm_callback should have higher priority than bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug to avoid race during bsp online checking. This is to hightlight the priorities between the two callbacks in case people may overlook the order. Ideally the priorities should be defined in macro/enum instead of fixed values. To do that, a seperate patchset may be pushed which will touch serveral other generic files and is out of scope of this patchset. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua...@intel.com Reviewed-by: Srivatsa S. Bhat Acked-by: Rafael J. Wysocki Signed-off-by: H. Peter Anvin --- kernel/cpu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index 42bd331..a2491a2 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -601,6 +601,11 @@ cpu_hotplug_pm_callback(struct notifier_block *nb, static int __init cpu_hotplug_pm_sync_init(void) { + /* +* cpu_hotplug_pm_callback has higher priority than x86 +* bsp_pm_callback which depends on cpu_hotplug_pm_callback +* to disable cpu hotplug to avoid cpu hotplug race. +*/ pm_notifier(cpu_hotplug_pm_callback, 0); return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/