Re: [GIT PULL] perf x86 updates for v3.20
On 02/15/2015 11:48 PM, Ingo Molnar wrote: Linus, Please pull the latest perf-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus # HEAD: a66734297f78707ce39d756b656bfae861d53f62 perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks [...] The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms. That's about the best I could benchmark, too -- if it was more than about 50ns, I'm pretty sure I wouldn't seen a difference, but, as it stands, it seems to have been lost in the noise. Maybe I should find a better benchmark. In any event, this series is probably a mixed bag performance-wise. In the best base, there's a small extra cost in context switches, and, when switching PCE, there's a CR4 write. On SVM guests, the CR4 write will suck. To balance that out, I removed a CR4 read from VMX entry and from global TLB flushes. The former mostly fixes a performance regression from a security fix a few releases back, and the I expect that the latter will more than offset the added context switch overhead (especially on SVM guests, where even CR4 reads exit AFAIK). Anyway, I tried and failed to detect any difference at all. Context switch timing was very noisy for me. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] perf x86 updates for v3.20
On 02/15/2015 11:48 PM, Ingo Molnar wrote: Linus, Please pull the latest perf-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus # HEAD: a66734297f78707ce39d756b656bfae861d53f62 perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks [...] The extra CR4 manipulation adds ~ 50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms. That's about the best I could benchmark, too -- if it was more than about 50ns, I'm pretty sure I wouldn't seen a difference, but, as it stands, it seems to have been lost in the noise. Maybe I should find a better benchmark. In any event, this series is probably a mixed bag performance-wise. In the best base, there's a small extra cost in context switches, and, when switching PCE, there's a CR4 write. On SVM guests, the CR4 write will suck. To balance that out, I removed a CR4 read from VMX entry and from global TLB flushes. The former mostly fixes a performance regression from a security fix a few releases back, and the I expect that the latter will more than offset the added context switch overhead (especially on SVM guests, where even CR4 reads exit AFAIK). Anyway, I tried and failed to detect any difference at all. Context switch timing was very noisy for me. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] perf x86 updates for v3.20
Linus, Please pull the latest perf-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus # HEAD: a66734297f78707ce39d756b656bfae861d53f62 perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks ( I'm sending these changes from Andy Lutomirski separately because they were based on other bits that went upstream in this cycle. ) This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms. ( Note: shortlog and diffstat created manually due to the somewhat unusual merge base - hopefully the result is still fine. ) Thanks, Ingo --> Andy Lutomirski (7): x86: Clean up cr4 manipulation x86: Store a per-cpu shadow copy of CR4 x86: Add a comment clarifying LDT context switching perf: Add pmu callbacks to track event mapping and unmapping perf: Pass the event to arch_perf_update_userpage() perf/x86: Only allow rdpmc if a perf_event is mapped perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks Ingo Molnar (1): Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patches arch/x86/include/asm/mmu.h | 2 ++ arch/x86/include/asm/mmu_context.h | 33 +- arch/x86/include/asm/paravirt.h | 6 ++--- arch/x86/include/asm/processor.h | 33 -- arch/x86/include/asm/special_insns.h | 6 ++--- arch/x86/include/asm/tlbflush.h | 77 ++-- arch/x86/include/asm/virtext.h | 5 ++-- arch/x86/kernel/acpi/sleep.c | 2 +- arch/x86/kernel/cpu/common.c | 17 ++ arch/x86/kernel/cpu/mcheck/mce.c | 3 ++- arch/x86/kernel/cpu/mcheck/p5.c | 3 ++- arch/x86/kernel/cpu/mcheck/winchip.c | 3 ++- arch/x86/kernel/cpu/mtrr/cyrix.c | 6 ++--- arch/x86/kernel/cpu/mtrr/generic.c | 6 ++--- arch/x86/kernel/cpu/perf_event.c | 76 +-- arch/x86/kernel/cpu/perf_event.h | 2 ++ arch/x86/kernel/head32.c | 1 + arch/x86/kernel/head64.c | 2 ++ arch/x86/kernel/i387.c | 3 ++- arch/x86/kernel/process.c| 5 ++-- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/setup.c | 2 +- arch/x86/kernel/xsave.c | 3 ++- arch/x86/kvm/svm.c | 2 +- arch/x86/kvm/vmx.c | 10 arch/x86/mm/fault.c | 2 +- arch/x86/mm/init.c | 13 -- arch/x86/mm/tlb.c| 3 --- arch/x86/power/cpu.c | 11 - arch/x86/realmode/init.c | 2 +- arch/x86/xen/enlighten.c | 4 ++-- drivers/lguest/x86/core.c| 5 ++-- include/linux/perf_event.h | 7 ++ kernel/events/core.c | 14 +-- 35 files changed, 253 insertions(+), 120 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] perf x86 updates for v3.20
Linus, Please pull the latest perf-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus # HEAD: a66734297f78707ce39d756b656bfae861d53f62 perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks ( I'm sending these changes from Andy Lutomirski separately because they were based on other bits that went upstream in this cycle. ) This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ 50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms. ( Note: shortlog and diffstat created manually due to the somewhat unusual merge base - hopefully the result is still fine. ) Thanks, Ingo -- Andy Lutomirski (7): x86: Clean up cr4 manipulation x86: Store a per-cpu shadow copy of CR4 x86: Add a comment clarifying LDT context switching perf: Add pmu callbacks to track event mapping and unmapping perf: Pass the event to arch_perf_update_userpage() perf/x86: Only allow rdpmc if a perf_event is mapped perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks Ingo Molnar (1): Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patches arch/x86/include/asm/mmu.h | 2 ++ arch/x86/include/asm/mmu_context.h | 33 +- arch/x86/include/asm/paravirt.h | 6 ++--- arch/x86/include/asm/processor.h | 33 -- arch/x86/include/asm/special_insns.h | 6 ++--- arch/x86/include/asm/tlbflush.h | 77 ++-- arch/x86/include/asm/virtext.h | 5 ++-- arch/x86/kernel/acpi/sleep.c | 2 +- arch/x86/kernel/cpu/common.c | 17 ++ arch/x86/kernel/cpu/mcheck/mce.c | 3 ++- arch/x86/kernel/cpu/mcheck/p5.c | 3 ++- arch/x86/kernel/cpu/mcheck/winchip.c | 3 ++- arch/x86/kernel/cpu/mtrr/cyrix.c | 6 ++--- arch/x86/kernel/cpu/mtrr/generic.c | 6 ++--- arch/x86/kernel/cpu/perf_event.c | 76 +-- arch/x86/kernel/cpu/perf_event.h | 2 ++ arch/x86/kernel/head32.c | 1 + arch/x86/kernel/head64.c | 2 ++ arch/x86/kernel/i387.c | 3 ++- arch/x86/kernel/process.c| 5 ++-- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/setup.c | 2 +- arch/x86/kernel/xsave.c | 3 ++- arch/x86/kvm/svm.c | 2 +- arch/x86/kvm/vmx.c | 10 arch/x86/mm/fault.c | 2 +- arch/x86/mm/init.c | 13 -- arch/x86/mm/tlb.c| 3 --- arch/x86/power/cpu.c | 11 - arch/x86/realmode/init.c | 2 +- arch/x86/xen/enlighten.c | 4 ++-- drivers/lguest/x86/core.c| 5 ++-- include/linux/perf_event.h | 7 ++ kernel/events/core.c | 14 +-- 35 files changed, 253 insertions(+), 120 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/