On 10/09, Manfred Spraul wrote: >Hi, > >On 10/09/2016 09:05 AM, kernel test robot wrote: >>FYI, we noticed a -8.8% regression of aim9.shared_memory.ops_per_sec due to >>commit: >> >>commit 0882cba0a03bca73acd8fab8fb50db04691908e9 ("ipc/sem.c: fix >>complex_count vs. simple op race") >>https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master >> >>in testcase: aim9 >>on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G >>memory >>with following parameters: >> >> testtime: 300s >> test: shared_memory >> cpufreq_governor: performance >> >>Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic >>benchmark. >> >> >It is probably caused by this change: >>--- ipc/sem-fast-but-wrong.c 2016-10-09 19:24:47.914825410 +0200 >>+++ ipc/sem.c 2016-10-09 19:24:57.960841540 +0200 >>@@ -363,6 +363,14 @@ >> */ >> spin_lock(&sem->lock); >> >>+ /* >>+ * See 51d7d5205d33 >>+ * ("powerpc: Add smp_mb() to arch_spin_is_locked()"): >>+ * A full barrier is required: the write of sem->lock >>+ * must be visible before the read is executed >>+ */ >>+ smp_mb(); >>+ >> if (!smp_load_acquire(&sma->complex_mode)) { >> /* fast path successful! */ >> return sops->sem_num; >Unfortunately, we need it, at least for powerpc. >And I do not want to add a CONFIG_PPC into ipc/sem.c. > >Is it possible to do a test what happens with patch that avoids >spin_unlock_wait()? > >https://patchwork.kernel.org/patch/9359365/ >https://patchwork.kernel.org/patch/9359371/ > >(if possible, with both patches together)
I applied these two patches on top of 0882cba ("ipc/sem.c: fix complex_count vs. simple op race"), seems they do help recover the performance back, here is the comparison: ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/shared_memory/aim9/300s commit: bfe738a6c274df47650d09360babc8c1b71ff2a8 0882cba0a03bca73acd8fab8fb50db04691908e9 cc5f1c2b5da85fdf3de370189765af88c8df7b55 ("ipc/sem: Add hysteresis") bfe738a6c274df47 0882cba0a03bca73acd8fab8fb cc5f1c2b5da85fdf3de3701897 ---------------- -------------------------- -------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | %stddev %change %stddev %change %stddev \ | \ | \ 1377511 ± 0% -8.8% 1256420 ± 0% -0.5% 1370192 ± 0% aim9.shared_memory.ops_per_sec 4133531 ± 0% -8.8% 3770259 ± 0% -0.5% 4111578 ± 0% aim9.time.minor_page_faults 8106 ± 6% -7.7% 7483 ± 8% +24.5% 10092 ± 7% cpuidle.POLL.usage 9.66 ± 0% -2.0% 9.46 ± 0% +0.5% 9.71 ± 0% turbostat.CorWatt 2374 ± 5% -12.4% 2078 ± 1% -5.5% 2243 ± 5% slabinfo.kmalloc-256.active_objs 697.75 ± 4% +33.5% 931.25 ± 5% +43.5% 1001 ± 4% slabinfo.kmalloc-512.active_objs 719.50 ± 3% +32.3% 951.75 ± 5% +41.8% 1020 ± 4% slabinfo.kmalloc-512.num_objs 5184 ± 2% -12.3% 4544 ± 8% -5.9% 4877 ± 8% slabinfo.kmalloc-64.active_objs 5184 ± 2% -12.3% 4544 ± 8% -5.9% 4877 ± 8% slabinfo.kmalloc-64.num_objs 2.124e+09 ± 3% +0.9% 2.144e+09 ± 9% +14.8% 2.439e+09 ± 9% perf-stat.cache-references 10295 ± 6% -8.9% 9376 ± 1% +6.0% 10912 ± 7% perf-stat.cpu-migrations 0.04 ± 24% +59.5% 0.06 ± 51% +508.5% 0.24 ±123% perf-stat.dTLB-load-miss-rate% 5.346e+11 ± 3% -13.1% 4.647e+11 ± 7% +2.9% 5.501e+11 ± 8% perf-stat.dTLB-loads 1.61 ± 1% -9.2% 1.46 ± 1% -0.6% 1.60 ± 1% perf-stat.ipc 4417053 ± 0% -8.2% 4053763 ± 0% -0.5% 4394973 ± 0% perf-stat.minor-faults 4417054 ± 0% -8.2% 4053763 ± 0% -0.5% 4394973 ± 0% perf-stat.page-faults 4478439 ±144% +83.7% 8227939 ± 90% -87.4% 564387 ± 21% sched_debug.cfs_rq:/.load.max 1912666 ±145% +84.7% 3531796 ± 91% -87.7% 235296 ± 19% sched_debug.cfs_rq:/.load.stddev 755.19 ± 79% +4.3% 787.48 ± 63% -55.2% 338.70 ± 15% sched_debug.cfs_rq:/.load_avg.avg 186.08 ± 9% -10.1% 167.38 ± 15% -39.9% 111.77 ± 35% sched_debug.cfs_rq:/.load_avg.min 86.83 ± 22% -8.6% 79.38 ± 31% -38.0% 53.83 ± 27% sched_debug.cfs_rq:/.util_avg.min 222488 ± 2% +0.2% 222984 ± 6% -22.8% 171691 ± 15% sched_debug.cpu.avg_idle.stddev 62.58 ± 56% -18.7% 50.90 ± 41% -93.8% 3.86 ±168% sched_debug.cpu.clock.stddev 62.58 ± 56% -18.7% 50.90 ± 41% -93.8% 3.86 ±168% sched_debug.cpu.clock_task.stddev 4500146 ±144% +82.7% 8220704 ± 92% -87.5% 564387 ± 21% sched_debug.cpu.load.max 1920553 ±145% +83.6% 3525404 ± 92% -87.7% 235296 ± 19% sched_debug.cpu.load.stddev 1.52 ± 11% -26.7% 1.11 ± 33% -24.9% 1.14 ± 18% sched_debug.cpu.nr_running.avg 18404 ± 42% +39.9% 25755 ± 11% +66.9% 30725 ± 7% sched_debug.cpu.ttwu_count.min 7.90 ± 26% -30.4% 5.50 ± 37% -13.0% 6.88 ± 45% sched_debug.rt_rq:/.rt_time.avg 31.58 ± 26% -30.4% 21.97 ± 37% -13.2% 27.42 ± 45% sched_debug.rt_rq:/.rt_time.max 13.67 ± 26% -30.5% 9.51 ± 37% -13.2% 11.86 ± 45% sched_debug.rt_rq:/.rt_time.stddev 3.17 ± 9% -26.7% 2.32 ± 13% +2.0% 3.23 ± 12% perf-profile.calltrace.cycles-pp._raw_spin_lock.sys_semop.entry_SYSCALL_64_fastpath 1.16 ± 4% -15.9% 0.98 ± 14% +1.0% 1.17 ± 11% perf-profile.calltrace.cycles-pp.do_munmap.sys_shmdt.entry_SYSCALL_64_fastpath 5.25 ± 8% -15.7% 4.43 ± 4% -2.3% 5.13 ± 15% perf-profile.calltrace.cycles-pp.do_smart_update.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath 1.50 ± 5% +68.6% 2.53 ± 10% -36.2% 0.96 ± 28% perf-profile.calltrace.cycles-pp.pid_vnr.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath 1.24 ± 4% -15.3% 1.05 ± 12% +0.6% 1.25 ± 11% perf-profile.calltrace.cycles-pp.sys_shmdt.entry_SYSCALL_64_fastpath 0.99 ± 12% -20.5% 0.79 ± 6% -5.7% 0.93 ± 17% perf-profile.calltrace.cycles-pp.wake_const_ops.do_smart_wakeup_zero.do_smart_update.SYSC_semtimedop.sys_semop 3.45 ± 9% -24.8% 2.60 ± 12% +1.7% 3.51 ± 13% perf-profile.children.cycles-pp._raw_spin_lock 1.18 ± 4% -15.7% 0.99 ± 14% +0.9% 1.19 ± 12% perf-profile.children.cycles-pp.do_munmap 5.61 ± 8% -15.4% 4.75 ± 3% -3.7% 5.40 ± 15% perf-profile.children.cycles-pp.do_smart_update 1.78 ± 5% +56.7% 2.80 ± 11% -29.1% 1.27 ± 22% perf-profile.children.cycles-pp.pid_vnr 1.24 ± 4% -15.3% 1.05 ± 12% +0.8% 1.25 ± 11% perf-profile.children.cycles-pp.sys_shmdt 1.24 ± 13% -24.7% 0.93 ± 8% -12.7% 1.08 ± 16% perf-profile.children.cycles-pp.wake_const_ops 9.86 ± 8% +47.0% 14.49 ± 9% -8.5% 9.02 ± 13% perf-profile.self.cycles-pp.SYSC_semtimedop 3.45 ± 9% -25.2% 2.58 ± 13% +1.9% 3.51 ± 13% perf-profile.self.cycles-pp._raw_spin_lock 2.26 ± 9% -26.8% 1.65 ± 7% -11.0% 2.01 ± 18% perf-profile.self.cycles-pp.do_smart_update 1.78 ± 5% +56.7% 2.80 ± 11% -29.1% 1.27 ± 22% perf-profile.self.cycles-pp.pid_vnr 1.03 ± 2% -15.3% 0.87 ± 8% -8.7% 0.94 ± 25% perf-profile.self.cycles-pp.security_ipc_permission 1.32 ± 5% -4.0% 1.27 ± 14% +14.8% 1.51 ± 8% perf-profile.self.cycles-pp.security_sem_semop 1.24 ± 13% -24.7% 0.93 ± 8% -12.7% 1.08 ± 16% perf-profile.self.cycles-pp.wake_const_ops Thanks, Xiaolong > >Thanks, > Manfred