Re: [linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement
On 8/15/23, Linus Torvalds wrote: > On Tue, 15 Aug 2023 at 07:12, kernel test robot > wrote: >> >> kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec >> on: > > WTF? That's ridiculous. Why would that even test new_inode() at all? > And why would it make any difference anyway to prefetch a new inode? > The 'zero' test claims to just read /dev/zero in a loop... > > [ Goes looking ] > Ye man, I was puzzled myself but just figured it out and was about to respond ;) # bpftrace -e 'kprobe:new_inode { @[kstack()] = count(); }' Attaching 1 probe... @[ new_inode+1 shmem_get_inode+137 __shmem_file_setup+195 shmem_zero_setup+46 mmap_region+1937 do_mmap+956 vm_mmap_pgoff+224 do_syscall_64+46 entry_SYSCALL_64_after_hwframe+115 ]: 2689570 the bench is doing this *A LOT* and this looks so fishy, for the bench itself and the kernel doing it, but I'm not going to dig into any of that. >> 39.35-0.3 39.09 >> perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup > > Ahh. It also does the mmap side, and the shared case ends up always > creating a new inode. > > And while the test only tests *reading* and the mmap is read-only, the > /dev/zero file descriptor was opened for writing too, for a different > part of a test. > > So even though the mapping is never written to, MAYWRITE is set, and > so the /dev/zero mapping is done as a shared memory mapping and we > can't do it as just a private one. > > That's kind of stupid and looks unintentional, but whatever. > > End result: that benchmark ends up being at least partly (and a fairly > noticeable part) a shmem setup benchmark, for no actual good reason. > > Oh well. I certainly don't mind the removal apparently then also > helping some odd benchmark case, but I don't think this translates to > anything real. Very random. > > Linus > -- Mateusz Guzik
Re: [linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement
On Tue, 15 Aug 2023 at 07:12, kernel test robot wrote: > > kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on: WTF? That's ridiculous. Why would that even test new_inode() at all? And why would it make any difference anyway to prefetch a new inode? The 'zero' test claims to just read /dev/zero in a loop... [ Goes looking ] > 39.35-0.3 39.09 > perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup Ahh. It also does the mmap side, and the shared case ends up always creating a new inode. And while the test only tests *reading* and the mmap is read-only, the /dev/zero file descriptor was opened for writing too, for a different part of a test. So even though the mapping is never written to, MAYWRITE is set, and so the /dev/zero mapping is done as a shared memory mapping and we can't do it as just a private one. That's kind of stupid and looks unintentional, but whatever. End result: that benchmark ends up being at least partly (and a fairly noticeable part) a shmem setup benchmark, for no actual good reason. Oh well. I certainly don't mind the removal apparently then also helping some odd benchmark case, but I don't think this translates to anything real. Very random. Linus
[linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement
Hello, kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on: commit: c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 ("locking: remove spin_lock_prefetch") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s class: memory test: zero cpufreq_governor: performance Details are as below: --> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20230815/202308151426.97be5bd8-oliver.s...@intel.com = class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: memory/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/zero/stress-ng/60s commit: 3feecb1b84 ("Merge tag 'char-misc-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc") c8afaa1b0f ("locking: remove spin_lock_prefetch") 3feecb1b848359b1 c8afaa1b0f8bc93d013ab2ea6b9 --- %stddev %change %stddev \ |\ 20.98 ± 8% +12.7% 23.65 ± 4% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 21.05 ± 8%+803.4% 190.14 ±196% perf-sched.total_sch_delay.max.ms 46437+2.4% 47564 stress-ng.time.involuntary_context_switches 87942414+6.3% 93441484stress-ng.time.minor_page_faults 21983137+6.3% 23357886stress-ng.zero.ops 366380+6.3% 389295stress-ng.zero.ops_per_sec 100683+4.1% 104861 ± 2% proc-vmstat.nr_shmem 60215587+6.2% 63957836proc-vmstat.numa_hit 60148996+6.2% 63889951proc-vmstat.numa_local 22046746+6.2% 23421583proc-vmstat.pgactivate 83092777+6.3% 88309102proc-vmstat.pgalloc_normal 88854159+6.1% 94276960proc-vmstat.pgfault 82294936+6.3% 87489838proc-vmstat.pgfree 21970411+6.3% 23344438proc-vmstat.unevictable_pgs_culled 21970116+6.3% 23344165 proc-vmstat.unevictable_pgs_mlocked 21970115+6.3% 23344164 proc-vmstat.unevictable_pgs_munlocked 21970113+6.3% 23344161 proc-vmstat.unevictable_pgs_rescued 1.455e+10+4.2% 1.517e+10perf-stat.i.branch-instructions 58358654+5.0% 61304729perf-stat.i.branch-misses 1.12e+08+5.2% 1.179e+08perf-stat.i.cache-misses 2.569e+08+5.1% 2.698e+08perf-stat.i.cache-references 3.32-4.4% 3.17perf-stat.i.cpi 2031 ± 2% -5.0% 1930 ± 2% perf-stat.i.cycles-between-cache-misses 1.603e+10+4.4% 1.674e+10perf-stat.i.dTLB-loads 7.449e+09+6.1% 7.901e+09perf-stat.i.dTLB-stores 6.52e+10+4.4% 6.807e+10perf-stat.i.instructions 0.31+5.7% 0.33 ± 3% perf-stat.i.ipc 825.05+4.8% 864.24perf-stat.i.metric.K/sec 598.07+4.7% 626.06perf-stat.i.metric.M/sec 12910790+4.3% 13471810perf-stat.i.node-load-misses 7901301 ± 2% +5.7%8348185perf-stat.i.node-loads 21890957 ± 3% +6.9% 23410670 ± 2% perf-stat.i.node-stores 3.38-4.3% 3.23perf-stat.overall.cpi 1964-5.1% 1864 perf-stat.overall.cycles-between-cache-misses 0.30+4.5% 0.31perf-stat.overall.ipc 1.431e+10+4.3% 1.493e+10perf-stat.ps.branch-instructions 57370846+5.0% 60264193perf-stat.ps.branch-misses 1.103e+08+5.3% 1.16e+08perf-stat.ps.cache-misses 2.528e+08+5.1% 2.657e+08perf-stat.ps.cache-references 1.577e+10+4.5% 1.647e+10perf-stat.ps.dTLB-loads 7.33e+09+6.1% 7.776e+09perf-stat.ps.dTLB-stores 6.415e+10+4.4% 6.699e+10perf-stat.ps.instructions 12704753+4.4% 13259951perf-stat.ps.node-load-misses 7778242 ± 2% +5.7%8224062perf-stat.ps.node-loads 21539559 ± 3% +7.0% 23044455 ± 2% perf-stat.ps.node-stores 4.005e+12+5.0% 4.205e+12perf-stat.total.instructions 38.85-0.8 38.07