Re: [linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement

2023-08-15 Thread Mateusz Guzik
On 8/15/23, Linus Torvalds  wrote:
> On Tue, 15 Aug 2023 at 07:12, kernel test robot 
> wrote:
>>
>> kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec
>> on:
>
> WTF? That's ridiculous. Why would that even test new_inode() at all?
> And why would it make any difference anyway to prefetch a new inode?
> The 'zero' test claims to just read /dev/zero in a loop...
>
> [ Goes looking ]
>

Ye man, I was puzzled myself but just figured it out and was about to respond ;)

# bpftrace -e 'kprobe:new_inode { @[kstack()] = count(); }'
Attaching 1 probe...

@[
new_inode+1
shmem_get_inode+137
__shmem_file_setup+195
shmem_zero_setup+46
mmap_region+1937
do_mmap+956
vm_mmap_pgoff+224
do_syscall_64+46
entry_SYSCALL_64_after_hwframe+115
]: 2689570

the bench is doing this *A LOT* and this looks so fishy, for the bench
itself and the kernel doing it, but I'm not going to dig into any of
that.

>>  39.35-0.3   39.09
>> perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
>
> Ahh. It also does the mmap side, and the shared case ends up always
> creating a new inode.
>
> And while the test only tests *reading* and the mmap is read-only, the
> /dev/zero file descriptor was opened for writing too, for a different
> part of a test.
>
> So even though the mapping is never written to, MAYWRITE is set, and
> so the /dev/zero mapping is done as a shared memory mapping and we
> can't do it as just a private one.
>
> That's kind of stupid and looks unintentional, but whatever.
>
> End result: that benchmark ends up being at least partly (and a fairly
> noticeable part) a shmem setup benchmark, for no actual good reason.
>
> Oh well. I certainly don't mind the removal apparently then also
> helping some odd benchmark case, but I don't think this translates to
> anything real. Very random.
>
> Linus
>


-- 
Mateusz Guzik 


Re: [linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement

2023-08-15 Thread Linus Torvalds
On Tue, 15 Aug 2023 at 07:12, kernel test robot  wrote:
>
> kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:

WTF? That's ridiculous. Why would that even test new_inode() at all?
And why would it make any difference anyway to prefetch a new inode?
The 'zero' test claims to just read /dev/zero in a loop...

[ Goes looking ]

>  39.35-0.3   39.09
> perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup

Ahh. It also does the mmap side, and the shared case ends up always
creating a new inode.

And while the test only tests *reading* and the mmap is read-only, the
/dev/zero file descriptor was opened for writing too, for a different
part of a test.

So even though the mapping is never written to, MAYWRITE is set, and
so the /dev/zero mapping is done as a shared memory mapping and we
can't do it as just a private one.

That's kind of stupid and looks unintentional, but whatever.

End result: that benchmark ends up being at least partly (and a fairly
noticeable part) a shmem setup benchmark, for no actual good reason.

Oh well. I certainly don't mind the removal apparently then also
helping some odd benchmark case, but I don't think this translates to
anything real. Very random.

Linus


[linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement

2023-08-15 Thread kernel test robot



Hello,

kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:


commit: c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 ("locking: remove 
spin_lock_prefetch")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz 
(Ice Lake) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
class: memory
test: zero
cpufreq_governor: performance






Details are as below:
-->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230815/202308151426.97be5bd8-oliver.s...@intel.com

=
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  
memory/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/zero/stress-ng/60s

commit: 
  3feecb1b84 ("Merge tag 'char-misc-6.5-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
  c8afaa1b0f ("locking: remove spin_lock_prefetch")

3feecb1b848359b1 c8afaa1b0f8bc93d013ab2ea6b9 
 --- 
 %stddev %change %stddev
 \  |\  
 20.98 ±  8% +12.7%  23.65 ±  4%  
perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
 21.05 ±  8%+803.4% 190.14 ±196%  perf-sched.total_sch_delay.max.ms
 46437+2.4%  47564
stress-ng.time.involuntary_context_switches
  87942414+6.3%   93441484stress-ng.time.minor_page_faults
  21983137+6.3%   23357886stress-ng.zero.ops
366380+6.3% 389295stress-ng.zero.ops_per_sec
100683+4.1% 104861 ±  2%  proc-vmstat.nr_shmem
  60215587+6.2%   63957836proc-vmstat.numa_hit
  60148996+6.2%   63889951proc-vmstat.numa_local
  22046746+6.2%   23421583proc-vmstat.pgactivate
  83092777+6.3%   88309102proc-vmstat.pgalloc_normal
  88854159+6.1%   94276960proc-vmstat.pgfault
  82294936+6.3%   87489838proc-vmstat.pgfree
  21970411+6.3%   23344438proc-vmstat.unevictable_pgs_culled
  21970116+6.3%   23344165
proc-vmstat.unevictable_pgs_mlocked
  21970115+6.3%   23344164
proc-vmstat.unevictable_pgs_munlocked
  21970113+6.3%   23344161
proc-vmstat.unevictable_pgs_rescued
 1.455e+10+4.2%  1.517e+10perf-stat.i.branch-instructions
  58358654+5.0%   61304729perf-stat.i.branch-misses
  1.12e+08+5.2%  1.179e+08perf-stat.i.cache-misses
 2.569e+08+5.1%  2.698e+08perf-stat.i.cache-references
  3.32-4.4%   3.17perf-stat.i.cpi
  2031 ±  2%  -5.0%   1930 ±  2%  
perf-stat.i.cycles-between-cache-misses
 1.603e+10+4.4%  1.674e+10perf-stat.i.dTLB-loads
 7.449e+09+6.1%  7.901e+09perf-stat.i.dTLB-stores
  6.52e+10+4.4%  6.807e+10perf-stat.i.instructions
  0.31+5.7%   0.33 ±  3%  perf-stat.i.ipc
825.05+4.8% 864.24perf-stat.i.metric.K/sec
598.07+4.7% 626.06perf-stat.i.metric.M/sec
  12910790+4.3%   13471810perf-stat.i.node-load-misses
   7901301 ±  2%  +5.7%8348185perf-stat.i.node-loads
  21890957 ±  3%  +6.9%   23410670 ±  2%  perf-stat.i.node-stores
  3.38-4.3%   3.23perf-stat.overall.cpi
  1964-5.1%   1864
perf-stat.overall.cycles-between-cache-misses
  0.30+4.5%   0.31perf-stat.overall.ipc
 1.431e+10+4.3%  1.493e+10perf-stat.ps.branch-instructions
  57370846+5.0%   60264193perf-stat.ps.branch-misses
 1.103e+08+5.3%   1.16e+08perf-stat.ps.cache-misses
 2.528e+08+5.1%  2.657e+08perf-stat.ps.cache-references
 1.577e+10+4.5%  1.647e+10perf-stat.ps.dTLB-loads
  7.33e+09+6.1%  7.776e+09perf-stat.ps.dTLB-stores
 6.415e+10+4.4%  6.699e+10perf-stat.ps.instructions
  12704753+4.4%   13259951perf-stat.ps.node-load-misses
   7778242 ±  2%  +5.7%8224062perf-stat.ps.node-loads
  21539559 ±  3%  +7.0%   23044455 ±  2%  perf-stat.ps.node-stores
 4.005e+12+5.0%  4.205e+12perf-stat.total.instructions
 38.85-0.8   38.07