Hi Peter, On 2020/12/15 0:48, Peter Zijlstra wrote: > Hai, here them patches Mel asked for. They've not (yet) been through the > robots, so there might be some build fail for configs I've not used. > > Benchmark time :-) >
Here is the data on my side, benchmarks were tested on a x86 4 sockets system with 24 cores per socket and 2 hyperthreads per core, total 192 CPUs. uperf throughput: netperf workload, tcp_nodelay, r/w size = 90 threads baseline-avg %std patch-avg %std 96 1 0.78 1.0072 1.09 144 1 0.58 1.0204 0.83 192 1 0.66 1.0151 0.52 240 1 2.08 0.8990 0.75 hackbench: process mode, 25600 loops, 40 file descriptors per group group baseline-avg %std patch-avg %std 2(80) 1 10.02 1.0339 9.94 3(120) 1 6.69 1.0049 6.92 4(160) 1 6.76 0.8663 8.74 5(200) 1 2.96 0.9651 4.28 schbench: 99th percentile latency, 16 workers per message thread mthread baseline-avg %std patch-avg %std 6(96) 1 0.88 1.0055 0.81 9(144) 1 0.59 1.0007 0.37 12(192) 1 0.61 0.9973 0.82 15(240) 1 25.05 0.9251 18.36 sysbench mysql throughput: read/write, table size = 10,000,000 thread baseline-avg %std patch-avg %std 96 1 6.62 0.9668 4.04 144 1 9.29 0.9579 6.53 192 1 9.52 0.9503 5.35 240 1 8.55 0.9657 3.34 It looks like - hackbench has a significant improvement of 4 groups - uperf has a significant regression of 240 threads Please let me know if you have any interested cases I can run/rerun. Thanks, -Aubrey