STINNER Victor added the comment: Stefan: "In my experience it is very hard to get stable benchmark results with Python. Even long running benchmarks on an empty machine vary: (...)"
tl; dr We *can* tune the Linux kernel to avoid most of the system noise when running kernels. I modified Stefan's telco.py to remove all I/O from the hot code: the benchmark is now really CPU-bound. I also modified telco.py to run the benchmark 5 times. One run takes around 2.6 seconds. I also added the following lines to check the CPU affinity and the number of context switches: os.system("grep -E -i 'cpu|ctx' /proc/%s/status" % os.getpid()) Well, see attached telco_haypo.py for the full script. I used my system_load.py script to get a system load >= 5.0. Without tasksel, the benchmark result changes completly: at least 5 seconds. Well, it's not really surprising, it's known that benchmarks depend on the system load. *BUT* I have a great kernel called Linux which has cool features called "CPU isolation" and "no HZ" (tickless kernel). On my Fedoera 23, the kernel is compiled with CONFIG_NO_HZ=y and CONFIG_NO_HZ_FULL=y. haypo@smithers$ lscpu --extended CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ 0 0 0 0 0:0:0:0 oui 5900,0000 1600,0000 1 0 0 1 1:1:1:0 oui 5900,0000 1600,0000 2 0 0 2 2:2:2:0 oui 5900,0000 1600,0000 3 0 0 3 3:3:3:0 oui 5900,0000 1600,0000 4 0 0 0 0:0:0:0 oui 5900,0000 1600,0000 5 0 0 1 1:1:1:0 oui 5900,0000 1600,0000 6 0 0 2 2:2:2:0 oui 5900,0000 1600,0000 7 0 0 3 3:3:3:0 oui 5900,0000 1600,0000 My CPU is on a single socket, has 4 physical cores, but Linux gets 8 cores because of hyper threading. I modified the Linux command line during the boot in GRUB to add: isolcpus=2,3,6,7 nohz_full=2,3,6,7. Then I forced the CPU frequency to performance to avoid hiccups: # for id in 2 3 6 7; do echo performance > cpu$id/cpufreq/scaling_governor; done Check the config with: $ cat /sys/devices/system/cpu/isolated 2-3,6-7 $ cat /sys/devices/system/cpu/nohz_full 2-3,6-7 $ cat /sys/devices/system/cpu/cpu[2367]/cpufreq/scaling_governor performance performance performance performance Ok now with this kernel config but still without tasksel on an idle system: ----------------------- Elapsed time: 2.660088424000037 Elapsed time: 2.5927538629999844 Elapsed time: 2.6135682369999813 Elapsed time: 2.5819260570000324 Elapsed time: 2.5991294099999322 Cpus_allowed: 33 Cpus_allowed_list: 0-1,4-5 voluntary_ctxt_switches: 1 nonvoluntary_ctxt_switches: 21 ----------------------- With system load >= 5.0: ----------------------- Elapsed time: 5.3484489170000415 Elapsed time: 5.336797472999933 Elapsed time: 5.187413687999992 Elapsed time: 5.24122020599998 Elapsed time: 5.10201246400004 Cpus_allowed_list: 0-1,4-5 voluntary_ctxt_switches: 1 nonvoluntary_ctxt_switches: 1597 ----------------------- And *NOW* using my isolated CPU physical cores #2 and #3 (Linux CPUs 2, 3, 6 and 7), still on the heavily loaded system: ----------------------- $ taskset -c 2,3,6,7 python3 telco_haypo.py full Elapsed time: 2.579487486000062 Elapsed time: 2.5827961039999536 Elapsed time: 2.5811954810001225 Elapsed time: 2.5782033600000887 Elapsed time: 2.572370636999949 Cpus_allowed: cc Cpus_allowed_list: 2-3,6-7 voluntary_ctxt_switches: 2 nonvoluntary_ctxt_switches: 16 ----------------------- Numbers look *more* stable than the numbers of the first test without taskset on an idle system! You can see that number of context switches is very low (total: 18). Example of a second run: ----------------------- haypo@smithers$ taskset -c 2,3,6,7 python3 telco_haypo.py full Elapsed time: 2.538398498999868 Elapsed time: 2.544711968999991 Elapsed time: 2.5323677339999904 Elapsed time: 2.536252647000083 Elapsed time: 2.525748182999905 Cpus_allowed: cc Cpus_allowed_list: 2-3,6-7 voluntary_ctxt_switches: 2 nonvoluntary_ctxt_switches: 15 ----------------------- Third run: ----------------------- haypo@smithers$ taskset -c 2,3,6,7 python3 telco_haypo.py full Elapsed time: 2.5819172930000605 Elapsed time: 2.5783024259999365 Elapsed time: 2.578493587999901 Elapsed time: 2.5774198510000588 Elapsed time: 2.5772148999999445 Cpus_allowed: cc Cpus_allowed_list: 2-3,6-7 voluntary_ctxt_switches: 2 nonvoluntary_ctxt_switches: 15 ----------------------- Well, it's no perfect, but it looks much stable than timings without specific kernel config nor CPU pinning. Statistics on the 15 timings of the 3 runs with tunning on a heavily loaded system: >>> times [2.579487486000062, 2.5827961039999536, 2.5811954810001225, 2.5782033600000887, 2.572370636999949, 2.538398498999868, 2.544711968999991, 2.5323677339999904, 2.536252647000083, 2.525748182999905, 2.5819172930000605, 2.5783024259999365, 2.578493587999901, 2.5774198510000588, 2.5772148999999445] >>> statistics.mean(times) 2.564325343866661 >>> statistics.pvariance(times) 0.0004340411190965491 >>> statistics.stdev(times) 0.021564880156747315 Compare if to the timings without tunning on an idle system: >>> times [2.660088424000037, 2.5927538629999844, 2.6135682369999813, 2.5819260570000324, 2.5991294099999322] >>> statistics.mean(times) 2.6094931981999934 >>> statistics.pvariance(times) 0.0007448087075422725 >>> statistics.stdev(times) 0.030512470965620608 We get (no tuning, idle system => tuning, busy system): * Population variance: 0.00074 => 0.00043 * Standard deviation: 0.031 => 0.022 It looks *much* better, no? Even I only used *5* timings on the benchmark without tuning, whereas I used 15 timings on the benchmark with tuning. I expect larger variance and deviation with more times. -- Just for fun, I ran the benchmark 3 times (so to get 3x5 timings) on an idle system with tuning: >>> times [2.542378394000025, 2.5541740109999864, 2.5456488329998592, 2.54730951800002, 2.5495472409998, 2.56374302800009, 2.5737907220000125, 2.581463170999996, 2.578222832999927, 2.574441839999963, 2.569389365999996, 2.5792129209999075, 2.5689420860001064, 2.5681367900001533, 2.5563378829999692] >>> import statistics >>> statistics.mean(times) 2.563515909133321 >>> statistics.pvariance(times) 0.00016384530912002678 >>> statistics.stdev(times) 0.013249473404092065 As expected, it's even better (no tune, idle system => tuning, busy system => tuning, idle system): * Population variance: 0.00074 => 0.00043 => 0.00016 * Standard deviation: 0.031 => 0.022 => 0.013 ---------- Added file: http://bugs.python.org/file41802/telco_haypo.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26275> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com