New submission from STINNER Victor: Hi, I'm working on some optimizations projects like FAT Python (PEP 509: issue #26058, PEP 510: issue #26098, and PEP 511: issue #26145) and faster memory allocators (issue #26249).
I have the *feeling* that perf.py output is not reliable even if it takes more than 20 minutes :-/ Maybe because Yury told that I must use -r (--rigorous) :-) Example with 5 runs of "python3 perf.py ../default/python ../default/python.orig -b regex_v8": --------------- Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64 Total CPU cores: 8 ### regex_v8 ### Min: 0.043237 -> 0.050196: 1.16x slower Avg: 0.043714 -> 0.050574: 1.16x slower Significant (t=-19.83) Stddev: 0.00171 -> 0.00174: 1.0178x larger ### regex_v8 ### Min: 0.042774 -> 0.051420: 1.20x slower Avg: 0.043843 -> 0.051874: 1.18x slower Significant (t=-14.46) Stddev: 0.00351 -> 0.00176: 2.0009x smaller ### regex_v8 ### Min: 0.042673 -> 0.048870: 1.15x slower Avg: 0.043726 -> 0.050474: 1.15x slower Significant (t=-8.74) Stddev: 0.00283 -> 0.00467: 1.6513x larger ### regex_v8 ### Min: 0.044029 -> 0.049445: 1.12x slower Avg: 0.044564 -> 0.049971: 1.12x slower Significant (t=-13.97) Stddev: 0.00175 -> 0.00211: 1.2073x larger ### regex_v8 ### Min: 0.042692 -> 0.049084: 1.15x slower Avg: 0.044295 -> 0.050725: 1.15x slower Significant (t=-7.00) Stddev: 0.00421 -> 0.00494: 1.1745x larger --------------- I'm only care of the "Min", IMHO it's the most interesting information here. The slowdown is betwen 12% and 20%, for me it's a big difference. It looks like some benchmarks have very short iterations compare to others. For example, bm_json_v2 takes around 3 seconds, whereas bm_regex_v8 only takes less than 0.050 second (50 ms). $ python3 performance/bm_json_v2.py -n 3 --timer perf_counter 3.310384973010514 3.3116717970115133 3.3077902760123834 $ python3 performance/bm_regex_v8.py -n 3 --timer perf_counter 0.0670697659952566 0.04515827298746444 0.045114840992027894 Do you think that bm_regex_v8 is reliable? I see that there is an "iteration scaling" to use run the benchmarks with more iterations. Maybe we can start to increase the "iteration scaling" for bm_regex_v8? Instead of a fixed number of iterations, we should redesign benchmarks to use time. For example, one iteration must take at least 100 ms and should not take more than 1 second (but take longer to get more reliable results). Then the benchmark is responsible to ajust internal parameters. I used this design for my "benchmark.py" script which is written to get "reliable" microbenchmarks: https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py?fileviewer=file-view-default The script is based on time and calibrate a benchmark. It also uses the *effictive* resolution of the clock used by the benchmark to calibrate the benchmark. I will maybe work on such patch, but it would be good to know first your opinion on such change. I guess that we should use the base python to calibrate the benchmark and then pass the same parameters to the modified python. ---------- components: Benchmarks messages: 259469 nosy: brett.cannon, haypo, pitrou, yselivanov priority: normal severity: normal status: open title: perf.py: bm_regex_v8 doesn't seem reliable even with --rigorous type: performance versions: Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26275> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com