Hi, Last weeks, I made researchs on how to get stable and reliable benchmarks, especially for the corner case of microbenchmarks. The first result is a serie of article, here are the first three:
https://haypo.github.io/journey-to-stable-benchmark-system.html https://haypo.github.io/journey-to-stable-benchmark-deadcode.html https://haypo.github.io/journey-to-stable-benchmark-average.html The second result is a new perf module which includes all "tricks" discovered in my research: compute average and standard deviation, spawn multiple worker child processes, automatically calibrate the number of outter-loop iterations, automatically pin worker processes to isolated CPUs, and more. The perf module allows to store benchmark results as JSON to analyze them in depth later. It helps to configure correctly a benchmark and check manually if it is reliable or not. The perf documentation also explains how to get stable and reliable benchmarks (ex: how to tune Linux to isolate CPUs). perf has 3 builtin CLI commands: * python -m perf: show and compare JSON results * python -m perf.timeit: new better and more reliable implementation of timeit * python -m metadata: display collected metadata Python 3 is recommended to get time.perf_counter(), use the new accurate statistics module, automatic CPU pinning (I will implement it on Python 2 later), etc. But Python 2.7 is also supported, fallbacks are implemented when needed. Example with the patched telco benchmark (benchmark for the decimal module) on a Linux with two isolated CPUs. First run the benchmark: --- $ python3 telco.py --json-file=telco.json ......................... Average: 26.7 ms +- 0.2 ms --- Then show the JSON content to see all details: --- $ python3 -m perf -v show telco.json Metadata: - aslr: enabled - cpu_affinity: 2, 3 - cpu_count: 4 - cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz - hostname: smithers - loops: 10 - platform: Linux-4.4.9-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three - python_executable: /usr/bin/python3 - python_implementation: cpython - python_version: 3.4.3 Run 1/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.8 ms, 26.7 ms Run 2/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms Run 3/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.9 ms, 26.8 ms (...) Run 25/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms Average: 26.7 ms +- 0.2 ms (25 runs x 3 samples; 1 warmup) --- Note: benchmarks can be analyzed with Python 2. I'm posting my email to python-dev because providing timeit results is commonly requested in review of optimization patches. The next step is to patch the CPython benchmark suite to use the perf module. I already forked the repository and started to patch some benchmarks. If you are interested by Python performance in general, please join us on the speed mailing list! https://mail.python.org/mailman/listinfo/speed Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com