Hi, I'm working on speed.python.org, CPython benchmark. I reworked the benchmark suite which is now called "performance":
http://pyperformance.readthedocs.io/ performance contains 54 benchmarks and works on Python 2.7 and 3.x. It creates a virtual environment with pinned versions of requirements to "isolate" the benchmark from the system, to get more reproductible results. I added a few benchmarks from the PyPy benchmark suite but I didn't add all of them yet. performance is now based on my perf module. The perf module is a toolkit to run, analyze and compare benchmarks: http://perf.readthedocs.io/ I would like to know how to adapt perf and performance to handle correctly PyPy JIT compiler: I would like to measure the performance when code has been optimized by the JIT compiler and ignore the warmup phase. I already made a few changes in perf and performance when a JIT is detected, but I'm not sure that I did them correctly. My final goal would be to have PyPy benchmark results on speed.python.org, to easily compare CPython and PyPy (using the same benchmark runner, same physical server). The perf module calibrates a benchmark based on time: it computes the number of outer loops to get a timing of at least 100 ms. Basically, a single value is computed as: t0 = perf.perf_counter() for _ in range(loops): func() value = perf.perf_counter() - t0 perf spawn a process only to calibrate the benchmark. On PyPy, it now (in the master branch) spawns a second process only computing warmup samples to validate the calibration. If a value becomes less than 100 ms, it doubles each time the number of loops. The opereation is repeated until the number of loops doesn't change. After the calibration, perf spawns worker processes sequentially: each worker computes warmup samples and then compute values. By default, each worker computes 1 warmup sample and 3 samples on CPython, and 10 warmup samples an 10 samples on PyPy. The configuration for PyPy is kind of arbitrary, wheras it was finely tuned for CPython. At the end, perf ignores all warmup samples and only computes the mean and standard deviations of other values. For example, on CPython 21 processes are spawned: 1 calibration + 20 workers, each worker computes 1 warmup + 3 values: compute the mean of 60 values. perf stores all data in a JSON file: metadata (hostname, CPU speed, system load, etc.), number of loops, warmup samples, samples, etc. It provides an API to access all data. perf also contains a lot of tools to analyze data: statistics (min, max, median/MAD, percentiles, ...), render an histogram, compare results and check if the difference is significant, detect unstable benchmark, etc. perf also contains a documentation explaining how to: run benchmark, analyze benchmarks, get stable/reproductible results, tune your system to run a benchmark, etc. To tune your system for benchmarks, run the "sudo python3 -m perf system tune" command. It configures the CPU (disable Turbo Boost, set a fixed frequency, ...), check that the power cable is plugged, set CPU affinity on IRQs, disable Linux perf events, etc. The command reduces the operating system jitter. Victor _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev