Re: [Python-Dev] Stop using timeit, use perf.timeit!

Terry Reedy Fri, 10 Jun 2016 10:00:11 -0700

On 6/10/2016 9:20 AM, Steven D'Aprano wrote:

On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:

Hi,


Last weeks, I made researchs on how to get stable and reliable
benchmarks, especially for the corner case of microbenchmarks. The
first result is a serie of article, here are the first three:


Thank you for this! I am very interested in benchmarking.

https://haypo.github.io/journey-to-stable-benchmark-system.html
https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
https://haypo.github.io/journey-to-stable-benchmark-average.html


I strongly question your statement in the third:

    [quote]
    But how can we compare performances if results are random?
    Take the minimum?

    No! You must never (ever again) use the minimum for
    benchmarking! Compute the average and some statistics like
    the standard deviation:
    [end quote]


While I'm happy to see a real-world use for the statistics module, I
disagree with your logic.

The problem is that random noise can only ever slow the code down, it
cannot speed it up. To put it another way, the random errors in the
timings are always positive.

Suppose you micro-benchmark some code snippet and get a series of
timings. We can model the measured times as:

    measured time t = T + ε

where T is the unknown "true" timing we wish to estimate,

For comparative timings, we do not care about T. So arguments about thebest estimate of T mist the point.

What we do wish to estimate is the relationship between two Ts, T0 for'control', and T1 for 'treatment', in particular T1/T0. I suspectViktor is correct that mean(t1)/mean(t0) is better than min(t1)/min(t0)as an estimate of the true ratio T1/T0 (for a particular machine).

But given that we have matched pairs of measurements with the samehashseed and address, it may be better yet to estimate T1/T0 from theratios t1i/t0i, where i indexes experimental conditions. But it hasbeen a long time since I have read about estimation of ratios. What Iremember is that this is a nasty subject.

It is also the case that while an individual with one machine wants thebest ratio for that machine, we need to make CPython patch decisions forthe universe of machines that run Python.

and ε is some variable error due to noise in the system.

> But ε is always positive,  never negative,

lognormal might be a first guess. But what we really have iscontributions from multiple factors,


--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stop using timeit, use perf.timeit!

Reply via email to