[issue32589] Statistics as a result from timeit

Steven D'Aprano Sat, 20 Jan 2018 00:07:16 -0800

Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:

On Fri, Jan 19, 2018 at 11:48:46PM +0000, STINNER Victor wrote:


> The timeit module of the stdlib computes 5 values by default... I'm 
> not sure that it's revelant to compute the standard deviation only on 
> 5 values.

I made the same mistake as that: I too thought Matthias was talking 
about calculating the statistics based on timer.repeat(), but after 
looking more closely at the PR, I see that the statistics gathered 
are from the (default) 1000000 runs of each call to timer.timeit().

(Matthias: in future, please don't just dump a PR into the bug tracker 
with barely a word of explanation. We shouldn't have to read the source 
to understand what it does, let alone why it is done. Thank you.)

Given that Mattias is calculating the stats of each call to timeit(), 
I'm not sure whether that makes more or less sense than what I thought.

His patch effectively converts timeit() to calculate the time for one 
run a million times, instead of the total time for one million runs. 
That is one of the traps that Tim Peters warns about in the O'Reilly 
book: timing *one* run may give unrealistic results due to the timer's 
quantization. So I think this will be much less accurate.

(Is this still true now that timeit() uses perf_counter() as the default 
timer, instead of clock() or time()? I don't know.)

> About the PR itself, I dislike providing a fixed list of statistical 
> functions. For example, what if someone needs the geometric mean? What 
> if you want to count outliers? etc.

I'm not sure that the geometric mean would be physically meaningful. But 
your point is well taken. For example, I think the median might be a 
better choice than mean -- but if some user disagrees, that's okay.

> If someone really wants timeit to evolve, I suggest to return all 
> values rather than only the minimum: as repeat() does.

timeit() doesn't return the minimum.

timeit() calculates the total time for `number` (default is 1000000) 
runs, and returns that total; repeat() calls timeit() `repeat` (default 
is 3, not 5) times and returns them as a list.

> By the way, using the minimum is IMHO a bad idea, but I already 
> proposed to change timeit, and my change was rejected because of the 
> backward compatibility.

I am interested in why you think it is a bad idea to use the minimum as 
the best estimate of the actual execution time.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32589>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue32589] Statistics as a result from timeit

Reply via email to