After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.

Like others, I must also disagree in princple. The minimum sounds like a useful metric for functions that (1) do the same amount of work in every test and (2) are microbenchmarks, i.e. they measure a small and simple task. If the benchmark being measured either (1) varies the amount of work each time (e.g. according to some approximation of real-world input, which obviously may vary)* or (2) measures a large system, then the average and standard deviation and even a histogram may be useful (or perhaps some indicator whether the runtimes are consistent with a normal distribution or not). If the running-time is long then the max might be useful (because things like task-switching overhead probably do not contribute that much to the total).

* I anticipate that you might respond "so, only test a single input per benchmark", but if I've got 1000 inputs that I want to try, I really don't want to write 1000 functions nor do I want 1000 lines of output from the benchmark. An average, standard deviation, min and max may be all I need, and if I need more detail, then I might break it up into 10 groups of 100 inputs. In any case, the minimum runtime is not the desired output when the input varies.

It's a little surprising to hear "The purpose of std.benchmark is not to estimate real-world time. (That is the purpose of profiling)"... Firstly, of COURSE I would want to estimate real-world time with some of my benchmarks. For some benchmarks I just want to know which of two or three approaches is faster, or to get a coarse ball-park sense of performance, but for others I really want to know the wall-clock time used for realistic inputs.

Secondly, what D profiler actually helps you answer the question "where does the time go in the real-world?"? The D -profile switch creates an instrumented executable, which in my experience (admittedly not experience with DMD) severely distorts running times. I usually prefer sampling-based profiling, where the executable is left unchanged and a sampling program interrupts the program at random and grabs the call stack, to avoid the distortion effect of instrumentation. Of course, instrumentation is useful to find out what functions are called the most and whether call frequencies are in line with expectations, but I wouldn't trust the time measurements that much.

As far as I know, D doesn't offer a sampling profiler, so one might indeed use a benchmarking library as a (poor) substitute. So I'd want to be able to set up some benchmarks that operate on realistic data, with perhaps different data in different runs in order to learn about how the speed varies with different inputs (if it varies a lot then I might create more benchmarks to investigate which inputs are processed quickly, and which slowly.)

Some random comments about std.benchmark based on its documentation:

- It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done.... I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right? Øivind says tests are run 1000 times... but it needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.) Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".)

- The "benchmark_relative_" feature looks quite useful. I'm also happy to see benchmarkSuspend() and benchmarkResume(), though benchmarkSuspend() seems redundant in most cases: I'd like to just call one function, say, benchmarkStart() to indicate "setup complete, please start measuring time now."

- I'm glad that StopWatch can auto-start; but the documentation should be clearer: does reset() stop the timer or just reset the time to zero? does stop() followed by start() start from zero or does it keep the time on the clock? I also think there should be a method that returns the value of peek() and restarts the timer at the same time (perhaps stop() and reset() should just return peek()?)

- After reading the documentation of comparingBenchmark and measureTime, I have almost no idea what they do.

Reply via email to