Re: Review of Andrei's std.benchmark

David Piepgrass Fri, 21 Sep 2012 12:06:40 -0700

After extensive tests with a variety of aggregate functions, Ican say firmly that taking the minimum time is by far the bestwhen it comes to assessing the speed of a function.

Like others, I must also disagree in princple. The minimum soundslike a useful metric for functions that (1) do the same amount ofwork in every test and (2) are microbenchmarks, i.e. they measurea small and simple task. If the benchmark being measured either(1) varies the amount of work each time (e.g. according to someapproximation of real-world input, which obviously may vary)* or(2) measures a large system, then the average and standarddeviation and even a histogram may be useful (or perhaps someindicator whether the runtimes are consistent with a normaldistribution or not). If the running-time is long then the maxmight be useful (because things like task-switching overheadprobably do not contribute that much to the total).

* I anticipate that you might respond "so, only test a singleinput per benchmark", but if I've got 1000 inputs that I want totry, I really don't want to write 1000 functions nor do I want1000 lines of output from the benchmark. An average, standarddeviation, min and max may be all I need, and if I need moredetail, then I might break it up into 10 groups of 100 inputs. Inany case, the minimum runtime is not the desired output when theinput varies.

It's a little surprising to hear "The purpose of std.benchmark isnot to estimate real-world time. (That is the purpose ofprofiling)"... Firstly, of COURSE I would want to estimatereal-world time with some of my benchmarks. For some benchmarks Ijust want to know which of two or three approaches is faster, orto get a coarse ball-park sense of performance, but for others Ireally want to know the wall-clock time used for realistic inputs.

Secondly, what D profiler actually helps you answer the question"where does the time go in the real-world?"? The D -profileswitch creates an instrumented executable, which in my experience(admittedly not experience with DMD) severely distorts runningtimes. I usually prefer sampling-based profiling, where theexecutable is left unchanged and a sampling program interruptsthe program at random and grabs the call stack, to avoid thedistortion effect of instrumentation. Of course, instrumentationis useful to find out what functions are called the most andwhether call frequencies are in line with expectations, but Iwouldn't trust the time measurements that much.

As far as I know, D doesn't offer a sampling profiler, so onemight indeed use a benchmarking library as a (poor) substitute.So I'd want to be able to set up some benchmarks that operate onrealistic data, with perhaps different data in different runs inorder to learn about how the speed varies with different inputs(if it varies a lot then I might create more benchmarks toinvestigate which inputs are processed quickly, and which slowly.)

Some random comments about std.benchmark based on itsdocumentation:

- It is very strange that the documentation of printBenchmarksuses neither of the words "average" or "minimum", and doesn't sayhow many trials are done.... I suppose the obvious interpretationis that it only does one trial, but then we wouldn't be havingthis discussion about averages and minimums right? Øivind saystests are run 1000 times... but it needs to be configurableper-test (my idea: support a _x1000 suffix in function names, or_for1000ms to run the test for at least 1000 milliseconds; andallow a multiplier when when running a group of benchmarks, e.g.a multiplier argument of 0.5 means to only run half as manytrials as usual.) Also, it is not clear from the documentationwhat the single parameter to each benchmark is (define"iterations count".)

- The "benchmark_relative_" feature looks quite useful. I'm alsohappy to see benchmarkSuspend() and benchmarkResume(), thoughbenchmarkSuspend() seems redundant in most cases: I'd like tojust call one function, say, benchmarkStart() to indicate "setupcomplete, please start measuring time now."

- I'm glad that StopWatch can auto-start; but the documentationshould be clearer: does reset() stop the timer or just reset thetime to zero? does stop() followed by start() start from zero ordoes it keep the time on the clock? I also think there should bea method that returns the value of peek() and restarts the timerat the same time (perhaps stop() and reset() should just returnpeek()?)

- After reading the documentation of comparingBenchmark andmeasureTime, I have almost no idea what they do.

Re: Review of Andrei's std.benchmark

Reply via email to