Heya Neil, I pushed a change to the format of the text logged to the console when you do a ./benchmark-guile. It seems that this affected your benchmarking bot. I was hoping that this would not be the case, because the benchmark suite also writes a log to `guile-benchmark.log', and I tried to avoid changing the format of that file.
Can you take a look at your bot and see if it's possible to switch to use benchmark-guile.log instead of the console output? Other suggestions as to a solution are also most welcome. Thanks! Andy On Mon 23 Apr 2012 11:22, Andy Wingo <wi...@pobox.com> writes: > Hi, > > I was going to try to optimize vhash-assoc, but I wanted a good > benchmark first, so I started to look at our benchmark suite. We have > some issues to deal with. > > For those of you who are not familiar with the benchmark suite, we have > a bunch of benchmarks in benchmark-suite/benchmarks/: those files that > end in ".bm". The format of a .bm file is like our .test files, except > that instead of `pass-if' and the like, we have `benchmark'. You run > benchmarks via ./benchmark-guile in the $top_builddir. > > The benchmarking framework tries to be appropriate for microbenchmarks, > as the `benchmark' form includes a suggested number of iterations. > Ideally when you create a benchmark, you give it a number of iterations > that makes it run approximately as long as the other benchmarks. > > When the benchmarking suite was first made, 10 years ago, there was an > empty "reference" benchmark that was created to run for approximately 1 > second. Currently it runs in 0.012 seconds. This is one problem: the > overall suite has old iteration counts. There is a facility for scaling > the iteration counts of the suite as a whole, but it is unused. > > Another problem is that the actual runtime of the various benchmarks > varies quite a lot, from 3.3 seconds for assoc (srfi-1), to 0.012 for > if.bm. > > Short runtimes magnify imprecisions in measurement. It used to be that > the measurement function was "times", but I just changed that to the > higher-precision get-internal-real-time / get-internal-run-time. Still, > though, there is nothing you can do for a benchmark that runs in a few > milliseconds or less. > > Another big problem is that some effect-free microbenchmarks optimize > away. For example, the computations in arithmetic.bm fold entirely. > The same goes for if.bm. These benchmarks do not measure anything > useful. > > The benchmarking suite attempts to compensate for the overhead of the > test by providing for "core time": the time taken to run a benchmark, > minus the time taken to run an empty benchmark with the same number of > iterations. The benchmark itself is compiled as a thunk, and the > framework calls the thunk repeatedly. In theory this sounds good. In > practice however, for high-iteration microbenchmarks, the overhead of > the thunk call outweighs any micro-benchmark being called. > > For what it's worth, the current overhead of the benchmark appears to be > about 35 microseconds per iteration, on my laptop. If we inline the > iteration into the benchmark itself, rather than calling a thunk > repeatedly, we can bring that down to around 13 microseconds. However > it's probably best to leave it as it is, because if we inline the loop, > it's liable to be optimized out. > > So, those are the problems: benchmarks running for inappropriate, > inconsistent durations; inappropriate benchmarks; and benchmarks being > optimized out. > > My proposal is to rebase the iteration count in 0-reference.bm to run > for 0.5s on some modern machine, and adjust all benchmarks to match, > removing those benchmarks that do not measure anything useful. Finally > we should perhaps enable automatic scaling of the iteration count. What > do folks think about that? > > On the positive side, all of our benchmarks are very clear that they are > a time per number of iterations, and so this change should not affect > users that measure time per iteration. > > Regards, > > Andy -- http://wingolog.org/