Hi Evan,

I am also really interesting in this topic and have been doing a bunch of
work on automating statistical benchmarks. I don't have a background in
statistics or formal QA but I am learning as I go along :).

The tools I'm building are outside Smalltalk. Our full performance test
suite takes about a week of machine time to run because tests ~15,000 QEMU
VMs with different software versions / configurations / workloads. There is
a CI server that runs all those tests, getting pretty fast turnarounds by
distributing across a cluster of servers and reusing results from
unmodified software branches, and spits out a CSV with one row per test
result (giving the benchmark score and the parameters of the test.)

Then what to do with that ~15,000 line CSV file? Just now I run Rmarkdown
to make a report on the distribution of results and then manually inspect
that to check for interesting differences. I lump all of the different
configurations in together and treat them as one population at the moment.
Here is an example report:
https://hydra.snabb.co/build/1604171/download/2/report.html

It's a bit primitive but it is getting the job done for release
engineering. I'm reasonably confident that new software releases don't
break or slow down in obscure configurations. We are building network
equipment and performance regressions are generally not acceptable.

I'm looking into more clever ways to automatically interpret the results,
e.g. fumbling around at
https://stats.stackexchange.com/questions/288416/non-parametric-test-if-two-samples-are-drawn-from-the-same-distribution
.

Could relate to your ambitions somehow?


On 19 July 2017 at 02:00, Evan Donahue <emdon...@gmail.com> wrote:

> Hi,
>
> I've been doing a lot of performance testing lately, and I've found myself
> wanting to upgrade my methods from ad hoc use of bench and message tally.
> Is there any kind of framework for like, statistically comparing
> improvements in performance benchmarks across different versions of code,
> or anything that generally helps manage the test-tweak-test loop? Just
> curious what's out there before I go writing something. Too many useful
> little libraries to keep track of!
>
> Evan
>

Reply via email to