Automatic detection of performance regressions might be nice, but has 
non-trivial implementation complexity. Besides naive or bad benchmarks or 
performance tests sometimes being worse than none at all, many CIs run on 
heavily shared computers. Such sharing makes real time passage a metric with 
very hostile noise distributions.

There are techniques to mitigate this, such as taking the average & sdev of the 
minimum of many runs, but they are both very expensive (e.g., 10*10 runs) and 
ultimately still not entirely reliable on heavily shared systems, especially 
within Cloud virtual machines. For example, every one of the 100 runs may be 
getting 5% of L3 cache of a server CPU or a full 100% at other epochs and 20x 
the L3 can have bigger performance impact than what is to be measured/trapped 
as a regression.

Beyond this any such results are at high risk of being idiosyncratic to 
specific hardware (CPU, DIMMs, NVMe/SSD/Winchester, etc.). To do this right, 
you really need a diverse suite of dedicated computers which may be beyond the 
resources of Nim core.

Reply via email to