Automatic detection of performance regressions might be nice, but has non-trivial implementation complexity. Besides naive or bad benchmarks or performance tests sometimes being worse than none at all, many CIs run on heavily shared computers. Such sharing makes real time passage a metric with very hostile noise distributions.
There are techniques to mitigate this, such as taking the average & sdev of the minimum of many runs, but they are both very expensive (e.g., 10*10 runs) and ultimately still not entirely reliable on heavily shared systems, especially within Cloud virtual machines. For example, every one of the 100 runs may be getting 5% of L3 cache of a server CPU or a full 100% at other epochs and 20x the L3 can have bigger performance impact than what is to be measured/trapped as a regression. Beyond this any such results are at high risk of being idiosyncratic to specific hardware (CPU, DIMMs, NVMe/SSD/Winchester, etc.). To do this right, you really need a diverse suite of dedicated computers which may be beyond the resources of Nim core.