On Wed, 25 Jun 2025 17:22:56 -0700 Mina Almasry wrote:
> What I'm hoping to do is:
> 
> 1. Have nipa run the benchmark always (or at least on patches that
> touch pp code, if that's possible), and always succeed.
> 2. The pp reviewers can always check the contest results to manually
> see if there is a regression. That's still great because it saves us
> the time of cherry-pick series and running the tests ourselves (or
> asking submitters to do that).
> 3. If we notice that the results between runs are stable, then we can
> change the test to actually fail/warn if it detects a regression (if
> fast path is > # of instructions, fail).

That's fine. I don't think putting the data on a graphs would be much
work, and clicking old results out of old runs will be a PITA. Just a
little parsing in the runner to propagate it into JSON. And a fairly
trivial bit of charts.js to fetch the runs and render UI.

> 4. If we notice that the results have too much noise, then we can
> improve the now merged benchmark to somehow make it more consistent.
> 
> FWIW, when I run the benchmark, I get very repeatable results across
> runs, especially when measuring the fast path, but nipa's mileage may
> vary.

100% on board. But someone with Meta credentials needs to add a runner
and babysit it, I have enough CI wrangling as is.

Or we wait a couple of months until we migrate to a more public setup.

Reply via email to