On Wed, 25 Jun 2025 17:22:56 -0700 Mina Almasry wrote: > What I'm hoping to do is: > > 1. Have nipa run the benchmark always (or at least on patches that > touch pp code, if that's possible), and always succeed. > 2. The pp reviewers can always check the contest results to manually > see if there is a regression. That's still great because it saves us > the time of cherry-pick series and running the tests ourselves (or > asking submitters to do that). > 3. If we notice that the results between runs are stable, then we can > change the test to actually fail/warn if it detects a regression (if > fast path is > # of instructions, fail).
That's fine. I don't think putting the data on a graphs would be much work, and clicking old results out of old runs will be a PITA. Just a little parsing in the runner to propagate it into JSON. And a fairly trivial bit of charts.js to fetch the runs and render UI. > 4. If we notice that the results have too much noise, then we can > improve the now merged benchmark to somehow make it more consistent. > > FWIW, when I run the benchmark, I get very repeatable results across > runs, especially when measuring the fast path, but nipa's mileage may > vary. 100% on board. But someone with Meta credentials needs to add a runner and babysit it, I have enough CI wrangling as is. Or we wait a couple of months until we migrate to a more public setup.