[pypy-dev] A new build-summary.pypy.org service and benchmark runner benchmarker2

matti picus via pypy-dev Sun, 31 May 2026 23:43:48 -0700

Hi. Here is an update about all the churn you might have been seeing
in the github/pypy org and around speed.pypy.org.


Maciej (and baroquesoftware) have been sponsoring our benchmarker
runner for quite a while now. It runs on a xenial64 chroot, which
provides gcc5.4.1. This matches our buildbot linux infrastructure
which is running in docker images based on manylinux2014, also using
gcc5.4.1. The buildbot master is also running on a baroquesoftware
machine, running buildbot 0.8.8 which is quite old. It uses python2.7
and has a heavily specialized summary view.

Our 6 remaining buildbot workers are also quite old.
- The linux64 machine is one donated by King's college (thanks to them
for that), it has been running for quite a number of years now with
few interruptions.
- The linux32 worker runs on the benchmarker machine with a lock to
prevent it from running when the benchmarks run.
- The macos workers run on a mac-mini with an M1 processor in my
house, both the arm64 and x86_64 run together.
- The windows64 worker runs in a windows 10 VM on my desktop machine.

Maciej and baroquesoftware have become less involved in the project.
That's fair, it's open source and neither generating much income (to
say the least) nor actively getting new features. It is time to update
our software and hardware stack, assuming the project is going to
continue to function.

I came up with a plan to:
- set up a new benchmarking machine (done)
- move buildbot master off baroquesoftware machines (done)
- move the 32-bit buildbot worker onto the new benchmarking machine (TBD)
- try to use the github actions we already have as a replacement for
buildbot testing, which will save updating all the worker machines (in
progress)
- Update all the software stack to manylinux2_28 which uses gcc14 (in progress).

I have set up a benchmarker2: an AMD Ryzen 5 3600 6-Core machine. The
advantage of this zen2 machine, besides being less expensive than zen
5 machines (since it is not the latest and greatest), is that the CPU
has two separate chiplets (CCD). I can isolate 3 cores for
benchmarking, and still run buildbot master and the 32-bit buildbot
worker on it (keeping the existing locking mechanism).

I also set up a new flask-based service
https:://build-summary.pypy.org[0], that replicates the buildbot
summary page pypy developed on top of buildbot. This is needed for two
reasons: to display the github action test results in the way PyPy
developers are used to, and to allow us to move forward past buildbot
0.8.8. Newer versions of buildbot do not allow exposing the twistd
endpoints we used for the customized summary. The new service is live,
as is benchmarker2.

In order to update the dockers, I created a pypy-ci repo[1] with
manylinux2_28-based dockerfiles.

I started a pypy branch[2] to do a full rpython run on the github
workers with the newer os and compiler versions. The branch required
updating rpython itself to run tests with gcc14. The newer compiler is
more picky about function definitions, and the binutils assembly is
slightly different, so some fixes were needed. The windows machine on
github actions is _much_ faster than the vm. The entire run (split
over 5 jobs, and 4 parallel tests running on each job) takes around 30
minutes for each of the 6 machines. You can see the branch summary
results on the new service [4]. The service is still a bit of a
work-in-progress, the comparison-to-main feature may be too noisy.
Note the coding to report the source of the logs: '*' for github
actions and '+' for buildbot [5].

I also hacked at speed.pypy.org to better display the two benchmarking
machines. The comparison page and timeline page now both allow
displaying across more than one environment, so you can see that the
new benchmarker2 machine is slightly faster than the older benchmarker
machine [3]. Results would be more distinct but I disabled speed
"turbo" on benchmarker2 to keep the results more consistent.

Besides updating to get security and other benefits, my real
motivation for all this churn has been to try to improve
cold-interpreter (non-JIT) performance. Carl and I recently worked on
computed gotos and inlining stack overflow checks, but they did not
seem to do much. Moving to gcc 14 does not seem to have changed that
either. Making `lst = [None] * size` faster[6] seemed promising from
the microbechmark, but the full benchmarking comparison shows, as with
the other changes, a bit faster and a bit slower [7]. In spite of all
these "meh" results, I will try to keep pushing on performance, I
still believe we should be able to unlock some enhancements. Any ideas
for things to try are welcome.
Matti

[0] https://build-summary.pypy.org/about
[1] https://github.com/pypy/pypy-ci
[2] https://github.com/pypy/pypy/pull/5488, the results can be seen
[3] 
http://127.0.0.1:8000/comparison/?exe=22:L:py3.11&ben=all&env=3,4&hor=true&bas=22:L:py3.11@4&chart=normal+bars
[4] https://build-summary.pypy.org/summary?branch=win-rpython
[5] 
https://build-summary.pypy.org/summary?revision=3e97da6d7a91&revision=89d59559f278
[6] https://github.com/pypy/pypy/pull/5469
[7] 
https://speed.pypy.org/comparison/?exe=8:L:main,8:L:memset&ben=all&env=4&hor=true&bas=8:L:main&chart=normal+bars
_______________________________________________
pypy-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/pypy-dev.python.org
Member address: [email protected]

[pypy-dev] A new build-summary.pypy.org service and benchmark runner benchmarker2

Reply via email to