Hi. Here is an update about all the churn you might have been seeing in the github/pypy org and around speed.pypy.org.
Maciej (and baroquesoftware) have been sponsoring our benchmarker runner for quite a while now. It runs on a xenial64 chroot, which provides gcc5.4.1. This matches our buildbot linux infrastructure which is running in docker images based on manylinux2014, also using gcc5.4.1. The buildbot master is also running on a baroquesoftware machine, running buildbot 0.8.8 which is quite old. It uses python2.7 and has a heavily specialized summary view. Our 6 remaining buildbot workers are also quite old. - The linux64 machine is one donated by King's college (thanks to them for that), it has been running for quite a number of years now with few interruptions. - The linux32 worker runs on the benchmarker machine with a lock to prevent it from running when the benchmarks run. - The macos workers run on a mac-mini with an M1 processor in my house, both the arm64 and x86_64 run together. - The windows64 worker runs in a windows 10 VM on my desktop machine. Maciej and baroquesoftware have become less involved in the project. That's fair, it's open source and neither generating much income (to say the least) nor actively getting new features. It is time to update our software and hardware stack, assuming the project is going to continue to function. I came up with a plan to: - set up a new benchmarking machine (done) - move buildbot master off baroquesoftware machines (done) - move the 32-bit buildbot worker onto the new benchmarking machine (TBD) - try to use the github actions we already have as a replacement for buildbot testing, which will save updating all the worker machines (in progress) - Update all the software stack to manylinux2_28 which uses gcc14 (in progress). I have set up a benchmarker2: an AMD Ryzen 5 3600 6-Core machine. The advantage of this zen2 machine, besides being less expensive than zen 5 machines (since it is not the latest and greatest), is that the CPU has two separate chiplets (CCD). I can isolate 3 cores for benchmarking, and still run buildbot master and the 32-bit buildbot worker on it (keeping the existing locking mechanism). I also set up a new flask-based service https:://build-summary.pypy.org[0], that replicates the buildbot summary page pypy developed on top of buildbot. This is needed for two reasons: to display the github action test results in the way PyPy developers are used to, and to allow us to move forward past buildbot 0.8.8. Newer versions of buildbot do not allow exposing the twistd endpoints we used for the customized summary. The new service is live, as is benchmarker2. In order to update the dockers, I created a pypy-ci repo[1] with manylinux2_28-based dockerfiles. I started a pypy branch[2] to do a full rpython run on the github workers with the newer os and compiler versions. The branch required updating rpython itself to run tests with gcc14. The newer compiler is more picky about function definitions, and the binutils assembly is slightly different, so some fixes were needed. The windows machine on github actions is _much_ faster than the vm. The entire run (split over 5 jobs, and 4 parallel tests running on each job) takes around 30 minutes for each of the 6 machines. You can see the branch summary results on the new service [4]. The service is still a bit of a work-in-progress, the comparison-to-main feature may be too noisy. Note the coding to report the source of the logs: '*' for github actions and '+' for buildbot [5]. I also hacked at speed.pypy.org to better display the two benchmarking machines. The comparison page and timeline page now both allow displaying across more than one environment, so you can see that the new benchmarker2 machine is slightly faster than the older benchmarker machine [3]. Results would be more distinct but I disabled speed "turbo" on benchmarker2 to keep the results more consistent. Besides updating to get security and other benefits, my real motivation for all this churn has been to try to improve cold-interpreter (non-JIT) performance. Carl and I recently worked on computed gotos and inlining stack overflow checks, but they did not seem to do much. Moving to gcc 14 does not seem to have changed that either. Making `lst = [None] * size` faster[6] seemed promising from the microbechmark, but the full benchmarking comparison shows, as with the other changes, a bit faster and a bit slower [7]. In spite of all these "meh" results, I will try to keep pushing on performance, I still believe we should be able to unlock some enhancements. Any ideas for things to try are welcome. Matti [0] https://build-summary.pypy.org/about [1] https://github.com/pypy/pypy-ci [2] https://github.com/pypy/pypy/pull/5488, the results can be seen [3] http://127.0.0.1:8000/comparison/?exe=22:L:py3.11&ben=all&env=3,4&hor=true&bas=22:L:py3.11@4&chart=normal+bars [4] https://build-summary.pypy.org/summary?branch=win-rpython [5] https://build-summary.pypy.org/summary?revision=3e97da6d7a91&revision=89d59559f278 [6] https://github.com/pypy/pypy/pull/5469 [7] https://speed.pypy.org/comparison/?exe=8:L:main,8:L:memset&ben=all&env=4&hor=true&bas=8:L:main&chart=normal+bars _______________________________________________ pypy-dev mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/pypy-dev.python.org Member address: [email protected]
