[issue44037] Broad performance regression from 3.10a7 to 3.10b2 with python.org macOS binaries

2021-08-03 Thread Ned Deily


Ned Deily  added the comment:

Summary: With the 3.10.0rc1 release, this performance regression in the 
var_access_benchmark starting with the 3.10.0b1 binaries in the python.org 
macOS universal2 installer is now resolved. With the rc1 release, the 
performance of this micro benchmark has actually improved over alpha7 in many 
cases, most notably on Intel Macs.  We have also taken some steps to reduce the 
chances of significant performance regressions in future releases of the macOS 
binaries going undetected prior to release.

Details: All var_access_benchmark results (Appendix A and B) are from running 
macOS Big Sur 11.5.1 (the current release at the moment). The rc1 binaries were 
also built on 11.5.1 using the current Apple (Xcode) Command Line Tools 12.5.1. 
In general, we build the python.org macOS installers on the most current macOS 
and Command Line Tools that have been released by Apple prior to that Python 
release. The a7 and b2 released universal2 binaries were thus made on then 
current versions of macOS Big Sur and the Command Line Tools, each different 
from rc1.

To put these results in context, let me once again note that the primary goal 
of the python.org macOS installers for many years has been to provide a 
convenient way to run Python on macOS on as many different Mac systems as 
possible with one download. For 3.10, the universal2 installer variant we 
provide is designed to run natively on all Macs that can run any version of 
macOS from 10.9 through the current macOS 11 Big Sur (and soon to include macOS 
12 Monterey), both Intel and Apple Silicon (M1) Macs.  To be able to run on 
such a wide variety of systems obviously requires some compromises. Thus 
providing optimum performance in every situation has *never* been a goal for 
these installers.  That doesn't mean we should totally ignore performance 
issues and I am grateful to Raymond for bringing this issue forward.  But, and 
not to belabor the point: for those situations where optimum performance is 
important, there is no substitute to using a Python built and optimized 
explicitly for th
 at environment; in other words, don't look to the python.org binaries for 
those cases.

As an example, with 3.10.0b1, we introduced the first python.org macOS builds 
that use newer compile- and link-time optimizations (--enable-optimizations and 
--with-lto). There were some kinks with that that have been subsequently ironed 
out. But the performance improvements aren't uniform across all systems. It 
appears that Intel Macs see much more of an improvement than Apple Silicon Macs 
do. There are probably a couple of reasons for that: for one, the longer 
experience with the tool chain for Intel archs, but, perhaps more importantly, 
we currently build universal2 binaries on Intel-based Macs which means 
performance-based optimizations by the tool chain are based on the performance 
on an Intel arch which may not be the same as performance on an Apple Silicon 
(arm64) CPU. That's a topic for future investigation but it is an example of 
the potential pitfalls when looking at performance.

Another example is that while there are some significant differences in the 
var_access_benchmark results, which targets specific micro-operations in the 
Python virtual machine, there is a different story looking at the larger-scale 
"realworld" benchmarks in the pyperformance package.  When I first started 
looking at this issue, I ran pyperformance and var_access_benchmark and found 
that, in general, there were *significant* performance improvements in most 
pyperformance benchmarks between 3.10.0a7 and 3.10.0b2 even though the 
var_access_benchmark showed performance regressions.  For 3.10.0rc1, the 
pyperformance results have mostly improved even more.  I have with some 
trepidation included some pyperformance results in Appendix C (3.10.0a7 vs 
3.10.0b2) and Appendix D (3.10.0a7 vs 3.10.0rc1).  Note that these results were 
run on a different, faster Intel Mac than the var_access_benchmark results and 
were run under different updates of macOS 11 so they should be viewed 
cautiously; as al
 ways with performance issues, your mileage may vary.

So by now, one might be curious as to how the performance regression was fixed 
in rc1. The answer at the moment is: I'm not totally sure! There are a *lot* of 
moving parts in making a Python release and the binaries that we provide for 
macOS. While I do try to control the build environments as much as possible 
(for example, by using dedicated virtual machines for builds) and be 
conservative about making changes to the build process and environments, 
especially later in a development/release cycle, as noted above I normally try 
to keep up with the most recent Apple updates to a given macOS version and 
build tools to ensure everyone is getting the benefit of the latest security 
and performance fixes. There have been Apple updates along the way between a7, 
b2, and rc1. So I can't rule those out 

[issue44037] Broad performance regression from 3.10a7 to 3.10b2 with python.org macOS binaries

2021-07-26 Thread Ondrej Novak


Change by Ondrej Novak :


--
nosy: +andrenvk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44037] Broad performance regression from 3.10a7 to 3.10b2 with python.org macOS binaries

2021-06-15 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

The problem is still present in Python 3.10b2.

--
title: Broad performance regression from 3.10a7 to 3.10b1 with python.org macOS 
binaries -> Broad performance regression from 3.10a7 to 3.10b2 with python.org 
macOS binaries

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com