https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628

--- Comment #23 from Ken Jin <kenjin4096 at gmail dot com> ---
> Hi Ken, my patch has been merged into GCC master branch.  Can you give it a 
> try?

I did a bench, note that this is not 100% what we use in CPython release
builds, as I had to pass `-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer`
to all my configurations to get the main branch of GCC to not miscompile the
current code.

LTO+PGO enabled for all configurations, disabled PGO only around tail call
bytecode handlers as it regressed performance for those. Intel Turbo boost off.

NO preserve_none:
Pystone(1.1) time for 1000000 passes = 1.98081
This machine benchmarks at 504844 pystones/second

preserve_none:
Pystone(1.1) time for 1000000 passes = 1.7661
This machine benchmarks at 566219 pystones/second

I also took some benchmarks from the pyperformance benchmark suite that are
Python-heavy. Specifically, nbody, spectral_norm, and deltablue.

Mean +- std dev: [NO_preserve_none_nbody] 108 ms +- 2 ms ->
[preserve_none_nbody] 95.3 ms +- 2.0 ms: 1.13x faster
Mean +- std dev: [NO_preserve_none_spectralnorm] 95.7 ms +- 0.4 ms ->
[preserve_none_spectralnorm] 83.8 ms +- 0.3 ms: 1.14x faster
Mean +- std dev: [NO_preserve_none_deltablue] 3.59 ms +- 0.03 ms ->
[preserve_none_deltablue] 3.24 ms +- 0.02 ms: 1.11x faster

So seems like the actual speedup is the ~10% range for preserve_none vs
no_preserve_none.

On my system, labels-as-values (indirect goto) performs roughly same as
preserve_none + tail calls. However, note that PGO is disabled for the tail
call handlers, and CPython has been optimizing for indirect goto style for over
10 years! So the fact the performance matches is actually incredibly good.

Reply via email to