[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Guido van Rossum


Change by Guido van Rossum :


--
type: crash -> performance

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Guido van Rossum


Guido van Rossum  added the comment:

Someone whose name I don't recognize (MagzoB Mall) just changed the issue type 
to "crash" without explaining why. That user was created today, and has no 
triage permissions. Mind if I change it back? It feels like vandalism.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread MagzoB Mall


Change by MagzoB Mall :


--
type: performance -> crash

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Brandt Bucher


Change by Brandt Bucher :


--
nosy: +brandtbucher -brandtbucher2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Brandt Bucher


Change by Brandt Bucher :


--
nosy: +brandtbucher2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread neonene


neonene  added the comment:

3.10.0 official binary is as slow as rc2.

Many files are not updated in the source archive or 
b494f5935c92951e75597bfe1c8b1f3112fec270, so I'm not sure if the delay is 
intentional or not.

We have no choice except waiting for 3.10.1.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-09 Thread neonene


neonene  added the comment:

PR28475 is not in the official source archive.
https://www.python.org/ftp/python/3.10.0/Python-3.10.0.tar.xz

I'll check later whether official binary has the fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-08 Thread Ma Lin


Ma Lin  added the comment:

Today I tested with msvc2022-preview, `__forceinline` attribute will not hang 
the build.

64-bit PGO builds:

28d28e0~1,vc2022   : baseline
28d28e0~1+F,vc2022 : 1.02x slower  <1>
28d28e0,vc2022 : 1.03x slower  <2>
28d28e0+F,vc2022   : 1.03x slower
3.10 final,vc2022  : 1.03x slower
3.10 final+F,vc2022: 1.03x slower
28d28e0~1,vc2019   : 1.00x slower  <3>

28d28e0~1 is the last fast commit, 28d28e0 is the first slow commit.
`+F` means add `__forceinline` attribute to all inline functions in object.h
vc2019 and vc2022 are the latest version.

<1> Forcing inline is slower.
<2> 28d28e0 is still slow, but not that much.
<3> Normally, msvc2019 and msvc2022 have the same performance.

Is it possible to write a PGO profile for 28d28e0? 
https://github.com/python/cpython/commit/28d28e053db6b69d91c2dfd579207cd8ccbc39e7

msvc2022 will be released in November this year, and maybe subsequent versions 
can be built with msvc2022.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread Steve Dower


Steve Dower  added the comment:

If we know which parts of the function are critical, perhaps we should 
be designing a PGO profile that actually hits them all? The current 
profile is very arbitrary, basically just waiting for someone motivated 
enough to figure out a better one.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread Steve Dower


Steve Dower  added the comment:

I would very much appreciate any new compiler be compatible with the 
standard Windows debuggers (windbg primarily, but I imagine most 
contributors would like it to keep working from VS).

Last I heard, clang is fine as a compiler for debugging if you use the 
MSVC linker to generate debug info, though it still isn't as complete as 
MSVC (ultimately by definition, since MSVC is the 
standard-by-implementation for this stuff). And I've got no idea 
how/whether link-time optimisation works when you mix tools, but I'd 
have to assume it doesn't.

Switching compiler may prevent me from being able to analyse crash 
reports (and by me, I mean the automated internal tools that do it for 
me), and certainly parts of the Windows build rely on MSVC-specific 
functionality right now (not in the main DLL) so we'd end up needing 
both for a full build.

Also, just to put it out there, I'm not volunteering to rewrite the 
build system :) If the steering council signs off on switching, I won't 
block it, but I have more interesting things to work on.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread STINNER Victor


STINNER Victor  added the comment:

> How feasible would it be to use Clang or GCC on Windows?

clang seems to have a good Windows support and tries to the ABI compatible with 
MSC which is a must have to keep wheel package support (especially for the 
stable ABI, used by PyQt on Windows for example).

Moreover, there are ways to cross-build Python from another platform to Windows 
which can be convenient ;-)

I don't know the Windows ecosystem. Do people want to get VS debugger for 
example? Is clang compatible with the VS debugger?

See the discussion of 2014: "Status of C compilers for Python on Windows"
https://mail.python.org/archives/list/python-...@python.org/thread/SYWDJ23AQDPWQN7HD6M6YCSGXERCHWA2/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread Mark Shannon


Mark Shannon  added the comment:

Sadly the MSVC team are claiming that this isn't a bug in their compiler.
Not sure how we convince them that it is. The website rejects any attempt to 
reopen the issue.

How feasible would it be to use Clang or GCC on Windows?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-30 Thread Ken Jin


Ken Jin  added the comment:

@Pablo
> I disagree. This is a regression/bug and we don't advertise "known bugs" in 
> the what's new, the same for any other bugfix that has been delayed until 
> 3.10.1

Alright, in hindsight 3.10 What's New was a bad suggestion on my part. I wonder 
if there's a better location for such news though.

>>  Some use cases are slower (by >10%!)
> Can you still reproduce this with PR 28475?

Yes that number is *with* PR28475. Without that PR it was worse. The second 
pyperformance comparison in this file is 3.10a7 vs PR28475 
https://bugs.python.org/file50293/PR28475_vs_310rc2_vs_310a7.txt. Omitting 
python_startup (unstable on Windows) and unpack_sequence (microbenchmark):

- logging_silent: 250 ns +- 7 ns -> 291 ns +- 10 ns: 1.16x slower
- hexiom: 14.0 ms +- 0.3 ms -> 15.7 ms +- 3.0 ms: 1.12x slower
- logging_simple: 16.1 us +- 0.2 us -> 18.0 us +- 0.5 us: 1.12x slower
- nbody: 215 ms +- 7 ms -> 235 ms +- 4 ms: 1.09x slower
- logging_format: 17.8 us +- 0.3 us -> 19.4 us +- 0.5 us: 1.09x slower
- richards: 104 ms +- 6 ms -> 112 ms +- 3 ms: 1.08x slower
- xml_etree_parse: 218 ms +- 3 ms -> 235 ms +- 3 ms: 1.08x slower
- sqlalchemy_imperative: 34.5 ms +- 0.9 ms -> 37.1 ms +- 1.1 ms: 1.08x slower
- xml_etree_iterparse: 158 ms +- 2 ms -> 168 ms +- 2 ms: 1.06x slower
- pathlib: 255 ms +- 6 ms -> 271 ms +- 3 ms: 1.06x slower
- pyflate: 963 ms +- 10 ms -> 1.02 sec +- 0.02 sec: 1.06x slower
- unpickle_pure_python: 446 us +- 11 us -> 471 us +- 9 us: 1.06x slower
 anything <= 1.05x slower is snipped since it could be noise -

At this point I don't know if we have any quick fixes left. So maybe we should 
open another issue for 3.11 and consider factoring out uncommon opcodes into 
functions like Victor and Mark suggested. We could make use of the opcode stats 
the faster-cpython folks have collected https://github.com/faster-cpython/tools.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

_PyEval_EvalFrameDefault() may also need to be divided.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ma Lin


Ma Lin  added the comment:

I think this is a bug of MSVC2019, not a really regression of CPython. So 
changing the code of CPython is just a workaround, maybe the right direction is 
to prompt MSVC to fix the bug, otherwise there will be more trouble when 3.11 
is released a year later.

Seeing MSVC's reply, it seems they didn't realize that it was a bug, but 
suggested to adjust the training samples and use `__forceinline`. They don't 
know `__forceinline` hangs the build process since 28d28e0.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

@pablogsal
I'm OK with more effective fixes in 3.10.1 and later.

Thanks all, thanks kj and malin for many help.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

I submitted 2 drafts in a hurry. Sorry for short explanations.
I'll add more reports.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


Change by neonene :


--
pull_requests: +27001
pull_request: https://github.com/python/cpython/pull/28631

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


Change by neonene :


--
pull_requests: +27000
pull_request: https://github.com/python/cpython/pull/28630

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> IMO, we should note in What's New that only for Windows, 3.10.0 has a slight 
> slowdown.

I disagree. This is a regression/bug and we don't advertise "known bugs" in the 
what's new, the same for any other bugfix that has been delayed until 3.10.1

>  Some use cases are slower (by >10%!)

Can you still reproduce this with PR 28475?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ken Jin


Ken Jin  added the comment:

Sadly, I can't reproduce the speedups OP reported from disabling 
test_patma.TestTracing. It's not any faster than what we have with PR28475. 
(See attached pyperformance).

I'm looking forward to their other fix :). Even if it comes in 3.10.1 that's 
still a huge win. I don't think everyone immediately upgrades when a new Python 
version arrives.

IMO, we should note in What's New that only for Windows, 3.10.0 has a slight 
slowdown. Some use cases are slower (by >10%!), while some people won't feel a 
thing. (Then again, maybe this is offset by the LOAD_ATTR opcache in 3.10 and 
we get a net zero effect?). I'll submit a PR soon if the full fix misses 3.10.0.

--
Added file: https://bugs.python.org/file50315/310rc2patched_vs_310rc2notrace.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> I have another fix.

If you have another fix, please create a PR ASAN and get it reviewed and merged 
by a core dev in the next 24 hours, otherwise it will need to wait until 3.10.1

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

I have another fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ken Jin


Ken Jin  added the comment:

> If someone wants this issue to be solved in 3.10.0 it must be resolved ASAP.

neonene suggested that the tracing tests for pattern matching (added in 
3.10b4/rc1) caused PGO to wrongly optimize the more uncommon tracing paths in 
ceval. I will verify their one-line fix to disable that test for PGO and report 
back within the next 48 hours.

Related issue: https://bugs.python.org/issue44600

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

This means that if anyone wants to pursue the 4% that is left the fix must be 
committed within 24 hours.

--
priority: release blocker -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

I'm landing PR 28475 for now as it improves the situation from 7%  to 4% 
slowdown and is contained enough.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

If someone wants this issue to be solved in 3.10.0 it must be resolved ASAP. I 
am going to start freezing the release branch in one day or two to start 
testing the final candidate as much as possible so this issue has 24h at max to 
be merged into 3.10 and cherry picked into the release branch

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-23 Thread neonene


neonene  added the comment:

3.10rc2 Python/ceval.c
1306: #define DISPATCH() \
1307: { \
1308: if (trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE) { \
1309: goto tracing_dispatch; \

Among the 44 pgo-tests, only test_patma.TestTracing hits the condition above. 
On Windows, it seems that skipping it tightens the profile of PR28475 a bit. 
Additional tests such as test_threading(.ThreadTests.test_frame_tstate_tracing) 
might also cause some amount of variation or vice versa.

3.10rc2 x64 PGO: 1.00
+ PR28475 
  with TestTracing : 1.05x faster (slow  3, fast 46, same  9)
  without  : 1.06x faster (slow  5, fast 52, same  1)

  with TestTracing : 1.00
  without  : 1.01x faster (slow 19, fast 27, same 12)

(Details: PR28475_skip1test_bench.txt)


Does test_patma.TestTracing need training for match-case performance?

--
Added file: https://bugs.python.org/file50296/PR28475_skip1test_bench.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene


neonene  added the comment:

PR28475 PGO is 2% slower than the patch I pasted on msg401743.
The function sizes are almost the same (+1:goto,+1:label), and there is no 
performance gap between release builds.

I suspect the following.

1. PGO is too sensitive to a function size at near the limit.
2. PR28475 is not fully covered by 44 tests. (msg401346)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Mark Shannon


Mark Shannon  added the comment:

The only other change of any obvious significance to _PyEval_EvalFrameDefault 
since 3.10a7 are the changes to MATCH_MAPPING and MATCH_SEQUENCE and those make 
_PyEval_EvalFrameDefault smaller.

We may need to look elsewhere for the remaining ~4% performance.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene


neonene  added the comment:

To be fair, the slowdowns between PR25244 and b1 seems to be an accumulation of 
"1.00x slower" of every commit. I don't know after b1.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Ken Jin


Ken Jin  added the comment:

Like what Ma Lin and neonene have mentioned above, PR28475 recovered half of 
the lost performance. It's unfortunately still 4% slower than 3.10a7.

>pyperf compare_to 310a7.json 310rc2.json 310rc2patched.json

Geometric mean (versus 3.10a7)
==

310rc2: 1.09x slower
310rc2patched: 1.04x slower

Attached pyperf benchmark comparisons file.

--
Added file: https://bugs.python.org/file50293/PR28475_vs_310rc2_vs_310a7.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Ma Lin


Ma Lin  added the comment:

PR28475:
64-bit build is 1.03x slower than 28d28e0~1
32-bit build is 1.04x slower than 28d28e0~1

28d28e0~1 is the last good commit.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene


neonene  added the comment:

I built 3.10rc2 PGO with PR28475 applied, and posted the inliner's log.
In the log, the 4-callees mentioned above are now inlined, which were "hard 
reject"ed before.

As for the performance, a few reporters may be needed, but it's not necessary 
for them to care about noises in the apparent gap.

310rc2 x64 PGO   : 1.00
 + PR28475 build1 bench1 : 1.05x faster (slower  7, faster 43, nochange  8)
  bench2 : 1.05x faster (slower  2, faster 43, nochange 13)
   build2: 1.05x faster (slower  4, faster 45, nochange  9)

310rc2 x64 release   : 1.00
 + PR28475   : 1.01x faster (slower 14, faster 25, nochange 19)


Is Windows involved in the faster-cpython project? If so, the project should be 
provided with Windows machines for validation.

--
Added file: https://bugs.python.org/file50291/PR28475_inline.log

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Guido van Rossum


Change by Guido van Rossum :


--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

Can someone repeat the benchmarks with 
https://github.com/python/cpython/pull/28475 ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Mark Shannon


Mark Shannon  added the comment:

If we are hitting a size limit for PGO, then we need to reduce the size of 
_PyEval_EvalFrameDefault, to let the compiler do its job.
Force inlining stuff is not going to help.

Reverting https://github.com/python/cpython/pull/25244 for 3.10 seems to be the 
cleanest way to do this.

Would someone with a reliable way to test performance on Windows test the 
effect of https://github.com/python/cpython/pull/28475, please?

Longer term we need get PGO in MSVC working on larger functions, but I doubt 
that will be possible for 3.10.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Mark Shannon


Change by Mark Shannon :


--
pull_requests: +26873
pull_request: https://github.com/python/cpython/pull/28475

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread neonene


neonene  added the comment:

>release with the performance regression

I'm OK with the option. The limitation of PGO seems to me a bit weird and it 
might be unexpected for MSVC team.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

I concur with Ma Lin.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Ma Lin


Ma Lin  added the comment:

Like OP's benchmark, if convert the inline functions to macros in object.h, the 
3.10 branch is 1.03x faster, but still 1.07x slower than 28d28e0~1.
@vstinner could you prepare such a PR as a candidate fix.

There seem to be two ways to solve it in short-term.
1, Split the giant function.
2, Contact MSVC team to see if there is a quick solution, such as undocumented 
options.

But the release time is too close. The worst result is to release with the 
performance regression, and note  in the download page that there is a 
performance regression, if you care about performance please use 3.9.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Erlend E. Aasland


Change by Erlend E. Aasland :


--
nosy: +erlendaasland

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-18 Thread Ma Lin


Ma Lin  added the comment:

> In my case, pgo got stuck on linking with the object.h.

Me too. Since commit 28d28e0 (the first commit to slow down the PGO build), if 
add `__forceinline` attribute to _Py_DECREF() function in object.h, the PGO 
build hangs (>50 minutes).

So PR 28427 may not be a short-term solution.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-18 Thread Ken Jin


Ken Jin  added the comment:

@Pablo,
> If  is correct ...

For some verification, I benched pyperformance on Win10 AMD64, with the Python 
3.10a7 and 3.10rc2 x64 binaries downloaded directly from python.org website 
release pages. The results corroborate with neonene's (please see the attached 
file). In short, there was a 1.09x slowdown on average.

FYI, `pyperf system tune` doesn't work on Windows. So I manually disabled turbo 
boost and intel speedstep, but I didn't have time to research setting core 
affinity and the other stabilizations. Nonetheless, most of the benches were 
stable.

> I am marking this as a release blocker until there is some agreement.
Got it. Setting as advised.

--
priority: high -> release blocker
Added file: https://bugs.python.org/file50286/310a7_vs_310rc2_bench.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread neonene


neonene  added the comment:

> (32-bit: "1.07", 64-bit: "1.14": "higher the slower" wrote neonene)

32-bit and 64-bit are in reverse. I compared b1 and a7 because this can be 
confirmed by anyone with official binary. If 7% of my patch has little to do 
with the gap, then I will be happy that 3.10 can be far faster.

>How can I build Python with PGO on Windows?

Try the following,

   PCbuild\build.bat -p x64 --no-tkinter --pgo

Before building, your object.h needs to replace
static inline int Py_ALWAYS_INLINE
with
static Py_ALWAYS_INLINE int 

In my case, pgo got stuck on linking with the object.h.


I'm waiting the reply from developercommunity.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:

Can someone compare the main branch (commit 
e4044e9f893350b4623677c048d33414a77edf55) to the main branch + PR 28427 patch?

You can download the patch from:
https://patch-diff.githubusercontent.com/raw/python/cpython/pull/28427.patch

How can I build Python with PGO on Windows? Visual Studio has two profiles: 
PGinstrument and PGupdate.

I cannot find "PGO", "PGinstrument" or "PGupdate" in the 
https://devguide.python.org/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:

I created a draft PR to mark functions like Py_INCREF() and Py_IS_TYPE() with 
__forceinline: PR 28427.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +26837
pull_request: https://github.com/python/cpython/pull/28427

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:

> If https://bugs.python.org/file50280/310rc2_benchmarks.txt is correct, this 
> means that we have a 7% slowdown in Windows PGO builds for 3.10, which I 
> don't think is acceptable.

What I understood is that the https://bugs.python.org/msg401743 patch makes 
1.07x faster.

But https://bugs.python.org/issue45116#msg401143 compares Python 3.10b1 to 
Python 3.10a7: Python 3.10b1 is slower (32-bit: "1.07", 64-bit: "1.14": "higher 
the slower" wrote neonene).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset e4044e9f893350b4623677c048d33414a77edf55 by Victor Stinner in 
branch 'main':
bpo-45116: Py_DEBUG ignores Py_ALWAYS_INLINE (GH-28419)
https://github.com/python/cpython/commit/e4044e9f893350b4623677c048d33414a77edf55


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

If https://bugs.python.org/file50280/310rc2_benchmarks.txt is correct, this 
means that we have a 7% slowdown in Windows PGO builds for 3.10, which I don't 
think is acceptable.

I am marking this as a release blocker until there is some agreement. I remind 
you that if an agreement cannot be reached, you may reach the Steering Council 
for help reaching some conclusion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:

Raymond: "Please do as Steve asked  and revert back to the previous stable, 
reliable code."

Is this issue really about Py_INCREF() being a static inline macro? Or is it 
more about the increased size of the _PyEval_EvalFrameDefault() function?

neonene's analyzis seems to show that the the PGO optimizer of the MSC compiler 
has thresholds depending on the function size, and _PyEval_EvalFrameDefault() 
crossed these thresholds. Py_INCREF() static inline function only seems to be 
the top of the iceberg, it's more a "side effect" than the root issue, no?

neonene showed in msg401743 that even adding *dead code* changes Python 
performance. So for me, it sounds weird to decide to change Py_INCREF() 
implementation only based on this analysis. Or maybe I missed something.

neonene's analyzis starts with the commit 
28d28e053db6b69d91c2dfd579207cd8ccbc39e7 of PR 25244. Are you suggesting to 
revert this change? Mark Shannon is pushing many changes in ceval.c and the 
frame object. Multiple changes have been pushed on top of it since this commit, 
I don't think that this commit can be easily reverted.

What I understood is that adding __forceinline on some static inline functions 
can make the performance regression of the commit 
28d28e053db6b69d91c2dfd579207cd8ccbc39e7 less bad. But I would prefer to 
validate that, since neonene's comparisons are not what I'm looking for: 
compare main to main+Py_ALWAYS_INLINE.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Ken Jin


Ken Jin  added the comment:

> How severe is the regression?

OP provided pyperformance of current 3.10 vs their patched version at 
https://bugs.python.org/file50280/310rc2_benchmarks.txt. The patch is at 
https://bugs.python.org/msg401743.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> Pablo, should this be a release blocker?

How severe is the regression? If is severe enough we can mark it as a release 
blocker, but a conclusion needs to be reached ASAP because I don't want to 
change a fundamental macro a few days before the release

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

FWIW: Back in the days of Python 1.5.2, the ceval loop was too big for CPU 
caches as well and one of the things I experimented with at the time was 
rearranging the opcodes based on how often they were used and splitting the 
whole switch statement we had back then in two parts. This results in a 10-20% 
speedup.

CPU caches have since gotten much larger, but the size of the loop still is 
something to keep in mind and optimize for, as more and more logic gets added 
to the inner loop of Python.

IMO, we should definitely keep forced inlines / macros where they are used 
inside hot loops, perhaps even in all of the CPython code, since the conversion 
to inline functions is mostly for hiding internals from extensions, not to hide 
them from CPython itself.

@neonene: Could you provide more details about the CPU you're using to run the 
tests ?

BTW: Perhaps the PSF could get a few sponsors to add more hosts to 
speed.python.org, to provide a better overview. It looks as if the system is 
only compiling on Ubuntu 14.04 and running on an 11 year old system 
(https://speed.python.org/about/). If that's the case, the system uses a server 
CPU with 12MB cache 
(https://www.intel.com/content/www/us/en/products/sku/47916/intel-xeon-processor-x5680-12m-cache-3-33-ghz-6-40-gts-intel-qpi/specifications.html).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Pablo, should this be a release blocker?

--
nosy: +lemburg, pablogsal

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> Right now, I'm not sure. The heuristic to decide
> if a function is inlined or not seems to depend
> a lot on the compiler and the compiler options.

That is exactly correct.  And it is why we should
use the macro form which is certain to be inlined. 

Please do as Steve asked  and revert back to the
previous stable, reliable code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +26831
pull_request: https://github.com/python/cpython/pull/28419

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:

I added Py_ALWAYS_INLINE to run benchmarks more easily. Even if Py_INCREF() is 
converted back to a macro, there are now multiple static inline functions which 
are short and performance critical.

Using Py_ALWAYS_INLINE *may* speed up the Python debug builds and the PGO build 
on Windows if I understood correctly.

Right now, I'm not sure. The heuristic to decide if a function is inlined or 
not seems to depend a lot on the compiler and the compiler options.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset 6b413551284a94cfe31377c9c607ff890aa06c26 by Victor Stinner in 
branch 'main':
bpo-45116: Add the Py_ALWAYS_INLINE macro (GH-28390)
https://github.com/python/cpython/commit/6b413551284a94cfe31377c9c607ff890aa06c26


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor


STINNER Victor  added the comment:

> the entire eval function is now too big for PGO on MSVC

I don't think that the issue is specific to MSVC. If a function becomes too 
big, it becomes less efficient for CPU caches.

One idea would be to move the least common opcodes into a slow-path, in a 
separated function, and make sure that this function is *not* inlined (use 
Py_NO_INLINE macro).

@Mark: What do you think?

Maybe we can keep current targets in the big switch, and call the function 
there. Something like:

TARGET(DUP_TOP):
TARGET(DUP_TOP_TWO):
(...)
   ceval_slow_path();
   break;

_PyEval_EvalFrameDefault() takes around 3500 lines of C code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Ken Jin


Ken Jin  added the comment:

@neonene
Thanks for the truly excellent investigation!

@Raymond and @Steve,
If I understood OP (neonene) properly, changing Py_DECREF to a macro won't get 
back the entire 7% lost performance in pyperformance. neonene's investigations 
suggest that the entire eval function is now too big for PGO on MSVC. Fixing 
Py_DECREF may get us a few %, but all the other hot functions in eval are 
likely not being inlined as well. Their suggested patch in 
https://bugs.python.org/msg401743 works, but IMO, _may_ slow down DISPATCH() 
_slightly_ since it seems to introduce another jump.

I suggest using their patch only for MSVC until we think of something better in 
3.11. The additional jump may not matter after PGO (and it surely isn't a 7% 
slowdown :). WDYT?

--
priority: normal -> high

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread Steve Dower


Steve Dower  added the comment:

I agree with Raymond. Let's stop throwing more code at this until we've figured 
out what's going on and revert the change for now.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

These should be changed back to macros where inlining is guaranteed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +26803
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/28390

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread neonene


neonene  added the comment:

I reported this issue to developercommunity of microsoft.

https://developercommunity.visualstudio.com/t/1531987

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread Ken Jin


Change by Ken Jin :


--
nosy: +kj

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-13 Thread neonene


neonene  added the comment:

With msvc 16.10.3 and 16.11.2 (latest),
PR25244 told me the amount of code in _PyEval_EvalFrameDefault() is over the 
limit of PGO.
In the old version of _PyEval_EvalFrameDefault (b98eba5), the same issue can be 
caused adding any-code anywhere with more than 20 expressions/statements. For 
example, at the top/middle/end of the function, repeating "if (0) {}" 10times, 
or "if (0) {19 statements}". As for python3.9.7, more than 800 
expressions/statements.

Here is just a workaround for 3.10rc2 on windows.
==
--- Python/ceval.c
+++ Python/ceval.c
@@ -1306,9 +1306 @@
-#define DISPATCH() \
-{ \
-if (trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE) { \
-goto tracing_dispatch; \
-} \
-f->f_lasti = INSTR_OFFSET(); \
-NEXTOPARG(); \
-DISPATCH_GOTO(); \
-}
+#define DISPATCH() goto tracing_dispatch
@@ -1782,4 +1774,9 @@
 tracing_dispatch:
 {
+if (!(trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE)) {
+f->f_lasti = INSTR_OFFSET();
+NEXTOPARG();
+DISPATCH_GOTO();
+}
 int instr_prev = f->f_lasti;
 f->f_lasti = INSTR_OFFSET();
==

This patch becomes ineffective just adding one expression to DISPATCH macro as 
below

   #define DISPATCH() {if (1) goto tracing_dispatch;}

And this approach is not sufficient for 3.11 with bigger eval-func.
I don't know a cl/link option to lift such restriction of function size.


3.10rc2 x86 pgo : 1.00
patched : 1.09x faster (slower  5, faster 48, not significant 5)

3.10rc2 x64 pgo : 1.00 (roughly the same speed as official bin)
patched : 1.07x faster (slower  5, faster 47, not significant 6)
  patched(/Ob3) : 1.07x faster (slower  7, faster 45, not significant 6)

x64 results are posted.

Fixing inlining rejection also made __forceinline buildable with normal 
processing time and memory usage.

--
Added file: https://bugs.python.org/file50280/310rc2_benchmarks.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-13 Thread STINNER Victor


Change by STINNER Victor :


--
title: Performance regression 3.10b1 and later on Windows -> Performance 
regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com