[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Guido van Rossum
Change by Guido van Rossum : -- type: crash -> performance ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Guido van Rossum
Guido van Rossum added the comment: Someone whose name I don't recognize (MagzoB Mall) just changed the issue type to "crash" without explaining why. That user was created today, and has no triage permissions. Mind if I change it back? It feels like vandalism. --

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread MagzoB Mall
Change by MagzoB Mall : -- type: performance -> crash ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Brandt Bucher
Change by Brandt Bucher : -- nosy: +brandtbucher -brandtbucher2 ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread Brandt Bucher
Change by Brandt Bucher : -- nosy: +brandtbucher2 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread neonene
neonene added the comment: 3.10.0 official binary is as slow as rc2. Many files are not updated in the source archive or b494f5935c92951e75597bfe1c8b1f3112fec270, so I'm not sure if the delay is intentional or not. We have no choice except waiting for 3.10.1. --

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-09 Thread neonene
neonene added the comment: PR28475 is not in the official source archive. https://www.python.org/ftp/python/3.10.0/Python-3.10.0.tar.xz I'll check later whether official binary has the fix. -- ___ Python tracker

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-08 Thread Ma Lin
Ma Lin added the comment: Today I tested with msvc2022-preview, `__forceinline` attribute will not hang the build. 64-bit PGO builds: 28d28e0~1,vc2022 : baseline 28d28e0~1+F,vc2022 : 1.02x slower <1> 28d28e0,vc2022 : 1.03x slower <2> 28d28e0+F,vc2022 : 1.03x slower

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread Steve Dower
Steve Dower added the comment: If we know which parts of the function are critical, perhaps we should be designing a PGO profile that actually hits them all? The current profile is very arbitrary, basically just waiting for someone motivated enough to figure out a better one. --

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread Steve Dower
Steve Dower added the comment: I would very much appreciate any new compiler be compatible with the standard Windows debuggers (windbg primarily, but I imagine most contributors would like it to keep working from VS). Last I heard, clang is fine as a compiler for debugging if you use the

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread STINNER Victor
STINNER Victor added the comment: > How feasible would it be to use Clang or GCC on Windows? clang seems to have a good Windows support and tries to the ABI compatible with MSC which is a must have to keep wheel package support (especially for the stable ABI, used by PyQt on Windows for

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-07 Thread Mark Shannon
Mark Shannon added the comment: Sadly the MSVC team are claiming that this isn't a bug in their compiler. Not sure how we convince them that it is. The website rejects any attempt to reopen the issue. How feasible would it be to use Clang or GCC on Windows? --

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-30 Thread Ken Jin
Ken Jin added the comment: @Pablo > I disagree. This is a regression/bug and we don't advertise "known bugs" in > the what's new, the same for any other bugfix that has been delayed until > 3.10.1 Alright, in hindsight 3.10 What's New was a bad suggestion on my part. I wonder if there's a

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene
neonene added the comment: _PyEval_EvalFrameDefault() may also need to be divided. -- ___ Python tracker ___ ___ Python-bugs-list

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ma Lin
Ma Lin added the comment: I think this is a bug of MSVC2019, not a really regression of CPython. So changing the code of CPython is just a workaround, maybe the right direction is to prompt MSVC to fix the bug, otherwise there will be more trouble when 3.11 is released a year later. Seeing

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene
neonene added the comment: @pablogsal I'm OK with more effective fixes in 3.10.1 and later. Thanks all, thanks kj and malin for many help. -- ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene
neonene added the comment: I submitted 2 drafts in a hurry. Sorry for short explanations. I'll add more reports. -- ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene
Change by neonene : -- pull_requests: +27001 pull_request: https://github.com/python/cpython/pull/28631 ___ Python tracker ___ ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene
Change by neonene : -- pull_requests: +27000 pull_request: https://github.com/python/cpython/pull/28630 ___ Python tracker ___ ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > IMO, we should note in What's New that only for Windows, 3.10.0 has a slight > slowdown. I disagree. This is a regression/bug and we don't advertise "known bugs" in the what's new, the same for any other bugfix that has been delayed until 3.10.1 >

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ken Jin
Ken Jin added the comment: Sadly, I can't reproduce the speedups OP reported from disabling test_patma.TestTracing. It's not any faster than what we have with PR28475. (See attached pyperformance). I'm looking forward to their other fix :). Even if it comes in 3.10.1 that's still a huge

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > I have another fix. If you have another fix, please create a PR ASAN and get it reviewed and merged by a core dev in the next 24 hours, otherwise it will need to wait until 3.10.1 -- ___ Python tracker

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene
neonene added the comment: I have another fix. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ken Jin
Ken Jin added the comment: > If someone wants this issue to be solved in 3.10.0 it must be resolved ASAP. neonene suggested that the tracing tests for pattern matching (added in 3.10b4/rc1) caused PGO to wrongly optimize the more uncommon tracing paths in ceval. I will verify their one-line

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: This means that if anyone wants to pursue the 4% that is left the fix must be committed within 24 hours. -- priority: release blocker -> ___ Python tracker

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I'm landing PR 28475 for now as it improves the situation from 7% to 4% slowdown and is contained enough. -- ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: If someone wants this issue to be solved in 3.10.0 it must be resolved ASAP. I am going to start freezing the release branch in one day or two to start testing the final candidate as much as possible so this issue has 24h at max to be merged into

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-23 Thread neonene
neonene added the comment: 3.10rc2 Python/ceval.c 1306: #define DISPATCH() \ 1307: { \ 1308: if (trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE) { \ 1309: goto tracing_dispatch; \ Among the 44 pgo-tests, only test_patma.TestTracing hits the condition above.

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene
neonene added the comment: PR28475 PGO is 2% slower than the patch I pasted on msg401743. The function sizes are almost the same (+1:goto,+1:label), and there is no performance gap between release builds. I suspect the following. 1. PGO is too sensitive to a function size at near the limit.

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Mark Shannon
Mark Shannon added the comment: The only other change of any obvious significance to _PyEval_EvalFrameDefault since 3.10a7 are the changes to MATCH_MAPPING and MATCH_SEQUENCE and those make _PyEval_EvalFrameDefault smaller. We may need to look elsewhere for the remaining ~4% performance.

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene
neonene added the comment: To be fair, the slowdowns between PR25244 and b1 seems to be an accumulation of "1.00x slower" of every commit. I don't know after b1. -- ___ Python tracker

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Ken Jin
Ken Jin added the comment: Like what Ma Lin and neonene have mentioned above, PR28475 recovered half of the lost performance. It's unfortunately still 4% slower than 3.10a7. >pyperf compare_to 310a7.json 310rc2.json 310rc2patched.json Geometric mean (versus 3.10a7) == 310rc2:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Ma Lin
Ma Lin added the comment: PR28475: 64-bit build is 1.03x slower than 28d28e0~1 32-bit build is 1.04x slower than 28d28e0~1 28d28e0~1 is the last good commit. -- ___ Python tracker

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene
neonene added the comment: I built 3.10rc2 PGO with PR28475 applied, and posted the inliner's log. In the log, the 4-callees mentioned above are now inlined, which were "hard reject"ed before. As for the performance, a few reporters may be needed, but it's not necessary for them to care

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Guido van Rossum
Change by Guido van Rossum : -- nosy: +gvanrossum ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: Can someone repeat the benchmarks with https://github.com/python/cpython/pull/28475 ? -- ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Mark Shannon
Mark Shannon added the comment: If we are hitting a size limit for PGO, then we need to reduce the size of _PyEval_EvalFrameDefault, to let the compiler do its job. Force inlining stuff is not going to help. Reverting https://github.com/python/cpython/pull/25244 for 3.10 seems to be the

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread Mark Shannon
Change by Mark Shannon : -- pull_requests: +26873 pull_request: https://github.com/python/cpython/pull/28475 ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread neonene
neonene added the comment: >release with the performance regression I'm OK with the option. The limitation of PGO seems to me a bit weird and it might be unexpected for MSVC team. -- ___ Python tracker

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Raymond Hettinger
Raymond Hettinger added the comment: I concur with Ma Lin. -- ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Ma Lin
Ma Lin added the comment: Like OP's benchmark, if convert the inline functions to macros in object.h, the 3.10 branch is 1.03x faster, but still 1.07x slower than 28d28e0~1. @vstinner could you prepare such a PR as a candidate fix. There seem to be two ways to solve it in short-term. 1,

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Erlend E. Aasland
Change by Erlend E. Aasland : -- nosy: +erlendaasland ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-18 Thread Ma Lin
Ma Lin added the comment: > In my case, pgo got stuck on linking with the object.h. Me too. Since commit 28d28e0 (the first commit to slow down the PGO build), if add `__forceinline` attribute to _Py_DECREF() function in object.h, the PGO build hangs (>50 minutes). So PR 28427 may not be a

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-18 Thread Ken Jin
Ken Jin added the comment: @Pablo, > If is correct ... For some verification, I benched pyperformance on Win10 AMD64, with the Python 3.10a7 and 3.10rc2 x64 binaries downloaded directly from python.org website release pages. The results corroborate with neonene's (please see the attached

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread neonene
neonene added the comment: > (32-bit: "1.07", 64-bit: "1.14": "higher the slower" wrote neonene) 32-bit and 64-bit are in reverse. I compared b1 and a7 because this can be confirmed by anyone with official binary. If 7% of my patch has little to do with the gap, then I will be happy that

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: Can someone compare the main branch (commit e4044e9f893350b4623677c048d33414a77edf55) to the main branch + PR 28427 patch? You can download the patch from: https://patch-diff.githubusercontent.com/raw/python/cpython/pull/28427.patch How can I build Python

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: I created a draft PR to mark functions like Py_INCREF() and Py_IS_TYPE() with __forceinline: PR 28427. -- ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
Change by STINNER Victor : -- pull_requests: +26837 pull_request: https://github.com/python/cpython/pull/28427 ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: > If https://bugs.python.org/file50280/310rc2_benchmarks.txt is correct, this > means that we have a 7% slowdown in Windows PGO builds for 3.10, which I > don't think is acceptable. What I understood is that the https://bugs.python.org/msg401743 patch makes

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: New changeset e4044e9f893350b4623677c048d33414a77edf55 by Victor Stinner in branch 'main': bpo-45116: Py_DEBUG ignores Py_ALWAYS_INLINE (GH-28419) https://github.com/python/cpython/commit/e4044e9f893350b4623677c048d33414a77edf55 --

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: If https://bugs.python.org/file50280/310rc2_benchmarks.txt is correct, this means that we have a 7% slowdown in Windows PGO builds for 3.10, which I don't think is acceptable. I am marking this as a release blocker until there is some agreement. I

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: Raymond: "Please do as Steve asked and revert back to the previous stable, reliable code." Is this issue really about Py_INCREF() being a static inline macro? Or is it more about the increased size of the _PyEval_EvalFrameDefault() function? neonene's

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Ken Jin
Ken Jin added the comment: > How severe is the regression? OP provided pyperformance of current 3.10 vs their patched version at https://bugs.python.org/file50280/310rc2_benchmarks.txt. The patch is at https://bugs.python.org/msg401743. -- ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > Pablo, should this be a release blocker? How severe is the regression? If is severe enough we can mark it as a release blocker, but a conclusion needs to be reached ASAP because I don't want to change a fundamental macro a few days before the

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: FWIW: Back in the days of Python 1.5.2, the ceval loop was too big for CPU caches as well and one of the things I experimented with at the time was rearranging the opcodes based on how often they were used and splitting the whole switch statement we had

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: Pablo, should this be a release blocker? -- nosy: +lemburg, pablogsal ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Right now, I'm not sure. The heuristic to decide > if a function is inlined or not seems to depend > a lot on the compiler and the compiler options. That is exactly correct. And it is why we should use the macro form which is certain to be inlined.

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
Change by STINNER Victor : -- pull_requests: +26831 pull_request: https://github.com/python/cpython/pull/28419 ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: I added Py_ALWAYS_INLINE to run benchmarks more easily. Even if Py_INCREF() is converted back to a macro, there are now multiple static inline functions which are short and performance critical. Using Py_ALWAYS_INLINE *may* speed up the Python debug builds

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: New changeset 6b413551284a94cfe31377c9c607ff890aa06c26 by Victor Stinner in branch 'main': bpo-45116: Add the Py_ALWAYS_INLINE macro (GH-28390) https://github.com/python/cpython/commit/6b413551284a94cfe31377c9c607ff890aa06c26 --

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread STINNER Victor
STINNER Victor added the comment: > the entire eval function is now too big for PGO on MSVC I don't think that the issue is specific to MSVC. If a function becomes too big, it becomes less efficient for CPU caches. One idea would be to move the least common opcodes into a slow-path, in a

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread Ken Jin
Ken Jin added the comment: @neonene Thanks for the truly excellent investigation! @Raymond and @Steve, If I understood OP (neonene) properly, changing Py_DECREF to a macro won't get back the entire 7% lost performance in pyperformance. neonene's investigations suggest that the entire eval

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread Steve Dower
Steve Dower added the comment: I agree with Raymond. Let's stop throwing more code at this until we've figured out what's going on and revert the change for now. -- ___ Python tracker

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread Raymond Hettinger
Raymond Hettinger added the comment: These should be changed back to macros where inlining is guaranteed. -- ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread STINNER Victor
Change by STINNER Victor : -- pull_requests: +26803 stage: -> patch review pull_request: https://github.com/python/cpython/pull/28390 ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread neonene
neonene added the comment: I reported this issue to developercommunity of microsoft. https://developercommunity.visualstudio.com/t/1531987 -- ___ Python tracker ___

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread Ken Jin
Change by Ken Jin : -- nosy: +kj ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-13 Thread neonene
neonene added the comment: With msvc 16.10.3 and 16.11.2 (latest), PR25244 told me the amount of code in _PyEval_EvalFrameDefault() is over the limit of PGO. In the old version of _PyEval_EvalFrameDefault (b98eba5), the same issue can be caused adding any-code anywhere with more than 20

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-13 Thread STINNER Victor
Change by STINNER Victor : -- title: Performance regression 3.10b1 and later on Windows -> Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build ___ Python tracker