[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2022-04-07 Thread neonene


neonene  added the comment:

>What exactly does "pgo hard reject" mean?

In my recognition, "pgo hard reject" is based on the PGOptimizer's heuristic, 
"reject" is related to the probe count (hot/cold).

  https://developercommunity.visualstudio.com/t/1531987#T-N1535774


And there was a reply from MSVC team, closing the issue. MSVC won't be fixed in 
the near future.

  https://developercommunity.visualstudio.com/t/1595341#T-N1695626

>From the reply and my investigation, 3.11 would need the following:

1. Some callsites such as tp_* pointer should not inline its fastpaths in the 
eval switch-case. They often conflict. Each pointer needs to be wrapped with a 
function or maybe _PyEval_EvalFrameDefault needs to be enclosed with 
"inline_depth(0)" pragma.

2. __assume(0) should be replaced with other function, inside the eval 
switch-case or in the inlined paths of callees. This is critical with PGO.

3. For inlining, use __forceinline / macro / const function pointer.

   MSVC's stuck can be avoided in many ways, when force-inlining in the 
evalloop a ton of Py_DECREF()s, unless tp_dealloc does not create a inlined 
callsite:

 void
 _Py_Dealloc(PyObject *op)
 {
  ...
 #pragma inline_depth(0) // effects from here, PGO accepts only 0.
 (*dealloc)(op); // conflicts when inlined.
 }
 #pragma inline_depth()  // can be reset only outside the func.



* Virtual Call Speculation:
  
https://docs.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=msvc-170#optimizations-performed-by-pgo


* The profiler runs under /GENPROFILE:PATH option, but at the big ceval-func, 
the optimizer merges the profiles into one like /GENPROFILE:NOPATH mode.
https://docs.microsoft.com/en-us/cpp/build/reference/genprofile-fastgenprofile-generate-profiling-instrumented-build?view=msvc-170#arguments


* __assume(0) (Py_UNREACHABLE):
  
https://devblogs.microsoft.com/cppblog/visual-studio-2017-throughput-improvements-and-advice/#remove-usages-of-__assume

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47182] _PyUnicode_Fini should invalidate ucnhash_capi capsule pointer

2022-04-04 Thread neonene


Change by neonene :


--
nosy: +neonene
nosy_count: 5.0 -> 6.0
pull_requests: +30375
pull_request: https://github.com/python/cpython/pull/32313

___
Python tracker 
<https://bugs.python.org/issue47182>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47103] Copy pgort140.dll when building for PGO

2022-03-27 Thread neonene


Change by neonene :


--
nosy: +neonene
nosy_count: 4.0 -> 5.0
pull_requests: +30226
pull_request: https://github.com/python/cpython/pull/32146

___
Python tracker 
<https://bugs.python.org/issue47103>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43166] Unused letters in Windows-specific pragma optimize

2022-03-21 Thread neonene


Change by neonene :


--
nosy: +neonene
nosy_count: 7.0 -> 8.0
pull_requests: +30111
pull_request: https://github.com/python/cpython/pull/32023

___
Python tracker 
<https://bugs.python.org/issue43166>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43271] AMD64 Windows10 3.x crash with Windows fatal exception: stack overflow

2022-03-21 Thread neonene


Change by neonene :


--
nosy: +neonene, pablogsal
nosy_count: 8.0 -> 10.0
pull_requests: +30112
pull_request: https://github.com/python/cpython/pull/32023

___
Python tracker 
<https://bugs.python.org/issue43271>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46841] Inline bytecode caches

2022-03-02 Thread neonene


neonene  added the comment:

UNPACK_SEQUENCE's slowdown is already filed?

https://speed.python.org/timeline/#/?exe=12=unpack_sequence=4=50=off=on=on

I hit the gap at 424ecab on Windows.

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue46841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2022-02-20 Thread neonene


Change by neonene :


--
pull_requests: +29588
pull_request: https://github.com/python/cpython/pull/31459

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2022-02-19 Thread neonene


Change by neonene :


--
pull_requests: +29570
pull_request: https://github.com/python/cpython/pull/31436

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46427] Correct MSBuild's configuration for _freeze_module.exe

2022-01-20 Thread neonene


neonene  added the comment:

> +  

This is bad if ARM64 machine takes the blank value as not "ARM64" but "ARM", as 
"ARM" tools are not necessary to install. Then, I agree with the  proposal of 
the OP (PR28491) below:

> Would it be acceptable if a new host platform property is added to
> the project file and keep x64 or x86 as default (depends on target)
> but allow users to configure a different host platform to allow
> native arm64 compilation?

--

___
Python tracker 
<https://bugs.python.org/issue46427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46427] Correct MSBuild's configuration for _freeze_module.exe

2022-01-20 Thread neonene


neonene  added the comment:

> When cross-compiling, tools that are executed as part of the build need to be 
> built for the tool platform, not the target platform.

My PR does not against that at this point, as proposed codes are based on your 
PR28322 (09b4ad11f323f8702cde795e345b75e0fbb1a9a5).
If we now need to prepare for future MSVC *on* ARM, then current _freeze_module 
configurations in "pcbuild.sln" also need to be reconsidered:

{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Debug|ARM.ActiveCfg = Debug|Win32
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Debug|ARM.Build.0 = Debug|Win32
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Debug|ARM64.ActiveCfg = Debug|x64
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Debug|ARM64.Build.0 = Debug|x64
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.PGInstrument|ARM.ActiveCfg = 
Release|Win32
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.PGInstrument|ARM.Build.0 = Release|Win32
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.PGInstrument|ARM64.ActiveCfg = 
Release|x64
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.PGInstrument|ARM64.Build.0 = Release|x64
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.PGUpdate|ARM.ActiveCfg = Release|Win32
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.PGUpdate|ARM64.ActiveCfg = Release|x64
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Release|ARM.ActiveCfg = Release|Win32
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Release|ARM.Build.0 = Release|Win32
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Release|ARM64.ActiveCfg = Release|x64
{19C0C13F-47CA-4432-AFF3-799A296A4DDC}.Release|ARM64.Build.0 = Release|x64


Anyway, what I care about is the usage of "PreferredToolArchitecture" property 
in the current configuration.
The property has nothing to do with whether the host is ARM* or not. Another 
property will do in the future.

When building x86 python with 64bit compiler (set 
PreferredToolArchitecture=x64), _freeze_module gets a x64 executable.

The following change is acceptable?

-  $(PreferredToolArchitecture)
+  

They are the same with no envvar. _freeze_module is always 32bit, though.

--

___
Python tracker 
<https://bugs.python.org/issue46427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46427] Correct MSBuild's configuration for _freeze_module.exe

2022-01-18 Thread neonene


neonene  added the comment:

Defenition in general_advanced.xml

  


  


These options above are corresponded to the following folders in my case:

Microsoft Visual Studio\.\VC\Tools\MSVC\\bin\Hostx86
Microsoft Visual Studio\.\VC\Tools\MSVC\\bin\Hostx64

And Each has the 4 children below that contain cl.exe/link.exe/etc...:

  arm
  arm64
  x64
  x86

--

___
Python tracker 
<https://bugs.python.org/issue46427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46427] Correct MSBuild's configuration for _freeze_module.exe

2022-01-18 Thread neonene


neonene  added the comment:

>This also rolls _freeze_module.exe's architecture back to x64

Correcting: from x86 back to x64

In my recognition, only Win32 _freeze_module.exe is built currently and run on 
non-ARM machines to generate the code for Win32/x64/ARM/ARM64 targets.

--

___
Python tracker 
<https://bugs.python.org/issue46427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46427] Correct MSBuild's configuration for _freeze_module.exe

2022-01-18 Thread neonene


Change by neonene :


--
keywords: +patch
pull_requests: +28873
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30673

___
Python tracker 
<https://bugs.python.org/issue46427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46427] Correct MSBuild's configuration for _freeze_module.exe

2022-01-18 Thread neonene


New submission from neonene :

In pcbuild.proj, "PreferredToolArchitecture" property looks misused, which I 
think is useful giving us two selections of a compiler (32bit or 64bit) for any 
target architecture (Win32/x64/ARM/ARM64).

I think the property can be unused there. This means a partial revert of 
PR28491, whose description I cannot reproduce. This also rolls 
_freeze_module.exe's architecture back to x64 when the target platform is x64 
or ARM64.

--
components: Build
messages: 410891
nosy: neonene
priority: normal
severity: normal
status: open
title: Correct MSBuild's configuration for _freeze_module.exe
type: behavior
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46362] os.path.abspath() needs more normalization on Windows

2022-01-14 Thread neonene


Change by neonene :


--
pull_requests: +28793
pull_request: https://github.com/python/cpython/pull/30595

___
Python tracker 
<https://bugs.python.org/issue46362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46362] os.path.abspath() needs more normalization on Windows

2022-01-13 Thread neonene


neonene  added the comment:

Basically, PR30571 aims for compatibility with 3.10 and earlier. Using Windows 
API is the easiest and the same way as them:

import os.path
paths = [
r'C:\CON',
r'C:\PRN',
r'C:\AUX',
r'C:\NUL',
r'C:\COM1',
r'C:\COM2',
r'C:\COM3',
r'C:\COM9',
r'C:\LPT1',
r'C:\LPT2',
r'C:\LPT3',
r'C:\LPT9',
r'C:\foo. . .',
]
for path in paths:
print(os.path.abspath(path))

"""
3.11 before
C:\CON
C:\PRN
C:\AUX
C:\NUL
C:\COM1
C:\COM2
C:\COM3
C:\COM9
C:\LPT1
C:\LPT2
C:\LPT3
C:\LPT9
C:\foo. . .

3.11 after
\\.\CON
\\.\PRN
\\.\AUX
\\.\NUL
\\.\COM1
\\.\COM2
\\.\COM3
\\.\COM9
\\.\LPT1
\\.\LPT2
\\.\LPT3
\\.\LPT9
C:\foo

3.10.1
\\.\CON
\\.\PRN
\\.\AUX
\\.\NUL
\\.\COM1
\\.\COM2
\\.\COM3
\\.\COM9
\\.\LPT1
\\.\LPT2
\\.\LPT3
\\.\LPT9
C:\foo
"""

--

___
Python tracker 
<https://bugs.python.org/issue46362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46287] UNC path normalisation issues on Windows

2022-01-12 Thread neonene


neonene  added the comment:

> PathCchSkipRoot() doesn't recognize forward slash as a path separator,
 
I opened issue46362 and PR30571 about the mentioned abspath() behaviors.

--

___
Python tracker 
<https://bugs.python.org/issue46287>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46362] os.path.abspath() needs more normalization on Windows

2022-01-12 Thread neonene


Change by neonene :


--
keywords: +patch
pull_requests: +28772
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30571

___
Python tracker 
<https://bugs.python.org/issue46362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46362] os.path.abspath() needs more normalization on Windows

2022-01-12 Thread neonene


New submission from neonene :

3.11a3+ introduced the C version of abspath(), which shows incompletely 
normalized absolute path (see msg410068):

>>> os.path.abspath(r'\\spam\\eggs. . .')
'spameggs. . .'
>>> os.path.abspath('C:\\spam. . .')
'C:\\spam. . .'
>>> os.path.abspath('C:\\nul')
'C:\\nul'

The design is efficient on startup with getpath_abspath(), but 
ntpath.abspath()'s result after startup should be more normalized.

--
components: Windows
messages: 410456
nosy: neonene, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.path.abspath() needs more normalization on Windows
type: behavior
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46362>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46287] UNC path normalisation issues on Windows

2022-01-07 Thread neonene


neonene  added the comment:

Regarding https://github.com/python/cpython/pull/30362#issuecomment-1005496892

_Py_abspath/_getfullpathname does not always call GetFullPathNameW on 3.11.

Python 3.10.1
>>> nt._getfullpathname('.\\C:spameggs. . .')
'.\\C:\\spam\\eggs'

Python 3.11.0a3
>>> nt._getfullpathname('.\\C:spameggs. . .')
'.\\C:spameggs. . .'

------
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue46287>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46208] os.path.normpath change between 3.11.0a2 and 3.11.0a3+

2022-01-03 Thread neonene


neonene  added the comment:

>Here's a branch with a passing ntpath.normpath test and a failing 
>posixpath.normpath test:

I took the test cases for my PR, thanks.
On Windows machine, ntpath fails and posixpath passes. It seems that the 
passing one is tested with pure python code.

--

___
Python tracker 
<https://bugs.python.org/issue46208>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46208] os.path.normpath change between 3.11.0a2 and 3.11.0a3+

2022-01-03 Thread neonene


Change by neonene :


--
keywords: +patch
nosy: +neonene
nosy_count: 3.0 -> 4.0
pull_requests: +28576
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30362

___
Python tracker 
<https://bugs.python.org/issue46208>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46217] 3.11 build failure on Win10: new _freeze_module changes?

2022-01-01 Thread neonene


neonene  added the comment:

The flag is not for Win8.1 and available starting in Win10 1703 with 
v10.0.15021 SDK.

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue46217>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46123] _freeze_module on Windows can be built faster with no optimization

2021-12-18 Thread neonene


Change by neonene :


--
keywords: +patch
pull_requests: +28400
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30181

___
Python tracker 
<https://bugs.python.org/issue46123>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46123] _freeze_module on Windows can be built faster with no optimization

2021-12-18 Thread neonene


New submission from neonene :

In Makefile.pre.in, LTO is disabled when building _freeze_module. MSVC can also 
cut the build time of _freeze_module.exe in half without the optimization.

--
components: Build, Windows
messages: 408841
nosy: neonene, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: _freeze_module on Windows can be built faster with no optimization
type: performance
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46123>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40915] multiple problems with mmap.resize() in Windows

2021-12-17 Thread neonene


Change by neonene :


--
nosy: +neonene
nosy_count: 6.0 -> 7.0
pull_requests: +28391
pull_request: https://github.com/python/cpython/pull/30175

___
Python tracker 
<https://bugs.python.org/issue40915>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45582] Rewrite getpath.c in Python

2021-12-09 Thread neonene


Change by neonene :


--
pull_requests: +28238
pull_request: https://github.com/python/cpython/pull/30014

___
Python tracker 
<https://bugs.python.org/issue45582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45582] Rewrite getpath.c in Python

2021-12-06 Thread neonene


Change by neonene :


--
pull_requests: +28166
pull_request: https://github.com/python/cpython/pull/29941

___
Python tracker 
<https://bugs.python.org/issue45582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45582] Rewrite getpath.c in Python

2021-12-05 Thread neonene


Change by neonene :


--
pull_requests: +28154
pull_request: https://github.com/python/cpython/pull/29930

___
Python tracker 
<https://bugs.python.org/issue45582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45582] Rewrite getpath.c in Python

2021-12-03 Thread neonene


neonene  added the comment:

PGO-instrumented binary seems not to specify the stdlib directory on PR29041. I 
can run it with PYTHONPATH set.


Python path configuration:
  PYTHONHOME = 'C:\Py311\'
  PYTHONPATH = (not set)
  program name = 'C:\Py311\PCbuild\amd64\instrumented\python.exe'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  is in build tree = 1
  stdlib dir = 'C:\Py311\PCbuild\Lib'
  sys._base_executable = 'C:\\py311\\PCbuild\\amd64\\instrumented\\python.exe'
  sys.base_prefix = 'C:\\py311\\'
  sys.base_exec_prefix = 'C:\\py311\\'
  sys.platlibdir = 'DLLs'
  sys.executable = 'C:\\py311\\PCbuild\\amd64\\instrumented\\python.exe'
  sys.prefix = 'C:\\py311\\'
  sys.exec_prefix = 'C:\\py311\\'
  sys.path = [
'C:\\py311\\PCbuild\\amd64\\instrumented\\python311.zip',
'C:\\py311\\PCbuild\\Lib',
'C:\\py311\\PCbuild\\amd64\\instrumented',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the 
filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

--

___
Python tracker 
<https://bugs.python.org/issue45582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45582] Rewrite getpath.c in Python

2021-12-03 Thread neonene


Change by neonene :


--
nosy: +neonene
nosy_count: 6.0 -> 7.0
pull_requests: +28130
pull_request: https://github.com/python/cpython/pull/29906

___
Python tracker 
<https://bugs.python.org/issue45582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2021-11-27 Thread neonene


neonene  added the comment:

I requested the MSVC team to reconsider the inlining issues, including 
__forceinline.
https://developercommunity.visualstudio.com/t/1595341


The stuck at link due to __forceinline can be avoided by completing the 
_Py_DECREF optimization outside _PyEval_EvalFrameDefault:

static inline void // no __forceinline
_Py_DECREF_impl(...) {
...
}
static __forceinline void
_Py_DECREF(...) {  // no conditional branch in the function
_Py_DECREF_impl(...);
}


In _PyEval_EvalFrameDefault, wrapping the callees like above seems better for 
performance than just specifying __forceinline under the current MSVC.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2021-11-19 Thread neonene


neonene  added the comment:

In the eval-loop of PR29565, inlining seems to be enabled within about 70 
op-brahches, trained with 44 tests.

log & source: ceval_PR29565_split_func.c  (not for performance)

--
Added file: https://bugs.python.org/file50452/ceval_PR29565_split_func.c

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2021-11-17 Thread neonene


neonene  added the comment:

>This essentially disables PGO.

Thank you for the suggestion. I'll take another experimental aproach to reduce 
the size of 3.11 evalfunc for stronger validation.


>@neonene what's the importance of PR29565?

While we are talking about function size, I would like to use around PR29565 
for consistent reporting. I think any commit is okay to reproduce the issue.

And please ignore the patch to build.bat.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2021-11-17 Thread neonene


neonene  added the comment:

Here are the 3 steps to reproduce with minimal pgo training. (vs2019)

1. Download the source archive of PR29565 and extract.
   
https://github.com/python/cpython/archive/6a84d61c55f2e543cf5fa84522d8781a795bba33.zip

2. Apply the following patch.

==
--- PCbuild/build.bat
+++ PCbuild/build.bat
@@ -66 +66 @@
-set pgo_job=-m test --pgo
+set pgo_job=-c"pass"
--- PCbuild/pyproject.props
+++ PCbuild/pyproject.props
@@ -47,2 +47,3 @@
   /utf-8 %(AdditionalOptions)
+  /d2inlinelogfull:_PyEval_EvalFrameDefault 
%(AdditionalOptions)
 
==

3. Build [Rebuild]

   PCbuild\build --no-tkinter --pgo > build.log [-r]

   According to the inlining section in the log, any function that has one or 
more conditional expressions got "reject" from inliner.

   > Inlinee for function _PyEval_EvalFrameDefault 
   >  -_Py_EnsureFuncTstateNotNULL (pgo hard reject)
   >  ...
   >  _Py_INCREF (pgu decision)
   >  _Py_INCREF (pgu decision)
   >  -_Py_XDECREF (pgo hard reject)
   >  -_Py_XDECREF (pgo hard reject)
   >  -_Py_DECREF (pgo hard reject)
   >  -_Py_DECREF (pgo hard reject)
   >  ...


Profiling scores can be shown on VS2019 Command Prompt.

   pgomgr PCbuild\amd64\python311.pgd /summary [/detail] > largefile.txt

   * pgomgr.exe (or profile itself) has an issue.
 https://developercommunity.visualstudio.com/t/1560909


Unused opcodes in this training

   ROT_THREE, DUP_TOP_TWO, UNARY_POSITIVE, UNARY_NEGATIVE,
   BINARY_OP_ADD_FLOAT, UNARY_INVERT, BINARY_OP_MULTIPLY_INT,
   BINARY_OP_MULTIPLY_FLOAT, GET_LEN, MATCH_MAPPING, MATCH_SEQUENCE,
   MATCH_KEYS, LOAD_ATTR_SLOT, LOAD_METHOD_CLASS, GET_AITER, GET_ANEXT,
   BEFORE_ASYNC_WITH, END_ASYNC_FOR, STORE_ATTR_SLOT,
   STORE_ATTR_WITH_HINT, GET_YIELD_FROM_ITER, PRINT_EXPR, YIELD_FROM,
   GET_AWAITABLE, LOAD_ASSERTION_ERROR, SETUP_ANNOTATIONS, UNPACK_EX,
   DELETE_ATTR, DELETE_GLOBAL, ROT_N, COPY, DELETE_DEREF,
   LOAD_CLASSDEREF, MATCH_CLASS, SET_UPDATE, DO_TRACING

   I managed to activate inliner experimentally by removing the 36 op-cases 
from switch and merging/removing many macros.


Static instruction counts of _PyEval_EvalFrameDefault()

   PR29565   : 6882 (down to 4400 with above change)

   PR29482   : 7035
   PR29482~1 : 7742
   3.10.0+   : 3980 (well inlined sharing DISPATCH macro)
   3.10.0: 5559
   3.10b1: 5680
   3.10a7: 4117 (well inlined)

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2021-11-16 Thread neonene


neonene  added the comment:

I still have the issue in current main and PR29565 with msvc2022 (v142 or v143 
toolset).

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2021-10-16 Thread neonene


neonene  added the comment:

msg402954
>https://github.com/faster-cpython/tools

According to the suggested stats and pgomgr.exe, I experimentally moved 
LOAD_FAST and LOAD_CONST cases out of switch as below.

if (opcode == LOAD_FAST) {
...
DISPATCH();
}

if (opcode == LOAD_CONST) {
...
DISPATCH();
}

switch (opcode) {


x64 performance results after patched (msvc2019)

Good inliner ver.
3.10.0+1.03x faster than before
28d28e0~1  1.04x faster
3.8.12 1.03x faster

Bad inliner ver. (too big evalfunc. Has msvc2022 increased the capacity?)
3.10.0/rc2 1.00x faster
3.11a1+1.02x faster


It seems to me since quite a while ago the optimizer has stopped at some place 
after successful inlining. So the performance may be sensitive to code changes 
and it could be possible to detect where the optimization is aborted.

(Benchmarks: switch-case_unarranged_bench.txt)

--
Added file: https://bugs.python.org/file50363/switch-case_unarranged_bench.txt

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-10 Thread neonene


neonene  added the comment:

3.10.0 official binary is as slow as rc2.

Many files are not updated in the source archive or 
b494f5935c92951e75597bfe1c8b1f3112fec270, so I'm not sure if the delay is 
intentional or not.

We have no choice except waiting for 3.10.1.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-09 Thread neonene


neonene  added the comment:

PR28475 is not in the official source archive.
https://www.python.org/ftp/python/3.10.0/Python-3.10.0.tar.xz

I'll check later whether official binary has the fix.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

_PyEval_EvalFrameDefault() may also need to be divided.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

@pablogsal
I'm OK with more effective fixes in 3.10.1 and later.

Thanks all, thanks kj and malin for many help.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

I submitted 2 drafts in a hurry. Sorry for short explanations.
I'll add more reports.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


Change by neonene :


--
pull_requests: +27001
pull_request: https://github.com/python/cpython/pull/28631

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


Change by neonene :


--
pull_requests: +27000
pull_request: https://github.com/python/cpython/pull/28630

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread neonene


neonene  added the comment:

I have another fix.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-23 Thread neonene


neonene  added the comment:

3.10rc2 Python/ceval.c
1306: #define DISPATCH() \
1307: { \
1308: if (trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE) { \
1309: goto tracing_dispatch; \

Among the 44 pgo-tests, only test_patma.TestTracing hits the condition above. 
On Windows, it seems that skipping it tightens the profile of PR28475 a bit. 
Additional tests such as test_threading(.ThreadTests.test_frame_tstate_tracing) 
might also cause some amount of variation or vice versa.

3.10rc2 x64 PGO: 1.00
+ PR28475 
  with TestTracing : 1.05x faster (slow  3, fast 46, same  9)
  without  : 1.06x faster (slow  5, fast 52, same  1)

  with TestTracing : 1.00
  without  : 1.01x faster (slow 19, fast 27, same 12)

(Details: PR28475_skip1test_bench.txt)


Does test_patma.TestTracing need training for match-case performance?

--
Added file: https://bugs.python.org/file50296/PR28475_skip1test_bench.txt

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene


neonene  added the comment:

PR28475 PGO is 2% slower than the patch I pasted on msg401743.
The function sizes are almost the same (+1:goto,+1:label), and there is no 
performance gap between release builds.

I suspect the following.

1. PGO is too sensitive to a function size at near the limit.
2. PR28475 is not fully covered by 44 tests. (msg401346)

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene


neonene  added the comment:

To be fair, the slowdowns between PR25244 and b1 seems to be an accumulation of 
"1.00x slower" of every commit. I don't know after b1.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread neonene


neonene  added the comment:

I built 3.10rc2 PGO with PR28475 applied, and posted the inliner's log.
In the log, the 4-callees mentioned above are now inlined, which were "hard 
reject"ed before.

As for the performance, a few reporters may be needed, but it's not necessary 
for them to care about noises in the apparent gap.

310rc2 x64 PGO   : 1.00
 + PR28475 build1 bench1 : 1.05x faster (slower  7, faster 43, nochange  8)
  bench2 : 1.05x faster (slower  2, faster 43, nochange 13)
   build2: 1.05x faster (slower  4, faster 45, nochange  9)

310rc2 x64 release   : 1.00
 + PR28475   : 1.01x faster (slower 14, faster 25, nochange 19)


Is Windows involved in the faster-cpython project? If so, the project should be 
provided with Windows machines for validation.

--
Added file: https://bugs.python.org/file50291/PR28475_inline.log

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-20 Thread neonene


neonene  added the comment:

>release with the performance regression

I'm OK with the option. The limitation of PGO seems to me a bit weird and it 
might be unexpected for MSVC team.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-17 Thread neonene


neonene  added the comment:

> (32-bit: "1.07", 64-bit: "1.14": "higher the slower" wrote neonene)

32-bit and 64-bit are in reverse. I compared b1 and a7 because this can be 
confirmed by anyone with official binary. If 7% of my patch has little to do 
with the gap, then I will be happy that 3.10 can be far faster.

>How can I build Python with PGO on Windows?

Try the following,

   PCbuild\build.bat -p x64 --no-tkinter --pgo

Before building, your object.h needs to replace
static inline int Py_ALWAYS_INLINE
with
static Py_ALWAYS_INLINE int 

In my case, pgo got stuck on linking with the object.h.


I'm waiting the reply from developercommunity.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-16 Thread neonene


neonene  added the comment:

I reported this issue to developercommunity of microsoft.

https://developercommunity.visualstudio.com/t/1531987

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-13 Thread neonene


neonene  added the comment:

With msvc 16.10.3 and 16.11.2 (latest),
PR25244 told me the amount of code in _PyEval_EvalFrameDefault() is over the 
limit of PGO.
In the old version of _PyEval_EvalFrameDefault (b98eba5), the same issue can be 
caused adding any-code anywhere with more than 20 expressions/statements. For 
example, at the top/middle/end of the function, repeating "if (0) {}" 10times, 
or "if (0) {19 statements}". As for python3.9.7, more than 800 
expressions/statements.

Here is just a workaround for 3.10rc2 on windows.
==
--- Python/ceval.c
+++ Python/ceval.c
@@ -1306,9 +1306 @@
-#define DISPATCH() \
-{ \
-if (trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE) { \
-goto tracing_dispatch; \
-} \
-f->f_lasti = INSTR_OFFSET(); \
-NEXTOPARG(); \
-DISPATCH_GOTO(); \
-}
+#define DISPATCH() goto tracing_dispatch
@@ -1782,4 +1774,9 @@
 tracing_dispatch:
 {
+if (!(trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE)) {
+f->f_lasti = INSTR_OFFSET();
+NEXTOPARG();
+DISPATCH_GOTO();
+}
 int instr_prev = f->f_lasti;
 f->f_lasti = INSTR_OFFSET();
==

This patch becomes ineffective just adding one expression to DISPATCH macro as 
below

   #define DISPATCH() {if (1) goto tracing_dispatch;}

And this approach is not sufficient for 3.11 with bigger eval-func.
I don't know a cl/link option to lift such restriction of function size.


3.10rc2 x86 pgo : 1.00
patched : 1.09x faster (slower  5, faster 48, not significant 5)

3.10rc2 x64 pgo : 1.00 (roughly the same speed as official bin)
patched : 1.07x faster (slower  5, faster 47, not significant 6)
  patched(/Ob3) : 1.07x faster (slower  7, faster 45, not significant 6)

x64 results are posted.

Fixing inlining rejection also made __forceinline buildable with normal 
processing time and memory usage.

--
Added file: https://bugs.python.org/file50280/310rc2_benchmarks.txt

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread neonene


neonene  added the comment:

According to:
https://docs.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=msvc-160

PGO seems to override /Ob3.
Around this issue, I posted benchmark on issue44381.
On python building, /Ob3 works for only non-pgo-supported dlls like,

_ctypes_test
_freeze_importlib
_msi
_testbuffer
_testcapi
_testconsole
_testembed
_testimportmultiple
_testinternalcapi
_testmultiphase
_uuid
liblzma
pylauncher
pyshellext
pywlauncher
sqlite3
venvlauncher
venvwlauncher
winsound

I use this option in _msvccompiler.py for my pyd.
I will try and report when PGO with /Ob3 makes difference in the log.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread neonene


Change by neonene :


Added file: https://bugs.python.org/file50276/x64_b98e.log

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread neonene


Change by neonene :


Added file: https://bugs.python.org/file50275/x64_28d2.log

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread neonene


Change by neonene :


Added file: https://bugs.python.org/file50274/pyproject_inlinestat.patch

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread neonene


Change by neonene :


Added file: https://bugs.python.org/file50273/b98e-no-inline-in-the-others.diff

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread neonene


neonene  added the comment:

Thanks for all suggestions. I focused on my bisected commit and the previous.

I run pyperformance with 4 functions never inlined in the sections below.

   _Py_DECREF()
   _Py_XDECREF()
   _Py_IS_TYPE()
   _Py_atomic_load_32bit_impl()

are

   (1) never inlined in _PyEval_EvalFrameDefault().
   (2) never inlined in the other funcitons.
   (3) never inlined in all functions.


slow downs [4-funcs never inlined section]
--
Windows x64 PGO (44job)(*)(1)(2)(3)
rebuildnone   eval  others  all
--
b98eba5 (4 funcs inlined in eval)  1.00   1.05   1.09   1.14
PR25244 (not inlined in eval)  1.06   1.07   1.18   1.17

pyperf compare_to upper lower:
   (*) 1.06x slower  (slower 45, faster  4, not not significant  9)
   (1) 1.02x slower  (slower 33, faster 13, not not significant 12)
   (2) 1.08x slower  (slower 48, faster  6, not not significant  4)
   (3) 1.03x slower  (slower 39, faster  6, not not significant 13)


--
Windows x86 PGO (44job)(*)(1)(2)(3)
rebuildnone   eval  others  all
--
b98eba5 (4 funcs inlined in eval)  1.00   1.03   1.06   1.15
PR25244 (not inlined in eval)  1.13   1.13   1.22   1.24

pyperf compare_to upper lower:
   (*) 1.13x slower  (slower 54, faster  2, not not significant  2)
   (1) 1.10x slower  (slower 47, faster  3, not not significant  8)
   (2) 1.14x slower  (slower 54, faster  1, not not significant  3)
   (3) 1.08x slower  (slower 43, faster  3, not not significant 12)


In both x64 and x86, it looks column (2) and (*) has similar gaps.
So, I would like to simply focus on the eval-loop.

I built PGO with "/d2inlinestats" and 
"/d2inlinelogfull:_PyEval_EvalFrameDefault" according to the blog.

I posted logs. As for PR25244, the logsize is 3x smaller than the previous and 
pgo rejects the 4 funcs above. I will look into it later.


Collecting:
> Before the PR, it took 10x~ longer to link than without __forceinline 
> function.

Current build is 10x~ shorter than before to link.
Before the PR, __forceinline had no impact to me.

--
Added file: https://bugs.python.org/file50271/b98e-no-inline-in-all.diff

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread neonene


Change by neonene :


Added file: https://bugs.python.org/file50272/b98e-no-inline-in-eval.diff

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-07 Thread neonene


neonene  added the comment:

@vstinner: __forceinline suggestion

Since PR25244 (mentioned above), it seems link.exe has got to get stuck on 
python310.dll.
Before the PR, it took 10x~ longer to link than without __forceinline function.
I can confirm with _Py_DECREF() and _Py_XDECREF() and one training-job (the 
more fucntions forced/jobs used, the slower to link).
Have you tried __forceinline on PGO ?


> I don't understand how to read the table.

Overhead field is the output of pyperf command, not subtraction (the answers 
are the same just luckily).

ex) 3.10rc1x86 PGO: 
 PGO  : pyperf compare_to 3.10a7 left
 patched  : pyperf compare_to 3.10a7 right
 overhead : pyperf compare_to right  left 
  are
 1.15x slower (slower 52, faster  4, not significant  2)
 1.13x slower (slower 50, faster  4, not significant  4)
 1.02x slower (slower 29, faster 14, not significant 15)


> I'm not sure if PGO builds are reproducible,

MSVC does not produce the same code. Inlining (all or nothing) might be a quite 
special case in the hottest section.
I suspect the profiler doesn't work well only for _PyEval_EvalFrameDefault(), 
including branch/align optimization.
So my posted macro or inlining is just for a mesureing, not the solution.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-06 Thread neonene


Change by neonene :


Added file: https://bugs.python.org/file50264/ceval_310rc1_patched.c

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-06 Thread neonene


New submission from neonene :

pyperformance on Windows shows some gap between 3.10a7 and 3.10b1.
The following are the ratios compared with 3.10a7 (the higher the slower).

-
Windows x64 |  PGO   release  official-binary
+
20210405|
3.10a7  |  1.00   1.241.00 (PGO?)
20210408-07:58  |
b98eba5 |  0.98
20210408-10:22  |
  * PR25244 |  1.04
20210503|
3.10b1  |  1.07   1.211.07
-
Windows x86 |  PGO   release  official-binary
+
20210405|
3.10a7  |  1.00   1.251.27 (release?)
20210408-07:58  |
b98eba5bc   |  1.00
20210408-10:22  |
  * PR25244 |  1.11
20210503|
3.10b1  |  1.14   1.281.29

Since PR25244 (28d28e053db6b69d91c2dfd579207cd8ccbc39e7),
_PyEval_EvalFrameDefault() in ceval.c has seemed to be unoptimized with PGO 
(msvc14.29.16.10).
At least the functions below have become un-inlined there at all.

  (1) _Py_DECREF() (from Py_DECREF,Py_CLEAR,Py_SETREF)
  (2) _Py_XDECREF()(from Py_XDECREF,SETLOCAL)
  (3) _Py_IS_TYPE()(from PyXXX_CheckExact)
  (4) _Py_atomic_load_32bit_impl() (from CHECK_EVAL_BREAKER)

I tried in vain other linker options like thread-safe-profiling, 
agressive-code-generation, /OPT:NOREF.
3.10a7 can inline them in the eval-loop even if profiling only test_array.py.

I measured overheads of (1)~(4) on my own build whose eval-loop uses macros 
instead of them.

-
Windows x64 |  PGO   patched  overhead in eval-loop
+
3.10a7  |  1.00
20210802|
3.10rc1 |  1.09   1.054%  (slow 43, fast  5, same 10)
20210831-20:42  |
863154c |  0.95   0.905%  (slow 48, fast  3, same  7)
   (3.11a0+)|
-
Windows x86 |  PGO   patched  overhead in eval-loop
+
3.10a7  |  1.00
20210802|
3.10rc1 |  1.15   1.132%  (slow 29, fast 14, same 15)
20210831-20:42  |
863154c |  1.05   1.023%  (slow 44, fast  7, same  7)
   (3.11a0+)|

--
components: C API, Interpreter Core, Windows
files: 310rc1_confirm_overhead.patch
keywords: patch
messages: 401143
nosy: Mark.Shannon, neonene, pablogsal, paul.moore, steve.dower, tim.golden, 
vstinner, zach.ware
priority: normal
severity: normal
status: open
title: Performance regression 3.10b1 and later on Windows
type: performance
versions: Python 3.10, Python 3.11
Added file: https://bugs.python.org/file50263/310rc1_confirm_overhead.patch

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44381] Allow enabling control flow guard in Windows build

2021-08-25 Thread neonene


neonene  added the comment:

I'd like to leave my pyperformance (x64) results here.
cpython: ae5259171b8ef62165e061b9dea7ad645a5131a2 (2021-8-23)

1) release + CFG  : 1.00x

2) release + CFG,/Ob3 : 1.05x faster | 41 faster
 |  9 slower
 |  8 not significant

3) release (default)  : 1.07x faster | 52 faster
 |  4 slower (regex_v8,
 |regex_effbot,
 |nbody,
 |hexiom)
 |  2 not significant

4) release + /Ob3 : 1.11x faster | 56 faster
 |  1 slower (regex_v8)
 |  1 not significant (regex_dna)

5) PGO + CFG  : 1.15x faster | 53 faster
 |  2 slower (regex_dna,
 |pidigits)
 |  3 not significant

6) PGO + CFG,/Ob3 : 1.15x faster | 54 faster
 |  1 slower (regex_dna)
 |  3 not significant

7) PGO (default)  : 1.21x faster | 56 faster
 |  1 slower (regex_dna)
 |  1 not significant (regex_effbot)

8) PGO + /Ob3 : 1.21x faster | 57 faster
 |  1 slower (regex_dna)
 |  0 not significant

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue44381>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44878] Clumsy dispatching on interpreter entry.

2021-08-18 Thread neonene


neonene  added the comment:

FYI, PR27727 ("Remove loop...") seems to be a bit slower than the previous 
commit (f08e6d1bb3c5655f184af88c6793e90908bb6338) on my Windows build 
(msvc14.29.16.10). pyperformance shows that

  Windows x64 PGO: 34 slower, 11 faster, 13 not significant, Geometric mean: 
1.02x slower
  Windows x86 PGO: 28 slower, 17 faster, 13 not significant, Geometric mean: 
1.02x slower

Undoing PR27727 on current cpython-main branch also get speed-ups by 1-2% on 
average.

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue44878>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44479] Windows build doesn't regenerate some files

2021-07-14 Thread neonene


neonene  added the comment:

When building, some pull-requests invoke regeneration of test_frozenmain.h.
On PGO mode, MSVC tries to call instrumented python and stops with 
"pgort140.dll not found" error.
Would it be OK to run python in externals folder instead ?

--

___
Python tracker 
<https://bugs.python.org/issue44479>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44479] Windows build doesn't regenerate some files

2021-07-14 Thread neonene


Change by neonene :


--
nosy: +neonene
nosy_count: 7.0 -> 8.0
pull_requests: +25688
pull_request: https://github.com/python/cpython/pull/27146

___
Python tracker 
<https://bugs.python.org/issue44479>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44575] Windows installer prohibits different patches for the same version

2021-07-07 Thread neonene


neonene  added the comment:

To debug pure python, use embeddable pythons in different folders and copy Lib 
folder from source archive instead of using python3.9.zip.

When using msvc (python3*.lib), I think it's enouth to install python as 
follows or build from source.

1.Copy Installed Python to any folder.
2.Uninstall Python.
3.Install Python with different minor version.

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue44575>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43105] [Windows] Can't import extension modules resolved via relative paths in sys.path

2021-06-14 Thread neonene


neonene  added the comment:

After this contribution, when using module at the root dir (maybe bad manners), 
the followings are expected behaviors?

(1) relative drive in sys.path -> bytecode is not put in __pycache__ folder.

>>> import sys
>>> sys.path.append('F:')  # flash device, etc...
>>> import foo
>>> foo.__file__
'F:foo.py'
>>> foo.__cached__
'F:foo.cpython-311.pyc'


(2) absolute drive in sys.path -> __pycache__ is under current dir, not 
absolute.

>>> import sys
>>> sys.path.append('F:\\')
>>> import foo
>>> foo.__file__
'F:\\foo.py'
>>> foo.__cached__
'F:__pycache__\\foo.cpython-311.pyc'

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue43105>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42986] pegen parser: Crash on SyntaxError with f-string on Windows

2021-01-20 Thread neonene


neonene  added the comment:

For me, I confirmed no crash with PR 24279.
Thanks for the fix in no time.

--

___
Python tracker 
<https://bugs.python.org/issue42986>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42986] pegen parser: Crash on SyntaxError with f-string on Windows

2021-01-20 Thread neonene


New submission from neonene :

On Windows, Python master crashes using f-string (which has an invalid char 
with braces) on line 3 and after.
It seems the issue is from commit (e5fe509054183bed9aef42c92da8407d339e8af8).

I tried

1) exec("f'{.}'")
2) exec("\nf'{.}'")
3) exec("\n\nf'{.}'")

commands and results are

1) expected
>>> exec("f'{.}'")
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1
(.)
 ^
SyntaxError: f-string: invalid syntax

2) unexpected (caret indicates nothing)
>>> exec("\nf'{.}'")
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2

^
SyntaxError: f-string: invalid syntax

3) python crashes
>>> exec("\n\nf'{.}'")

--
components: Interpreter Core
messages: 385377
nosy: lys.nikolaou, neonene, pablogsal
priority: normal
severity: normal
status: open
title: pegen parser: Crash on SyntaxError with f-string on Windows
type: crash
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue42986>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-06 Thread neonene


New submission from neonene :

After 
https://github.com/python/cpython/commit/0b858cdd5d114f0890b11b6c4d6559d0ceb468ab
(bpo-1635741: Convert _multibytecodec to multi-phase init),

On Windows x64/x86 with chinese/japanese/korean system-locale,
MultibyteCodec_Check() in multibytecodec.c returns false and
PyExc_TypeError follows. This affects some tests and PGO training.



1) python -m test --verbose test_threading

==
FAIL: test_daemon_threads_fatal_error (test.test_threading.SubinterpThreadi
ngTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_threading.py", line 1124, in test_da
emon_threads_fatal_error
self.assertIn("Fatal Python error: Py_EndInterpreter: "
AssertionError: 'Fatal Python error: Py_EndInterpreter: not the last thread
' not found in 'TypeError: codec is unexpected type\nFatal Python error: _P
yThreadState_Delete: tstate 003FF980 is still current\nPython runti
me state: initialized\n\nThread 0x0710 (most recent call first):\n\n'



2) python -m test --verbose test_embed

==
FAIL: test_audit_subinterpreter (test.test_embed.AuditingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 1433, in test_audit_
subinterpreter
self.run_embedded_interpreter("test_audit_subinterpreter")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 0050CAF0 is still current\nPython runtime state: initializ
ed\n\nThread 0x09d8 (most recent call first):\n\n'

==
FAIL: test_subinterps_different_ids (test.test_embed.EmbeddingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 169, in test_subinte
rps_different_ids
for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 0041C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x0a40 (most recent call first):\n\n'

==
FAIL: test_subinterps_distinct_state (test.test_embed.EmbeddingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 177, in test_subinte
rps_distinct_state
for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 0047C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x0b34 (most recent call first):\n\n'

==
FAIL: test_subinterps_main (test.test_embed.EmbeddingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 163, in test_subinte
rps_main
for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: cod

[issue41766] Python3.10 (x64) crashes after flake8/pyflakes on Windows

2020-09-12 Thread neonene

neonene  added the comment:

I applied PR 21961 to master and comfirmed no crash.
I'll close this issue. Thanks for your quick reply.

--
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue41766>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41766] Python3.10 (x64) crashes after flake8/pyflakes on Windows

2020-09-11 Thread neonene


Change by neonene :


--
nosy: +pablogsal, vstinner -paul.moore, steve.dower, tim.golden, zach.ware

___
Python tracker 
<https://bugs.python.org/issue41766>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41766] Python3.10 (x64) crashes after flake8/pyflakes on Windows

2020-09-11 Thread neonene


New submission from neonene :

On Python3.10(64bit only) for Windows,
flake8 (with .pyc cache) frequently crashes after output.

e.g:
python -m flake8   c:/Python3/Lib/turtle.py
python -m pyflakes c:/Python3/Lib/turtle.py

I think I encountered the crash after PR21293 applied.
(bpo-41194: "Convert _ast extension to PEP 489")

--
components: Interpreter Core, Windows
messages: 376747
nosy: neonene, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: Python3.10 (x64) crashes after flake8/pyflakes on Windows
type: crash
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue41766>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40958] ASAN/UBSAN: heap-buffer-overflow in pegen.c

2020-06-19 Thread neonene


Change by neonene :


--
nosy: +christian.heimes -neonene

___
Python tracker 
<https://bugs.python.org/issue40958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40958] ASAN/UBSAN: heap-buffer-overflow in pegen.c

2020-06-18 Thread neonene


neonene  added the comment:

FYI, since PR 20875/20919, msvc(x64) has warned C4244 (conversion from 
'Py_ssize_t' to 'int', possible loss of data).
parse.c especially gets more than 700.

--
nosy: +neonene -christian.heimes, miss-islington

___
Python tracker 
<https://bugs.python.org/issue40958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40082] Assertion failure in trip_signal

2020-04-06 Thread neonene


neonene  added the comment:

On Windows, PyGILState_GetThisThreadState() returns NULL when ^C-interrupt 
occurs. It is from TlsGetValue() winAPI and I don't think the os's behevior is 
wrong. 
In trip_signal(), crash can be avoided by skipping PyEval_SignalReceived()  if 
tstate is invalid. But I'm not sure the skip itself is ok.

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue40082>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37702] memory leak in ssl certification

2019-08-31 Thread neonene


neonene  added the comment:

I raised another PR(15632), which keeps the changes to a minimum.
I hope either PR would be in the 3.7.5 / 3.8.0 official.

--

___
Python tracker 
<https://bugs.python.org/issue37702>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37702] memory leak in ssl certification

2019-08-31 Thread neonene


Change by neonene :


--
pull_requests: +15300
pull_request: https://github.com/python/cpython/pull/15632

___
Python tracker 
<https://bugs.python.org/issue37702>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37702] memory leak in ssl certification

2019-07-29 Thread neonene


New submission from neonene :

Windows10/7(x86/x64)
After issue35941 (any PR merged)

In https-access, memory usage increases by about 200KB per urlopen()
and easily reach to giga bytes.

I found out leak of certificate-store-handles in _ssl.c and made patch,
which works fine for my pc.

I guess some users are in trouble with this leak.
I'm about to raise PR, so please review. Thanks!

--
assignee: christian.heimes
components: SSL, Windows
messages: 348600
nosy: christian.heimes, neonene, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: memory leak in ssl certification
type: resource usage
versions: Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue37702>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35941] ssl.enum_certificates() regression

2019-07-12 Thread neonene


neonene  added the comment:

I meant 12609. (x86,x64 Py374rc1,Py380a4 and later)
And though I tried merging 12610 and Py374, memory usage still increases.
Sorry, I can't find out the cause.

--

___
Python tracker 
<https://bugs.python.org/issue35941>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37498] request.urlopen(), memory leak?

2019-07-12 Thread neonene


neonene  added the comment:

issue35941

--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue37498>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35941] ssl.enum_certificates() regression

2019-07-12 Thread neonene


neonene  added the comment:

After this patch applied, memory usage increases every https-access and is not 
released in my Win7x64SP1.
I hope this will be fixed or reverted.

(case sample)
from urllib import request
from time import sleep
import gc
while True:
request.urlopen(request.Request('https://...'))
gc.collect()
sleep(2)

--
nosy: +neonene

___
Python tracker 
<https://bugs.python.org/issue35941>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37498] request.urlopen(), memory leak?

2019-07-04 Thread neonene


New submission from neonene :

Python3.8.0a4,b1,b2(x64,x32)
Python3.7.4rc1,rc2 (x64,x32)
In Windows7 SP1(x64), memory usage keep increasing by the following code.
---
from urllib import request
from time import sleep
while True:
req = request.Request('https://www.python.org/')
request.urlopen(req)
sleep(1)
---
Sorry, I'm not sure why.

--
components: Windows
messages: 347293
nosy: neonene, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: request.urlopen(), memory leak?
type: resource usage
versions: Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue37498>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com