[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-08 Thread STINNER Victor


STINNER Victor  added the comment:

> bpo-42846: Convert CJK codec extensions to multiphase init (GH-24157)

I added a new test and new test spotted a reference leak, likely an existing 
one: bpo-42866 "test test_multibytecodec: 
Test_IncrementalEncoder.test_subinterp() leaks references".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:

> 1) python -m test --verbose test_threading
> 2) python -m test --verbose test_embed

I ran manually these two tests with cp932 ANSI code page: they now pass with my 
fix.

I also added a regression test to test_multibytecodec.py.

Thanks for your quick bug report neonene! It's now fixed.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset 07f2cee93f1b619650403981c455f47bfed8d818 by Victor Stinner in 
branch 'master':
bpo-42846: Convert CJK codec extensions to multiphase init (GH-24157)
https://github.com/python/cpython/commit/07f2cee93f1b619650403981c455f47bfed8d818


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:

Ah, if you don't want to change the ANSI code page to cp932 (Japanese language) 
just to reproduce the issue, you can just set the stdio encoding:
-
C:\> set PYTHONIOENCODING=cp932
C:\> python t.py|more
sys.stdout.encoding='cp1250'

TypeError: codec is unexpected type
(...)
-

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:

Simpler way to reproduce the issue with t.py script:
---
import test.support
import sys

import _testcapi

print(f"{sys.stdout.encoding=}", file=sys.stderr)

with test.support.SuppressCrashReport():
_testcapi.run_in_subinterp("pass")
---

By default, UTF-8 is used, everything is fine:
-
C:\> python t.py
sys.stdout.encoding='utf-8'
-

Disable _WindowsConsoleIO with PYTHONLEGACYWINDOWSSTDIO env var, we get the 
issue:
-
C:\> set PYTHONLEGACYWINDOWSSTDIO=1

C:\> python t.py
Running Debug|x64 interpreter...
sys.stdout.encoding='cp932'
TypeError: codec is unexpected type
Fatal Python error: (...)
-

Or redirect the output into a program or a file to disable _WindowsConsoleIO to 
also reproduce the issue:
-
C:\> python t.py|more
sys.stdout.encoding='cp932'
TypeError: codec is unexpected type
(...)
-

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:

Attached PR 24157 should fix the issue.

> FAIL: test_daemon_threads_fatal_error 
> (test.test_threading.SubinterpThreadingTests)

This test runs code in a subinterpreter which is run in a subprocess. The 
problem is not in the code run in the subinterpreter, but the creation of 
sys.stdout in the subprocess.

The test creates a subprocess and redirects its stdout and stderr. In this 
case, Python doesn't create a _io._WindowsConsoleIO for sys.stdout.buffer.raw, 
but a regular _io.FileIO object. When the raw I/O is a _WindowsConsoleIO 
instance, create_stdio() of Python/pylifecycle.c forces the usage of the UTF-8 
encoding. But for FileIO, it keeps the locale encoding.

If the locale encoding is "cp932", a CJK multicodec is used. In the main 
interpreter, it's fine. In a subinterpreter, we hit the bug of the _codecs_jp 
which doesn't use the new multi-phase initialization API.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


Change by STINNER Victor :


--
keywords: +patch
pull_requests: +22985
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24157

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:

I'm working on a fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread Erlend Egeberg Aasland


Erlend Egeberg Aasland  added the comment:

It should be sufficient to convert cjkcodecs.h to multi-phase init then? From 
what I can see, the support modules are state less, right?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:

It took me a while to understand it, the _multibytecodec module itself is fine. 
The issue comes from the _codecs_jp module which uses the legacy module API:

codec = _codecs_jp.getcodec('cp932')

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread STINNER Victor


STINNER Victor  added the comment:

I can reproduce the issue on Windows configured in Japanese language: ANSI code 
page cp932.

I managed to reproduce the bug on Linux with attached bug.py

--
Added file: https://bugs.python.org/file49727/bug.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-07 Thread Erlend Egeberg Aasland


Erlend Egeberg Aasland  added the comment:

I'm unable to reproduce this on Windows 10 (amd64). What's your exact locale 
setting? Are you compiling with HEAD at 
0b858cdd5d114f0890b11b6c4d6559d0ceb468ab?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42846] Using _multibytecodec module on Windows, test_threading/embed get failure

2021-01-06 Thread neonene


New submission from neonene :

After 
https://github.com/python/cpython/commit/0b858cdd5d114f0890b11b6c4d6559d0ceb468ab
(bpo-1635741: Convert _multibytecodec to multi-phase init),

On Windows x64/x86 with chinese/japanese/korean system-locale,
MultibyteCodec_Check() in multibytecodec.c returns false and
PyExc_TypeError follows. This affects some tests and PGO training.



1) python -m test --verbose test_threading

==
FAIL: test_daemon_threads_fatal_error (test.test_threading.SubinterpThreadi
ngTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_threading.py", line 1124, in test_da
emon_threads_fatal_error
self.assertIn("Fatal Python error: Py_EndInterpreter: "
AssertionError: 'Fatal Python error: Py_EndInterpreter: not the last thread
' not found in 'TypeError: codec is unexpected type\nFatal Python error: _P
yThreadState_Delete: tstate 003FF980 is still current\nPython runti
me state: initialized\n\nThread 0x0710 (most recent call first):\n\n'



2) python -m test --verbose test_embed

==
FAIL: test_audit_subinterpreter (test.test_embed.AuditingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 1433, in test_audit_
subinterpreter
self.run_embedded_interpreter("test_audit_subinterpreter")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 0050CAF0 is still current\nPython runtime state: initializ
ed\n\nThread 0x09d8 (most recent call first):\n\n'

==
FAIL: test_subinterps_different_ids (test.test_embed.EmbeddingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 169, in test_subinte
rps_different_ids
for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 0041C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x0a40 (most recent call first):\n\n'

==
FAIL: test_subinterps_distinct_state (test.test_embed.EmbeddingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 177, in test_subinte
rps_distinct_state
for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 0047C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x0b34 (most recent call first):\n\n'

==
FAIL: test_subinterps_main (test.test_embed.EmbeddingTests)
--
Traceback (most recent call last):
  File "C:\cpython-0b858\lib\test\test_embed.py", line 163, in test_subinte
rps_main
for run in self.run_repeated_init_and_subinterpreters():
  File "C:\cpython-0b858\lib\test\test_embed.py", line 110, in run_repeated
_init_and_subinterpreters
out, err = self.run_embedded_interpreter("test_repeated_init_and_subint
erpreters")
  File "C:\cpython-0b858\lib\test\test_embed.py", line 104, in run_embedded
_interpreter
self.assertEqual(p.returncode, returncode,
AssertionError: 3221225477 != 0 : bad returncode 3221225477, stderr is 'Typ
eError: codec is unexpected type\nFatal Python error: _PyThreadState_Delete
: tstate 0032C960 is still current\nPython runtime state: initializ
ed\n\nThread 0x0bf0 (most