[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Inada Naoki added the comment: New changeset 4216dce04b7d3f329beaaafc82a77c4ac6cf4d57 by Inada Naoki in branch 'main': bpo-47000: Make `io.text_encoding()` respects UTF-8 mode (GH-32003) https://github.com/python/cpython/commit/4216dce04b7d3f329beaaafc82a77c4ac6cf4d57 -- ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Inada Naoki added the comment: > Please see https://bugs.python.org/issue47000#msg415769 for what Victor > suggested. Of course, I read it. > In particular, the locale module uses the "no underscore" convention. > Not sure whether it's good to start using snake case now, but I'm also > not against it. Victor didn't mention about "no underscore" convention. I just want to see preference from others. I will remove the underscore. > I would like to reiterate my concern with the "locale" encoding, though. > > As mentioned earlier, I believe it adds too much magic. It would be better > to leave this in the hands of the applications and not try to guess > the correct encoding. I don't recommend to use "locale" encoding for users. I strongly recommend to consider using "utf-8" instead. But "locale" encoding is needed when user don't want to change behavior of current application. It had been accepted by PEP 597 already. > It's better to expose easy to use APIs to access the various different > settings and point users to those rather than try to do a best effort > guess... explicit is better than implicit. In some case, user need to decide "not change the encoding for now". If we don't provide "locale", it's difficult to change the default encoding to UTF-8. > After all, Mojibake potentially corrupts important data, without the > alerting the user and that's not really what we should be after (e.g. > UTF-8 is valid Latin-1 in most cases and this is a real problem we often > run into in Germany with our Umlauts). Changing the default encoding will temporary increase this risk. But after changing the default encoding to UTF-8, this risk will be reduced overwhelmingly. Most popular text editors, including VSCode, Atom, Sublime Text, Notepad.exe use UTF-8 by default. -- ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Inada Naoki added the comment: @vstiner Since UTF-8 mode affects `locale.getpreferredencoding(False)`, I need to decide alternative API in the PEP 686. If no objections, I will choose `locale.get_encoding()` for current locale encoding (ACP on Windows). See https://github.com/python/peps/pull/2470/files -- ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: OK. Cache efficiency is dropped from motivations list. Current motivations are: * Memory saving (currently, 4 BytesObject (= 32 bytes of ob_shash) per code object. * Make bytes objects immutable * Share objects among multi interpreters. * CoW efficiency. I close this issue for now, because this issue is just for making direct access of ob_shash deprecated. After Python 3.12 become beta, we will reconsider about we should remove ob_shash or keep it. -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: > I guess not much difference in benchmarks. > But if put a bytes object into multiple dicts/sets, and len(bytes_key) is > large, it will take a long time. (1 GiB 0.40 seconds on i5-11500 DDR4-3200) > The length of bytes can be arbitrary,so computing time may be very different. I don't think calculating hash() for large bytes is not so common use case. Rare use cases may not justify adding 8bytes to basic types, especially users expect it is compact. Balance is important. Microbenchmark for specific case doesn't guarantee the good balance. So I want real world examples. Do you know some popular libraries that are depending on hash(bytes) performance? > Is it possible to let code objects use other types? In addition to ob_hash, > maybe the extra byte \x00 at the end can be saved. Of course, it is possible. But it needs large refactoring around code, including pyc cache file format. I will try it before 3.13. -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: First of all, this is just deprecating direct access of `ob_shash`. This makes users need to use `PyObject_Hash()`. We don't make the final decision about removing it. We just make we can remove it in Python 3.13. RAM and CACHE efficiency is not the only motivation for this. There is a discussion about (1) increasing CoW efficiency, and (2) sharing data between subinterpreters after per-interpreter GIL. Removing ob_shash will help them, especially about the (2). But if we stop using bytes objects in code objects by Python 3.13, there is no need to remove ob_shash. > If put a bytes object into multiple dicts/sets, the hash need to be computed > multiple times. This seems a common usage. Doesn't it lose only some milliseconds? I posted remove-bytes-hash.patch in this issue. Would you measure how this affects whole application performance rather than micro benchmarks? -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Inada Naoki added the comment: I am not sure about we really need "locale encoding at Python startup". For this issue, I don't want to change `encoding="locale"` behavior except ignore UTF-8 mode. So what I want is "current locale encoding" or ANSI codepage on Windows. On the other hand, I know Eryk wants to support locale on Windows. So `locale.get_encoding()` might return current locale encoding (not ANSI codepage) even on Windows. If so, I will use `sys.getlocaleencoding()` to implement `encoding="locale"` to keep using ANSI codepage, instead of adding yet another "get locale encoding" function. -- nosy: +eryksun ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: New changeset 894d0ea5afa822c23286e9e68ed80bb1122b402d by Inada Naoki in branch 'main': bpo-46864: Suppress deprecation warnings for ob_shash. (GH-32042) https://github.com/python/cpython/commit/894d0ea5afa822c23286e9e68ed80bb1122b402d -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: Average RAM capacity doesn't grow as CPU cores grows. Additionally, L1+L2 cache is really limited resource compared to CPU or RAM. Bytes object is used for co_code that is hot. So cache efficiency is important. Would you give us more realistic (or real world) example for caching bytes hash is important? -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Change by Inada Naoki : -- pull_requests: +30157 pull_request: https://github.com/python/cpython/pull/32068 ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Inada Naoki added the comment: > * sys.getfilesystemencoding(): Python filesystem encoding, return "UTF-8" if > the Python UTF-8 Mode is enabled Yes, althoguh PYTHONLEGACYWINDOWSFSENCODING takes priority. > * locale.getencoding(): Get the locale encoding, LC_CTYPE locale encoding or > the Windows ANSI code page, *read at Python startup*. Ignore the Python UTF-8 > Mode. I proposed `locale.get_encoding()` in the PEP 686. I will remove underscore if you don't like it. > * locale.getencoding(current=True): Get the *current* locale encoding. The > difference with locale.getencoding() is that on Unix, it gets the LC_CTYPE > locale encoding at each call. Hmm, I don't add it to the PEP 686 because it is not relating to UTF-8 mode nor EncodingWarning. Since `locale.getencoding()` returns locale encoding on startup, how about this idea? * sys.getlocaleencoding() -- Get the locale encoding read at Python startup. * locale.getencoding() -- Get the current locale encoding. Note that we have `sys.getdefaultencoding()` and `sys.getfilesystemencoding()`. `sys.getlocaleencoding()` looks consistent with them. -- ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: Since the hash is randomized, using hash(bytes) for such use case is not recommended. User should use stable hash functions instead. I agree that there is few use cases this change cause performance regression. But it is really few compared to overhead of adding 8bytes for all bytes instances. -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: Since Python 3.13, yes. It will be bit slower. -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: I'm sorry. Maybe, ccache hides the warning from me. -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Change by Inada Naoki : -- pull_requests: +30132 stage: needs patch -> patch review pull_request: https://github.com/python/cpython/pull/32042 ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46126] Unittest output drives developers to avoid docstrings
Inada Naoki added the comment: > As you can see, the location of the failing test in the log is masked, and > instead the description is present. Could you elaborate? ``` test_index_empty (idlelib.idle_test.test_text.MockTextTest) Failing test with bad description. ... ERROR (snip) == ERROR: test_index_empty (idlelib.idle_test.test_text.MockTextTest) Failing test with bad description. -- ``` I can see `test_index_empty (idlelib.idle_test.test_text.MockTextTest)` in both places. What is masked? -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46126> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Change by Inada Naoki : -- keywords: +patch pull_requests: +30091 stage: -> patch review pull_request: https://github.com/python/cpython/pull/32003 ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47009] Streamline list.append for the common case
Inada Naoki added the comment: Thank you. I agree that inlining is worth enough. But we already inlined too many functions in ceval and there is an issue caused by it... (bpo-45116) -- ___ Python tracker <https://bugs.python.org/issue47009> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Inada Naoki added the comment: I created another topic relating this issue. https://discuss.python.org/t/add-legacy-text-encoding-option-to-make-utf-8-default/14281 If we add another option (e.g. legacy_text_encoding), we do not need to change UTF-8 mode behavior. -- ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35228] Index search in CHM help crashes viewer
Inada Naoki added the comment: I know chm is handy. But Microsoft abandoned it already. I think we should stop providing chm. -- ___ Python tracker <https://bugs.python.org/issue35228> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47009] Streamline list.append for the common case
Inada Naoki added the comment: Hmm. Would you measure benefit from inlining and skipping incref/decref separately? If benefit of inlining is very small, making _PyList_AppendTakeRef() as regular internal API looks better to me. -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue47009> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
Inada Naoki added the comment: I created a related topic on discuss.python.org. https://discuss.python.org/t/jep-400-utf-8-by-default-and-future-of-python/14246 If we recommend `PYTHONUTF8` as opt-in "UTF-8 by default", `encoding="locale"` should locale encoding in UTF-8 mode. If we don't change `PYTHONUTF8` behavior, we need yet another option for opt-in "UTF-8 by default". -- ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39829] __len__ called twice in the list() constructor
Inada Naoki added the comment: Thanks. -- ___ Python tracker <https://bugs.python.org/issue39829> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39829] __len__ called twice in the list() constructor
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue39829> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39829] __len__ called twice in the list() constructor
Inada Naoki added the comment: New changeset 2153daf0a02a598ed5df93f2f224c1ab2a2cca0d by Crowthebird in branch 'main': bpo-39829: Fix `__len__()` is called twice in list() constructor (GH-31816) https://github.com/python/cpython/commit/2153daf0a02a598ed5df93f2f224c1ab2a2cca0d -- ___ Python tracker <https://bugs.python.org/issue39829> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.
New submission from Inada Naoki : Currently, `encoding="locale"` is just shortcut of `encoding=locale.getpreferredencoding(False)`. `encoding="locale"` means that "locale encoding should be used here, even if Python default encoding is changed to UTF-8". I am not sure that UTF-8 mode becomes the default or not. But some user want to use UTF-8 mode to change default encoding in their Python environments without waiting Python default encoding changed. So I think `encoding="locale"` should use real locale encoding (ACP on Windows) regardless UTF-8 mode is enabled or not. Currently, UTF-8 mode affects to `_Py_GetLocaleEncoding()`. So it is difficult that make encoding="locale" ignores UTF-8 mode. Is it safe to use `locale.getlocale(locale.LC_CTYPE)[1] or "UTF-8"`? -- components: Unicode messages: 415028 nosy: ezio.melotti, methane, vstinner priority: normal severity: normal status: open title: Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled. versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue47000> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39829] __len__ called twice in the list() constructor
Inada Naoki added the comment: > Changes compared here: > https://github.com/python/cpython/compare/main...thatbirdguythatuknownot:patch-17 Looks good to me. Would you create a pull request? -- ___ Python tracker <https://bugs.python.org/issue39829> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43574] Regression in overallocation for literal list initialization in v3.9+
Inada Naoki added the comment: Relating issue: https://twitter.com/nedbat/status/1489233208713437190 Current overallocation strategy is rough. We need to make it more smooth. -- versions: +Python 3.11 -Python 3.9 ___ Python tracker <https://bugs.python.org/issue43574> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43574] Regression in overallocation for literal list initialization in v3.9+
Change by Inada Naoki : -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue43574> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39829] __len__ called twice in the list() constructor
Change by Inada Naoki : -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue39829> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46925] Document dict behavior when setting equal but not identical key
Inada Naoki added the comment: I don't know much about Java, but Java's WeakHashMap is same to Python's WeakKeyDictionary. https://docs.oracle.com/javase/9/docs/api/java/util/WeakHashMap.html """ This class is intended primarily for use with key objects whose equals methods test for object identity using the == operator. Once such a key is discarded it can never be recreated, so it is impossible to do a lookup of that key in a WeakHashMap at some later time and be surprised that its entry has been removed. This class will work perfectly well with key objects whose equals methods are not based upon object identity, such as String instances. With such recreatable key objects, however, the automatic removal of WeakHashMap entries whose keys have been discarded may prove to be confusing. """ -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46925> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23882] unittest discovery doesn't detect namespace packages when given no parameters
Change by Inada Naoki : -- resolution: -> not a bug stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue23882> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: New changeset 2d8b764210c8de10893665aaeec8277b687975cd by Inada Naoki in branch 'main': bpo-46864: Deprecate PyBytesObject.ob_shash. (GH-31598) https://github.com/python/cpython/commit/2d8b764210c8de10893665aaeec8277b687975cd -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.
Change by Inada Naoki : -- resolution: -> rejected stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.
Inada Naoki added the comment: OK. By quick grepping, I found only msgpack and bitstruct use these API. It is not enough number to make them public. -- ___ Python tracker <https://bugs.python.org/issue46906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"
Inada Naoki added the comment: New changeset 4f74052b455a54ac736f38973693aeea2ec14116 by Inada Naoki in branch 'main': bpo-40116: dict: Add regression test for iteration order. (GH-31550) https://github.com/python/cpython/commit/4f74052b455a54ac736f38973693aeea2ec14116 -- ___ Python tracker <https://bugs.python.org/issue40116> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.
Change by Inada Naoki : -- keywords: +patch pull_requests: +29769 stage: -> patch review pull_request: https://github.com/python/cpython/pull/31649 ___ Python tracker <https://bugs.python.org/issue46906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46903] Crash when setting attribute with string subclass as the name (--with-pydebug)
Change by Inada Naoki : -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46903> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.
New submission from Inada Naoki : Original issue. https://github.com/msgpack/msgpack-python/issues/497 _PyFloat_(Pack|Unpack)(4|8) is very nice API for serializers like msgpack. Converting double and float into char[] is not trivial and these APIs do it in very efficient way. And these APIs don't reveal CPython internal strucutre. It just convert double and float into char[]. So please keep these APIs public for libraries like msgpack. -- components: C API messages: 414401 nosy: methane, vstinner priority: normal severity: normal status: open title: Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal. versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue46906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46845] dict: Use smaller entry for Unicode-key only dict.
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46845> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46845] dict: Use smaller entry for Unicode-key only dict.
Inada Naoki added the comment: New changeset 9833bb91e4d5c2606421d9ec2085f5c2dfb6f72c by Inada Naoki in branch 'main': bpo-46845: Reduce dict size when all keys are Unicode (GH-31564) https://github.com/python/cpython/commit/9833bb91e4d5c2606421d9ec2085f5c2dfb6f72c -- ___ Python tracker <https://bugs.python.org/issue46845> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45373] ./configure --enable-optimizations should enable LTO
Inada Naoki added the comment: Can we use --lto=thin when availabe? And can we not use --lto when building profiling python? -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue45373> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: When removed shash: ``` ## small key $ ./python -m pyperf timeit --compare-to ../cpython/python -s 'd={b"foo":1, b"bar":2, b"buzz":3}' -- 'b"key" in d' /home/inada-n/work/python/cpython/python: . 23.2 ns +- 1.7 ns /home/inada-n/work/python/remove-bytes-hash/python: . 40.0 ns +- 1.5 ns Mean +- std dev: [/home/inada-n/work/python/cpython/python] 23.2 ns +- 1.7 ns -> [/home/inada-n/work/python/remove-bytes-hash/python] 40.0 ns +- 1.5 ns: 1.73x slower ## large key $ ./python -m pyperf timeit --compare-to ../cpython/python -s 'd={b"foo":1, b"bar":2, b"buzz":3};k=b"key"*100' -- 'k in d' /home/inada-n/work/python/cpython/python: . 22.3 ns +- 1.2 ns /home/inada-n/work/python/remove-bytes-hash/python: . 108 ns +- 2 ns Mean +- std dev: [/home/inada-n/work/python/cpython/python] 22.3 ns +- 1.2 ns -> [/home/inada-n/work/python/remove-bytes-hash/python] 108 ns +- 2 ns: 4.84x slower ``` I will reconsider the removal before remove the cache. We changed code object too often. If Python 3.13 don't use so much bytes objects, we don't need to remove the hash to save some RAM. -- Added file: https://bugs.python.org/file50649/remove-bytes-hash.patch ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46845] dict: Use smaller entry for Unicode-key only dict.
Inada Naoki added the comment: I added _PyDict_FromItems() to the PR. It checks that all keys are Unicode or not before creating dict. _PyDict_NewPresized() just returns general-purpose dict. But it isn't used from CPython core. It is just kept for compatibility (for Cython). ``` $ ./python -m pyperf timeit --compare-to ../cpython/python -- '{"k1":1, "k2":2, "k3":3, "k4":4, "k5":5, "k6":6}' /home/inada-n/work/python/cpython/python: . 198 ns +- 5 ns /home/inada-n/work/python/dict-compact/python: . 213 ns +- 6 ns Mean +- std dev: [/home/inada-n/work/python/cpython/python] 198 ns +- 5 ns -> [/home/inada-n/work/python/dict-compact/python] 213 ns +- 6 ns: 1.07x slower ``` Overhead of checking keys types is not so large. Additionally, we can reduce some code from ceval.c. -- ___ Python tracker <https://bugs.python.org/issue46845> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Inada Naoki added the comment: > But some programs can still work with encoded bytes instead of strings. In > particular os.environ and os.environb are implemented as dict of bytes on > non-Windows. This change doesn't affect to os.environ. os.environ[key] does `key.encode(sys.getfilesystemencoding(), "surrogateescape")` internally. So the encoded key doesn't have cached hash. On the other hand, dict (`self._data`) has own hash cache. So it don't use hash cached in the bytes objects. On the other hand, this change will affect `os.environb[key]` if key is used repeatedly. -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46845] dict: Use smaller entry for Unicode-key only dict.
Inada Naoki added the comment: In most case, first PyDict_SetItem decides which format should be used. But _PyDict_NewPresized() can be a problem. It creates a hash table before inserting the first key, when 5 < (expected size) < 87382. In CPython code base, _PyDict_NewPresized() is called from three places: 1. call.c: Building kwargs dict -- all key should be Unicode. 2. ceval.c: BUILD_MAP and BUILD_CONST_KEY_MAP -- there is no guarantee that all keys are Unicode. Current pull request assumes the dict keys are unicode-only key. So building dict from non-Unicode keys become slower. ``` $ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, (4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}' /home/inada-n/work/python/cpython/python: . 233 ns +- 1 ns /home/inada-n/work/python/dict-compact/python: . 328 ns +- 6 ns Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> [/home/inada-n/work/python/dict-compact/python] 328 ns +- 6 ns: 1.41x slower ``` There are some approaches to fix this problem: 1. Don't use _PyDict_NewPresized() in BUILD_MAP, BUILD_CONST_KEY_MAP ``` $ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, (4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}' /home/inada-n/work/python/cpython/python: . 233 ns +- 1 ns /home/inada-n/work/python/dict-compact/python: . 276 ns +- 1 ns Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> [/home/inada-n/work/python/dict-compact/python] 276 ns +- 1 ns: 1.18x slower ``` I think this performance regression is acceptable level. 2. Add an argument `unicode` to _PyDict_NewPresized(). -- Breaks some 3rd party codes using internal APIs. 3. Add a new internal C API such that _PyDict_NewPresizedUnicodeKey(). -- Most conservative. 4. Add a new internal C API that creates dict form keys and values for extreme performance, like this: // Create a new dict from keys and values. // Items are received as `{keys[i*keys_offset]: values[i*values_offset] for i in range(length)}`. // When distinct=1, this function skips checking duplicated keys. // So pass distinct=1 unless you can guarantee that there is no duplicated keys. PyObject * PyDict_FromKeysAndValues(PyObject **keys, Py_ssize_t keys_offset, PyObject **values, Py_ssize_t values_offset, Py_ssize_t lenghh, int distincit) { } -- ___ Python tracker <https://bugs.python.org/issue46845> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Change by Inada Naoki : -- keywords: +patch pull_requests: +29721 stage: -> patch review pull_request: https://github.com/python/cpython/pull/31598 ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
New submission from Inada Naoki : Code objects have more and more bytes attributes for now. To reduce the RAM by code, I want to remove ob_shash (cached hash value) from bytes object. Sets and dicts have own hash cache. Unless checking same bytes object against dicts/sets many times, this don't cause big performance loss. -- components: Interpreter Core messages: 414083 nosy: methane priority: normal severity: normal status: open title: Deprecate ob_shash in BytesObject versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46845] dict: Use smaller entry for Unicode-key only dict.
Inada Naoki added the comment: > > > Do you propose to > 1. Only use StringKeyDicts when non-string keys are not possible? (Where > would this be?) > 2. Switch to a normal dict when a non-string key is added? (But likely > not switch back when the last non-string key is removed.) > 3. Deprecate and remove the option to add non-string keys to namespace > dicts? (Proposed and rejected at least once as not gaining much.) > > > 2. We already do such hack for key sharing dict. And yes, deleting non string key doesn't switch back. d[0]=0; del d[0]; loop must be amortized O(1). Only dict.clear() switches back. -- ___ Python tracker <https://bugs.python.org/issue46845> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46845] dict: Use smaller entry for Unicode-key only dict.
Change by Inada Naoki : -- keywords: +patch pull_requests: +29686 stage: -> patch review pull_request: https://github.com/python/cpython/pull/31564 ___ Python tracker <https://bugs.python.org/issue46845> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46606] Large C stack usage of os.getgroups() and os.setgroups()
Inada Naoki added the comment: New changeset ad6c7003e38a9f8bdf8d865fb5fa0f3c03690315 by Inada Naoki in branch 'main': bpo-46606: Remove redundant +1. (GH-31561) https://github.com/python/cpython/commit/ad6c7003e38a9f8bdf8d865fb5fa0f3c03690315 -- ___ Python tracker <https://bugs.python.org/issue46606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43364] Windows: Make UTF-8 mode more accessible
Change by Inada Naoki : -- resolution: -> rejected stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue43364> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46606] Large C stack usage of os.getgroups() and os.setgroups()
Change by Inada Naoki : -- pull_requests: +29684 pull_request: https://github.com/python/cpython/pull/31561 ___ Python tracker <https://bugs.python.org/issue46606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"
Inada Naoki added the comment: PyDict_Keys(), PyDict_Values(), and PyDict_Items() don't respect insertion order too. -- ___ Python tracker <https://bugs.python.org/issue40116> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46845] dict: Use smaller entry for Unicode-key only dict.
New submission from Inada Naoki : Currently, PyDictKeyEntry is 24bytes (hash, key, and value). We can drop the hash from entry when all keys are unicode, because unicode objects caches hash already. This will cause some performance regression on microbenchmark because dict need one more indirect access to compare hash value. On the other hand, this will reduce some RAM usage. Additionally, unlike docstrings and annotations, this includes much **hot** RAM. It will make Python more cache efficient. This is work in progress code: https://github.com/methane/cpython/pull/43 pypeformance result is in the PR too. -- components: Interpreter Core messages: 413892 nosy: Mark.Shannon, methane, rhettinger priority: normal severity: normal status: open title: dict: Use smaller entry for Unicode-key only dict. type: performance versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue46845> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"
Change by Inada Naoki : -- pull_requests: +29671 pull_request: https://github.com/python/cpython/pull/31550 ___ Python tracker <https://bugs.python.org/issue40116> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"
Inada Naoki added the comment: I found regression caused by GH-28520. ``` class C: def __init__(self, n): if n: self.a = 1 self.b = 2 self.c = 3 else: self.c = 1 self.b = 2 self.a = 3 o1 = C(True) o2 = C(False) print(o2.__dict__) # {'c': 1, 'b': 2, 'a': 3} d1 = {} d1.update(o2.__dict__) # {'a': 3, 'b': 2, 'c': 1} print(d1) ``` -- ___ Python tracker <https://bugs.python.org/issue40116> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40255] Fixing Copy on Writes from reference counting and immortal objects
Inada Naoki added the comment: All of these optimizations should be disabled by default. * It will cause leak when Python is embedded. * Even for python command, it will break __del__ and weakref callbacks. -- ___ Python tracker <https://bugs.python.org/issue40255> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46606] Large C stack usage of os.getgroups() and os.setgroups()
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46606] Large C stack usage of os.getgroups() and os.setgroups()
Inada Naoki added the comment: New changeset 74127b89a8224d021fc76f679422b76510844ff9 by Inada Naoki in branch 'main': bpo-46606: Reduce stack usage of getgroups and setgroups (GH-31073) https://github.com/python/cpython/commit/74127b89a8224d021fc76f679422b76510844ff9 -- ___ Python tracker <https://bugs.python.org/issue46606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46813] Allow developer to resize the dictionary
Inada Naoki added the comment: As I commented in https://github.com/faster-cpython/ideas/discussions/288, your benchmark is not fair. Include `{}` and `{}.resize(len(cases))` into the measured function. -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29992] Expose parse_string in JSONDecoder
Inada Naoki added the comment: > Generally speaking, parsing some things as decimal or datetime are schema > dependent. Totally agree with this. > In order to provide maximal flexibility it would be much nicer to have a > streaming interface available (like SAX for XML parsing), but that is not > what this is. I think it is too difficult and complicated. I think post-processing approach (e.g. dataclass_json, pydantic) is enough. -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue29992> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40255] Fixing Copy on Writes from reference counting and immortal objects
Inada Naoki added the comment: I think making more objects immortal by default will reduce the gap, although I am not sure it can be 2%. (I guess 3% and I think it is acceptable gap.) * Code attributes (contents of co_consts, co_names, etc...) in deep frozen modules. * only if subinterpreter shares them. * Statically allocated strings (previously _Py_IDENTIFIER) To reduce gap more, we need to reduce Python stack operation in ceval in some way. -- ___ Python tracker <https://bugs.python.org/issue40255> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46688] Add sys.is_interned
Inada Naoki added the comment: Thank you, I can not find it because it is too old. -- resolution: -> duplicate stage: patch review -> resolved status: open -> closed superseder: -> Add sys.isinterned() ___ Python tracker <https://bugs.python.org/issue46688> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46688] Add sys.is_interned
Inada Naoki added the comment: I thought sys.is_interned() is needed to implement bpo-46430, but GH-30683 looks nice to me. I will close this issue after GH-30683 is merged. -- ___ Python tracker <https://bugs.python.org/issue46688> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46688] Add sys.is_interned
Change by Inada Naoki : -- keywords: +patch pull_requests: +29397 stage: -> patch review pull_request: https://github.com/python/cpython/pull/31227 ___ Python tracker <https://bugs.python.org/issue46688> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46688] Add sys.is_interned
New submission from Inada Naoki : deepfreeze.py needs to know the unicode object is interned. Ref: https://bugs.python.org/issue46430 -- components: Interpreter Core messages: 412890 nosy: methane priority: normal severity: normal status: open title: Add sys.is_interned versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue46688> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46600] Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call
Inada Naoki added the comment: I didn't mean _Py_abspath is problem. I just used it to describe why -O0 and -Og is so different. We can reduce stack usage of it easily, but it is not a problem than _PyEval_EvalFrameDefault. It is difficult to reduce stack usage of _PyEval_EvalFrameDefault with -O0. -- ___ Python tracker <https://bugs.python.org/issue46600> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46606] Large C stack usage of os.getgroups() and os.setgroups()
Change by Inada Naoki : -- keywords: +patch pull_requests: +29257 stage: -> patch review pull_request: https://github.com/python/cpython/pull/31073 ___ Python tracker <https://bugs.python.org/issue46606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46606] Large C stack usage of os.getgroups() and os.setgroups()
New submission from Inada Naoki : I checked stack usage for bpo-46600 and found this two functions use a lot of stack. os_setgroups: 262200 bytes os_getgroups_impl: 262184 bytes Both function has local variable like this: gid_t grouplist[MAX_GROUPS]; MAX_GROUPS is defined as: ``` #ifdef NGROUPS_MAX #define MAX_GROUPS NGROUPS_MAX #else /* defined to be 16 on Solaris7, so this should be a small number */ #define MAX_GROUPS 64 #endif ``` NGROUPS_MAX is 65536 and sizeof(gid_t) is 4 on Ubuntu 20.04, so grouplist is 262144bytes. It seems this grouplist is just for avoid allocation: ``` } else if (n <= MAX_GROUPS) { /* groups will fit in existing array */ alt_grouplist = grouplist; } else { alt_grouplist = PyMem_New(gid_t, n); if (alt_grouplist == NULL) { return PyErr_NoMemory(); } ``` How about just using `#define MAX_GROUPS 64`? Or should we remove this grouplist because os.grouplist() is not called so frequently? -- components: Library (Lib) messages: 412335 nosy: methane priority: normal severity: normal status: open title: Large C stack usage of os.getgroups() and os.setgroups() versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue46606> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46600] Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call
Inada Naoki added the comment: FWIW, it seems -O0 don't merge local variables in different path or lifetime. For example, see _Py_abspath ``` if (path[0] == '\0' || !wcscmp(path, L".")) { wchar_t cwd[MAXPATHLEN + 1]; //(snip) } //(snip) wchar_t cwd[MAXPATHLEN + 1]; ``` wchar_t is 4bytes and MAXPATHLEN is 4096 on Linux. So each cwd is 16388bytes. -Og allocates 32856 bytes for it and -Og allocates 16440 bytes for it. I don't know what is the specific optimization flag in -Og do merge local variable, but I think -Og is very important for _PyEval_EvalFrameDefault() since it has many local variables in huge switch-case statements. -Og allocates 312 bytes for it and -O0 allocates 8280 bytes for it. By the way, clang 13 has `-fstack-usage` option like gcc, but clang 12 don't have it. Since Ubuntu 20.04 have only clang 12, I use `-fstack-size-segment` and https://github.com/mvanotti/stack-sizes to get stack size. -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46600> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36346] Prepare for removing the legacy Unicode C API
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue36346> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36346] Prepare for removing the legacy Unicode C API
Inada Naoki added the comment: No. I just waiting Python 3.11 become Bata. -- ___ Python tracker <https://bugs.python.org/issue36346> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33205] GROWTH_RATE prevents dict shrinking
Inada Naoki added the comment: We do not have *fill* since Python 3.6. There is a `dk_nentries` instead. But when `insertion_resize()` is called, `dk_nentries` is equal to `USABLE_FRACTION(dk_size)` (dk_size is `1 << dk_log2_size` for now). So it is different from *fill* in the old dict. I chose `dk_used*3` as GROWTH_RATE because it reserves more spaces when there are dummies than when there is no dummy, as I described in the same comment: > In case of dict growing without deletion, dk_size is doubled for each resize > as current behavior. > When there are deletion, dk_size is growing aggressively than Python 3.3 > (used*2 -> used*3). And it allows dict shrinking after massive deletions. For example, when current dk_size == 16 and USABLE_FRACTION(dk_size) == 10, new dk_size is: * used = 10 (dummy=0) -> 32 (31.25%) * used = 9 (dummy=1) -> 32 (28.125%) (snip) * used = 6 (dummy=4) -> 32 (18.75%) * used = 5 (dummy=5) -> 16 (31.25%) * used = 4 (dummy=6) -> 16 (25%) (snip) * used = 2 (dummy=8) -> 8 (25%) As you can see, dict is more sparse when there is dummy than when there is no dummy, except used=5/dummy=5 case. There may be a small room for improvement, especially for `used=5/dummy=5` case. But I am not sure it is worth enough to use more complex GROWTH_RATE than used*3. Any good idea? -- ___ Python tracker <https://bugs.python.org/issue33205> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44723] Codec name normalization breaks custom codecs
Change by Inada Naoki : -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue44723> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46464] concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used
Inada Naoki added the comment: > The only way to safely launch worker processes on demand is to spawn a worker > launcher process spawned prior to any thread creation that remains idle, with > a sole job of spawn new worker processes for us. That sounds complicated. > That'd be a feature. Lets go with the bugfix first. fork is not the only way to launch worker process. We have spawn. And sapwn is the default for macOS since Python 3.8. Simple reverting seems not good for macOS users, since they need to pay cost for both of pre-spawning and spawn. Can't we just pre-spawn only when fork is used? -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46464> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers
Inada Naoki added the comment: > If we literally ignore the attribute, any usage of `.mapping` will be an > error, which basically makes the whole `.mapping` feature useless for > statically typed code. It also wouldn't appear in IDE autocompletions. `.mapping` is not exist between Python 3.0~3.9. And it is not feature that is long awaited by many users. See https://bugs.python.org/issue40890#msg370841 Raymond said: Traditionally, we do expose wrapped objects: property() exposes fget, partial() exposes func, bound methods expose __func__, ChainMap() exposes maps, etc. Exposing this attribute would help with introspection, making it possible to write efficient functions that operate on dict views. Type hints is very useful for application code, especially when it is large. But introspection is very rarely used in such typed code bases. I don't think `.mapping` is useful for many users, like `.fget` of the property. So adding `# type: ignore` in such lines is the "lesser evil". > If we add it to `KeysView` and `ValuesView`, library authors will end up > using `.mapping` with arguments annotated as `Mapping` or `MutableMapping`, > not realizing it is purely a dict thing, not required from an arbitrary > mapping object. It doesn't make sense at all, IMO. If we really need `.mapping` in typeshed, we should add it to `KeysViewWithMapping`. So mapping classes that don't inherit dict shouldn't be forced to implement `.mapping`. > If we keep `.mapping` in dict but not anywhere else, as described already, it > becomes difficult to override .keys() and .values() in a dict subclass. You > can't just return a KeysView or a ValuesView. If that was allowed, how should > people annotate code that uses `.mapping`? You can't annotate with `dict`, > because that also allows subclasses of dict, which might not have a > `.mapping` attribute. `# type: ignore`. > Yet another option would be to expose `dict_keys` and `dict_values` somewhere > where they don't actually exist at runtime. This leads to code like this: > > from typing import Any, TYPE_CHECKING > if TYPE_CHECKING: > # A lie for type checkers to work. > from something_that_doesnt_exist_at_runtime import dict_keys, dict_values > else: > # Runtime doesn't check type annotations anyway. > dict_keys = Any > dict_values = Any > > While this works, it isn't very pretty. What problem this problem solve? `SortedDict.keys()` can not return `dict_keys`. As far as I think, your motivation is making dict subclass happy with type checkers. But this option doesn't make dict subclass happy at all. -- ___ Python tracker <https://bugs.python.org/issue46399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers
Inada Naoki added the comment: In other words, a. If `.keys()` in all dict subclasses must return subclass of `dict_keys`: `dict.keys() -> dict_keys`. b. If `.keys().mapping` must be accessible for all dict subclasses: Add `.mapping` to `KeysView`. c. If `.keys().mapping` is optional for dict subclasses: typeshed can't add `.mapping` to anywhere, AFAIK. -- ___ Python tracker <https://bugs.python.org/issue46399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers
Inada Naoki added the comment: > I agree with Inada that not every internal type should be exposed, but I > would make an exception for the dict views classes due to the fact that dict > subclasses are much more common than subclasses of other mappings, such as > OrderedDict. I don't think it's *particularly* important to expose the > OrderedDict views classes in the same way. I am afraid that you misread me. I used OrderedDict as one example of dict subclass. I didn't mean dict_(keys|items|values) shouldn't exposed because of I don't want to expose odict_(keys|items|values). Anyway, OrderedDict was not good choise to explain my thought because its builtin type and defined in typeshed. Instead, I use sortedcontainers.SortedDict as example. See https://github.com/grantjenks/python-sortedcontainers/blob/dff7ef79a21b3f3ceb6a19868f302f0a680aa243/sortedcontainers/sorteddict.py#L43 It is a dict subclass. It's `keys()` method returns `SortedKeysView`. `SortedKeysView` is subclass of `collections.abc.KeysView`. But it is not subclass of `dict_keys`. If `dict.keys()` in typeshed defines it returns `dict_keys`, doesn't mypy flag it as an "incompatible override"? So I propose that typeshed defines that dict.keys() returns KeysView, not dict_keys. Although subclass of dict is very common, it is very rare that: * Override `keys()`, and * Returns `super().keys()`, instead of KeysView (or list), and * `.keys().mapping` is accessed. It is very minor inconvinience that user need to ignore false positive for this very specific cases. Or do you think this case is much more common than classes like SortedDict? Note that dict_(keys|items|values) is implementation detail and subclassing it doesn't make sense. Another option is adding more ABC or Protocol that defines `.mapping` attribute. SortedKeysView can inherit it and implement `.mapping`. -- ___ Python tracker <https://bugs.python.org/issue46399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers
Inada Naoki added the comment: I am not happy about exposing every internal types. I prefer duck typing. Like OrderedDict, not all dict subtypes uses `dict_keys`, `dict_views`, and `dict_items`. If typeshed annotate dict.keys() returns `dict_keys`, "incompatible override" cano not be avoided. I prefer: * Keep status-quo: keys().mapping cause false positive and user need to suppress. This is not a big problem because `.mapping` is very rarely used. * Or add `.mapping` to `KeysView`, `ValuesView`, and `ItemsView`. Force every dict subclasses to implement it. -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45644] Make json.tool soak up input before opening output for writing
Change by Inada Naoki : -- nosy: +methane nosy_count: 4.0 -> 5.0 pull_requests: +28860 pull_request: https://github.com/python/cpython/pull/30659 ___ Python tracker <https://bugs.python.org/issue45644> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode
Inada Naoki added the comment: Mercurial still use it. https://www.mercurial-scm.org/repo/hg-stable/file/tip/mercurial/pycompat.py#l113 Mercurial has plan to move filesystem name from ANSI Code Page to UTF-8, but I don't know about its progress. https://www.mercurial-scm.org/wiki/WindowsUTF8Plan -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue29241> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46376] PyMapping_Check returns 1 for list
Inada Naoki added the comment: collections.abc.Mapping is fixed by https://bugs.python.org/issue43977 We can be same thing if backward compatibility allows it. -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23882] unittest discovery doesn't detect namespace packages when given no parameters
Inada Naoki added the comment: New changeset 0b2b9d251374c5ed94265e28039f82b37d039e3e by Inada Naoki in branch 'main': bpo-23882: unittest: Drop PEP 420 support from discovery. (GH-29745) https://github.com/python/cpython/commit/0b2b9d251374c5ed94265e28039f82b37d039e3e -- ___ Python tracker <https://bugs.python.org/issue23882> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45661] [meta] Freeze commonly used stdlib modules.
Inada Naoki added the comment: I don't against deep freezing functools and contextlib. But I think we should optimize and utilize zipimport or something similar, because we can not deep-freeze all stdlib or 3rd party libraries. See also: https://github.com/faster-cpython/ideas/discussions/158#discussioncomment-1857198 -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue45661> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46236] PyFunction_GetAnnotations returning Tuple vs Dict
Change by Inada Naoki : -- keywords: +patch pull_requests: +28615 stage: -> patch review pull_request: https://github.com/python/cpython/pull/30409 ___ Python tracker <https://bugs.python.org/issue46236> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46143] [docs] IO > Text Encoding info outdated
Inada Naoki added the comment: UTF-8 mode is not enabled by default. So locale encoding is still the default encoding. -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46143> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46085] OrderedDict iterator allocates di_result unnecessarily
Inada Naoki added the comment: Nice catch. > if ((kind & _odict_ITER_KEYS) && (kind &_odict_ITER_VALUES)) You can reduce one branch by ``` #define _odict_ITER_ITEMS (_odict_ITER_KEYS|_odict_ITER_VALUES) ... if (kind & _odict_ITER_ITEMS == _odict_ITER_ITEMS) ``` -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46085> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46006] [subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters
Inada Naoki added the comment: That's too bad. We can not compare two Unicode by pointer even if both are interned anymore... It was a nice optimization. -- ___ Python tracker <https://bugs.python.org/issue46006> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46006] [subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters
Inada Naoki added the comment: Should `_PyUnicode_EqualToASCIIId()` support comparing two unicode from different interpreter?? -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue46006> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23882] unittest discovery doesn't detect namespace packages when given no parameters
Change by Inada Naoki : -- versions: +Python 3.11 -Python 3.10, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue23882> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23882] unittest discovery doesn't detect namespace packages when given no parameters
Change by Inada Naoki : -- pull_requests: +27982 stage: needs patch -> patch review pull_request: https://github.com/python/cpython/pull/29745 ___ Python tracker <https://bugs.python.org/issue23882> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over
Inada Naoki added the comment: The another error I found is already reported as #42868. -- ___ Python tracker <https://bugs.python.org/issue38625> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over
Change by Inada Naoki : -- resolution: -> fixed stage: -> resolved status: open -> closed versions: +Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38625> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over
Inada Naoki added the comment: I confirmed that this bug is fixed, but I found another error. -- ___ Python tracker <https://bugs.python.org/issue38625> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over
Inada Naoki added the comment: Is this bug fixed by #26730? -- ___ Python tracker <https://bugs.python.org/issue38625> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45521] obmalloc radix tree typo in code
Inada Naoki added the comment: When I am trying to understand this issue, I see this segfault. https://gist.github.com/methane/1b83e2abc6739017e0490c5f70a27b52 I am not sure this segfault is caused by this issue or not. If this is unrelated, I will create another issue. -- ___ Python tracker <https://bugs.python.org/issue45521> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45475] gzip fails to read a gzipped file (ValueError: readline of closed file)
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue45475> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45475] gzip fails to read a gzipped file (ValueError: readline of closed file)
Inada Naoki added the comment: New changeset 0a4c82ddd34a3578684b45b76f49cd289a08740b by Inada Naoki in branch 'main': bpo-45475: Revert `__iter__` optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016) https://github.com/python/cpython/commit/0a4c82ddd34a3578684b45b76f49cd289a08740b -- ___ Python tracker <https://bugs.python.org/issue45475> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com