[issue45025] Reliance on C bit fields in C API is undefined behavior

STINNER Victor Mon, 30 Aug 2021 08:25:00 -0700


STINNER Victor <vstin...@python.org> added the comment:


> The macro PyUnicode_KIND is part of the documented public C API.

IMO it was a mistake to expose it as part of the public C API. This is an 
implementation detail which should not be exposed. The C API should not expose 
*directly* how characters are stored in memory, but provide an abstract way to 
read and write Unicode characters.

The PEP 393 implementation broke the old C API in many ways because it exposed 
too many implementation details. Sadly, the new C API is... not better :-(

If tomorrow, CPython is modified to use UTF-8 internally (as PyPy does), the C 
API will likely be broken *again* in many (new funny) ways.

11 years after the PEP 393 (Python 3.3), we only start fixing the old C API :-( 
The work will be completed in 2 or 3 Python releases (Python 3.12 or 3.13):

* https://www.python.org/dev/peps/pep-0623/
* https://www.python.org/dev/peps/pep-0624/

The C API for Unicode strings is causing a lot of issues in PyPy which uses 
UTF-8 internally. C extensions can fail to build on PyPy if they use functions 
(macros) like PyUnicode_KIND().

----------
nosy: +methane, serhiy.storchaka

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45025>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45025] Reliance on C bit fields in C API is undefined behavior

Reply via email to