[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-31 Thread STINNER Victor
STINNER Victor added the comment: > My use case for these low-level APIs is to write tests for low-level > string/encoding handling in my custom use of the PyPreConfig and PyConfig > structs. I wanted to verify that exact byte sequences were turned into > specific representations inside of

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread Gregory Szorc
Gregory Szorc added the comment: My use case for these low-level APIs is to write tests for low-level string/encoding handling in my custom use of the PyPreConfig and PyConfig structs. I wanted to verify that exact byte sequences were turned into specific representations inside of Python

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread STINNER Victor
STINNER Victor added the comment: > PyUnicode_KIND does *not* expose the implementation details to the programmer. PyUnicode_KIND() is very specific to the exact PEP 393 implementation. Documentation of this field: --- /* Character size: - PyUnicode_WCHAR_KIND (0): * character type

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread Petr Viktorin
Petr Viktorin added the comment: PyUnicode_KIND does *not* expose the implementation details to the programmer. If the internal representation os strings is switched to use masks and shifts instead of bitfields, PyUnicode_KIND (and others) can be adapted to the new details without breaking

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread STINNER Victor
STINNER Victor added the comment: > In order to avoid undefined behavior, Python's C API should avoid all use of > bit fields. See also the PEP 620. IMO more generally, the C API should not expose structures, but provide ways to access it through getter and setter functions. See bpo-40120

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread STINNER Victor
STINNER Victor added the comment: > The macro PyUnicode_KIND is part of the documented public C API. IMO it was a mistake to expose it as part of the public C API. This is an implementation detail which should not be exposed. The C API should not expose *directly* how characters are stored

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread Petr Viktorin
Petr Viktorin added the comment: The macro PyUnicode_KIND is part of the documented public C API. It accesses the bit field "state.kind" directly. -- ___ Python tracker ___

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread STINNER Victor
STINNER Victor added the comment: > At least the PyASCIIObject struct in Include/cpython/unicodeobject.h uses bit > fields. Various preprocessor macros like PyUnicode_IS_ASCII() and > PyUnicode_KIND() access this struct's bit field. What is your use case? Which functions do you need? You

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread Georg Brandl
Change by Georg Brandl : -- nosy: +georg.brandl ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-30 Thread Georg Brandl
Change by Georg Brandl : -- nosy: +vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-27 Thread Erlend E. Aasland
Change by Erlend E. Aasland : -- nosy: +petr.viktorin ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45025] Reliance on C bit fields in C API is undefined behavior

2021-08-26 Thread Gregory Szorc
Change by Gregory Szorc : -- title: Reliance on C bit fields is C API is undefined behavior -> Reliance on C bit fields in C API is undefined behavior ___ Python tracker ___

[issue45025] Reliance on C bit fields is C API is undefined behavior

2021-08-26 Thread Gregory Szorc
New submission from Gregory Szorc : At least the PyASCIIObject struct in Include/cpython/unicodeobject.h uses bit fields. Various preprocessor macros like PyUnicode_IS_ASCII() and PyUnicode_KIND() access this struct's bit field. This is problematic because according to the C specification,