STINNER Victor <vstin...@python.org> added the comment:

> PyUnicode_KIND does *not* expose the implementation details to the programmer.

PyUnicode_KIND() is very specific to the exact PEP 393 implementation. 
Documentation of this field:
---
/* Character size:

   - PyUnicode_WCHAR_KIND (0):

     * character type = wchar_t (16 or 32 bits, depending on the
       platform)

   - PyUnicode_1BYTE_KIND (1):

     * character type = Py_UCS1 (8 bits, unsigned)
     * all characters are in the range U+0000-U+00FF (latin1)
     * if ascii is set, all characters are in the range U+0000-U+007F
       (ASCII), otherwise at least one character is in the range
       U+0080-U+00FF

   - PyUnicode_2BYTE_KIND (2):

     * character type = Py_UCS2 (16 bits, unsigned)
     * all characters are in the range U+0000-U+FFFF (BMP)
     * at least one character is in the range U+0100-U+FFFF

   - PyUnicode_4BYTE_KIND (4):

     * character type = Py_UCS4 (32 bits, unsigned)
     * all characters are in the range U+0000-U+10FFFF
     * at least one character is in the range U+10000-U+10FFFF
 */
unsigned int kind:3;
---

I don't think that PyUnicode_KIND() makes sense if CPython uses UTF-8 tomorrow.


> If the internal representation os strings is switched to use masks and shifts 
> instead of bitfields, PyUnicode_KIND (and others) can be adapted to the new 
> details without breaking API compatibility.

PyUnicode_KIND() was exposed in the *public* C API because unicodeobject.h 
provides functions as macros for best performances, and these macros use 
PyUnicode_KIND() internally.

Macros like PyUnicode_READ(kind, data, index) are also designed for best 
performances with the exact PEP 393 implementation.

The public C API should only contain PyUnicode_READ_CHAR(unicode, index): this 
macro doesn't use "kind" or "data" which are (again) specific to the PEP 393.

In the CPython implementation, we should use the most efficient code, it's fine 
to use macros accessing directly structures.

But for the public C API, I would recommend to only provide abstractions, even 
if there are a little bit slower.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45025>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to