New submission from STINNER Victor <victor.stin...@haypocalc.com>: PyUnicode_AsWideChar() doesn't merge surrogate pairs on a system with 32 bits wchar_t and Python compiled in narrow mode (sizeof(wchar_t) == 4 and sizeof(Py_UNICODE) == 2) => see issue #8670.
It is not easy to fix this problem because the callers of PyUnicode_AsWideChar() suppose that the output (wide character) string has the same length (in character) than the input (PyUnicode) string (suppose that sizeof(wchar_t) == sizeof(Py_UNICODE)). And PyUnicode_AsWideChar() doesn't write nul character at the end if the output string is truncated. To prepare this change, a new PyUnicode_AsWideCharString() function would help because it does compute the size of the output buffer (whereas PyUnicode_AsWideChar() requires the output buffer in an argument). Attached patch implements it: ------- /* Convert the Unicode object to a wide character string. The output string always ends with a nul character. If size is not NULL, write the number of wide characters (including the final nul character) into *size. Returns a buffer allocated by PyMem_Alloc() (use PyMem_Free() to free it) on success. On error, returns NULL and *size is undefined. */ PyAPI_FUNC(wchar_t*) PyUnicode_AsWideCharString( PyUnicodeObject *unicode, /* Unicode object */ Py_ssize_t *size /* number of characters of the result */ ); ------- ---------- components: Interpreter Core, Unicode files: pyunicode_aswidecharstring.patch keywords: patch messages: 117566 nosy: haypo priority: normal severity: normal status: open title: Create PyUnicode_AsWideCharString() function versions: Python 3.2 Added file: http://bugs.python.org/file19054/pyunicode_aswidecharstring.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9979> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com