[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

Serhiy Storchaka Thu, 19 Dec 2019 01:43:38 -0800


Serhiy Storchaka <[email protected]> added the comment:


Do you mean some concrete code? Several times I wished similar feature. To get 
a UTF-8 cache if it exists and encode to UTF-8 without creating a cache 
otherwise. 

The private _PyUnicode_UTF8() macro could help

if ((s = _PyUnicode_UTF8(str))) {
    size = _PyUnicode_UTF8_LENGTH(str);
    tmpbytes = NULL;
}
else {
    tmpbytes = _PyUnicode_AsUTF8String(str, "replace");
    s = PyBytes_AS_STRING(tmpbytes);
    size = PyBytes_GET_SIZE(tmpbytes);
}

but it is not even available outside of unicodeobject.c.

PyUnicode_BorrowUTF8() looks too complex for the public API. I am not sure that 
it will be easy to implement it in PyPy. It also does not cover all use cases 
-- sometimes you want to convert to UTF-8 but does not use any memory 
allocation at all (either use an existing buffer or raise an error if there is 
no cached UTF-8 or the string is not ASCII).

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue39087>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

Reply via email to