[issue22324] Use PyUnicode_AsWideCharString() instead of PyUnicode_AsUnicode()

STINNER Victor Wed, 03 Sep 2014 00:10:54 -0700

STINNER Victor added the comment:

> Will not this cause performance regression? When we hardly work with 
> wchar_t-based API, it looks good to cache encoded value.


Yes, it will be slower. But I prefer slower code with a lower memory footprint. 
On UNIX, I don't think that anyone will notice the difference.

My concern is that the cache is never released. If the conversion is only 
needed once at startup, the memory will stay until Python exits. It's not 
really efficient.

On Windows, conversion to wchar_t* is common because Python uses the Windows 
wide character API ("W" API vs "A" ANSI code page API). For example, most 
access to the filesystem use wchar_t* type.

On Python < 3.3, Python was compiled in narrow mode and so Unicode was already 
using wchar_t* internally to store characters. Since Python 3.3, Python uses a 
more compact representation. wchar_t* shares Unicode data only if 
sizeof(wchar_t*) == KIND where KIND is 1, 2 or 4 bytes per character. Examples: 
"\u20ac" on Windows (16 bits wchar_t) or "\U0010ffff" on Linux (32 bits 
wchar_t) .

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22324>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22324] Use PyUnicode_AsWideCharString() instead of PyUnicode_AsUnicode()

Reply via email to