New submission from Josh Rosenberg <shadowranger+pyt...@gmail.com>:

Patch is good, but while we're at it, is there any reason why this 
multi-allocation design was even used? It PyMem_Mallocs a buffer, makes a 
C-style string in it, then uses PyUnicode_FromString to convert C-style string 
to Python str.

Seems like the correct approach would be to just use PyUnicode_New to 
preallocate the final string buffer up front, then pull out the internal buffer 
with PyUnicode_1BYTE_DATA and populate that directly, saving a pointless 
allocation/deallocation, which also means the failure case means no cleanup 
needed at all, while barely changing the code (aside from removing the need to 
explicitly NUL terminate).

Only reason I can see to avoid this would be if the codec names could contain 
arbitrary Unicode encoded as UTF-8 (and therefore strlen wouldn't tell you the 
final length in Unicode ordinals), but I'm pretty sure that's not the case (if 
it is, we're not normalizing properly, since we only lower case ASCII). If 
Unicode codec names need to be handled, there are other options, though the 
easy savings go away.

----------
nosy: +josh.r

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33231>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to