STINNER Victor added the comment:

Advantages of the patch.

* finer control on how the buffer is allocated: only overallocate if the 
replacement string (while handling an encoding error) is longer than 1 
byte/character. The "replace" error handler should never use overallocation for 
example. Overallocation (when misused, when it was not needed) has a cost at 
the end of the encoder, because the buffer must be resized (shrink)

* use a buffer allocated on the stack for short strings. I'm not really 
convinced of this optimization. The data is still copied when the result is 
converted to a bytes objects (PyBytes_FromStringAndSize). It may be interesting 
if the encoder has to handle one or more errors: no need to resize the buffer 
until we reach the size of the small buffer (ex: 512 bytes).

* handle correctly integer overflow: most encoders do not catch integer 
overflow errors and may fail to handle (very) long strings (ex: encoded string 
longer than PY_SSIZE_T_MAX).

I'm not convinced that the patch would permit to design faster code. According 
to the assembler, it is the opposite (when "*writer.str++" is used in a loop). 
I don't know if it's possible to design a more efficient _PyBytesWriter API (to 
help GCC to generate more efficient machine code), nor if the overhead is 
important in a "normal case" (bench_encoders.py tests border cases, text with 
many many errors).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17742>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to