STINNER Victor added the comment:

Attached patch changes _PyUnicodeWriter_Init() API: it now only has one 
argument (the writer). Minimum length and overallocation must be configured 
using attributes. The problem with the old API was that it was not possible to 
configure minimum length and overallocation separatly.

Disable overallocation in CJK decoders: only set the minimum length.

Other changes:

 * Add min_char character to _PyUnicodeWriter. It is currenctly unused. Using 
_PyUnicodeWriter_Prepare(writer, 0, min_char) is different because it allocates 
immediatly the buffer, and calling _PyUnicodeWriter_Prepare() with size=0 is 
not supported (it does not widen the buffer if maxchar is bigger).
 * unicode_decode_call_errorhandler_writer() only enables overallocation if the 
replaced string is longer than 1 character
 * PyUnicode_DecodeRawUnicodeEscape() and _PyUnicode_DecodeUnicodeInternal() 
set minimum length instead of preallocating the whole buffer. It avoids the 
need of widen the buffer if the first written character is the biggest 
character. It also avoids an useless memory allocation if the decoder fails 
before the first write.
 * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow when 
computing the minimum length
 * _PyUnicodeWriter_Update() is now responsible to set size to zero if readonly 
is set

The goal is to delay the first allocation until the first real write to be able 
to choose correctly the maximum character and the buffer size. If the buffer is 
allocated before the first write, even the first write must widen and/or 
enlarge the buffer.

----------
Added file: http://bugs.python.org/file29840/writer_minlen.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17694>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to