New submission from STINNER Victor:

In Python 3.3, I added _PyUnicodeWriter API to factorize code handling a 
Unicode "buffer", just the code to allocate memory and resize the buffer if 
needed.

I propose to do the same with a new _PyBytesWriter API. The API is very similar 
to _PyUnicodeWriter:

 * _PyBytesWriter_Init(writer)
 * _PyBytesWriter_Prepare(writer, count)
 * _PyBytesWriter_WriteStr(writer, bytes_obj)
 * _PyBytesWriter_WriteChar(writer, ch)
 * _PyBytesWriter_Finish(writer)
 * _PyBytesWriter_Dealloc(writer)

The patch changes ASCII, Latin1, UTF-8 and charmap encoders to use 
_PyBytesWriter API. A second patch changes CJK encoders.

I did not run a benchmark yet. I wrote a patch to factorize the code, not the 
make the code faster.

Notes on performances:

 * I peek the "small buffer allocated on the stack" idea from UTF-8 encoder, 
but the smaller buffer is always 500 bytes (instead of a size depending on the 
Unicode maximum character of the input Unicode string)
 * _PyBytesWriter overallocates by 25% (when overallocation is enabled), 
whereas charmap encoders doubles the buffer:

    /* exponentially overallocate to minimize reallocations */
    if (requiredsize < 2*outsize)
        requiredsize = 2*outsize;

 * I didn't check if the allocation size is the same with the patch. min_size 
and overallocate attributes should be set correctly to not make the code slower.
 * The code writing a single into a _PyUnicodeWriter buffer is inlined in 
unicodeobject.c. _PyBytesWriter API does not provide inlined function for the 
same purpose.

----------
files: bytes_writer.patch
keywords: patch
messages: 187035
nosy: haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Add _PyBytesWriter API
versions: Python 3.4
Added file: http://bugs.python.org/file29877/bytes_writer.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17742>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to