New submission from STINNER Victor: Attached patch modify dict_repr() function to use the _PyUnicodeWriter API instead of building a list of short strings with PyUnicode_AppendAndDel() and calling PyUnicode_Join() at the end to join the list. PyUnicode_Append() is inefficient because it has to allocate a new string instead of reusing the same buffer.
_PyUnicodeWriter API has a different design. It overallocates a buffer to write Unicode characters and shrink the buffer at the end. It is faster according to my micro benchmark. $ ./python ~/prog/HG/misc/python/benchmark.py compare_to pyaccu writer Common platform: CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Python unicode implementation: PEP 393 CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Timer precision: 40 ns Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow Bits: int=32, long=64, long long=64, size_t=64, void*=64 Timer: time.perf_counter Platform of campaign pyaccu: Date: 2013-11-18 21:37:44 Python version: 3.4.0a4+ (default:fc7ceb001eec, Nov 18 2013, 21:29:41) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] SCM: hg revision=fc7ceb001eec tag=tip branch=default date="2013-11-18 21:11 +0100" Platform of campaign writer: Date: 2013-11-18 22:10:40 Python version: 3.4.0a4+ (default:fc7ceb001eec+, Nov 18 2013, 22:10:12) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] SCM: hg revision=fc7ceb001eec+ tag=tip branch=default date="2013-11-18 21:11 +0100" --------------------------------------+-------------+-------------- Tests | pyaccu | writer --------------------------------------+-------------+-------------- {"a": 1} | 603 ns (*) | 496 ns (-18%) dict(zip("abc", range(3))) | 1.05 us (*) | 904 ns (-14%) {"%03d":"abc" for k in range(10)} | 631 ns (*) | 501 ns (-21%) {"%100d":"abc" for k in range(10)} | 660 ns (*) | 484 ns (-27%) {k:"a" for k in range(10**3)} | 235 us (*) | 166 us (-30%) {k:"abc" for k in range(10**3)} | 245 us (*) | 177 us (-28%) {"%100d":"abc" for k in range(10**3)} | 668 ns (*) | 478 ns (-28%) {k:"a" for k in range(10**6)} | 258 ms (*) | 186 ms (-28%) {k:"abc" for k in range(10**6)} | 265 ms (*) | 184 ms (-31%) {"%100d":"abc" for k in range(10**6)} | 652 ns (*) | 489 ns (-25%) --------------------------------------+-------------+-------------- Total | 523 ms (*) | 369 ms (-29%) --------------------------------------+-------------+-------------- ---------- components: Unicode files: dict_repr_writer.patch keywords: patch messages: 203322 nosy: ezio.melotti, haypo, serhiy.storchaka priority: normal severity: normal status: open title: Use PyUnicodeWriter in repr(dict) type: enhancement versions: Python 3.4 Added file: http://bugs.python.org/file32694/dict_repr_writer.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19646> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com