STINNER Victor added the comment: I splitted my patch into two parts:
- calloc-4.patch: add new "Calloc" functions including _PyObject_GC_Calloc() - use_calloc.patch: patch types (bytes, dict, list, set, tuple, etc.) and various modules to use calloc I reverted my changes on _PyObject_GC_Malloc() and added _PyObject_GC_Calloc(), performance regressions are gone. Creating a large tuple is a little bit (8%) faster. But the real speedup is to build a large bytes strings of null bytes: $ ./python.orig -m timeit 'bytes(50*1024*1024)' 100 loops, best of 3: 5.7 msec per loop $ ./python.calloc -m timeit 'bytes(50*1024*1024)' 100000 loops, best of 3: 4.12 usec per loop On Linux, no memory is allocated, even if you read the bytes content. RSS is almost unchanged. Ok, now the real use case where it becomes faster: I implemented the same optimization for bytearray. $ ./python.orig -m timeit 'bytearray(50*1024*1024)' 100 loops, best of 3: 6.33 msec per loop $ ./python.calloc -m timeit 'bytearray(50*1024*1024)' 100000 loops, best of 3: 4.09 usec per loop If you overallocate a bytearray and only write a few bytes, the bytes of end of bytearray will not be allocated (at least on Linux). Result of bench_alloc.py comparing original Python to patched Python (calloc-4.patch + use_calloc.patch). Common platform: SCM: hg revision=4b97092aa4bd+ tag=tip branch=default date="2014-04-27 18:02 +0100" Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) Python unicode implementation: PEP 393 CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Bits: int=32, long=64, long long=64, size_t=64, void*=64 Timer: time.perf_counter CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug Platform of campaign orig: Timer precision: 42 ns Date: 2014-04-28 00:27:19 Python version: 3.5.0a0 (default:4b97092aa4bd, Apr 28 2014, 00:24:03) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] Platform of campaign calloc: Timer precision: 54 ns Date: 2014-04-28 00:28:35 Python version: 3.5.0a0 (default:4b97092aa4bd+, Apr 28 2014, 00:25:56) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] -----------------------------------+-------------+-------------- Tests | orig | calloc -----------------------------------+-------------+-------------- object() | 61 ns (*) | 71 ns (+16%) b'A' * 10 | 54 ns (*) | 52 ns b'A' * 10**3 | 124 ns (*) | 110 ns (-12%) b'A' * 10**6 | 38.4 us (*) | 38.5 us 'A' * 10 | 59 ns (*) | 62 ns 'A' * 10**3 | 132 ns (*) | 107 ns (-19%) 'A' * 10**6 | 38.5 us (*) | 38.5 us 'A' * 10**8 | 10.3 ms (*) | 10.6 ms decode 10 null bytes from ASCII | 264 ns (*) | 263 ns decode 10**3 null bytes from ASCII | 403 ns (*) | 379 ns (-6%) decode 10**6 null bytes from ASCII | 80.5 us (*) | 80.5 us decode 10**8 null bytes from ASCII | 17.7 ms (*) | 17.3 ms (None,) * 10**0 | 29 ns (*) | 28 ns (None,) * 10**1 | 75 ns (*) | 76 ns (None,) * 10**2 | 461 ns (*) | 460 ns (None,) * 10**3 | 3.6 us (*) | 3.57 us (None,) * 10**4 | 35.7 us (*) | 35.7 us (None,) * 10**5 | 364 us (*) | 365 us (None,) * 10**6 | 4.12 ms (*) | 4.11 ms (None,) * 10**7 | 43.5 ms (*) | 40.3 ms (-7%) (None,) * 10**8 | 433 ms (*) | 400 ms (-8%) ([None] * 10)[1:-1] | 121 ns (*) | 134 ns (+11%) ([None] * 10**3)[1:-1] | 3.62 us (*) | 3.61 us ([None] * 10**6)[1:-1] | 4.24 ms (*) | 4.22 ms ([None] * 10**8)[1:-1] | 440 ms (*) | 402 ms (-9%) -----------------------------------+-------------+-------------- Total | 954 ms (*) | 880 ms (-8%) -----------------------------------+-------------+-------------- ---------- Added file: http://bugs.python.org/file35063/calloc-4.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21233> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com