Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:
As for depending the optimization on the size of CPU cache, I have repeated mickrobenchmarks on the computer with 6 MiB cache and two computers with 512 KiB caches (64- and 32-bit). Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (cache size: 6144 KB): +---------------------------------+----------+------------------------------+ | Benchmark | baseline | inline | +=================================+==========+==============================+ | round_(4.2) | 113 ns | 81.3 ns: 1.39x faster (-28%) | +---------------------------------+----------+------------------------------+ | sum_(()) | 83.8 ns | 56.7 ns: 1.48x faster (-32%) | +---------------------------------+----------+------------------------------+ | sum_(a) | 98.0 ns | 72.1 ns: 1.36x faster (-26%) | +---------------------------------+----------+------------------------------+ | 'abc'.split() | 107 ns | 83.1 ns: 1.29x faster (-22%) | +---------------------------------+----------+------------------------------+ | b'abc'.split() | 101 ns | 75.4 ns: 1.34x faster (-25%) | +---------------------------------+----------+------------------------------+ | 'abc'.split('-') | 123 ns | 89.9 ns: 1.37x faster (-27%) | +---------------------------------+----------+------------------------------+ | 'abc'.encode() | 79.6 ns | 59.2 ns: 1.34x faster (-26%) | +---------------------------------+----------+------------------------------+ | b'abc'.decode() | 105 ns | 84.7 ns: 1.24x faster (-20%) | +---------------------------------+----------+------------------------------+ | int_(4.2) | 88.9 ns | 64.1 ns: 1.39x faster (-28%) | +---------------------------------+----------+------------------------------+ | int_('5') | 137 ns | 108 ns: 1.28x faster (-22%) | +---------------------------------+----------+------------------------------+ | 42 .to_bytes(2, 'little') | 113 ns | 77.6 ns: 1.45x faster (-31%) | +---------------------------------+----------+------------------------------+ | int_from_bytes(b'ab', 'little') | 83.4 ns | 51.5 ns: 1.62x faster (-38%) | +---------------------------------+----------+------------------------------+ | struct_i32_unpack_from(b'abcd') | 96.0 ns | 71.6 ns: 1.34x faster (-25%) | +---------------------------------+----------+------------------------------+ | re_word_match('a') | 221 ns | 180 ns: 1.22x faster (-18%) | +---------------------------------+----------+------------------------------+ | datetime_now() | 282 ns | 248 ns: 1.14x faster (-12%) | +---------------------------------+----------+------------------------------+ Not significant (1): zlib_compress(b'abc') AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ (cache size: 512 KB): +---------------------------------+----------+-----------------------------+ | Benchmark | baseline | inline | +=================================+==========+=============================+ | round_(4.2) | 391 ns | 272 ns: 1.44x faster (-31%) | +---------------------------------+----------+-----------------------------+ | sum_(()) | 212 ns | 160 ns: 1.32x faster (-24%) | +---------------------------------+----------+-----------------------------+ | sum_(a) | 256 ns | 211 ns: 1.21x faster (-18%) | +---------------------------------+----------+-----------------------------+ | 'abc'.split() | 290 ns | 233 ns: 1.25x faster (-20%) | +---------------------------------+----------+-----------------------------+ | b'abc'.split() | 263 ns | 226 ns: 1.16x faster (-14%) | +---------------------------------+----------+-----------------------------+ | 'abc'.split('-') | 316 ns | 262 ns: 1.21x faster (-17%) | +---------------------------------+----------+-----------------------------+ | 'abc'.encode() | 197 ns | 154 ns: 1.28x faster (-22%) | +---------------------------------+----------+-----------------------------+ | b'abc'.decode() | 303 ns | 250 ns: 1.21x faster (-18%) | +---------------------------------+----------+-----------------------------+ | int_(4.2) | 234 ns | 171 ns: 1.37x faster (-27%) | +---------------------------------+----------+-----------------------------+ | int_('5') | 372 ns | 310 ns: 1.20x faster (-17%) | +---------------------------------+----------+-----------------------------+ | 42 .to_bytes(2, 'little') | 370 ns | 245 ns: 1.51x faster (-34%) | +---------------------------------+----------+-----------------------------+ | int_from_bytes(b'ab', 'little') | 251 ns | 167 ns: 1.50x faster (-33%) | +---------------------------------+----------+-----------------------------+ | struct_i32_unpack_from(b'abcd') | 252 ns | 202 ns: 1.24x faster (-20%) | +---------------------------------+----------+-----------------------------+ | re_word_match('a') | 625 ns | 524 ns: 1.19x faster (-16%) | +---------------------------------+----------+-----------------------------+ | datetime_now() | 2.05 us | 1.99 us: 1.03x faster (-3%) | +---------------------------------+----------+-----------------------------+ | zlib_compress(b'abc') | 28.6 us | 28.0 us: 1.02x faster (-2%) | +---------------------------------+----------+-----------------------------+ Intel(R) Atom(TM) CPU N570 @ 1.66GHz (cache size: 512 KB), 32-bit: +---------------------------------+----------+------------------------------+ | Benchmark | baseline | inline | +=================================+==========+==============================+ | round_(4.2) | 1.95 us | 1.29 us: 1.51x faster (-34%) | +---------------------------------+----------+------------------------------+ | sum_(()) | 1.15 us | 821 ns: 1.40x faster (-29%) | +---------------------------------+----------+------------------------------+ | sum_(a) | 1.32 us | 1.02 us: 1.30x faster (-23%) | +---------------------------------+----------+------------------------------+ | 'abc'.split() | 1.32 us | 1.11 us: 1.19x faster (-16%) | +---------------------------------+----------+------------------------------+ | b'abc'.split() | 1.22 us | 1.03 us: 1.18x faster (-15%) | +---------------------------------+----------+------------------------------+ | 'abc'.split('-') | 1.78 us | 1.15 us: 1.54x faster (-35%) | +---------------------------------+----------+------------------------------+ | 'abc'.encode() | 1.05 us | 883 ns: 1.19x faster (-16%) | +---------------------------------+----------+------------------------------+ | b'abc'.decode() | 1.34 us | 1.17 us: 1.15x faster (-13%) | +---------------------------------+----------+------------------------------+ | int_(4.2) | 1.23 us | 859 ns: 1.43x faster (-30%) | +---------------------------------+----------+------------------------------+ | int_('5') | 2.20 us | 1.41 us: 1.56x faster (-36%) | +---------------------------------+----------+------------------------------+ | 42 .to_bytes(2, 'little') | 1.45 us | 1.09 us: 1.33x faster (-25%) | +---------------------------------+----------+------------------------------+ | int_from_bytes(b'ab', 'little') | 1.07 us | 737 ns: 1.45x faster (-31%) | +---------------------------------+----------+------------------------------+ | struct_i32_unpack_from(b'abcd') | 1.31 us | 1.08 us: 1.21x faster (-18%) | +---------------------------------+----------+------------------------------+ | re_word_match('a') | 2.85 us | 2.06 us: 1.39x faster (-28%) | +---------------------------------+----------+------------------------------+ | datetime_now() | 6.20 us | 5.92 us: 1.05x faster (-4%) | +---------------------------------+----------+------------------------------+ | zlib_compress(b'abc') | 28.7 us | 26.9 us: 1.07x faster (-6%) | +---------------------------------+----------+------------------------------+ The speed up is significant on all computers. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36127> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com