On 2019-07-09, Inada Naoki wrote: > PyObject_Malloc inlines pymalloc_alloc, and PyObject_Free inlines > pymalloc_free. > But compiler doesn't know which is the hot part in pymalloc_alloc and > pymalloc_free.
Hello Inada, I don't see this on my PC. I'm using GCC 8.3.0. I have configured the build with --enable-optimizations. To speed up the profile generation, I have changed PROFILE_TASK to only run these tests: test_shelve test_set test_pprint test_pickletools test_ordered_dict test_tabnanny test_difflib test_pickle test_json test_collections I haven't spent much time trying to figure out what set of tests is best but the above set runs pretty quickly and seems to work okay. I have run pyperformance to compare CPython 'master' with your PR 14674. There doesn't seem to be a difference (table below). If I look at the disassembly, it seems that the hot paths of pymalloc_alloc and pymalloc_free are being inlined as you would hope, without needing the LIKELY/UNLIKELY annotations. OTOH, your addition of LIKELY() and UNLIKELY() in the PR is a pretty small change and probably doesn't hurt anything. So, I think it would be fine to merge it. Regards, Neil +-------------------------+---------+-----------------------------+ | Benchmark | master | PR-14674 | +=========================+=========+=============================+ | 2to3 | 305 ms | 304 ms: 1.00x faster (-0%) | +-------------------------+---------+-----------------------------+ | chaos | 109 ms | 110 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | crypto_pyaes | 118 ms | 117 ms: 1.01x faster (-1%) | +-------------------------+---------+-----------------------------+ | django_template | 112 ms | 114 ms: 1.02x slower (+2%) | +-------------------------+---------+-----------------------------+ | fannkuch | 446 ms | 440 ms: 1.01x faster (-1%) | +-------------------------+---------+-----------------------------+ | float | 119 ms | 120 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | go | 247 ms | 250 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | json_loads | 25.1 us | 24.4 us: 1.03x faster (-3%) | +-------------------------+---------+-----------------------------+ | logging_simple | 8.86 us | 8.66 us: 1.02x faster (-2%) | +-------------------------+---------+-----------------------------+ | meteor_contest | 97.5 ms | 97.7 ms: 1.00x slower (+0%) | +-------------------------+---------+-----------------------------+ | nbody | 140 ms | 142 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | pathlib | 19.2 ms | 18.9 ms: 1.01x faster (-1%) | +-------------------------+---------+-----------------------------+ | pickle | 8.95 us | 9.08 us: 1.02x slower (+2%) | +-------------------------+---------+-----------------------------+ | pickle_dict | 18.1 us | 18.0 us: 1.01x faster (-1%) | +-------------------------+---------+-----------------------------+ | pickle_list | 2.75 us | 2.68 us: 1.03x faster (-3%) | +-------------------------+---------+-----------------------------+ | pidigits | 182 ms | 184 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | python_startup | 7.83 ms | 7.81 ms: 1.00x faster (-0%) | +-------------------------+---------+-----------------------------+ | python_startup_no_site | 5.36 ms | 5.36 ms: 1.00x faster (-0%) | +-------------------------+---------+-----------------------------+ | raytrace | 495 ms | 499 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | regex_dna | 173 ms | 170 ms: 1.01x faster (-1%) | +-------------------------+---------+-----------------------------+ | regex_effbot | 2.79 ms | 2.67 ms: 1.05x faster (-4%) | +-------------------------+---------+-----------------------------+ | regex_v8 | 21.1 ms | 21.2 ms: 1.00x slower (+0%) | +-------------------------+---------+-----------------------------+ | richards | 68.2 ms | 68.7 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | scimark_monte_carlo | 103 ms | 102 ms: 1.01x faster (-1%) | +-------------------------+---------+-----------------------------+ | scimark_sparse_mat_mult | 4.37 ms | 4.35 ms: 1.00x faster (-0%) | +-------------------------+---------+-----------------------------+ | spectral_norm | 132 ms | 133 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | sqlalchemy_imperative | 30.3 ms | 30.7 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | sympy_sum | 88.2 ms | 89.2 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | telco | 6.63 ms | 6.58 ms: 1.01x faster (-1%) | +-------------------------+---------+-----------------------------+ | tornado_http | 178 ms | 179 ms: 1.01x slower (+1%) | +-------------------------+---------+-----------------------------+ | unpickle | 12.0 us | 12.4 us: 1.03x slower (+3%) | +-------------------------+---------+-----------------------------+ | unpickle_list | 3.93 us | 3.75 us: 1.05x faster (-4%) | +-------------------------+---------+-----------------------------+ Not significant (25): deltablue; dulwich_log; hexiom; html5lib; json_dumps; logging_format; logging_silent; mako; nqueens; pickle_pure_python; regex_compile; scimark_fft; scimark_lu; scimark_sor; sqlalchemy_declarative; sqlite_synth; sympy_expand; sympy_integrate; sympy_str; unpack_sequence; unpickle_pure_python; xml_etree_parse; xml_etree_iterparse; xml_etree_generate; xml_etree_process _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6E44YQ4EOFCO6CNYFXT7PQJUCCFR5YXS/