Jeethu Rao <jee...@jeethurao.com> added the comment:

Victor: I’m booting with the isolcpus and rcu_nocbs flags, and running 
pyperformance with the --affinity flag to pin the benchmark to the isolated CPU 
cores. I’ve also run `perf system tune`. And the OS is Ubuntu 17.10. Thanks for 
the tip on using perf timeit instead of timeit. I’ve run the benchmark that 
you've suggested with a minor change (to avoid the cost of LOAD_ATTR) and 
attached the output on a gist[1].

Antoine: Thanks for benchmarking it. After looking at the generated 
assembly[2], I found that ins1 is being inlined and the call to memmove was 
appearing before the loop (possibly because the compiler assumes that the call 
to memmove is more likely). I made a minor change and increased the threshold 
to 32. I’ve attached the generated assembly in a gist[3] (The relevant sequence 
is around line 8406, if you’re interested). And here’s the pyperformance 
comparison[4]. Could you please try benchmarking this version on your machine?

[1]: https://gist.github.com/jeethu/2d2de55afdb8ea4ad03b6a5d04d5227f
[2]: Generated with “gcc -DNDEBUG -fwrapv -O3 -std=c99  -I. -I./Include 
-DPy_BUILD_CORE -S -masm=intel Objects/listobject.c”
[3]: https://gist.github.com/jeethu/596bfc1251590bc51cc230046b52fb38
[4]: https://gist.github.com/jeethu/d6e4045f7932136548a806380eddd030

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32534>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to