Bruce Merry <bme...@gmail.com> added the comment:

I've attached a benchmark script and CSV results for master (whichever version 
that was at the point I forked) and with unconditional dropping of the GIL. It 
shows up to 3x performance improvement when using 4 threads. That's on my home 
desktop, which is quite old (Sandy Bridge). I'm expecting more significant 
gains on server CPUs, whose memory systems are optimised for multi-threaded 
workloads. The columns are chunk size, number of chunks, number of threads, and 
per-thread throughput.

There are also cases where using multiple threads is a slowdown, but I think 
that's an artifact of the benchmark. It repeatedly joins the same strings, so 
performance is higher when they all fit in the cache; when using 4 threads that 
execute in parallel, the working set is 4x larger and may cease to fit in 
cache. In real-world usage one is unlikely to be joining the same strings again 
and again.

In the single-threaded case, the benchmark seems to show that for 64K+, 
performance is improved by dropping the GIL (which I'm guessing must be 
statistical noise, since there shouldn't be anything contending for it), which 
is my reasoning behind the 65536 threshold.

I'll take a look at extra unit tests soon. Do you know off the top of your head 
where to look for existing `join` tests to add to?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36051>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to