Bruce Merry <bme...@gmail.com> added the comment:

I'm realising that the benchmark makes it difficult to see what's going on 
because it doesn't separate overhead costs (slowdowns because 
releasing/acquiring the GIL is not free, particularly when contended) from 
cache effects (slowdowns due to parallel threads creating more cache pressure 
than threads that take turns). inada.naoki's version of the benchmark is better 
here because it uses the same input data for all the threads, but the output 
data will still be different in each thread.

For example, on my system I see a big drop in speedup (although I still get 
speedup) with the new benchmark once the buffer size gets to 2MB per thread, 
which is not surprising with an 8MB L3 cache.

My feeling is that we should try to ignore cache effects when picking a 
threshold, because we can't predict them generically (they'll vary by workload, 
thread count, CPU etc) whereas users can benchmark specific use cases to decide 
whether multithreading gives them a benefit. If the threshold is too low then 
users can always choose not to use multi-threading (and in general one doesn't 
expect much from it in Python) but if the threshold is too high then users have 
no recourse. That being said, 65536 does still seem a bit low based on the 
results available.

I'll try to write a variant of the benchmark in which other threads just spin 
in Python without creating memory pressure to see if that gives a different 
picture. I'll also run the benchmark on a server CPU when I'm back at work.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36051>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to