Steve Dower <steve.do...@python.org> added the comment:

> I would expect that the negative impact on branch predictability would easily 
> outweigh the cost of the memory write (A guaranteed L1 hit)

If that were true then Spectre and Meltdown wouldn't have been so interesting :)

Pipelining processors are going to speculatively execute both paths, and will 
skip the write much more quickly than by doing it, and meanwhile nobody should 
have tried to read the value so it hasn't had to block for that path. I'm not 
aware of any that detect no-op writes and skip synchronising across cores - the 
dirty bit of the cache line is just set unconditionally.

Benchmarking already showed that the branching version is faster. It's possible 
that "refcount += (refcount & IMMORTAL) ? 0 : 1" could generate different code 
(should be mov,test,lea,cmovz rather than mov,and,add,mov or 
mov,and,jz,add,mov), but it's totally reasonable for a branch to be faster than 
unconditionally modifying memory.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40255>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to