On Wednesday, 30 April 2014 at 22:08:23 UTC, John Colvin wrote:
I don't think I fully understand.

Either all RC changes for a given type need to be atomic or none do, and that information is given by the type (everything that is immutable/const/shared). I don't see any feasible way of escaping this, or any advantage to a runtime convention like the odd/even trick above.

If you CPU is decent you have some cache coherency protocol in place. This won't ensure that thing appears sequentially consistent, but you don't care here.

You can proceed as follow in pseudo assembly :

count = load count_addr
need_atomic = count & 0x01
brtr atomic
count = count + 2
store count count_addr
br follow_up

atomic:
atomic_add count_addr 2

follow_up:
// Code after increment goes here

Note that is working as count may not be the correct number in case of sharing, but will always have the same parity, so even reading the wrong value will make you branch properly and the value of count is not used to increment in the atomic block.

I'm not happy with this solution, because:
- You still have an atomic in there, and the compiler can't remove it. This reduce greatly the capability of the compiler to optimize. For instance, the compiler cannot optimize away redundant pairs of increment/decrement. - You have a branch in there. Atomic are expensive, but branch as well. Especially since both are storing (and one atomically), so it can't be speculated. - If we start using that all over the place, the codegen will ends up being quite fat. That means less friendly cache behavior.

That odd/even solution surely works, but ultimately do not solve the issue: if you want full speed, you'll have to provide both a const and a mutable version of the code, which defeat the purpose of const. Note that the exact same issue exists with inout.

Reply via email to