> Any atomics can be emulated in SW (using compiler built-ins or locks > directly). The point here is the missing HW support: > * E.g. MIPS, Power, ARMv7 do not have 128 bit CAS > * 128 bit fetch-and-add is not supported in any of the architectures > > We need to ensure on any operations added that those can be implemented > efficiently on most of the targets.
Yes, I totally appreciate that this is important with respect to adding 128-bit support to the ODP atomics API. In terms of this example in particular, though, I would think that the more important factor is having support for the built-ins at all. After all, this isn't for performance measurement, but is merely an illustrative demonstration of fragmentation and reassembly using ODP. And we don't want to break builds without support. That being said, it would be nice if the performance was relatively acceptable in general in order to provide a more realistic view of what might be used in real systems. I believe that to be the case here, even for 32-bit machines, where 64-bit rather than 128-bit atomics are used. (This is good news for ARMv7 at least — I'm not sure if it helps MIPS out.) Unfortunately, there is a further complication here in that the doubleword (ARMv7) and quadword (ARMv8) atomic primitives aren't always there by default either. In my working copy, I'm currently bundling along lock-free 64-bit and 128-bit CAS implementations to fill this purpose for ARMv7 and ARMv8 respectively. This is a slight annoyance, but saves a dependency on the external "libatomic" and gives a more efficient implementation than the lock-based solution used within this.