> Any atomics can be emulated in SW (using compiler built-ins or locks 
> directly). The point here is the missing HW support:
>  * E.g. MIPS, Power, ARMv7 do not have 128 bit CAS
>  * 128 bit fetch-and-add is not supported in any of the architectures
> 
> We need to ensure on any operations added that those can be implemented 
> efficiently on most of the targets.

Yes, I totally appreciate that this is important with respect to adding
128-bit support to the ODP atomics API. In terms of this example in
particular, though, I would think that the more important factor is having
support for the built-ins at all. After all, this isn't for performance
measurement, but is merely an illustrative demonstration of fragmentation and
reassembly using ODP. And we don't want to break builds without support.

That being said, it would be nice if the performance was relatively
acceptable in general in order to provide a more realistic view of what might
be used in real systems. I believe that to be the case here, even for 32-bit
machines, where 64-bit rather than 128-bit atomics are used. (This is good
news for ARMv7 at least — I'm not sure if it helps MIPS out.)

Unfortunately, there is a further complication here in that the doubleword
(ARMv7) and quadword (ARMv8) atomic primitives aren't always there by default
either. In my working copy, I'm currently bundling along lock-free 64-bit and
128-bit CAS implementations to fill this purpose for ARMv7 and ARMv8
respectively. This is a slight annoyance, but saves a dependency on the
external "libatomic" and gives a more efficient implementation than the
lock-based solution used within this.

Reply via email to