On Wed, 2019-01-23 at 16:02 +0000, Jerin Jacob Kollanukkaran wrote: > On Tue, 2019-01-22 at 09:27 +0000, Ola Liljedahl wrote: > > > > On Fri, 2019-01-18 at 09:23 -0600, Gage Eads wrote: > > > > > > v3: > > > - Avoid the ABI break by putting 64-bit head and tail values in > > > the > > > same > > > cacheline as struct rte_ring's prod and cons members. > > > - Don't attempt to compile rte_atomic128_cmpset without > > > ALLOW_EXPERIMENTAL_API, as this would break a large number of > > > libraries. > > > - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case > > > someone tries > > > to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag. > > > - Update the ring mempool to use experimental APIs > > > - Clarify that RINB_F_NB is only limited to x86_64 currently; > > > ARMv8.1-A builds > > > can eventually support it with the CASP instruction. > > ARMv8.0 should be able to implement a 128-bit atomic compare exchange > > operation using LDXP/STXP. > Just wondering what would the performance difference between CASP vs > LDXP/STXP on LSE supported machine? I think that is up to the microarchitecture. But one the ideas behind introducing the LSE atomics was that they should be "better" than the equivalent code using exclusives. I think non-conditional LDxxx and STxxx atomics could be better than using exclusives while conditional atomics (CAS, CASP) might not be so different (the reason has to do with cache coherency, a core can speculatively snoop-unique the cache line which is targetted by an atomic instruction but to what extent that provides a benefit could be depend on whether the atomic actually performs a store or not).
> > I think, We can not detect the presese of LSE support in compile time. > Right? Unfortunately, AFAIK GCC doesn't notify the source code that it is targetting v8.1+ with LSE support. If there were intrinsics for (certain) LSE instructions (e.g. those not generated by the compiler, e.g. STxxx and CASP), we could use some corresponding preprocessor define to detect the presence of such intrinsics (they exist for other intrinsics, e.g. __ARM_FEATURE_QRDMX for SQRDMLAH/SQRDMLSH instructions and corresponding intrinsics). I have tried to interest the Arm GCC developers in this but have not yet succeeded. Perhaps if we have more use cases were atomics intrinsics would be useful, we could convince them to add such intrinsics to the ACLE (ARM C Language Extensions). But we will never get intrinsics for exclusives, they are deemed unsafe for explicit use from C. Instead need to provide inline assembler that contains the complete exclusives sequence. But in practice it seems to work with using inline assembler for LDXR and STXR as I do in the lockfree code linked below. > > The dynamic one will be costly like, Do you think so? Shouldn't this branch be perfectly predictable? Once in a while it will fall out of the branch history table but doesn't that mean the application hasn't been executing this code for some time so not really performance critical? > > if (hwcaps & HWCAP_ATOMICS) { > casp > } else { > ldxp > stxp > } > > > > > From an ARM perspective, I want all atomic operations to take memory > > ordering arguments (e.g. acquire, release). Not all usages of e.g. > +1 > > > > > atomic compare exchange require sequential consistency (which I think > > what x86 cmpxchg instruction provides). DPDK functions should not be > > modelled after x86 behaviour. > > > > Lock-free 128-bit atomics implementations for ARM/AArch64 and x86-64 > > are available here: > > https://github.com/ARM-software/progress64/blob/master/src/lockfree.h > > -- Ola Liljedahl, Networking System Architect, Arm Phone +46706866373, Skype ola.liljedahl