>-----Original Message----- >From: Ola Liljedahl [mailto:ola.liljed...@arm.com] >On 28/09/2018, 02:43, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote: > > Some general comments for the various __atomic_store/load added, > > 1. Although it passes the compiler check, but I just want to confirm that > if we should use GCC/clang builtins, or if > There are higher level APIs in DPDK to do atomic operations? >[Ola] Adding "higher level" API's on top of the basic language/compiler >support is not a good idea. >There is an infinite amount of base types for the atomic operations, multiply >that with all different types of atomic operations (e.g. >load, store, fetch_add, add, cas etc etc) and the different memory orderings >and you create a very large API (but likely only a small but >irregular subset will be used). So lots of work for little gain and difficult >to test every single item in the API. > >For some compiler that does not support __atomic builtins, one could write an >__atomic emulation layer. But I think GCC __atomic is >already the ideal source code abstraction. [Wang, Yipeng]Thanks for the explanation. I think OVS does something like using macros to abstract the various atomic function from different compilers/architectures. But anyway, since rte_ring is using the builtin as well and the compiler check passed, I am OK with the implementation. Another comment I replied earlier is that rte_ring seems having a c11 header for using them. Should we assume similar thing?
> > > 2. We believe compiler will translate the atomic_store/load to regular MOV > instruction on > Total Store Order architecture (e.g. X86_64). But we run the perf test on > x86 and here is the relative slowdown on > lookup comparing to master head. I am not sure if the performance drop > comes from the atomic buitins. >[Ola] Any performance difference is most likely not from the use of atomic >builtins because as you write, on x86 they should translate >to normal loads and stores in most situations. But the code and data >structures have changed so there is some difference in e.g. >memory accesses, couldn't this explain the performance difference?> [Wang, Yipeng] Yes it might be. > [Wang, Yipeng] I did not quite understand why do we need synchronization > for hash data update. > Since pdata write is already atomic, the lookup will either read out the > stale data or the new data, > which should be fine without synchronization. > Is it to ensure the order of multiple reads in lookup threads? >[Ola] If pdata is used as a reference to access other shared data, you need to >ensure that the access of pdata and accesses to other >data are ordered appropriately (e.g. with acquire/release). I think reading a >new pdata but stale associated data is a bad thing. > [Wang, Yipeng] Thanks for the explanation. I got it now!