>-----Original Message-----
>From: Ola Liljedahl [mailto:ola.liljed...@arm.com]
>On 28/09/2018, 02:43, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote:
>
>    Some general comments for  the various __atomic_store/load added,
>
>    1. Although it passes the compiler check, but I just want to confirm that 
> if we should use GCC/clang builtins, or if
>    There are higher level APIs in DPDK to do atomic operations?
>[Ola] Adding "higher level" API's on top of the basic language/compiler 
>support is not a good idea.
>There is an infinite amount of base types for the atomic operations, multiply 
>that with all different types of atomic operations (e.g.
>load, store, fetch_add, add, cas etc etc) and the different memory orderings 
>and you create a very large API (but likely only a small but
>irregular subset will be used). So lots of work for little gain and difficult 
>to test every single item in the API.
>
>For some compiler that does not support __atomic builtins, one could write an 
>__atomic emulation layer. But I think GCC __atomic is
>already the ideal source code abstraction.
[Wang, Yipeng]Thanks for the explanation. I think OVS does something like using 
macros to abstract the various atomic
function from different compilers/architectures. But anyway,
since rte_ring is using the builtin as well and the compiler check passed, I am 
OK with the implementation.
Another comment I replied earlier is that rte_ring seems having a c11 header 
for using them. Should we
assume similar thing?

>
>
>    2. We believe compiler will translate the atomic_store/load to regular MOV 
> instruction on
>    Total Store Order architecture (e.g. X86_64). But we run the perf test on 
> x86 and here is the relative slowdown on
>    lookup comparing to master head. I am not sure if the performance drop 
> comes from the atomic buitins.
>[Ola] Any performance difference is most likely not from the use of atomic 
>builtins because as you write, on x86 they should translate
>to normal loads and stores in most situations. But the code and data 
>structures have changed so there is some difference in e.g.
>memory accesses, couldn't this explain the performance difference?>
[Wang, Yipeng] Yes it might be. 


>    [Wang, Yipeng] I did not quite understand why do we need synchronization 
> for hash data update.
>    Since pdata write is already atomic, the lookup will either read out the 
> stale data or the new data,
>    which should be fine without synchronization.
>    Is it to ensure the order of multiple reads in lookup threads?
>[Ola] If pdata is used as a reference to access other shared data, you need to 
>ensure that the access of pdata and accesses to other
>data are ordered appropriately (e.g. with acquire/release). I think reading a 
>new pdata but stale associated data is a bad thing.
>
[Wang, Yipeng] Thanks for the explanation. I got it now!


Reply via email to