>-----Original Message----- >From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] >Sent: Sunday, September 30, 2018 3:21 PM >To: Wang, Yipeng1 <yipeng1.w...@intel.com>; Richardson, Bruce ><bruce.richard...@intel.com>; De Lara Guarch, Pablo ><pablo.de.lara.gua...@intel.com> >Cc: dev@dpdk.org; Gavin Hu (Arm Technology China) <gavin...@arm.com>; Steve >Capper <steve.cap...@arm.com>; Ola Liljedahl ><ola.liljed...@arm.com>; nd <n...@arm.com> >Subject: RE: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race >conditions > >> >> Some general comments for the various __atomic_store/load added, >> >> 1. Although it passes the compiler check, but I just want to confirm that if >> we >> should use GCC/clang builtins, or if There are higher level APIs in DPDK to >> do >> atomic operations? >> >I have used gcc builtins (just like rte_ring does) [Wang, Yipeng] I checked rte_ring, it also has a specific header for C11, since it is a C11 standard, do we need something similar here? > >> 2. We believe compiler will translate the atomic_store/load to regular MOV >> instruction on Total Store Order architecture (e.g. X86_64). But we run the >> perf test on x86 and here is the relative slowdown on lookup comparing to >> master head. I am not sure if the performance drop comes from the atomic >> buitins. >> >C11 atomics also block compiler reordering. Other than this, the retry loop is >an addition to lookup. >The patch also has the alignment corrected. I am not sure how is that >affecting the perf numbers. > >> Keysize | single lookup | bulk lookup >> 4 | 0.93 | 0.95 >> 8 | 0.95 | 0.96 >> 16 | 0.97 | 0.96 >> 32 | 0.97 | 1.00 >> 48 | 1.03 | 0.99 >> 64 | 1.04 | 0.98 >> 9 | 0.91 | 0.96 >> 13 | 0.97 | 0.98 >> 37 | 1.04 | 1.03 >> 40 | 1.02 | 0.98 >> >I assume this is the data from the test cases in test_hash_perf.c file. I >tried to reproduce this data, but my data is worse. Can you >specify the actual test from test_hash_perf.c you are using (With >locks/Pre-computed hash/With data/Elements in primary)? >IMO, the differences you have provided are not high.
[Wang, Yipeng] I remember the performance data I used is the no-lock, without hash, with 8-byte data, in both primary and secondary. I compared the master head to the one with your first two commits. > >> [Wang, Yipeng] I think even for current code, we need to check empty_slot. >> Could you export this as a bug fix commit? >> >In the existing code, there is check 'if (!!key_idx & !rte_hash....)'. Are you >referring to '!!key_idx'? I think this should be changed to >'(key_idx != EMPTY_SLOT)'. [Wang, Yipeng] Yeah, I guess I did not see that part. Then I guess it is no need to export as a bug fix for now since it is not a functional issue. Your change is good.