Hi Dong, > -----Original Message----- > From: Wang Dong [mailto:dong.wang.pro at hotmail.com] > Sent: Saturday, May 09, 2015 11:24 AM > To: Ananyev, Konstantin; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for > IA processor's rte_wmb/rte_rmb. > > Hi Konstantin, > > > > > Hi Dong, > > > >> -----Original Message----- > >> From: Wang Dong [mailto:dong.wang.pro at hotmail.com] > >> Sent: Thursday, May 07, 2015 4:28 PM > >> To: Ananyev, Konstantin; dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier > >> for IA processor's rte_wmb/rte_rmb. > >> > >> Hi Konstantin, > >> > >>> Hi Dong, > >>> > >>>> -----Original Message----- > >>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of WangDong > >>>> Sent: Tuesday, May 05, 2015 4:38 PM > >>>> To: dev at dpdk.org > >>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for > >>>> IA processor's rte_wmb/rte_rmb. > >>>> > >>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor > >>>> memory barrier. It's unnessary for IA processor, > >> compiler > >>>> memory barrier is enough. > >>> > >>> I wouldn't say they are 'unnecessary'. > >>> There are situations, even on IA, when you need _fence_ isntructions. > >>> So, please leave rte_*mb() macros unmodified. > >> OK, leave them unmodified, but I really can't find a situation to use > >> sfence and lfence instructions. > > > > For example: > > http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ > > http://dpdk.org/ml/archives/dev/2014-May/002613.html > > > >> > >> > >>> I still think that we need to create a new set of architecture dependent > >>> macros, as what discussed before. > >>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for > >>> them. > >>> Though if you have some better name in mind, I am open to suggestions > >>> here. > >> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ > > > > Hmm, but why _dma_? > > We need same thing for multi-core communication too. > > If rte_smp_ is not good enough, might be: rte_arch_? > I want these two macro only used in PMD, so I think _dma_ is better. The > memory barrier of processor-processor maybe more complex, and I'm not > familiar with it... Someone can add rte_smp_*mb for multi-core.
Sorry, what you are talking about? At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_. Konstantin > > I think _arch_ is means nothing here, because rte_*mb is already for > architectures that dpdk supported, they are redefined in these architecture. > > > > >> > >>> > >>>> But if dpdk runing on a AMD processor, maybe we should use processor > >>>> memory barrier. > >>> > >>> As far as I remember, amd has the same memory ordering model. > >> It's too hard to find a AMD's software developer manual..... > > > > There for example: > > http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf > > ? > Search such document on AMD offical website for a long time, this manual > is what I want, thanks very much!!! > > Dong > > > > > Konstantin > > > >> > >> Dong > >> > >>> So, I don't think we need #ifdef RTE_ARCH_X86_IA here. > >>> > >>> Konstantin > >>> > >>>> I add a macro to distinguish them, if we compile DPDK for IA processor, > >>>> add the macro (RTE_ARCH_X86_IA) can improve > >> performance > >>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using > >>>> processor memory barrier, in this case, if didn't > add > >> the > >>>> macro, the memory ordering will not be guaranteed. Which macro is better? > >>>> If this patch applied, the PMD's old implementation of compiler memory > >>>> barrier (some volatile variable) can be fixed with > >> rte_rmb() > >>>> and rte_wmb() for any architecture. > >>>> > >>>> --- > >>>> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ > >>>> 1 file changed, 10 insertions(+) > >>>> > >>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>>> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>>> index e93e8ee..52b1e81 100644 > >>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>>> @@ -49,10 +49,20 @@ extern "C" { > >>>> > >>>> #define rte_mb() _mm_mfence() > >>>> > >>>> +#ifdef RTE_ARCH_X86_IA > >>>> + > >>>> +#define rte_wmb() rte_compiler_barrier() > >>>> + > >>>> +#define rte_rmb() rte_compiler_barrier() > >>>> + > >>>> +#else > >>>> + > >>>> #define rte_wmb() _mm_sfence() > >>>> > >>>> #define rte_rmb() _mm_lfence() > >>>> > >>>> +#endif > >>>> + > >>>> /*------------------------- 16 bit atomic operations > >>>> -------------------------*/ > >>>> > >>>> #ifndef RTE_FORCE_INTRINSICS > >>>> -- > >>>> 1.9.1 > >>>