The current implementation of rte_wmb/rte_rmb for x86 is using processor memory 
barrier. It's unnessary for IA processor, compiler memory barrier is enough. 
But if dpdk runing on a AMD processor, maybe we should use processor memory 
barrier.
I add a macro to distinguish them, if we compile DPDK for IA processor, add the 
macro (RTE_ARCH_X86_IA) can improve performance with compiler memory barrier. 
Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this 
case, if didn't add the macro, the memory ordering will not be guaranteed. 
Which macro is better?
If this patch applied, the PMD's old implementation of compiler memory barrier 
(some volatile variable) can be fixed with rte_rmb() and rte_wmb() for any 
architecture.

---
 lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h 
b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
index e93e8ee..52b1e81 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
@@ -49,10 +49,20 @@ extern "C" {

 #define        rte_mb() _mm_mfence()

+#ifdef RTE_ARCH_X86_IA
+
+#define rte_wmb() rte_compiler_barrier()
+
+#define rte_rmb() rte_compiler_barrier()
+
+#else
+
 #define        rte_wmb() _mm_sfence()

 #define        rte_rmb() _mm_lfence()

+#endif
+
 /*------------------------- 16 bit atomic operations 
-------------------------*/

 #ifndef RTE_FORCE_INTRINSICS
-- 
1.9.1

Reply via email to