> Subject: Re: [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier > > 2014-09-11 07:48, Hiroshi Shimamoto: > > x86 can keep store ordering with standard operations. > > Are we sure it's always the case (including old 32-bit CPU)? > I would prefer to have a reference here. I know we already discussed > this kind of things but having a reference in commit log could help > for future discussions. > > > Using memory barrier is much expensive in main packet processing loop. > > Removing this improves xmit/recv packet performance. > > > > We can see performance improvements with memnic-tester. > > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU. > > size | before | after > > 64 | 4.18Mpps | 4.59Mpps > > 128 | 3.85Mpps | 4.87Mpps > > 256 | 4.01Mpps | 4.72Mpps > > 512 | 3.52Mpps | 4.41Mpps > > 1024 | 3.18Mpps | 3.64Mpps > > 1280 | 2.86Mpps | 3.15Mpps > > 1518 | 2.59Mpps | 2.87Mpps > > > > Note: we have to take care if we use temporal cache. > > Please, could you explain this last sentence?
Oops, I have mistaken the word, "temporal" should be "non-temporal". By the way, there are some instructions which use non-temporal cache liek MOVNTx series. The store ordering of these instructions is not kept. Ref. Intel Software Developer Manual Vol.1 10.4.6.2 Caching of Temporal vs. Non-Temporal Data Vol.3 8.2 Memory Ordering thanks, Hiroshi > > Thanks > -- > Thomas