> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson > Sent: Thursday, September 17, 2020 11:13 AM > > On Thu, Sep 17, 2020 at 09:37:29AM +0200, Morten Brørup wrote: > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wenzhuo Lu > > > Sent: Thursday, September 17, 2020 3:40 AM > > > > > > AVX512 instructions is supported by more and more platforms. These > > > instructions > > > can be used in the data path to enhance the per-core performance of > > > packet > > > processing. > > > Comparing with the existing implementation, this path set > introduces > > > some AVX512 > > > instructions into the iavf data path, and we get a better per-code > > > throughput. > > > > > > v2: > > > Update meson.build. > > > Repalce the deprecated 'buf_physaddr' by 'buf_iova'. > > > > > > Wenzhuo Lu (3): > > > net/iavf: enable AVX512 for legacy RX > > > net/iavf: enable AVX512 for flexible RX > > > net/iavf: enable AVX512 for TX > > > > > > doc/guides/rel_notes/release_20_11.rst | 3 + > > > drivers/net/iavf/iavf_ethdev.c | 3 +- > > > drivers/net/iavf/iavf_rxtx.c | 69 +- > > > drivers/net/iavf/iavf_rxtx.h | 18 + > > > drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720 > > > +++++++++++++++++++++++++++++++ > > > drivers/net/iavf/meson.build | 17 + > > > 6 files changed, 1818 insertions(+), 12 deletions(-) > > > create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c > > > > > > -- > > > 1.9.3 > > > > > > > I am not sure I understand the full context here, so please bear with > me if I'm completely off... > > > > With this patch set, it looks like the driver manipulates the mempool > cache directly, bypassing the libararies encapsulating it. > > > > Isn't that going deeper into a library than expected... What if the > implementation of the mempool library changes radically? > > > > And if there are performance gains to be achieved by using vector > instructions for manipulating the mempool, perhaps your vector > optimizations should go into the mempool library instead? > > > > Looking specifically at the descriptor re-arm code, the benefit from > working off the mempool cache directly comes from saving loads by > merging > the code blocks, rather than directly from the vectorization itself - > though the vectorization doesn't hurt. The original code having a > separate > mempool function worked roughly like below: > > 1. mempool code loads mbuf pointers from cache > 2. mempool code writes mbuf pointers to the SW ring for the NIC > 3. driver code loads the mempool pointers from the SW ring > 4. driver code then does the rest of the descriptor re-arm. > > The benefit comes from eliminating step 3, the loads in the driver, > which > are dependent upon the previous stores. By having the driver itself > read > from the mempool cache (the code still uses mempool functions for every > other part, since everything beyond the cache depends on the > ring/stack/bucket implementation), we can have the stores go out, and > while > they are completing reuse the already-loaded data to do the descriptor > rearm. > > Hope this clarifies things. > > /Bruce >
Thank you for the detailed explanation, Bruce. It makes sense to me now. So, Acked-By: Morten Brørup <m...@smartsharesystems.com> Med venlig hilsen / kind regards - Morten Brørup