Hi > -----Original Message----- > From: Ananyev, Konstantin > Sent: Tuesday, October 3, 2017 00:39 > To: Li, Xiaoyun <[email protected]>; Richardson, Bruce > <[email protected]> > Cc: Lu, Wenzhuo <[email protected]>; Zhang, Helin > <[email protected]>; [email protected] > Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > -----Original Message----- > > From: Li, Xiaoyun > > Sent: Monday, October 2, 2017 5:13 PM > > To: Ananyev, Konstantin <[email protected]>; Richardson, > Bruce <[email protected]> > > Cc: Lu, Wenzhuo <[email protected]>; Zhang, Helin > <[email protected]>; [email protected]; Li, Xiaoyun <[email protected]> > > Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > This patch dynamically selects functions of memcpy at run-time based > > on CPU flags that current machine supports. This patch uses function > > pointers which are bind to the relative functions at constrctor time. > > In addition, AVX512 instructions set would be compiled only if users > > config it enabled and the compiler supports it. > > > > Signed-off-by: Xiaoyun Li <[email protected]> > > --- > > v2 > > * Use gcc function multi-versioning to avoid compilation issues. > > * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the > > compiler supports it, the AVX512 codes would be compiled. Only if the > > compiler supports AVX2, the AVX2 codes would be compiled. > > > > v3 > > * Reduce function calls via only keep rte_memcpy_xxx. > > * Add conditions that when copy size is small, use inline code path. > > Otherwise, use dynamic code path. > > * To support attribute target, clang version must be greater than 3.7. > > Otherwise, would choose SSE/AVX code path, the same as before. > > * Move two mocro functions to the top of the code since they would be > > used in inline SSE/AVX and dynamic SSE/AVX codes. > > > > v4 > > * Modify rte_memcpy.h to several .c files and modify makefiles to compile > > AVX2 and AVX512 files. > > Could you explain to me why instead of reusing existing rte_memcpy() code > to generate _sse/_avx2/ax512f flavors you keep pushing changes with 3 > separate implementations? > Obviously that is much more expensive in terms of maintenance and doesn't > look like > feasible solution to me. > Is existing rte_memcpy() implementation is not good enough in terms of > functionality and/or performance? > If so, can you outline these problems and try to fix them first. > Konstantin >
I just change many small functions to one function in those 3 separate functions. Because the existing codes are totally inline, including rte_memcpy() itself. So the compilation will change all rte_memcpy() calls into the basic codes like xmm0=xxx. The existing codes in this way are OK. But when run-time, it will bring lots of function calls and cause perf drop. Best Regards, Xiaoyun Li

