OK. Got it. Thanks!
> -----Original Message----- > From: Ananyev, Konstantin > Sent: Tuesday, October 3, 2017 20:12 > To: Li, Xiaoyun <[email protected]>; Richardson, Bruce > <[email protected]> > Cc: Lu, Wenzhuo <[email protected]>; Zhang, Helin > <[email protected]>; [email protected] > Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > > > Hi > > You mean just use rte_memcpy_internal in rte_memcpy_avx2, > rte_memcpy_avx512? > > Yes, exactly and for rte_memcpy_sse() too. > Basically we for rte_memcpy_avx512() we force compiler to use AVX512F > path inside rte_memcpy_iternal(), for rte_memcpy_avx2() we use AVX2 path > inside rte_memcpy_internal(), etc. > To do that we setup: > CFLAGS_rte_memcpy_avx512f.o += -mavx512f > CFLAGS_rte_memcpy_avx512f.o += -DRTE_MACHINE_CPUFLAG_AVX512F > inside the Makefile. > > For rte_memcpy_avx2() we force compiler to use AVX2 path inside > rte_memcpy_internal(), etc. > > > But if RTE_MACHINE_CPUFLAGS_AVX2 means only whether the compiler > > supports avx2, then internal would only compiled With avx2 codes, then > cannot choose other code path. What if the HW cannot support avx2? > > If the HW can't support AVX2 then rte_memcpy_init() just wouldn't select > rte_memcpy_avx2(), it would select rte_memcpy_sse() instead: > > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) {...} - that is a runtime > check that underlying HW does support AVX2. > > Konstantin > > > If RTE_MACHINE_CPUFLAGS_AVX2 means as before, suggests whether > both > > compiler and HW supports avx2. Then the function has no difference right > now. > > The mocro is determined at compilation time. But selection is hoped to be > at runtime. > > Did I consider something wrong? > > > > Best Regards, > > Xiaoyun Li > > > > > > > > > > > -----Original Message----- > > > From: Ananyev, Konstantin > > > Sent: Tuesday, October 3, 2017 19:16 > > > To: Li, Xiaoyun <[email protected]>; Richardson, Bruce > > > <[email protected]> > > > Cc: Lu, Wenzhuo <[email protected]>; Zhang, Helin > > > <[email protected]>; [email protected] > > > Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > > Hi, > > > > > > > > > > > Hi > > > > > > > > > -----Original Message----- > > > > > From: Ananyev, Konstantin > > > > > Sent: Tuesday, October 3, 2017 00:39 > > > > > To: Li, Xiaoyun <[email protected]>; Richardson, Bruce > > > > > <[email protected]> > > > > > Cc: Lu, Wenzhuo <[email protected]>; Zhang, Helin > > > > > <[email protected]>; [email protected] > > > > > Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over > > > > > memcpy > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Li, Xiaoyun > > > > > > Sent: Monday, October 2, 2017 5:13 PM > > > > > > To: Ananyev, Konstantin <[email protected]>; > > > > > > Richardson, > > > > > Bruce <[email protected]> > > > > > > Cc: Lu, Wenzhuo <[email protected]>; Zhang, Helin > > > > > <[email protected]>; [email protected]; Li, Xiaoyun > > > > > <[email protected]> > > > > > > Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > > > > > > > > This patch dynamically selects functions of memcpy at run-time > > > > > > based on CPU flags that current machine supports. This patch > > > > > > uses function pointers which are bind to the relative > > > > > > functions at constrctor > > > time. > > > > > > In addition, AVX512 instructions set would be compiled only if > > > > > > users config it enabled and the compiler supports it. > > > > > > > > > > > > Signed-off-by: Xiaoyun Li <[email protected]> > > > > > > --- > > > > > > v2 > > > > > > * Use gcc function multi-versioning to avoid compilation issues. > > > > > > * Add macros for AVX512 and AVX2. Only if users enable AVX512 > > > > > > and the compiler supports it, the AVX512 codes would be > > > > > > compiled. Only if the compiler supports AVX2, the AVX2 codes > would be compiled. > > > > > > > > > > > > v3 > > > > > > * Reduce function calls via only keep rte_memcpy_xxx. > > > > > > * Add conditions that when copy size is small, use inline code path. > > > > > > Otherwise, use dynamic code path. > > > > > > * To support attribute target, clang version must be greater than > > > > > > 3.7. > > > > > > Otherwise, would choose SSE/AVX code path, the same as before. > > > > > > * Move two mocro functions to the top of the code since they > > > > > > would be used in inline SSE/AVX and dynamic SSE/AVX codes. > > > > > > > > > > > > v4 > > > > > > * Modify rte_memcpy.h to several .c files and modify makefiles > > > > > > to compile > > > > > > AVX2 and AVX512 files. > > > > > > > > > > Could you explain to me why instead of reusing existing > > > > > rte_memcpy() code to generate _sse/_avx2/ax512f flavors you keep > > > > > pushing changes with 3 separate implementations? > > > > > Obviously that is much more expensive in terms of maintenance > > > > > and doesn't look like feasible solution to me. > > > > > Is existing rte_memcpy() implementation is not good enough in > > > > > terms of functionality and/or performance? > > > > > If so, can you outline these problems and try to fix them first. > > > > > Konstantin > > > > > > > > > > > > > I just change many small functions to one function in those 3 > > > > separate > > > functions. > > > > > > Yes, so with what you suggest we'll have 4 implementations for > > > rte_memcpy to support. > > > That's very expensive terms of maintenance and I believe totally > unnecessary. > > > > > > > Because the existing codes are totally inline, including > > > > rte_memcpy() itself. So the compilation will change all > > > > rte_memcpy() calls into the basic > > > codes like xmm0=xxx. > > > > > > > > The existing codes in this way are OK. > > > > > > Good. > > > > > > >But when run-time, it will bring lots of function calls and cause > > > >perf drop. > > > > > > I believe it wouldn't if we do it properly. > > > All internal functions (mov16, mov32, etc.) will still be unlined by > > > the compiler for each flavor (sse/avx2/etc.) - have a look at the patch I > sent. > > > > > > Konstantin

