Hi all After further investigating, we have found some benefits with the patchset. So the plan is to add a config parameter CONFIG_RTE_ENABLE_RUNTIME_DISPATCH. By default, the value is "n" and would use current memcpy codes. Only if users config it to "y", it would use the run-time dispatch codes(without inline).
Best Regards, Xiaoyun Li > -----Original Message----- > From: dev [mailto:[email protected]] On Behalf Of Li, Xiaoyun > Sent: Tuesday, September 12, 2017 10:27 > To: Wang, Liang-min <[email protected]>; Richardson, Bruce > <[email protected]>; Ananyev, Konstantin > <[email protected]> > Cc: Zhang, Qi Z <[email protected]>; Lu, Wenzhuo > <[email protected]>; Zhang, Helin <[email protected]>; > [email protected]; [email protected] > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/x86: run-time dispatch over > memcpy > > Hi ALL > > After investigating, most DPDK codes are already run-time dispatching. Only > rte_memcpy chooses the ISA at build-time. > > To modify memcpy, there are two ways. The first one is function pointers > and another is function multi-versioning in GCC. > > But memcpy has been greatly optimized and gets benefit from total inline. If > changing it to run-time dispatching via function pointers, the perf will drop > a > lot especially when copy size is small. > > And function multi-versioning in GCC only works for C++. Even if it is said > that > GCC6 can support C, but in fact it does not support C in my trial. > > > > The attachment is the perf results of memcpy with and without my patch and > original DPDK codes but without inline. > > It's just for comparison, so right now, I only tested on Broadwell, using > AVX2. > > The results are from running test/test/test_memcpy_perf.c. > > (C = compile-time constant) > > /* Do aligned tests where size is a variable */ > > /* Do aligned tests where size is a compile-time constant */ > > /* Do unaligned tests where size is a variable */ > > /* Do unaligned tests where size is a compile-time constant */ > > > > 4-7 means dpdk costs time 4 and glibc costs time 7 > > For size smaller than 128 bytes. This patch's perf is bad and even worse than > glibc. > > When size grows, the perf is better than glibc but worse than original dpdk. > > And when grows above about 1024 bytes, it performs similarly to original > dpdk. > > Furthermore, if delete inline in original dpdk, the perf are similar to the > perf > with patch. > > Different situations(4 types, such as cache to cache) perform differently but > the trend is the same (size grows, perf grows). > > > > So if needs dynamic, needs sacrifices some perf and needs to compile for the > minimum target (e.g. compile for target avx, run on avx, avx2, avx512f). > > > > Thus, I think this feature shouldn't be delivered in this release. > > > > Best Regards, > > Xiaoyun Li

