On Wed, Jun 10, 2020 at 3:47 AM Harry van Haaren <harry.van.haa...@intel.com> wrote: > > v3 Changes Summary: > - Added new "subtable lookup get" command for ease of use > - Changed set command to include "prio" aligning with other commands > - Improved output of "subtable lookup prio set" command > - Added documentation > - Minor code cleanups, #defines for magic numbers, typos etc > - Implement fix for hash-mismatch issue (reported by William Tu) > Thanks, I benchmark v3 with EMC disable. Similar to the conclusion from v2, overall with AVX512 lookup enabled, the overall performance is slower, but the miniflow_lookup is faster.
=== Without AVX === root@instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show pmd thread numa_id 0 core_id 0: packets received: 213457536 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 0 smc hits: 0 megaflow hits: 213457119 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 1 miss with failed upcall: 416 avg. packets per output batch: 0.00 idle cycles: 0 (0.00%) processing cycles: 49779856442 (100.00%) avg cycles per packet: 233.21 (49779856442/213457536) avg processing cycles per packet: 233.21 (49779856442/213457536) === With AVX512 === ./boot.sh && ./configure CFLAGS="-g -O2 -mpopcnt -msse4.2 -march=native" --enable-Werror --with-dpdk=/usr/src/dpdk/build/ && make -j4 && make install root@instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show pmd thread numa_id 0 core_id 0: packets received: 130351552 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 0 smc hits: 0 megaflow hits: 130351071 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 1 miss with failed upcall: 480 avg. packets per output batch: 0.00 idle cycles: 0 (0.00%) processing cycles: 31506266904 (100.00%) avg cycles per packet: 241.70 (31506266904/130351552) avg processing cycles per packet: 241.70 (31506266904/130351552) > v4 Planned work: > - Add NEWS section > - Investigate/fix --enable-shared builds link-time issues > - Enable autovalidator to run with unit-tests without recompilation > (Already works now, but requires manual priority change at compile time) > - Address other feedback on v3 > > > This patchset implements the changes as proposed during the > OVS Conf '19, in the talk "Next steps for SW Datapath". > Youtube link: https://youtu.be/x0bOpojnpmU > > The talk raises 3 main requirements for CPU ISA Optimizations, > each of which is addressed in some of the patches below. > - Test & Validation (video @ 2:20) > - Usabiliity & Debug (video @ 6:00) > - Package & Deploy (video @ 8:45) > > Patch 1/7: > The test and validation requirements proposed above are implemented, > with the refactor of the subtable function pointer registration, > and the autovalidator implementation is added. > > Patch 2 & 3 / 7: > Adds the commands for usability & debug. Now improved with a "get" and > "set" command. Get returns current priorities and a list of each lookup > implementation. Set provides feedback to the user as to the number of > DPCLS ports/subtables that have new lookup functions due to the command > that was executed. > > Patch 4/7: > Enable CPU ISA detection at runtime, providing information for future > ISA optimized functions. > > Patch 5/7: > Build system changes to enable the Package & Deploy requirements, > allowing a single OVS binary to run on all CPUs, but also gain best > performance from CPU specific ISA optimizations. > > Patch 6/7: > Actual AVX-512 implementation for DPCLS subtable search. This is the > actual SIMD vector code, which performs DPCLS miniflow iteration in > parallel. > > Patch 7/7: > Add section in dpdk/bridges.rst on how to use the DPCLS commands, and > what they can be used for. Testing and validation using autovalidator > concept introduced, and command to set its priority is provided. > > > Thanks for reading, any questions please let me know. > Regards, -Harry > > > Harry van Haaren (7): > dpif: implement subtable lookup validation > dpif-netdev: add subtable lookup set command > dpif-netdev: add subtable-lookup-get command for usability > dpcls: enable cpu feature detection > lib/automake: split build multiple static library > dpif-lookup: add avx512 gather implementation > docs/dpdk/bridge: add datapath performance section > > Documentation/topics/dpdk/bridge.rst | 63 ++++++ > lib/automake.mk | 69 +++++-- > lib/dpdk-stub.c | 13 ++ > lib/dpdk.c | 27 +++ > lib/dpdk.h | 2 + > lib/dpif-netdev-lookup-autovalidator.c | 106 ++++++++++ > lib/dpif-netdev-lookup-avx512-gather.c | 265 +++++++++++++++++++++++++ > lib/dpif-netdev-lookup-generic.c | 9 +- > lib/dpif-netdev-lookup.c | 111 +++++++++++ > lib/dpif-netdev-lookup.h | 82 ++++++++ > lib/dpif-netdev-private.h | 15 -- > lib/dpif-netdev.c | 165 ++++++++++++++- > 12 files changed, 884 insertions(+), 43 deletions(-) > create mode 100644 lib/dpif-netdev-lookup-autovalidator.c > create mode 100644 lib/dpif-netdev-lookup-avx512-gather.c > create mode 100644 lib/dpif-netdev-lookup.c > create mode 100644 lib/dpif-netdev-lookup.h > > -- > 2.17.1 > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev