> -----Original Message----- > From: Eelco Chaudron <echau...@redhat.com> > Sent: Thursday, July 14, 2022 2:25 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com> > Cc: d...@openvswitch.org; i.maxim...@ovn.org; Amber, Kumar > <kumar.am...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Finn, Emma > <emma.f...@intel.com>; Stokes, Ian <ian.sto...@intel.com> > Subject: Re: [PATCH v10 10/10] odp-execute: Add ISA implementation of > set_masked > IPv4 action > > > From: Emma Finn <emma.f...@intel.com> > > > > This commit adds support for the AVX512 implementation of the > > ipv4_set_addrs action as well as an AVX512 implementation of > > updating the checksums.
<snip> > > + /* Update the IP checksum based on updated IP values. */ > > + uint16_t delta = avx512_ipv4_update_csum(v_res, v_packet); > > + uint32_t new_csum = old_csum + delta; > > + delta = csum_finish(new_csum); > > + > > + /* Insert new checksum. */ > > + v_res = _mm256_insert_epi16(v_res, delta, 5); > > + > > + /* If ip_src or ip_dst has been modified, L4 checksum needs to > > + * be updated too. */ > > + if (mask->ipv4_src || mask->ipv4_dst) { > > + > > + uint16_t delta_checksum = avx512_l4_update_csum(v_packet, > > v_res); > > + > > Wondering if all this AVX code being executed really is faster than > recalc_csum32(uh- > >udp_csum, old_addr, new_addr)? Ultimately, measuring is worth more than talking about it. In our measurements here, yes absolutely it is, our measurements are available in the cover letter of the patchset. Note that the code here is compute-bound, its juggling values between registers, and with XMM/YMM registers, SIMD IPC of 3 can be achieved. That means that in theory, the SIMD code executes ~3 intrinsics *per cycle*, but in practice the IPC is often *more* due to interleaved scalar code, and Out-of-Order execution capabilities of the CPU. Although the code is verbose (lots of typing) the resulting instruction stream is generally optimized very well by the compiler, and reduced to very small, dense and hot loops. I recommend using "perf top" to investigate the hotspots, for those unaware of tools and methods, a DPDK Userspace presentation covers exactly this using OVS DPCLS as the examples code! https://youtu.be/ZmwOKR5JyPk Regards, -Harry _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev