On Fri, Jan 9, 2026 at 10:58 AM Morten Brørup <[email protected]> wrote: > > > From: Scott Mitchell [mailto:[email protected]] > > Sent: Friday, 9 January 2026 16.27 > > > > On Fri, Jan 9, 2026 at 4:26 AM Morten Brørup <[email protected]> > > wrote: > > > > > > > Changes in v8: > > > > - __rte_raw_cksum: use native pointer arithmetic instead of > > RTE_PTR_ADD > > > > to avoid incorrect results with -O3 for UDP checksums. Also > > improves > > > > performance due to less assembly generated with Clang. > > > > > > Personally, I also have observed GCC's optimizer behave as if it > > loses some contextual information when using RTE_PTR_ADD, and thus > > emitting less optimal code. > > > I didn't look further into it, and thus have no data or examples to > > back up the claim. Which is why I haven't started a discussion about > > discouraging the use of RTE_PTR_ADD. > > > In other words: I support this change. > > > > Sounds good! I observed ~600 (dpdk ptr macros) vs ~500 (native c ptr > > operations) TSC cycles/block in cksum_perf_autotest. > > That is a significant performance degradation caused by the RTE_PTR_ADD() > macros. We really should look into that - some day. ;-) > Our application code base has RTE_CONST_PTR_ADD/SUB() for type consistency > reasons (not for performance reasons). But I haven't gotten around to > submitting them to the DPDK project yet. > I wonder if the implicit stripping of "const" when using the RTE_PTR_ADD() > macros makes the difference, or if the difference stems from other optimizer > context getting lost. > > > > > > > > > > /* if length is odd, keeping it byte order independent */ > > > > - if (unlikely(len % 2)) { > > > > + if (len & 1) { > > > > uint16_t left = 0; > > > > - > > > > memcpy(&left, end, 1); > > > > sum += left; > > > > } > > > > > > Changing "len % 2" to "len & 1" made sense for consistency in > > previous versions handling 32/16/8/4/2-byte chunks before this 1-byte > > chunk; now it makes no difference, so consider not changing this part > > at all. > > > Under all circumstances, don't remove the unlikely() for handling odd > > length in __rte_raw_cksum(). The vast majority of packets (and partial > > packets, e.g. headers) being checksummed are even length. > > > > > > > Sounds good. I will restore the original. > > > > The use case that motivated these changes was software interfaces > > (veth) > > with encapsulation requiring software checksum on inner IPv4 payloads, > > where lengths may be odd/even. > > You might want to mention the use case in the cover letter or patch > description. > Having a real use case often helps getting a patch accepted, especially for > optimization patches. >
Will do. Big win for this patch is enabling vectorization on clang which doesn't require the changes to the last byte. > > However, I agree that header checksums > > with even lengths are the more common case and unlikely() is > > appropriate. > > In my experience (and based on statistics from our appliances deployed), > internet traffic is dominated by "max size" (1500 byte or QUIC "safe max" > somewhat below 1500 byte) packets and empty TCP ACK packets, which are even > size. > So I also consider the unlikely()applicable to "real life" traffic on the > internet. > Although it's not in the order of magnitude many people advocate should be a > requirement for unlikely(). > 1.5k internet common-case makes sense. We have use cases using jumbo frames where non-full-frames aren't uncommon.

