On Fri, Jan 9, 2026 at 10:58 AM Morten Brørup <[email protected]> 
wrote:
>
> > From: Scott Mitchell [mailto:[email protected]]
> > Sent: Friday, 9 January 2026 16.27
> >
> > On Fri, Jan 9, 2026 at 4:26 AM Morten Brørup <[email protected]>
> > wrote:
> > >
> > > > Changes in v8:
> > > > - __rte_raw_cksum: use native pointer arithmetic instead of
> > RTE_PTR_ADD
> > > >   to avoid incorrect results with -O3 for UDP checksums. Also
> > improves
> > > >   performance due to less assembly generated with Clang.
> > >
> > > Personally, I also have observed GCC's optimizer behave as if it
> > loses some contextual information when using RTE_PTR_ADD, and thus
> > emitting less optimal code.
> > > I didn't look further into it, and thus have no data or examples to
> > back up the claim. Which is why I haven't started a discussion about
> > discouraging the use of RTE_PTR_ADD.
> > > In other words: I support this change.
> >
> > Sounds good! I observed ~600 (dpdk ptr macros) vs ~500 (native c ptr
> > operations) TSC cycles/block in cksum_perf_autotest.
>
> That is a significant performance degradation caused by the RTE_PTR_ADD() 
> macros. We really should look into that - some day. ;-)
> Our application code base has RTE_CONST_PTR_ADD/SUB() for type consistency 
> reasons (not for performance reasons). But I haven't gotten around to 
> submitting them to the DPDK project yet.
> I wonder if the implicit stripping of "const" when using the RTE_PTR_ADD() 
> macros makes the difference, or if the difference stems from other optimizer 
> context getting lost.
>
> >
> > >
> > > >       /* if length is odd, keeping it byte order independent */
> > > > -     if (unlikely(len % 2)) {
> > > > +     if (len & 1) {
> > > >               uint16_t left = 0;
> > > > -
> > > >               memcpy(&left, end, 1);
> > > >               sum += left;
> > > >       }
> > >
> > > Changing "len % 2" to "len & 1" made sense for consistency in
> > previous versions handling 32/16/8/4/2-byte chunks before this 1-byte
> > chunk; now it makes no difference, so consider not changing this part
> > at all.
> > > Under all circumstances, don't remove the unlikely() for handling odd
> > length in __rte_raw_cksum(). The vast majority of packets (and partial
> > packets, e.g. headers) being checksummed are even length.
> > >
> >
> > Sounds good. I will restore the original.
> >
> > The use case that motivated these changes was software interfaces
> > (veth)
> > with encapsulation requiring software checksum on inner IPv4 payloads,
> > where lengths may be odd/even.
>
> You might want to mention the use case in the cover letter or patch 
> description.
> Having a real use case often helps getting a patch accepted, especially for 
> optimization patches.
>

Will do. Big win for this patch is enabling vectorization on clang
which doesn't require the changes to the last byte.

> > However, I agree that header checksums
> > with even lengths are the more common case and unlikely() is
> > appropriate.
>
> In my experience (and based on statistics from our appliances deployed), 
> internet traffic is dominated by "max size" (1500 byte or QUIC "safe max" 
> somewhat below 1500 byte) packets and empty TCP ACK packets, which are even 
> size.
> So I also consider the unlikely()applicable to "real life" traffic on the 
> internet.
> Although it's not in the order of magnitude many people advocate should be a 
> requirement for unlikely().
>

1.5k internet common-case makes sense. We have use cases using jumbo frames
where non-full-frames aren't uncommon.

Reply via email to