On Sat, 7 Feb 2026 at 02:29, Scott Mitchell <[email protected]> wrote:
>
> Thanks for testing! I included my build/host config, results on the
> main branch, and then with this path applied below. What is your build
> flags/configuration (e, cpu_instruction_set, march, optimization
> level, etc.)? I wasn't able to get any Clang version (18, 19, 20) to
> vectorize on Godbolt https://godbolt.org/z/8149r7sq8, and curious if
> your config enables vectorization.
>
> #### build / host config
>   User defined options
>     b_lto              : false
>     buildtype          : release
>     c_args             : -fno-omit-frame-pointer
> -DPACKET_QDISC_BYPASS=1 -DRTE_MEMCPY_AVX512=1
>     cpu_instruction_set: cascadelake
>     default_library    : static
>     max_lcores         : 128
>     optimization       : 3
> $ clang --version
> clang version 18.1.8 (Red Hat, Inc. 18.1.8-3.el9)
> $ cat /etc/redhat-release
> Red Hat Enterprise Linux release 9.4 (Plow)
>
> #### main branch
> $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test
> ### rte_raw_cksum() performance ###
> Alignment  Block size    TSC cycles/block  TSC cycles/byte
> Aligned           20                10.0             0.50
> Unaligned         20                10.1             0.50
> Aligned           21                11.1             0.53
> Unaligned         21                11.6             0.55
> Aligned          100                39.4             0.39
> Unaligned        100                67.3             0.67
> Aligned          101                43.3             0.43
> Unaligned        101                41.5             0.41
> Aligned         1500               728.2             0.49
> Unaligned       1500               805.8             0.54
> Aligned         1501               768.8             0.51
> Unaligned       1501               787.3             0.52
> Test OK
>
> #### with this patch
> $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test
> ### rte_raw_cksum() performance ###
> Alignment  Block size    TSC cycles/block  TSC cycles/byte
> Aligned           20                12.6             0.63
> Unaligned         20                12.3             0.62
> Aligned           21                13.6             0.65
> Unaligned         21                13.6             0.65
> Aligned          100                22.7             0.23
> Unaligned        100                22.6             0.23
> Aligned          101                47.4             0.47
> Unaligned        101                23.9             0.24
> Aligned         1500                73.9             0.05
> Unaligned       1500                73.9             0.05
> Aligned         1501                95.7             0.06
> Unaligned       1501                73.9             0.05
> Aligned         9000               459.8             0.05
> Unaligned       9000               523.5             0.06
> Aligned         9001               536.7             0.06
> Unaligned       9001               507.5             0.06
> Aligned        65536              3158.4             0.05
> Unaligned      65536              3506.1             0.05
> Aligned        65537              3277.6             0.05
> Unaligned      65537              3697.6             0.06
> Test OK

I redid my bench from scratch and I do see an improvement for clang.
-Aligned         1500               905.3             0.60
-Unaligned       1500               924.9             0.62
-Aligned         1501               907.6             0.60
-Unaligned       1501               932.1             0.62
-Aligned         9000              5252.1             0.58
-Unaligned       9000              5433.0             0.60
-Aligned         9001              5260.9             0.58
-Unaligned       9001              5440.4             0.60
-Aligned        65536             38395.2             0.59
-Unaligned      65536             39639.5             0.60
-Aligned        65537             38030.3             0.58
-Unaligned      65537             39292.7             0.60

+Aligned         1500               104.0             0.07
+Unaligned       1500               106.5             0.07
+Aligned         1501               104.1             0.07
+Unaligned       1501               107.0             0.07
+Aligned         9000               596.7             0.07
+Unaligned       9000               655.1             0.07
+Aligned         9001               597.6             0.07
+Unaligned       9001               657.2             0.07
+Aligned        65536              4139.3             0.06
+Unaligned      65536              4583.2             0.07
+Aligned        65537              4139.9             0.06
+Unaligned      65537              4585.9             0.07

Something was most likely wrong in my test (and seeing how the gcc and
clang numbers looked so close... I may have been using the gcc
binary...).
This is noticeable with clang, and no special cpu_instruction_set or
any kind of compiler optimisation level set.

I'll finish my checks and merge this nice improvement for rc1.


-- 
David Marchand

Reply via email to