On Sat, 7 Feb 2026 at 02:29, Scott Mitchell <[email protected]> wrote: > > Thanks for testing! I included my build/host config, results on the > main branch, and then with this path applied below. What is your build > flags/configuration (e, cpu_instruction_set, march, optimization > level, etc.)? I wasn't able to get any Clang version (18, 19, 20) to > vectorize on Godbolt https://godbolt.org/z/8149r7sq8, and curious if > your config enables vectorization. > > #### build / host config > User defined options > b_lto : false > buildtype : release > c_args : -fno-omit-frame-pointer > -DPACKET_QDISC_BYPASS=1 -DRTE_MEMCPY_AVX512=1 > cpu_instruction_set: cascadelake > default_library : static > max_lcores : 128 > optimization : 3 > $ clang --version > clang version 18.1.8 (Red Hat, Inc. 18.1.8-3.el9) > $ cat /etc/redhat-release > Red Hat Enterprise Linux release 9.4 (Plow) > > #### main branch > $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test > ### rte_raw_cksum() performance ### > Alignment Block size TSC cycles/block TSC cycles/byte > Aligned 20 10.0 0.50 > Unaligned 20 10.1 0.50 > Aligned 21 11.1 0.53 > Unaligned 21 11.6 0.55 > Aligned 100 39.4 0.39 > Unaligned 100 67.3 0.67 > Aligned 101 43.3 0.43 > Unaligned 101 41.5 0.41 > Aligned 1500 728.2 0.49 > Unaligned 1500 805.8 0.54 > Aligned 1501 768.8 0.51 > Unaligned 1501 787.3 0.52 > Test OK > > #### with this patch > $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test > ### rte_raw_cksum() performance ### > Alignment Block size TSC cycles/block TSC cycles/byte > Aligned 20 12.6 0.63 > Unaligned 20 12.3 0.62 > Aligned 21 13.6 0.65 > Unaligned 21 13.6 0.65 > Aligned 100 22.7 0.23 > Unaligned 100 22.6 0.23 > Aligned 101 47.4 0.47 > Unaligned 101 23.9 0.24 > Aligned 1500 73.9 0.05 > Unaligned 1500 73.9 0.05 > Aligned 1501 95.7 0.06 > Unaligned 1501 73.9 0.05 > Aligned 9000 459.8 0.05 > Unaligned 9000 523.5 0.06 > Aligned 9001 536.7 0.06 > Unaligned 9001 507.5 0.06 > Aligned 65536 3158.4 0.05 > Unaligned 65536 3506.1 0.05 > Aligned 65537 3277.6 0.05 > Unaligned 65537 3697.6 0.06 > Test OK
I redid my bench from scratch and I do see an improvement for clang. -Aligned 1500 905.3 0.60 -Unaligned 1500 924.9 0.62 -Aligned 1501 907.6 0.60 -Unaligned 1501 932.1 0.62 -Aligned 9000 5252.1 0.58 -Unaligned 9000 5433.0 0.60 -Aligned 9001 5260.9 0.58 -Unaligned 9001 5440.4 0.60 -Aligned 65536 38395.2 0.59 -Unaligned 65536 39639.5 0.60 -Aligned 65537 38030.3 0.58 -Unaligned 65537 39292.7 0.60 +Aligned 1500 104.0 0.07 +Unaligned 1500 106.5 0.07 +Aligned 1501 104.1 0.07 +Unaligned 1501 107.0 0.07 +Aligned 9000 596.7 0.07 +Unaligned 9000 655.1 0.07 +Aligned 9001 597.6 0.07 +Unaligned 9001 657.2 0.07 +Aligned 65536 4139.3 0.06 +Unaligned 65536 4583.2 0.07 +Aligned 65537 4139.9 0.06 +Unaligned 65537 4585.9 0.07 Something was most likely wrong in my test (and seeing how the gcc and clang numbers looked so close... I may have been using the gcc binary...). This is noticeable with clang, and no special cpu_instruction_set or any kind of compiler optimisation level set. I'll finish my checks and merge this nice improvement for rc1. -- David Marchand

