https://github.com/pcc created https://github.com/llvm/llvm-project/pull/99260

We can use UMAXV.4S to reduce the comparison result in a single
instruction. This improves performance by roughly 4% on Apple M1:

Summary
  bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode 
--sweep-max-size=128 --output=/dev/null --num-trials=10 ran
    1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark3 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark2 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.02 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.05 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark1 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10
    1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 
--study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null 
--num-trials=10

(1 = original, 2 = a variant of this patch that uses UMAXV.16B, 3 = this patch)



_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to