eisenwave wrote: FYI I compared some of this in a benchmark:
<img width="1735" height="975" alt="image" src="https://github.com/user-attachments/assets/ef03118e-9d36-4b97-ac1e-5dfb79fa5d1c" /> https://quick-bench.com/q/lqt9N8l715lwl9I4On2-hNdrV_o The `naive` versions are simple linear loops, but I did make them branchless. The `fast` versions are the Hacker's Delight algorithms, and `native` is just using the `pext`/`pdep` instructions. Making the naive versions branching makes them much slower. So for the codegen, I'm definitely going to go for the Hacker's Delight algorithm because it always seems to beat the naive form. To be fair, this benchmark is unfair because it uses uniformly random masks and inputs, but that's not a realistic situation in practice. However, the Hacker's delight versions are faster even when testing on 100% zeroed data. https://github.com/llvm/llvm-project/pull/200114 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
