On Fri, Dec 5, 2025 at 10:40 PM Nathan Bossart <[email protected]> wrote: > I don't think the proposed improvements are relevant for either of the > machines you used for your benchmarks. For x86, we've optimized our > popcount code to use SSE4.2 or AVX-512, and for AArch64, we've optimized it > to use Neon or SVE. And for other systems, we still try to use > __builtin_popcount() and friends in the fallback paths, which IIUC are > available on both gcc and clang (and maybe elsewhere). IMHO we need to run > the benchmarks on a compiler/architecture combination where it would > actually be used in practice.
Yeah, if we did anything here, I'd rather arrange so that architectures that have unconditional hardware support can inline it at compile time. I believe ppc64le and aarch64 can do that unconditionally. For x86 we might be able to detect some symbol defined by the compiler, to do the same thing for OS's that require such support. -- John Naylor Amazon Web Services
