On Thu, Jan 15, 2026 at 3:40 AM Nathan Bossart <[email protected]> wrote: > Right now, the organization of this code is weird. All AArch64-specific > implementations live in an AArch64-specific file, the AVX-512 > implementations live in their own file, and the architecture-agnostic and > SSE4.2 implementations live in pg_bitutils.c. The attached patches move > the SSE4.2 implementations to the AVX-512 file (which is renamed > appropriately), and they update some function names to be more descriptive, > i.e., "fast" is replaced with "sse42" and "slow" is replaced with > "generic".
Thanks for taking on some technical debt! 0001 --- a/src/port/pg_popcount_avx512.c +++ b/src/port/pg_popcount_x86_64.c Can we get away with just "x86" for brevity? We generally don't target 32-bit CPUs for this kind of work, so there's no chance of confusion. 0002 ``` -#ifdef USE_AVX512_POPCNT_WITH_RUNTIME_CHECK +#include "port/pg_bitutils.h" + +#ifdef TRY_POPCNT_X86_64 #if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT) #include <cpuid.h> #endif ``` With the above in the x86 .c file, I wonder we can get rid of this stanza and the "try" symbol and gate only on HAVE_X86_64_POPCNTQ: #ifdef HAVE_X86_64_POPCNTQ #if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID) #define TRY_POPCNT_X86_64 1 #endif #endif If we have to be cautious, we could just turn the #error on no CPUID symbol into "return false". 0003 s/fast/sse42/: Seems okay in this file, but this isn't the best name, either. Maybe a comment to head off future "corrections", something like: "Technically, POPCNT is not part of SSE 4.2, and is not even a vector operation, but many compilers emit the popcnt instruction with -msse4.2 anyway." s/slow/generic/: I'm ambivalent about this. The "slow" designation is flat-out wrong since at least Power and aarch64 can emit a single instruction here without prodding the compiler. On the other hand, "generic" seems wrong too, since e.g. pg_popcount64_slow() has three configure symbols and two compiler builtins. :-D A possible future project would be to have a truly generic simple fallback in pure C and put all the fancy stuff in the header for architectures that have unconditional hardware support. It would make more sense to revisit the name then. -- John Naylor Amazon Web Services
