Thomas Munro <thomas.mu...@enterprisedb.com> writes: > On Thu, Feb 14, 2019 at 4:38 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> I'd be inclined to rip out all of the run-time-detection logic here; >> I doubt any of it is buying anything that's worth the price of an >> indirect call.
> No view on that but apparently there were Intel Atom and AMD C chips > sold in the early part of this decade that lack POPCNT so I suspect > the distros can't ship software that requires it with no fallback. Ah, I was not looking at the business with the optional -mpopcnt compiler flag. I agree that we probably should not assume that code compiled with that will run anywhere. But it's silly to build all this infrastructure and then throw away the opportunity to optimize for anything but late-model Intel. A survey of the buildfarm results so far says that __builtin_clz and __builtin_ctz exist just about everywhere, and even __builtin_popcount is available on some non-Intel architectures. It is reasonable to assume that those builtins are faster than the C equivalents if they exist. It's reasonable to assume that even on old-school Intel hardware. The way this should have been done is to have a separate file that's compiled with -mpopcnt if the compiler has that (and has the builtins), and for the mainline file to have "slow" versions that use the less-optimized builtins if available, and only fall back to raw C code if not HAVE__BUILTIN_WHATEVER. Also, in #if defined(HAVE__GET_CPUID) && defined(HAVE__BUILTIN_POPCOUNT) static bool pg_popcount_available(void) { unsigned int exx[4] = { 0, 0, 0, 0 }; #if defined(HAVE__GET_CPUID) __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUID) __cpuid(exx, 1); #else #error cpuid instruction not available #endif return (exx[2] & (1 << 23)) != 0; /* POPCNT */ } #endif it's obvious to the naked eye that the __cpuid() and #error branches are unreachable because of the outer #if. I don't think that was the design intention. regards, tom lane