Hi, On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote: > On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote: > > Ah, I somehow thought we'd avoid the runtime check in case we determine at > > compile time we don't need any extra flags to enable the AVX512 stuff > > (similar > > to how we deal with crc32). But it looks like that's not the case - which > > seems pretty odd to me: > > > > This turns something that can be a single instruction into an indirect > > function call, even if we could know that it's guaranteed to be available > > for > > the compilation target, due to -march=.... > > > > It's one thing for the avx512 path to have that overhead, but it's > > particularly absurd for pg_popcount32/pg_popcount64, where > > > > a) The function call overhead is a larger proportion of the cost. > > b) the instruction is almost universally available, including in the > > architecture baseline x86-64-v2, which several distros are using as the > > x86-64 baseline. > > Yeah, pg_popcount32/64 have been doing this since v12 (02a6a54). Until v17 > (cc4826d), pg_popcount() repeatedly calls these function pointers, too. I > think it'd be awesome if we could start requiring some of these "almost > universally available" instructions, but AFAICT that brings its own > complexity [0].
I'll respond there... > > Why are we actually checking for xsave? We're not using xsave itself and I > > couldn't find a comment in 792752af4eb5 explaining what we're using it as a > > proxy for? Is that just to know if _xgetbv() exists? Is it actually > > possible > > that xsave isn't available when avx512 is? > > Yes, it's to verify we have XGETBV, which IIUC requires support from both > the processor and the OS (see 598e011 and upthread discussion). AFAIK the > way we are detecting AVX-512 support is quite literally by-the-book unless > I've gotten something wrong. I'm basically wondering whether we need to check for compiler (not OS support) support for xsave if we also check for -mavx512vpopcntdq -mavx512bw support. Afaict the latter implies support for xsave. andres@alap6:~$ echo|gcc -c - -march=x86-64 -xc -dM -E - -o -|grep '__XSAVE__' andres@alap6:~$ echo|gcc -c - -march=x86-64 -mavx512vpopcntdq -mavx512bw -xc -dM -E - -o -|grep '__XSAVE__' #define __XSAVE__ 1 #define __XSAVE__ 1 Greetings, Andres Freund