On 29.11.23 18:15, Nathan Bossart wrote:
Using the same benchmark as we did for the SSE2 linear searches in
XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following:
writers sse2 avx2 %
256 1195 1188 -1
512 928 1054 +14
1024 633 716 +13
2048 332 420 +27
4096 162 203 +25
8192 162 182 +12
AFAICT, your patch merely provides an alternative AVX2 implementation
for where currently SSE2 is supported, but it doesn't provide any new
API calls or new functionality. One might naively expect that these are
just two different ways to call the underlying primitives in the CPU, so
these performance improvements are surprising to me. Or do the CPUs
actually have completely separate machinery for SSE2 and AVX2, and just
using the latter to do the same thing is faster?