On Wed, 12 Jun 2024, Paolo Bonzini wrote: > On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov <amona...@ispras.ru> wrote: > > On Wed, 12 Jun 2024, Paolo Bonzini wrote: > > > I didn't do this because of RHEL9, I did it because it's silly that > > > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to > > > compute the x86 parity flag (and POPCNT was introduced at the same > > > time as SSE4.2). > > > > From looking at that POPCNT patch I understood that Qemu detects > > presence of POPCNT at runtime and will only use the fallback when > > POPCNT is unavailable. Did I misunderstand? > > -mpopcnt allows GCC to generate the POPCNT instruction for helper > code. Right now we have code like this in > target/i386/tcg/cc_helper_template.h: > > pf = parity_table[(uint8_t)dst]; > > and it could be instead something like > > #if defined __i386__ || defined __x86_64__ || defined __s390x__|| > defined __riscv_zbb
GCC also predefines __POPCNT__ when -mpopcnt is active, so that would be available for ifdef testing like above, but... > static inline unsigned int compute_pf(uint8_t x) > { > return __builtin_parity(x) * CC_P; > } > #else > extern const uint8_t parity_table[256]; > static inline unsigned int compute_pf(uint8_t x) > { > return parity_table[x]; > } > #endif > > The code generated for __builtin_parity, if you don't have it > available in hardware, is pretty bad. On x86 parity _is_ available in baseline ISA, no? Here's what gcc-14 generates: xor eax, eax test dil, dil setnp al sal eax, 2 and with -mpopcnt: movsx eax, dil popcnt eax, eax and eax, 1 sal eax, 2 Alexander