On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov <amona...@ispras.ru> wrote: > On Wed, 12 Jun 2024, Paolo Bonzini wrote: > > I didn't do this because of RHEL9, I did it because it's silly that > > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to > > compute the x86 parity flag (and POPCNT was introduced at the same > > time as SSE4.2). > > From looking at that POPCNT patch I understood that Qemu detects > presence of POPCNT at runtime and will only use the fallback when > POPCNT is unavailable. Did I misunderstand?
-mpopcnt allows GCC to generate the POPCNT instruction for helper code. Right now we have code like this in target/i386/tcg/cc_helper_template.h: pf = parity_table[(uint8_t)dst]; and it could be instead something like #if defined __i386__ || defined __x86_64__ || defined __s390x__|| defined __riscv_zbb static inline unsigned int compute_pf(uint8_t x) { return __builtin_parity(x) * CC_P; } #else extern const uint8_t parity_table[256]; static inline unsigned int compute_pf(uint8_t x) { return parity_table[x]; } #endif The code generated for __builtin_parity, if you don't have it available in hardware, is pretty bad. Paolo