On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov <amona...@ispras.ru> wrote:
> > On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > > I didn't do this because of RHEL9, I did it because it's silly that
> > > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> > > compute the x86 parity flag (and POPCNT was introduced at the same
> > > time as SSE4.2).
> >
> > From looking at that POPCNT patch I understood that Qemu detects
> > presence of POPCNT at runtime and will only use the fallback when
> > POPCNT is unavailable. Did I misunderstand?
> 
> -mpopcnt allows GCC to generate the POPCNT instruction for helper
> code. Right now we have code like this in
> target/i386/tcg/cc_helper_template.h:
> 
>     pf = parity_table[(uint8_t)dst];
> 
> and it could be instead something like
> 
> #if defined __i386__ || defined __x86_64__ || defined __s390x__||
> defined __riscv_zbb

GCC also predefines __POPCNT__ when -mpopcnt is active, so that would be
available for ifdef testing like above, but...

> static inline unsigned int compute_pf(uint8_t x)
> {
>     return __builtin_parity(x) * CC_P;
> }
> #else
> extern const uint8_t parity_table[256];
> static inline unsigned int compute_pf(uint8_t x)
> {
>     return parity_table[x];
> }
> #endif
> 
> The code generated for __builtin_parity, if you don't have it
> available in hardware, is pretty bad.

On x86 parity _is_ available in baseline ISA, no? Here's what gcc-14 generates:

        xor     eax, eax
        test    dil, dil
        setnp   al
        sal     eax, 2

and with -mpopcnt:

        movsx   eax, dil
        popcnt  eax, eax
        and     eax, 1
        sal     eax, 2

Alexander

Reply via email to