On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov <amona...@ispras.ru> wrote:
> On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > I didn't do this because of RHEL9, I did it because it's silly that
> > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> > compute the x86 parity flag (and POPCNT was introduced at the same
> > time as SSE4.2).
>
> From looking at that POPCNT patch I understood that Qemu detects
> presence of POPCNT at runtime and will only use the fallback when
> POPCNT is unavailable. Did I misunderstand?

-mpopcnt allows GCC to generate the POPCNT instruction for helper
code. Right now we have code like this in
target/i386/tcg/cc_helper_template.h:

    pf = parity_table[(uint8_t)dst];

and it could be instead something like

#if defined __i386__ || defined __x86_64__ || defined __s390x__||
defined __riscv_zbb
static inline unsigned int compute_pf(uint8_t x)
{
    return __builtin_parity(x) * CC_P;
}
#else
extern const uint8_t parity_table[256];
static inline unsigned int compute_pf(uint8_t x)
{
    return parity_table[x];
}
#endif

The code generated for __builtin_parity, if you don't have it
available in hardware, is pretty bad.

Paolo


Reply via email to