Il gio 28 dic 2023, 21:45 Richard Henderson <richard.hender...@linaro.org>
ha scritto:

> On 12/28/23 23:05, Paolo Bonzini wrote:
> > In the case where OR or XOR has an 8-bit immediate between 128 and 255,
> we can
> > operate on a low-byte register and shorten the output by two or three
> bytes
> > (two if a prefix byte is needed for REX.B).
> >
> > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> > ---
> >   tcg/i386/tcg-target.c.inc | 7 +++++++
> >   1 file changed, 7 insertions(+)
>
> At least once upon a time the partial register stall like this was quite
> slow.  IIRC there
> have been improvements in the last couple of generations, but it's still
> slower.
>
> Data to show this is worthwhile?
>

To be honest I simply had noticed that GCC generates it just fine these
days.

However, Agner Fog says that the (previously very high) penalty for partial
register access became just 1 uop starting with the Pentium D, and it's
gone completely except for AH/BH/CH/DH starting with Haswell.

On Atom and AMD processors there's a false dependency on the rest of the
register, but you'd have a (true) dependency anyway for OR r32, imm. The
only case where the false dependency matters is for instructions such as
MOV AL, imm; these have such a dependency on Atom and AMD processors but
not on recent Intel processors.

Paolo


>
> r~
>
> >
> > diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
> > index 1791b959738..a24a23f43b1 100644
> > --- a/tcg/i386/tcg-target.c.inc
> > +++ b/tcg/i386/tcg-target.c.inc
> > @@ -244,6 +244,7 @@ static bool tcg_target_const_match(int64_t val,
> TCGType type, int ct, int vece)
> >   #define P_VEXL          0x80000         /* Set VEX.L = 1 */
> >   #define P_EVEX          0x100000        /* Requires EVEX encoding */
> >
> > +#define OPC_ARITH_EbIb       (0x80)
> >   #define OPC_ARITH_EvIz      (0x81)
> >   #define OPC_ARITH_EvIb      (0x83)
> >   #define OPC_ARITH_GvEv      (0x03)          /* ... plus (ARITH_FOO <<
> 3) */
> > @@ -1366,6 +1367,12 @@ static void tgen_arithi(TCGContext *s, int c, int
> r0,
> >           tcg_out8(s, val);
> >           return;
> >       }
> > +    if (val == (uint8_t)val && (c == ARITH_OR || c == ARITH_XOR) &&
> > +        (r0 < 4 || TCG_TARGET_REG_BITS == 64)) {
> > +        tcg_out_modrm(s, OPC_ARITH_EbIb + P_REXB_RM, c, r0);
> > +        tcg_out8(s, val);
> > +        return;
> > +    }
> >       if (rexw == 0 || val == (int32_t)val) {
> >           tcg_out_modrm(s, OPC_ARITH_EvIz + rexw, c, r0);
> >           tcg_out32(s, val);
>
>

Reply via email to