The following patch uses correct SSE register class; vzeroupper
operates only on lower 16 (8 on 32bit target) SSE registers.

2019-10-08  Uroš Bizjak  <ubiz...@gmail.com>

    PR target/91994
    * config/i386/i386.c (x86_avx_u128_mode_needed): Use SSE_REG
    instead of ALL_SSE_REG to check if function call preserves some
    256-bit SSE registers.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.

On Tue, Oct 1, 2019 at 12:14 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>
> > > The comment suggests that this code is only needed for Win64 and that
> > > not testing for Win64 is just a simplification.  But in practice it was
> > > needed for correctness on GNU/Linux and other targets too, since without
> > > it the RA would be able to keep 256-bit and 512-bit values in SSE
> > > registers across calls that are known not to clobber them.
> > >
> > > This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> > > that some SSE registers are not touched by a call.  There are then no
> > > regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> > > for GNU/Linux (not something we should do, was just for testing).
>
> If RA can sse that some SSE regs are not touched by the call, then we
> are sure that the called function is part of the current TU. In this
> case, the called function will be compiled using VEX instructions,
> where there is no AVX-SSE transition penalty. So, skipping VZEROUPPER
> is beneficial here.
>
> Uros.
>
> > > If in fact we want -fipa-ra to pretend that all functions clobber
> > > SSE registers above 128 bits, it'd certainly be possible to arrange
> > > that.  But IMO that would be an optimisation decision, whereas what
> > > the patch is fixing is a correctness decision.  So I think we should
> > > have this check even so.
> >
> > 2019-09-25  Richard Sandiford  <richard.sandif...@arm.com>
> >
> > gcc/
> >         * config/i386/i386.c: Include function-abi.h.
> >         (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
> >         if they preserve some 256-bit or 512-bit SSE registers.
> >
> > Index: gcc/config/i386/i386.c
> > ===================================================================
> > --- gcc/config/i386/i386.c      2019-09-25 16:47:48.000000000 +0100
> > +++ gcc/config/i386/i386.c      2019-09-25 16:47:49.089962608 +0100
> > @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
> >  #include "i386-builtins.h"
> >  #include "i386-expand.h"
> >  #include "i386-features.h"
> > +#include "function-abi.h"
> >
> >  /* This file should be included last.  */
> >  #include "target-def.h"
> > @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
> >             }
> >         }
> >
> > +      /* If the function is known to preserve some SSE registers,
> > +        RA and previous passes can legitimately rely on that for
> > +        modes wider than 256 bits.  It's only safe to issue a
> > +        vzeroupper if all SSE registers are clobbered.  */
> > +      const function_abi &abi = insn_callee_abi (insn);
> > +      if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
> > +                                 abi.mode_clobbers (V4DImode)))
> > +       return AVX_U128_ANY;
> > +
> >        return AVX_U128_CLEAN;
> >      }
> >
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (revision 276677)
+++ config/i386/i386.c  (working copy)
@@ -13530,7 +13530,7 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
         modes wider than 256 bits.  It's only safe to issue a
         vzeroupper if all SSE registers are clobbered.  */
       const function_abi &abi = insn_callee_abi (insn);
-      if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
+      if (!hard_reg_set_subset_p (reg_class_contents[SSE_REGS],
                                  abi.mode_clobbers (V4DImode)))
        return AVX_U128_ANY;
 

Reply via email to