The following patch uses correct SSE register class; vzeroupper operates only on lower 16 (8 on 32bit target) SSE registers.
2019-10-08 Uroš Bizjak <ubiz...@gmail.com> PR target/91994 * config/i386/i386.c (x86_avx_u128_mode_needed): Use SSE_REG instead of ALL_SSE_REG to check if function call preserves some 256-bit SSE registers. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros. On Tue, Oct 1, 2019 at 12:14 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: > > > > The comment suggests that this code is only needed for Win64 and that > > > not testing for Win64 is just a simplification. But in practice it was > > > needed for correctness on GNU/Linux and other targets too, since without > > > it the RA would be able to keep 256-bit and 512-bit values in SSE > > > registers across calls that are known not to clobber them. > > > > > > This patch conservatively treats calls as AVX_U128_ANY if the RA can see > > > that some SSE registers are not touched by a call. There are then no > > > regressions if the ix86_hard_regno_call_part_clobbered check is disabled > > > for GNU/Linux (not something we should do, was just for testing). > > If RA can sse that some SSE regs are not touched by the call, then we > are sure that the called function is part of the current TU. In this > case, the called function will be compiled using VEX instructions, > where there is no AVX-SSE transition penalty. So, skipping VZEROUPPER > is beneficial here. > > Uros. > > > > If in fact we want -fipa-ra to pretend that all functions clobber > > > SSE registers above 128 bits, it'd certainly be possible to arrange > > > that. But IMO that would be an optimisation decision, whereas what > > > the patch is fixing is a correctness decision. So I think we should > > > have this check even so. > > > > 2019-09-25 Richard Sandiford <richard.sandif...@arm.com> > > > > gcc/ > > * config/i386/i386.c: Include function-abi.h. > > (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY > > if they preserve some 256-bit or 512-bit SSE registers. > > > > Index: gcc/config/i386/i386.c > > =================================================================== > > --- gcc/config/i386/i386.c 2019-09-25 16:47:48.000000000 +0100 > > +++ gcc/config/i386/i386.c 2019-09-25 16:47:49.089962608 +0100 > > @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1 > > #include "i386-builtins.h" > > #include "i386-expand.h" > > #include "i386-features.h" > > +#include "function-abi.h" > > > > /* This file should be included last. */ > > #include "target-def.h" > > @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins > > } > > } > > > > + /* If the function is known to preserve some SSE registers, > > + RA and previous passes can legitimately rely on that for > > + modes wider than 256 bits. It's only safe to issue a > > + vzeroupper if all SSE registers are clobbered. */ > > + const function_abi &abi = insn_callee_abi (insn); > > + if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS], > > + abi.mode_clobbers (V4DImode))) > > + return AVX_U128_ANY; > > + > > return AVX_U128_CLEAN; > > } > >
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 276677) +++ config/i386/i386.c (working copy) @@ -13530,7 +13530,7 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) modes wider than 256 bits. It's only safe to issue a vzeroupper if all SSE registers are clobbered. */ const function_abi &abi = insn_callee_abi (insn); - if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS], + if (!hard_reg_set_subset_p (reg_class_contents[SSE_REGS], abi.mode_clobbers (V4DImode))) return AVX_U128_ANY;