Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-14 Thread Richard Sandiford
Alex Coplan  writes:
> This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
> The only part of the patch that isn't a straight cherry-pick is due to
> the TX iterator lacking TDmode for GCC 12, so this version adjusts
> TX_V16QI accordingly.
>
> Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
> testsuite I saw were in
> gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
> "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
> since libhwasan gained the short granule tag feature, I've requested a
> backport of the following patch (committed as
> r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
> (independent) issue for GCC 12:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
>
> OK for the GCC 12 branch?

OK, thanks.

Richard

> Thanks,
> Alex
>
> -- >8 --
>
> The PR shows us ICEing due to an unrecognizable TFmode save emitted by
> aarch64_process_components.  The problem is that for T{I,F,D}mode we
> conservatively require mems to be in range for x-register ldp/stp.  That
> is because (at least for TImode) it can be allocated to both GPRs and
> FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
> a q-register load/store.
>
> As Richard pointed out in the PR, aarch64_get_separate_components
> already checks that the offsets are suitable for a single load, so we
> just need to choose a mode in aarch64_reg_save_mode that gives the full
> q-register range.  In this patch, we choose V16QImode as an alternative
> 16-byte "bag-of-bits" mode that doesn't have the artificial range
> restrictions imposed on T{I,F,D}mode.
>
> Unlike for GCC 14 we need additional handling in the load/store pair
> code as various cases are not expecting to see V16QImode (particularly
> the writeback patterns, but also aarch64_gen_load_pair).
>
> gcc/ChangeLog:
>
>   PR target/111677
>   * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
>   V16QImode for the full 16-byte FPR saves in the vector PCS case.
>   (aarch64_gen_storewb_pair): Handle V16QImode.
>   (aarch64_gen_loadwb_pair): Likewise.
>   (aarch64_gen_load_pair): Likewise.
>   * config/aarch64/aarch64.md (loadwb_pair_):
>   Rename to ...
>   (loadwb_pair_): ... this, extending to
>   V16QImode.
>   (storewb_pair_): Rename to ...
>   (storewb_pair_): ... this, extending to
>   V16QImode.
>   * config/aarch64/iterators.md (TX_V16QI): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/111677
>   * gcc.target/aarch64/torture/pr111677.c: New test.
>
> (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
> ---
>  gcc/config/aarch64/aarch64.cc | 13 ++-
>  gcc/config/aarch64/aarch64.md | 35 ++-
>  gcc/config/aarch64/iterators.md   |  3 ++
>  .../gcc.target/aarch64/torture/pr111677.c | 28 +++
>  4 files changed, 61 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 3bccd96a23d..2bbba323770 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
>case ARM_PCS_SIMD:
>   /* The vector PCS saves the low 128 bits (which is the full
>  register on non-SVE targets).  */
> - return TFmode;
> + return V16QImode;
>  
>case ARM_PCS_SVE:
>   /* Use vectors of DImode for registers that need frame
> @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, 
> rtx reg, rtx reg2,
>return gen_storewb_pairtf_di (base, base, reg, reg2,
>   GEN_INT (-adjustment),
>   GEN_INT (UNITS_PER_VREG - adjustment));
> +case E_V16QImode:
> +  return gen_storewb_pairv16qi_di (base, base, reg, reg2,
> +GEN_INT (-adjustment),
> +GEN_INT (UNITS_PER_VREG - adjustment));
>  default:
>gcc_unreachable ();
>  }
> @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, 
> rtx reg, rtx reg2,
>  case E_TFmode:
>return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT 
> (adjustment),
>  GEN_INT (UNITS_PER_VREG));
> +case E_V16QImode:
> +  return gen_loadwb_pairv16qi_di (base, base, reg, reg2,
> +   GEN_INT (adjustment),
> +   GEN_INT (UNITS_PER_VREG));
>  default:
>gcc_unreachable ();
>  }
> @@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx 
> mem1, rtx reg2,
>  case E_V4SImode:
>return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
>  
> +case E_V16QImode:
> +  return 

Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-15 Thread Alex Coplan
On 14/02/2024 11:18, Richard Sandiford wrote:
> Alex Coplan  writes:
> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
> > The only part of the patch that isn't a straight cherry-pick is due to
> > the TX iterator lacking TDmode for GCC 12, so this version adjusts
> > TX_V16QI accordingly.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
> > testsuite I saw were in
> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
> > since libhwasan gained the short granule tag feature, I've requested a
> > backport of the following patch (committed as
> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
> > (independent) issue for GCC 12:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
> >
> > OK for the GCC 12 branch?
> 
> OK, thanks.

Thanks.  The patch cherry-picks cleanly on the GCC 11 branch, and
bootstraps/regtests OK there.  Is it OK for GCC 11 too, even though the
issue is latent there (at least for the testcase in the patch)?

Alex

> 
> Richard
> 
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by
> > aarch64_process_components.  The problem is that for T{I,F,D}mode we
> > conservatively require mems to be in range for x-register ldp/stp.  That
> > is because (at least for TImode) it can be allocated to both GPRs and
> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
> > a q-register load/store.
> >
> > As Richard pointed out in the PR, aarch64_get_separate_components
> > already checks that the offsets are suitable for a single load, so we
> > just need to choose a mode in aarch64_reg_save_mode that gives the full
> > q-register range.  In this patch, we choose V16QImode as an alternative
> > 16-byte "bag-of-bits" mode that doesn't have the artificial range
> > restrictions imposed on T{I,F,D}mode.
> >
> > Unlike for GCC 14 we need additional handling in the load/store pair
> > code as various cases are not expecting to see V16QImode (particularly
> > the writeback patterns, but also aarch64_gen_load_pair).
> >
> > gcc/ChangeLog:
> >
> > PR target/111677
> > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
> > V16QImode for the full 16-byte FPR saves in the vector PCS case.
> > (aarch64_gen_storewb_pair): Handle V16QImode.
> > (aarch64_gen_loadwb_pair): Likewise.
> > (aarch64_gen_load_pair): Likewise.
> > * config/aarch64/aarch64.md (loadwb_pair_):
> > Rename to ...
> > (loadwb_pair_): ... this, extending to
> > V16QImode.
> > (storewb_pair_): Rename to ...
> > (storewb_pair_): ... this, extending to
> > V16QImode.
> > * config/aarch64/iterators.md (TX_V16QI): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/111677
> > * gcc.target/aarch64/torture/pr111677.c: New test.
> >
> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
> > ---
> >  gcc/config/aarch64/aarch64.cc | 13 ++-
> >  gcc/config/aarch64/aarch64.md | 35 ++-
> >  gcc/config/aarch64/iterators.md   |  3 ++
> >  .../gcc.target/aarch64/torture/pr111677.c | 28 +++
> >  4 files changed, 61 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 3bccd96a23d..2bbba323770 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
> >case ARM_PCS_SIMD:
> > /* The vector PCS saves the low 128 bits (which is the full
> >register on non-SVE targets).  */
> > -   return TFmode;
> > +   return V16QImode;
> >  
> >case ARM_PCS_SVE:
> > /* Use vectors of DImode for registers that need frame
> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx 
> > base, rtx reg, rtx reg2,
> >return gen_storewb_pairtf_di (base, base, reg, reg2,
> > GEN_INT (-adjustment),
> > GEN_INT (UNITS_PER_VREG - adjustment));
> > +case E_V16QImode:
> > +  return gen_storewb_pairv16qi_di (base, base, reg, reg2,
> > +  GEN_INT (-adjustment),
> > +  GEN_INT (UNITS_PER_VREG - adjustment));
> >  default:
> >gcc_unreachable ();
> >  }
> > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx 
> > base, rtx reg, rtx reg2,
> >  case E_TFmode:
> >return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT 
> > (adjustment),
> >GEN_INT (UNITS_PER_VREG));
> > +case E_V16QImode:
> > +  return gen_loadwb_pairv16qi_di (base, base, 

Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-20 Thread Richard Sandiford
Alex Coplan  writes:
> On 14/02/2024 11:18, Richard Sandiford wrote:
>> Alex Coplan  writes:
>> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
>> > The only part of the patch that isn't a straight cherry-pick is due to
>> > the TX iterator lacking TDmode for GCC 12, so this version adjusts
>> > TX_V16QI accordingly.
>> >
>> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
>> > testsuite I saw were in
>> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
>> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
>> > since libhwasan gained the short granule tag feature, I've requested a
>> > backport of the following patch (committed as
>> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
>> > (independent) issue for GCC 12:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
>> >
>> > OK for the GCC 12 branch?
>> 
>> OK, thanks.
>
> Thanks.  The patch cherry-picks cleanly on the GCC 11 branch, and
> bootstraps/regtests OK there.  Is it OK for GCC 11 too, even though the
> issue is latent there (at least for the testcase in the patch)?

Yeah, I think we should apply it there too, since the bug is likely
to manifest with other testcases.

Richard

> Alex
>
>> 
>> Richard
>> 
>> > Thanks,
>> > Alex
>> >
>> > -- >8 --
>> >
>> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by
>> > aarch64_process_components.  The problem is that for T{I,F,D}mode we
>> > conservatively require mems to be in range for x-register ldp/stp.  That
>> > is because (at least for TImode) it can be allocated to both GPRs and
>> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
>> > a q-register load/store.
>> >
>> > As Richard pointed out in the PR, aarch64_get_separate_components
>> > already checks that the offsets are suitable for a single load, so we
>> > just need to choose a mode in aarch64_reg_save_mode that gives the full
>> > q-register range.  In this patch, we choose V16QImode as an alternative
>> > 16-byte "bag-of-bits" mode that doesn't have the artificial range
>> > restrictions imposed on T{I,F,D}mode.
>> >
>> > Unlike for GCC 14 we need additional handling in the load/store pair
>> > code as various cases are not expecting to see V16QImode (particularly
>> > the writeback patterns, but also aarch64_gen_load_pair).
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/111677
>> >* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
>> >V16QImode for the full 16-byte FPR saves in the vector PCS case.
>> >(aarch64_gen_storewb_pair): Handle V16QImode.
>> >(aarch64_gen_loadwb_pair): Likewise.
>> >(aarch64_gen_load_pair): Likewise.
>> >* config/aarch64/aarch64.md (loadwb_pair_):
>> >Rename to ...
>> >(loadwb_pair_): ... this, extending to
>> >V16QImode.
>> >(storewb_pair_): Rename to ...
>> >(storewb_pair_): ... this, extending to
>> >V16QImode.
>> >* config/aarch64/iterators.md (TX_V16QI): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR target/111677
>> >* gcc.target/aarch64/torture/pr111677.c: New test.
>> >
>> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
>> > ---
>> >  gcc/config/aarch64/aarch64.cc | 13 ++-
>> >  gcc/config/aarch64/aarch64.md | 35 ++-
>> >  gcc/config/aarch64/iterators.md   |  3 ++
>> >  .../gcc.target/aarch64/torture/pr111677.c | 28 +++
>> >  4 files changed, 61 insertions(+), 18 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 3bccd96a23d..2bbba323770 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
>> >case ARM_PCS_SIMD:
>> >/* The vector PCS saves the low 128 bits (which is the full
>> >   register on non-SVE targets).  */
>> > -  return TFmode;
>> > +  return V16QImode;
>> >  
>> >case ARM_PCS_SVE:
>> >/* Use vectors of DImode for registers that need frame
>> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx 
>> > base, rtx reg, rtx reg2,
>> >return gen_storewb_pairtf_di (base, base, reg, reg2,
>> >GEN_INT (-adjustment),
>> >GEN_INT (UNITS_PER_VREG - adjustment));
>> > +case E_V16QImode:
>> > +  return gen_storewb_pairv16qi_di (base, base, reg, reg2,
>> > + GEN_INT (-adjustment),
>> > + GEN_INT (UNITS_PER_VREG - adjustment));
>> >  default:
>> >gcc_unreachable ();
>> >  }
>> > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx 
>> > base, rtx reg, rtx reg2,
>> >  case E_TFmode:
>> >