[PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch. The only part of the patch that isn't a straight cherry-pick is due to the TX iterator lacking TDmode for GCC 12, so this version adjusts TX_V16QI accordingly. Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the testsuite I saw were in gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch since libhwasan gained the short granule tag feature, I've requested a backport of the following patch (committed as r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that (independent) issue for GCC 12: https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html OK for the GCC 12 branch? Thanks, Alex -- >8 -- The PR shows us ICEing due to an unrecognizable TFmode save emitted by aarch64_process_components. The problem is that for T{I,F,D}mode we conservatively require mems to be in range for x-register ldp/stp. That is because (at least for TImode) it can be allocated to both GPRs and FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is a q-register load/store. As Richard pointed out in the PR, aarch64_get_separate_components already checks that the offsets are suitable for a single load, so we just need to choose a mode in aarch64_reg_save_mode that gives the full q-register range. In this patch, we choose V16QImode as an alternative 16-byte "bag-of-bits" mode that doesn't have the artificial range restrictions imposed on T{I,F,D}mode. Unlike for GCC 14 we need additional handling in the load/store pair code as various cases are not expecting to see V16QImode (particularly the writeback patterns, but also aarch64_gen_load_pair). gcc/ChangeLog: PR target/111677 * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use V16QImode for the full 16-byte FPR saves in the vector PCS case. (aarch64_gen_storewb_pair): Handle V16QImode. (aarch64_gen_loadwb_pair): Likewise. (aarch64_gen_load_pair): Likewise. * config/aarch64/aarch64.md (loadwb_pair_): Rename to ... (loadwb_pair_): ... this, extending to V16QImode. (storewb_pair_): Rename to ... (storewb_pair_): ... this, extending to V16QImode. * config/aarch64/iterators.md (TX_V16QI): New. gcc/testsuite/ChangeLog: PR target/111677 * gcc.target/aarch64/torture/pr111677.c: New test. (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3) --- gcc/config/aarch64/aarch64.cc | 13 ++- gcc/config/aarch64/aarch64.md | 35 ++- gcc/config/aarch64/iterators.md | 3 ++ .../gcc.target/aarch64/torture/pr111677.c | 28 +++ 4 files changed, 61 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 3bccd96a23d..2bbba323770 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno) case ARM_PCS_SIMD: /* The vector PCS saves the low 128 bits (which is the full register on non-SVE targets). */ - return TFmode; + return V16QImode; case ARM_PCS_SVE: /* Use vectors of DImode for registers that need frame @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2, return gen_storewb_pairtf_di (base, base, reg, reg2, GEN_INT (-adjustment), GEN_INT (UNITS_PER_VREG - adjustment)); +case E_V16QImode: + return gen_storewb_pairv16qi_di (base, base, reg, reg2, + GEN_INT (-adjustment), + GEN_INT (UNITS_PER_VREG - adjustment)); default: gcc_unreachable (); } @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2, case E_TFmode: return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment), GEN_INT (UNITS_PER_VREG)); +case E_V16QImode: + return gen_loadwb_pairv16qi_di (base, base, reg, reg2, + GEN_INT (adjustment), + GEN_INT (UNITS_PER_VREG)); default: gcc_unreachable (); } @@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, case E_V4SImode: return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); +case E_V16QImode: + return gen_load_pairv16qiv16qi (reg1, mem1, reg2, mem2); + default: gcc_unreachable (); } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index fb100bdf6b3..99f185718c9 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1874,17 +1874,18 @@ (define_insn "loadwb_pair_" [(set_attr "type" "neon_load1_2reg")] ) -(define_insn "loadwb_pair_" +(define_insn "l
Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
Alex Coplan writes: > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch. > The only part of the patch that isn't a straight cherry-pick is due to > the TX iterator lacking TDmode for GCC 12, so this version adjusts > TX_V16QI accordingly. > > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the > testsuite I saw were in > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch > since libhwasan gained the short granule tag feature, I've requested a > backport of the following patch (committed as > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that > (independent) issue for GCC 12: > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html > > OK for the GCC 12 branch? OK, thanks. Richard > Thanks, > Alex > > -- >8 -- > > The PR shows us ICEing due to an unrecognizable TFmode save emitted by > aarch64_process_components. The problem is that for T{I,F,D}mode we > conservatively require mems to be in range for x-register ldp/stp. That > is because (at least for TImode) it can be allocated to both GPRs and > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is > a q-register load/store. > > As Richard pointed out in the PR, aarch64_get_separate_components > already checks that the offsets are suitable for a single load, so we > just need to choose a mode in aarch64_reg_save_mode that gives the full > q-register range. In this patch, we choose V16QImode as an alternative > 16-byte "bag-of-bits" mode that doesn't have the artificial range > restrictions imposed on T{I,F,D}mode. > > Unlike for GCC 14 we need additional handling in the load/store pair > code as various cases are not expecting to see V16QImode (particularly > the writeback patterns, but also aarch64_gen_load_pair). > > gcc/ChangeLog: > > PR target/111677 > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use > V16QImode for the full 16-byte FPR saves in the vector PCS case. > (aarch64_gen_storewb_pair): Handle V16QImode. > (aarch64_gen_loadwb_pair): Likewise. > (aarch64_gen_load_pair): Likewise. > * config/aarch64/aarch64.md (loadwb_pair_): > Rename to ... > (loadwb_pair_): ... this, extending to > V16QImode. > (storewb_pair_): Rename to ... > (storewb_pair_): ... this, extending to > V16QImode. > * config/aarch64/iterators.md (TX_V16QI): New. > > gcc/testsuite/ChangeLog: > > PR target/111677 > * gcc.target/aarch64/torture/pr111677.c: New test. > > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3) > --- > gcc/config/aarch64/aarch64.cc | 13 ++- > gcc/config/aarch64/aarch64.md | 35 ++- > gcc/config/aarch64/iterators.md | 3 ++ > .../gcc.target/aarch64/torture/pr111677.c | 28 +++ > 4 files changed, 61 insertions(+), 18 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 3bccd96a23d..2bbba323770 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno) >case ARM_PCS_SIMD: > /* The vector PCS saves the low 128 bits (which is the full > register on non-SVE targets). */ > - return TFmode; > + return V16QImode; > >case ARM_PCS_SVE: > /* Use vectors of DImode for registers that need frame > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, > rtx reg, rtx reg2, >return gen_storewb_pairtf_di (base, base, reg, reg2, > GEN_INT (-adjustment), > GEN_INT (UNITS_PER_VREG - adjustment)); > +case E_V16QImode: > + return gen_storewb_pairv16qi_di (base, base, reg, reg2, > +GEN_INT (-adjustment), > +GEN_INT (UNITS_PER_VREG - adjustment)); > default: >gcc_unreachable (); > } > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, > rtx reg, rtx reg2, > case E_TFmode: >return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT > (adjustment), > GEN_INT (UNITS_PER_VREG)); > +case E_V16QImode: > + return gen_loadwb_pairv16qi_di (base, base, reg, reg2, > + GEN_INT (adjustment), > + GEN_INT (UNITS_PER_VREG)); > default: >gcc_unreachable (); > } > @@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx > mem1, rtx reg2, > case E_V4SImode: >return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); > > +case E_V16QImode: > + return
Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
On 14/02/2024 11:18, Richard Sandiford wrote: > Alex Coplan writes: > > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch. > > The only part of the patch that isn't a straight cherry-pick is due to > > the TX iterator lacking TDmode for GCC 12, so this version adjusts > > TX_V16QI accordingly. > > > > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the > > testsuite I saw were in > > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output > > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch > > since libhwasan gained the short granule tag feature, I've requested a > > backport of the following patch (committed as > > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that > > (independent) issue for GCC 12: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html > > > > OK for the GCC 12 branch? > > OK, thanks. Thanks. The patch cherry-picks cleanly on the GCC 11 branch, and bootstraps/regtests OK there. Is it OK for GCC 11 too, even though the issue is latent there (at least for the testcase in the patch)? Alex > > Richard > > > Thanks, > > Alex > > > > -- >8 -- > > > > The PR shows us ICEing due to an unrecognizable TFmode save emitted by > > aarch64_process_components. The problem is that for T{I,F,D}mode we > > conservatively require mems to be in range for x-register ldp/stp. That > > is because (at least for TImode) it can be allocated to both GPRs and > > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is > > a q-register load/store. > > > > As Richard pointed out in the PR, aarch64_get_separate_components > > already checks that the offsets are suitable for a single load, so we > > just need to choose a mode in aarch64_reg_save_mode that gives the full > > q-register range. In this patch, we choose V16QImode as an alternative > > 16-byte "bag-of-bits" mode that doesn't have the artificial range > > restrictions imposed on T{I,F,D}mode. > > > > Unlike for GCC 14 we need additional handling in the load/store pair > > code as various cases are not expecting to see V16QImode (particularly > > the writeback patterns, but also aarch64_gen_load_pair). > > > > gcc/ChangeLog: > > > > PR target/111677 > > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use > > V16QImode for the full 16-byte FPR saves in the vector PCS case. > > (aarch64_gen_storewb_pair): Handle V16QImode. > > (aarch64_gen_loadwb_pair): Likewise. > > (aarch64_gen_load_pair): Likewise. > > * config/aarch64/aarch64.md (loadwb_pair_): > > Rename to ... > > (loadwb_pair_): ... this, extending to > > V16QImode. > > (storewb_pair_): Rename to ... > > (storewb_pair_): ... this, extending to > > V16QImode. > > * config/aarch64/iterators.md (TX_V16QI): New. > > > > gcc/testsuite/ChangeLog: > > > > PR target/111677 > > * gcc.target/aarch64/torture/pr111677.c: New test. > > > > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3) > > --- > > gcc/config/aarch64/aarch64.cc | 13 ++- > > gcc/config/aarch64/aarch64.md | 35 ++- > > gcc/config/aarch64/iterators.md | 3 ++ > > .../gcc.target/aarch64/torture/pr111677.c | 28 +++ > > 4 files changed, 61 insertions(+), 18 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c > > > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 3bccd96a23d..2bbba323770 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno) > >case ARM_PCS_SIMD: > > /* The vector PCS saves the low 128 bits (which is the full > >register on non-SVE targets). */ > > - return TFmode; > > + return V16QImode; > > > >case ARM_PCS_SVE: > > /* Use vectors of DImode for registers that need frame > > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx > > base, rtx reg, rtx reg2, > >return gen_storewb_pairtf_di (base, base, reg, reg2, > > GEN_INT (-adjustment), > > GEN_INT (UNITS_PER_VREG - adjustment)); > > +case E_V16QImode: > > + return gen_storewb_pairv16qi_di (base, base, reg, reg2, > > + GEN_INT (-adjustment), > > + GEN_INT (UNITS_PER_VREG - adjustment)); > > default: > >gcc_unreachable (); > > } > > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx > > base, rtx reg, rtx reg2, > > case E_TFmode: > >return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT > > (adjustment), > >GEN_INT (UNITS_PER_VREG)); > > +case E_V16QImode: > > + return gen_loadwb_pairv16qi_di (base, base,
Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
Alex Coplan writes: > On 14/02/2024 11:18, Richard Sandiford wrote: >> Alex Coplan writes: >> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch. >> > The only part of the patch that isn't a straight cherry-pick is due to >> > the TX iterator lacking TDmode for GCC 12, so this version adjusts >> > TX_V16QI accordingly. >> > >> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the >> > testsuite I saw were in >> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output >> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch >> > since libhwasan gained the short granule tag feature, I've requested a >> > backport of the following patch (committed as >> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that >> > (independent) issue for GCC 12: >> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html >> > >> > OK for the GCC 12 branch? >> >> OK, thanks. > > Thanks. The patch cherry-picks cleanly on the GCC 11 branch, and > bootstraps/regtests OK there. Is it OK for GCC 11 too, even though the > issue is latent there (at least for the testcase in the patch)? Yeah, I think we should apply it there too, since the bug is likely to manifest with other testcases. Richard > Alex > >> >> Richard >> >> > Thanks, >> > Alex >> > >> > -- >8 -- >> > >> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by >> > aarch64_process_components. The problem is that for T{I,F,D}mode we >> > conservatively require mems to be in range for x-register ldp/stp. That >> > is because (at least for TImode) it can be allocated to both GPRs and >> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is >> > a q-register load/store. >> > >> > As Richard pointed out in the PR, aarch64_get_separate_components >> > already checks that the offsets are suitable for a single load, so we >> > just need to choose a mode in aarch64_reg_save_mode that gives the full >> > q-register range. In this patch, we choose V16QImode as an alternative >> > 16-byte "bag-of-bits" mode that doesn't have the artificial range >> > restrictions imposed on T{I,F,D}mode. >> > >> > Unlike for GCC 14 we need additional handling in the load/store pair >> > code as various cases are not expecting to see V16QImode (particularly >> > the writeback patterns, but also aarch64_gen_load_pair). >> > >> > gcc/ChangeLog: >> > >> >PR target/111677 >> >* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use >> >V16QImode for the full 16-byte FPR saves in the vector PCS case. >> >(aarch64_gen_storewb_pair): Handle V16QImode. >> >(aarch64_gen_loadwb_pair): Likewise. >> >(aarch64_gen_load_pair): Likewise. >> >* config/aarch64/aarch64.md (loadwb_pair_): >> >Rename to ... >> >(loadwb_pair_): ... this, extending to >> >V16QImode. >> >(storewb_pair_): Rename to ... >> >(storewb_pair_): ... this, extending to >> >V16QImode. >> >* config/aarch64/iterators.md (TX_V16QI): New. >> > >> > gcc/testsuite/ChangeLog: >> > >> >PR target/111677 >> >* gcc.target/aarch64/torture/pr111677.c: New test. >> > >> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3) >> > --- >> > gcc/config/aarch64/aarch64.cc | 13 ++- >> > gcc/config/aarch64/aarch64.md | 35 ++- >> > gcc/config/aarch64/iterators.md | 3 ++ >> > .../gcc.target/aarch64/torture/pr111677.c | 28 +++ >> > 4 files changed, 61 insertions(+), 18 deletions(-) >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c >> > >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >> > index 3bccd96a23d..2bbba323770 100644 >> > --- a/gcc/config/aarch64/aarch64.cc >> > +++ b/gcc/config/aarch64/aarch64.cc >> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno) >> >case ARM_PCS_SIMD: >> >/* The vector PCS saves the low 128 bits (which is the full >> > register on non-SVE targets). */ >> > - return TFmode; >> > + return V16QImode; >> > >> >case ARM_PCS_SVE: >> >/* Use vectors of DImode for registers that need frame >> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx >> > base, rtx reg, rtx reg2, >> >return gen_storewb_pairtf_di (base, base, reg, reg2, >> >GEN_INT (-adjustment), >> >GEN_INT (UNITS_PER_VREG - adjustment)); >> > +case E_V16QImode: >> > + return gen_storewb_pairv16qi_di (base, base, reg, reg2, >> > + GEN_INT (-adjustment), >> > + GEN_INT (UNITS_PER_VREG - adjustment)); >> > default: >> >gcc_unreachable (); >> > } >> > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx >> > base, rtx reg, rtx reg2, >> > case E_TFmode: >> >