[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=114575 --- Comment #10 from Tamar Christina --- This has also broken our addressing modes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114575
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 --- Comment #9 from Jeffrey A. Law --- Thanks for that info Edwin -- my tester flagged them too and mentally I'd figured it was most likely the combiner change.
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 --- Comment #8 from Edwin Lu --- (In reply to Robin Dapp from comment #7) > There is some riscv fallout as well. Edwin has the details. I haven't done an in depth analysis but the full list of new riscv scan-dump failures can be found here: https://github.com/patrick-rivos/gcc-postcommit-ci/issues/694
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Robin Dapp changed: What|Removed |Added CC||ewlu at rivosinc dot com, ||rdapp at gcc dot gnu.org --- Comment #7 from Robin Dapp --- There is some riscv fallout as well. Edwin has the details.
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Richard Biener changed: What|Removed |Added Priority|P2 |P1 Ever confirmed|0 |1 Last reconfirmed||2024-04-02 Status|UNCONFIRMED |NEW --- Comment #6 from Richard Biener --- Note I think given the offending rev fixed a very old bug we should eventually revert the fix and rework it during next stage1. This was at least unexpectedly big fallout AFAIU.
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2 CC||law at gcc dot gnu.org
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 --- Comment #5 from Richard Sandiford --- For the record, the associated new testsuite failures are: FAIL: gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 3 FAIL: gcc.target/aarch64/asimd-mull-elem.c scan-assembler-times \\s+fmul\\tv[0-9]+\\.4s, v[0-9]+\\.4s, v[0-9]+\\.s\\[0\\] 4 FAIL: gcc.target/aarch64/asimd-mull-elem.c scan-assembler-times \\s+mul\\tv[0-9]+\\.4s, v[0-9]+\\.4s, v[0-9]+\\.s\\[0\\] 4 FAIL: gcc.target/aarch64/ccmp_3.c scan-assembler-not \tcbnz\t FAIL: gcc.target/aarch64/pr100056.c scan-assembler-times \\t[us]bfiz\\tw[0-9]+, w[0-9]+, 11 2 FAIL: gcc.target/aarch64/pr100056.c scan-assembler-times \\tadd\\tw[0-9]+, w[0-9]+, w[0-9]+, uxtb\\n 2 FAIL: gcc.target/aarch64/pr108840.c scan-assembler-not and\\tw[0-9]+, w[0-9]+, 31 FAIL: gcc.target/aarch64/pr112105.c scan-assembler-not \\tdup\\t FAIL: gcc.target/aarch64/pr112105.c scan-assembler-times (?n)\\tfmul\\t.*v[0-9]+\\.s\\[0\\]\\n 2 FAIL: gcc.target/aarch64/rev16_2.c scan-assembler-times rev16\\tx[0-9]+ 2 FAIL: gcc.target/aarch64/vaddX_high_cost.c scan-assembler-not dup\\t FAIL: gcc.target/aarch64/vmul_element_cost.c scan-assembler-not dup\\t FAIL: gcc.target/aarch64/vmul_high_cost.c scan-assembler-not dup\\t FAIL: gcc.target/aarch64/vsubX_high_cost.c scan-assembler-not dup\\t FAIL: gcc.target/aarch64/sve/pr98119.c scan-assembler \\tand\\tx[0-9]+, x[0-9]+, #?-31\\n FAIL: gcc.target/aarch64/sve/pred-not-gen-1.c scan-assembler-not \\tbic\\t FAIL: gcc.target/aarch64/sve/pred-not-gen-1.c scan-assembler-times \\tnot\\tp[0-9]+\\.b, p[0-9]+/z, p[0-9]+\\.b\\n 1 FAIL: gcc.target/aarch64/sve/pred-not-gen-4.c scan-assembler-not \\tbic\\t FAIL: gcc.target/aarch64/sve/pred-not-gen-4.c scan-assembler-times \\tnot\\tp[0-9]+\\.b, p[0-9]+/z, p[0-9]+\\.b\\n 1 FAIL: gcc.target/aarch64/sve/var_stride_2.c scan-assembler-times \\tubfiz\\tx[0-9]+, x2, 10, 16\\n 1 FAIL: gcc.target/aarch64/sve/var_stride_2.c scan-assembler-times \\tubfiz\\tx[0-9]+, x3, 10, 16\\n 1 FAIL: gcc.target/aarch64/sve/var_stride_4.c scan-assembler-times \\tsbfiz\\tx[0-9]+, x2, 10, 32\\n 1 FAIL: gcc.target/aarch64/sve/var_stride_4.c scan-assembler-times \\tsbfiz\\tx[0-9]+, x3, 10, 32\\n 1
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 --- Comment #4 from Richard Sandiford --- (In reply to Richard Biener from comment #1) > Btw, why does forwprop not do this? Not 100% sure (I wasn't involved in choosing the current heuristics). But fwprop can propagate across blocks, so there is probably more risk of increasing register pressure.
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 --- Comment #3 from Richard Sandiford --- In RTL terms, the dup is vec_duplicate. The combination is: Trying 10 -> 13: 10: r107:V4SF=vec_duplicate(r115:SF) REG_DEAD r115:SF 13: r110:V4SF=r111:V4SF*r107:V4SF REG_DEAD r111:V4SF Failed to match this instruction: (parallel [ (set (reg:V4SF 110 [ _2 ]) (mult:V4SF (vec_duplicate:V4SF (reg:SF 115)) (reg:V4SF 111 [ *ptr_6(D) ]))) (set (reg:V4SF 107) (vec_duplicate:V4SF (reg:SF 115))) ]) Failed to match this instruction: (parallel [ (set (reg:V4SF 110 [ _2 ]) (mult:V4SF (vec_duplicate:V4SF (reg:SF 115)) (reg:V4SF 111 [ *ptr_6(D) ]))) (set (reg:V4SF 107) (vec_duplicate:V4SF (reg:SF 115))) ]) Successfully matched this instruction: (set (reg:V4SF 107) (vec_duplicate:V4SF (reg:SF 115))) Successfully matched this instruction: (set (reg:V4SF 110 [ _2 ]) (mult:V4SF (vec_duplicate:V4SF (reg:SF 115)) (reg:V4SF 111 [ *ptr_6(D) ]))) allowing combination of insns 10 and 13 original costs 8 + 20 = 28 replacement costs 8 + 20 = 28 modifying insn i210: r107:V4SF=vec_duplicate(r115:SF) deferring rescan insn with uid = 10. modifying insn i313: r110:V4SF=vec_duplicate(r115:SF)*r111:V4SF REG_DEAD r115:SF REG_DEAD r111:V4SF deferring rescan insn with uid = 13.
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 --- Comment #2 from Segher Boessenkool --- The PR101523 fix makes sure we do not get the same I2 back, because that violates algorithmic assumptions of combine. Importantly, the way it was things can be changed back time and time again, and that actually happened. There is no "canonical form" in combine, it all depends on what little piece of context is and is not considered what form combine prefers. Things can -- and DID -- oscillate. So, what is happening here? The "dup" here is really a "splat"? Should the backend have some extra define_insn or define_split, or maybe even a peephole?
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 --- Comment #1 from Richard Biener --- Btw, why does forwprop not do this?
[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Richard Biener changed: What|Removed |Added Target||aarch64 Target Milestone|--- |14.0 CC||rguenth at gcc dot gnu.org