[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=106106 --- Comment #3 from Tamar Christina --- This is caused by SRA scalarizing the structural registers. i.e. it breaks apart the uint8x16x2_t into two uint8x16_t, for use with vld2 we need them as a whole, and so we recreate the type again. This causes a copy through scalarization and then constructing the type again in RTL. Reload is able to remove one copy but not the other. The fix for #106106 will also fix this.
[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967 Bug 89967 depends on bug 89057, which changed state. Bug 89057 Summary: [9 Regression] AArch64 ld3 st4 less optimized https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967 Bug 89967 depends on bug 94052, which changed state. Bug 94052 Summary: Paradoxical subregs out of expand causes ICE with multi register modes at -O2 or higher https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94052 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967 Andrew Pinski changed: What|Removed |Added Depends on||89057 --- Comment #2 from Andrew Pinski --- Related to PR 89057. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057 [Bug 89057] [8/9/10 Regression] AArch64 ld3 st4 less optimized
[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967 Andrew Pinski changed: What|Removed |Added Keywords||ra Status|UNCONFIRMED |NEW Last reconfirmed||2019-04-04 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- There are two seperate issues which is causing the inefficient code. One is the first set of mov should really be movi; I don't know why the constant formation was moved out of the loop. The second issue is the register allocation is not doing a good job for OImode, there are some clobber rtl there which might be getting in the way; I have not looked into or understand why the clobber rtl is needed.