[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64

2023-08-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=106106

--- Comment #3 from Tamar Christina  ---
This is caused by SRA scalarizing the structural registers. i.e. it breaks
apart the uint8x16x2_t into two uint8x16_t, for use with vld2 we need them as a
whole, and so we recreate the type again.

This causes a copy through scalarization and then constructing the type again
in RTL. Reload is able to remove one copy but not the other.


The fix for #106106 will also fix this.

[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64

2021-05-17 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967
Bug 89967 depends on bug 89057, which changed state.

Bug 89057 Summary: [9 Regression] AArch64 ld3 st4 less optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64

2020-03-24 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967
Bug 89967 depends on bug 94052, which changed state.

Bug 94052 Summary: Paradoxical subregs out of expand causes ICE with multi 
register modes at -O2 or higher
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94052

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64

2020-03-05 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||89057

--- Comment #2 from Andrew Pinski  ---
Related to PR 89057.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057
[Bug 89057] [8/9/10 Regression] AArch64 ld3 st4 less optimized

[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64

2019-04-04 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||ra
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-04-04
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
There are two seperate issues which is causing the inefficient code.
One is the first set of mov should really be movi; I don't know why the
constant formation was moved out of the loop.
The second issue is the register allocation is not doing a good job for OImode,
there are some clobber rtl there which might be getting in the way; I have not
looked into or understand why the clobber rtl is needed.