Compile the following code with -maltivec -O2 -mabi=altivec -m64: #define vector __attribute__((vector_size(16))) vector float f(vector float t, vector float t1) { return t / t1; } We currently get: addi 9,1,-36 stvewx 2,0,9 addi 9,1,-20 stvewx 3,0,9 addi 9,1,-144 stvewx 2,0,9 addi 9,1,-128 stvewx 3,0,9 addi 9,1,-108 stvewx 2,0,9 addi 9,1,-92 stvewx 3,0,9 addi 9,1,-72 stvewx 2,0,9 addi 9,1,-56 stvewx 3,0,9 addi 9,1,-16 ....
Which is storing out each element of the vector one by one, instead of just storing the whole vector out: stvx 2, ... stvx 3, ... -- Summary: Divide with vectors cause extra stores (and more stack space) (with VMX) Product: gcc Version: 4.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pinskia at gcc dot gnu dot org GCC target triplet: powerpc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28366