On Tue, Dec 15, 2015 at 11:35:45AM +0000, Wilco Dijkstra wrote: > > Add support for vector permute cost since various permutes can expand into a > complex > sequence of instructions. This fixes major performance regressions due to > recent changes > in the SLP vectorizer (which now vectorizes more aggressively and emits many > complex > permutes). > > Set the cost to > 1 for all microarchitectures so that the number of permutes > is usually zero > and regressions disappear. An example of the kind of code that might be > emitted for > VEC_PERM_EXPR {0, 3} where registers happen to be in the wrong order: > > adrp x4, .LC16 > ldr q5, [x4, #:lo12:.LC16 > eor v1.16b, v1.16b, v0.16b > eor v0.16b, v1.16b, v0.16b > eor v1.16b, v1.16b, v0.16b > tbl v0.16b, {v0.16b - v1.16b}, v5.16b > > Regress passes. This fixes regressions that were introduced recently, so OK > for commit?
OK. Thanks, James > ChangeLog: > 2015-12-15 Wilco Dijkstra <wdijk...@arm.com> > > * gcc/config/aarch64/aarch64.c (generic_vector_cost): > Set vec_permute_cost. > (cortexa57_vector_cost): Likewise. > (exynosm1_vector_cost): Likewise. > (xgene1_vector_cost): Likewise. > (aarch64_builtin_vectorization_cost): Use vec_permute_cost. > * gcc/config/aarch64/aarch64-protos.h (cpu_vector_cost): > Add vec_permute_cost entry.