On Tue, Dec 15, 2015 at 11:35:45AM +0000, Wilco Dijkstra wrote:
> 
> Add support for vector permute cost since various permutes can expand into a 
> complex
> sequence of instructions.  This fixes major performance regressions due to 
> recent changes
> in the SLP vectorizer (which now vectorizes more aggressively and emits many 
> complex 
> permutes).
> 
> Set the cost to > 1 for all microarchitectures so that the number of permutes 
> is usually zero
> and regressions disappear.  An example of the kind of code that might be 
> emitted for
> VEC_PERM_EXPR {0, 3} where registers happen to be in the wrong order:
> 
>         adrp    x4, .LC16
>         ldr     q5, [x4, #:lo12:.LC16
>         eor     v1.16b, v1.16b, v0.16b
>         eor     v0.16b, v1.16b, v0.16b
>         eor     v1.16b, v1.16b, v0.16b
>         tbl     v0.16b, {v0.16b - v1.16b}, v5.16b
> 
> Regress passes. This fixes regressions that were introduced recently, so OK 
> for commit?

OK.

Thanks,
James

> ChangeLog:
> 2015-12-15  Wilco Dijkstra  <wdijk...@arm.com>
> 
>       * gcc/config/aarch64/aarch64.c (generic_vector_cost):
>       Set vec_permute_cost.
>       (cortexa57_vector_cost): Likewise.
>       (exynosm1_vector_cost): Likewise.
>       (xgene1_vector_cost): Likewise.
>       (aarch64_builtin_vectorization_cost): Use vec_permute_cost.
>       * gcc/config/aarch64/aarch64-protos.h (cpu_vector_cost):
>       Add vec_permute_cost entry.
 

Reply via email to