On 05/02/13 18:18, Christophe Lyon wrote:
Hi,
Following the discussion about "disable peeling" [1] a few weeks ago,
it turned out that the vectorizer cost model needed some
implementation for ARM.
The attached patch implements arm_builtin_vectorization_cost and
arm_add_stmt_cost, providing default costs when aligned and unaligned
loads/stores have the same cost (=1). init_cost and finish_cost still
use the default implementation (I noticed that x86 has chosen to
duplicate the default implementation without changing it, why?)
Benchmarking shows very little variation, expect a noticeable +1.6% on coremark.
If this is OK, we can then discuss how to disable peeling completely
when aligned and unaligned accesses have the same cost (and thus where
peeling is a loss of performance). I think adding a new hook is
necessary, since target descriptions may use different models for
these costs (eg x86 makes no difference between unaligned loads and
unaligned stores).
Thanks,
Christophe.
[1] http://gcc.gnu.org/ml/gcc/2012-12/msg00036.html
2013-02-05 Christophe Lyon <christophe.l...@linaro.org>
* config/arm/arm.c (arm_builtin_vectorization_cost)
(arm_add_stmt_cost): New functions.
(TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST)
(TARGET_VECTORIZE_ADD_STMT_COST): Define.
(struct processor_costs): New struct type.
(default_arm_cost): New struct of type processor_costs.=
Christophe,
Thanks for the patch. This is mostly OK, but please can you make the
following changes.
+struct processor_costs {
Please name this something like cpu_vec_costs. It's not the only cost
table in the back-end.
+struct processor_costs default_arm_cost = { /* arm generic costs. */
Similarly, use something like default_arm_vec_cost.
+const struct processor_costs *arm_cost = &default_arm_cost;
And here. But better still, link this through the current_tune table
rather than introducing a new global.
Finally,
@@ -27256,4 +27272,130 @@ arm_validize_comparison (rtx *comparison, rtx
* op1, rtx * op2)
}
+/* Vectorizer cost model implementation. */
Please put the patch in a more suitable location rather than just
dumping it at the end of the file. There are already numerous functions
related to costs that are mostly grouped together. I suggest this goes
near the rtx_costs code.
R.