On Wed, Dec 12, 2012 at 6:50 PM, Andi Kleen <a...@firstfloor.org> wrote: > "H.J. Lu" <hjl.to...@gmail.com> writes: >> >> i386.c has >> >> { >> /* When not optimize for size, enable vzeroupper optimization for >> TARGET_AVX with -fexpensive-optimizations and split 32-byte >> AVX unaligned load/store. */ > > This is only for the load, not for deciding whether peeling is > worthwhile or not. > > I believe it's unimplemented for x86 at this point. There isn't even a > hook for it. Any hook that is added should ideally work for both ARM64 > and x86. This would imply it would need to handle different vector > sizes.
There is /* Implement targetm.vectorize.builtin_vectorization_cost. */ static int ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, tree vectype, int misalign ATTRIBUTE_UNUSED) { ... case unaligned_load: case unaligned_store: return ix86_cost->vec_unalign_load_cost; which indeed doesn't distinguish between unaligned load/store cost. Still it does distinguish between aligned and unaligned load/store cost. Now look at the cost tables and see different unaligned vs. aligned costs dependent on the target CPU. generic32 and generic64 have: 1, /* vec_align_load_cost. */ 2, /* vec_unalign_load_cost. */ 1, /* vec_store_cost. */ The missed piece in the vectorizer is that peeling for alignment should have the option to turn itself off based on that costs (and analysis). Richard.