cleanup code when peeling

peter at cordes dot ca Wed, 16 Dec 2015 13:47:13 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68928


--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
Richard wrote: 
> [...] avoid peeling for alignment on x86_64 and just use unaligned ops

Yeah, that's what clang does, and may be optimal.  Certainly it's easy, and
gives optimal performance when buffers *are* in fact aligned, even when the
programmer has neglected to inform the compiler of any guarantee.

However, with vector sizes getting closer to the cache-line size, unaligned
accesses will cross cache lines more of the time.  (e.g. an AVX loop over an
unaligned buffer will have a cacheline split on every other iteration).  Iff we
can *cheaply* avoid this, it may be worth it.

IIRC, all modern x86 / x86-64 CPUs have no penalty for unaligned loads, as long
as they don't actually cross a cache-line boundary.  (True for Intel since
Nehalem).  Store-forwarding doesn't work well if the stores don't line up with
the loads, though.

[Bug target/68928] AVX loops on unaligned arrays could generate more efficient startup/cleanup code when peeling

Reply via email to