Jan Hubicka <hubi...@ucw.cz> writes: > Note that I think Core has similar characteristics - at least for string > operations > it fares well with unalignes accesses.
Nehalem and later has very fast unaligned vector loads. There's still some penalty when they cross cache lines however. iirc the rule of thumb is to do unaligned for 128 bit vectors, but avoid it for 256bit vectors because the cache line cross penalty is larger on Sandy Bridge and more likely with the larger vectors. -Andi -- a...@linux.intel.com -- Speaking for myself only