https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908
--- Comment #10 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- (In reply to Richard Biener from comment #9) > One issue with > > V load3(const unsigned long* ptr) > { > V ret = {}; > __builtin_memcpy(&ret, ptr, 3 * sizeof(unsigned long)); > > is that we cannot load a vector worth of data from ptr because that might > trap Unless the target has a masked load instruction (e.g. AVX512) or ptr is known to be aligned to at least 16 Bytes (in which case we know there cannot be a page boundary at ptr + 24 Bytes). No? In this specific example, ptr is pointing to a 32-Byte vector object. The library can do this and it makes a difference: if (__builtin_object_size(ptr, 0) >= 4 * sizeof(T)) __builtin_memcpy(&ret, ptr, 4 * sizeof(T)); else __builtin_memcpy(&ret, ptr, 3 * sizeof(T));