https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- Thanks, and it might be enough to handle typedef unsigned long V [[gnu::vector_size(32)]]; V load3(const unsigned long* ptr) { V ret = {}; __builtin_memcpy(&ret, ptr, 3 * sizeof(unsigned long)); return ret; } where with -O2 .optimized still has <bb 2> [local count: 1073741824]: ret = { 0, 0, 0, 0 }; __builtin_memcpy (&ret, ptr_3(D), 24); _5 = ret; ret ={v} {CLOBBER(eos)}; return _5; and thus 'ret' not promoted to a register.