https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65369
--- Comment #32 from Alan Modra <amodra at gmail dot com> --- Richi, ptr+12 is nonsense. Suppose ptr is 16k+1, then first vector loads from 16k and second from the same address since (16k+1+12) & ~15 == 16k. But we want to end up with 15 bytes from the first 16-byte aligned block and one byte from the *next* 16-byte aligned block, which means we must use ptr+16.