https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86735
--- Comment #28 from Alexander Monakov <amonakov at gcc dot gnu.org> --- It seems like object file is not correctly assembled, note wrong offset to the last instruction (movdqu): .s: vpgatherqd (%rax,%ymm3,4), %xmm4{%k4} vpgatherqd (%rax,%ymm1,4), %xmm0{%k1} vshufi32x4 $0, %ymm0, %ymm4, %ymm0 vmovdqu32 %ymm0, 32(,%rsi,4) .o: 52a: 62 f2 7d 2c 91 24 98 vpgatherqd (%rax,%ymm3,4),%xmm4{%k4} 531: 62 f2 7d 29 91 04 88 vpgatherqd (%rax,%ymm1,4),%xmm0{%k1} 538: 62 f3 5d 28 43 c0 00 vshufi32x4 $0x0,%ymm0,%ymm4,%ymm0 53f: 62 f1 7e 28 7f 04 b5 vmovdqu32 %ymm0,0x1(,%rsi,4) 546: 01 00 00 00 On older Binutils I get vmovdqu32 %ymm0,0x20(,%rsi,4) as expected. Probably relevant Binutils bugs: https://sourceware.org/bugzilla/show_bug.cgi?id=23465 https://sourceware.org/bugzilla/show_bug.cgi?id=23314