https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95021
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> --- STV generates: 8d b6 00 00 00 00 lea 0x0(%esi),%esi a1 00 00 00 00 mov 0x0,%eax R_386_32 target_p 83 ec 08 sub $0x8,%esp f3 0f 7e 00 movq (%eax),%xmm0 a1 00 00 00 00 mov 0x0,%eax R_386_32 c 66 0f 6f c8 movdqa %xmm0,%xmm1 66 0f 7e 44 24 10 movd %xmm0,0x10(%esp) 66 0f 73 d1 20 psrlq $0x20,%xmm1 66 0f d6 00 movq %xmm0,(%eax) 66 0f 7e 4c 24 14 movd %xmm1,0x14(%esp) ff 74 24 14 pushl 0x14(%esp) ff 74 24 14 pushl 0x14(%esp) e8 fc ff ff ff call <d+0x53> R_386_PC32 e instead of 8d b6 00 00 00 00 lea 0x0(%esi),%esi a1 00 00 00 00 mov 0x0,%eax R_386_32 target_p 8b 0d 00 00 00 00 mov 0x0,%ecx R_386_32 c 83 ec 08 sub $0x8,%esp 8b 50 04 mov 0x4(%eax),%edx 8b 00 mov (%eax),%eax 89 51 04 mov %edx,0x4(%ecx) 89 01 mov %eax,(%ecx) 52 push %edx 50 push %eax e8 fc ff ff ff call <d+0x3b> R_386_PC32 e It is hard to tell if vector is faster.