http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231
--- Comment #11 from Thiago Macieira <thiago at kde dot org> 2012-08-13 10:12:48 UTC --- Attaching __attribute__((target("xxx"))) to the function does help. It generates the following with the my_bzero function from comment 2: 00000000000002e0 <bzero_avx.2362>: 2e0: test %rsi,%rsi 2e3: vpxor %xmm0,%xmm0,%xmm0 2e7: je 2fe <bzero_avx.2362+0x1e> 2e9: nopl 0x0(%rax) 2f0: vmovntdq %xmm0,(%rdi) 2f4: add $0x10,%rdi 2f8: sub $0x1,%rsi 2fc: jne 2f0 <bzero_avx.2362+0x10> 2fe: repz retq 0000000000000300 <my_bzero>: 300: mov 0x200171(%rip),%rax # 200478 <my_bzero+0x200178> 307: mov (%rax),%eax 309: test %eax,%eax 30b: jne 330 <my_bzero+0x30> 30d: test %rsi,%rsi 310: pxor %xmm0,%xmm0 314: je 332 <my_bzero+0x32> 316: nopw %cs:0x0(%rax,%rax,1) 320: movntdq %xmm0,(%rdi) 324: add $0x10,%rdi 328: sub $0x1,%rsi 32c: jne 320 <my_bzero+0x20> 32e: repz retq 330: jmp 2e0 <bzero_avx.2362> 332: repz retq This workaround might be useful for me in a few places where the code inlining provided by LTO was desired (even though, in this example, the AVX variant is exactly what it would be if no LTO had been used). But it won't work without major changes to the code if I have 400+ functions in a file, plus possibly inlines from headers, to be compiled.