http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #11 from Thiago Macieira <thiago at kde dot org> 2012-08-13 
10:12:48 UTC ---
Attaching __attribute__((target("xxx"))) to the function does help.

It generates the following with the my_bzero function from comment 2:

00000000000002e0 <bzero_avx.2362>:
 2e0:   test   %rsi,%rsi
 2e3:   vpxor  %xmm0,%xmm0,%xmm0
 2e7:   je     2fe <bzero_avx.2362+0x1e>
 2e9:   nopl   0x0(%rax)
 2f0:   vmovntdq %xmm0,(%rdi)
 2f4:   add    $0x10,%rdi
 2f8:   sub    $0x1,%rsi
 2fc:   jne    2f0 <bzero_avx.2362+0x10>
 2fe:   repz retq 

0000000000000300 <my_bzero>:
 300:   mov    0x200171(%rip),%rax        # 200478 <my_bzero+0x200178>
 307:   mov    (%rax),%eax
 309:   test   %eax,%eax
 30b:   jne    330 <my_bzero+0x30>
 30d:   test   %rsi,%rsi
 310:   pxor   %xmm0,%xmm0
 314:   je     332 <my_bzero+0x32>
 316:   nopw   %cs:0x0(%rax,%rax,1)
 320:   movntdq %xmm0,(%rdi)
 324:   add    $0x10,%rdi
 328:   sub    $0x1,%rsi
 32c:   jne    320 <my_bzero+0x20>
 32e:   repz retq 
 330:   jmp    2e0 <bzero_avx.2362>
 332:   repz retq 


This workaround might be useful for me in a few places where the code inlining
provided by LTO was desired (even though, in this example, the AVX variant is
exactly what it would be if no LTO had been used). But it won't work without
major changes to the code if I have 400+ functions in a file, plus possibly
inlines from headers, to be compiled.

Reply via email to