http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61043

            Bug ID: 61043
           Summary: LTO accumulates CPU requirements from all input
                    objects
           Product: gcc
           Version: 4.8.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andysem at mail dot ru

Created attachment 32726
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32726&action=edit
A test case to reproduce the problem

I have a test case (attached) with several input files. main.cpp contains
generic code that should run on any CPU, and add_sse2.c and add_avx2.c
containing optimized code with SSE2 and AVX2 intrinsics, respectively. main.cpp
detects CPU features in run time and invokes the optimized code when possible.

The problem is when this test is compiled with LTO enabled, the resulting
executable contains add_sse2 function with VEX-encoded instructions (i.e. with
AVX-128 code instead of legacy SSE2). This does not happen when LTO is not
enabled. My guess is that LTO computes the highest required CPU across all
input object files (which is one with AVX2 in this case) and generates code for
it instead of generating code for the CPU that was specified during the
compilation stage. The expected behavior would be to record target-related
compiler options for every function and use these options at LTO stage.

To compile the test you can use compile.sh. To obtain disassembled code you can
use disasm.sh. Look for add_sse2 code in the disassembly.

Reply via email to