I'm not sure what LTO is supposed to do -- the documentation is not exactly clear. But I assumed it should make things faster and/or smaller.
So I tried using it on an application -- a processor emulator, CPU intensive code, a lot of 64 bit integer arithmetic. Using a compile/assembler run on the emulated system as a benchmark, I compared the code on x86_64-linux, gcc 4.7.0, -O2 plain, -O2 -fprofile-use (after having done -fprofile-generate), and -O2 -fprofile-use -flto (using a separate set of profile data files from -fprofile-generate -flto). Results: profiling speeds things up about 8%, but LTO is 50% (!) slower than without. Any suggestions of what to look at for this? paul