Hi, On Thu, Sep 14, 2017 at 11:55:21AM +0200, Richard Biener wrote: > On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen > <li...@carewolf.com> wrote: > > On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote: > >> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: > >> > On its own -O3 doesn't add much (some loop opts and slightly more > >> > aggressive inlining/unrolling), so whatever it does we > >> > should consider doing at -O2 eventually. > >> > >> Well, -O3 adds vectorization, which we don't enable at -O2 by default. > >> > > Would it be possible to enable basic block vectorization on -O2? I assume > > that > > doesn't increase binary size since it doesn't unroll loops. > > Somebody needs to provide benchmarking looking at the compile-time cost > vs. the runtime benefit and the code size effect. There's also room to tune > aggressiveness of BB vectorization as it currently allows for cases where > the scalar computation is not fully replaced by vector code. >
A good candidate too look at might be 525.x264_r from the SPEC2017 CPU suite. With just -O2, GCC is about 70% slower than LLVM (which I think must be doing some vectorization at -O2). When I give -O2 -ftree-vectorize to gcc, the difference drops to 20%, so vectorization is not the whole story either. There is no real difference in run-time of executables generated with both compilers at -Ofast. (But no, I'm not volunteering to analyze it further in foreseeable future.) Martin