Hi,

On Thu, Sep 14, 2017 at 11:55:21AM +0200, Richard Biener wrote:
> On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen
> <li...@carewolf.com> wrote:
> > On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote:
> >> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
> >> > On its own -O3 doesn't add much (some loop opts and slightly more
> >> > aggressive inlining/unrolling), so whatever it does we
> >> > should consider doing at -O2 eventually.
> >>
> >> Well, -O3 adds vectorization, which we don't enable at -O2 by default.
> >>
> > Would it be possible to enable basic block vectorization on -O2? I assume 
> > that
> > doesn't increase binary size since it doesn't unroll loops.
> 
> Somebody needs to provide benchmarking looking at the compile-time cost
> vs. the runtime benefit and the code size effect.  There's also room to tune
> aggressiveness of BB vectorization as it currently allows for cases where
> the scalar computation is not fully replaced by vector code.
> 

A good candidate too look at might be 525.x264_r from the SPEC2017 CPU
suite.  With just -O2, GCC is about 70% slower than LLVM (which I
think must be doing some vectorization at -O2).  When I give -O2
-ftree-vectorize to gcc, the difference drops to 20%, so vectorization
is not the whole story either.  There is no real difference in
run-time of executables generated with both compilers at -Ofast.

(But no, I'm not volunteering to analyze it further in foreseeable
future.)

Martin

Reply via email to