On Sonntag, 27. Mai 2018 03:23:36 CEST Segher Boessenkool wrote:
> On Sun, May 27, 2018 at 01:25:25AM +0200, Allan Sandfeld Jensen wrote:
> > On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote:
> > > On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote:
> > > > I brought this subject up earlier, and was told to suggest it again
> > > > for
> > > > gcc 9, so I have attached the preliminary changes.
> > > > 
> > > > My studies have show that with generic x86-64 optimization it reduces
> > > > binary size with around 0.5%, and when optimizing for x64 targets with
> > > > SSE4 or better, it reduces binary size by 2-3% on average. The
> > > > performance changes are negligible however*, and I haven't been able
> > > > to
> > > > detect changes in compile time big enough to penetrate general noise
> > > > on
> > > > my platform, but perhaps someone has a better setup for that?
> > > > 
> > > > * I believe that is because it currently works best on non-optimized
> > > > code,
> > > > it is better at big basic blocks doing all kinds of things than
> > > > tightly
> > > > written inner loops.
> > > > 
> > > > Anythhing else I should test or report?
> > > 
> > > What does it do on other architectures?
> > 
> > I believe NEON would do the same as SSE4, but I can do a check. For
> > architectures without SIMD it essentially does nothing.
> 
> Sorry, I wasn't clear.  What does it do to performance on other
> architectures?  Is it (almost) always a win (or neutral)?  If not, it
> doesn't belong in -O2, not for the generic options at least.
> 
It shouldnt have any way of making slower code, so it is neutral or a win in 
performance, and similarly in code size, merged instructions means fewer 
instructions.

I never found a benchmark where it really made a measurable difference in 
performance, but I found many large binaries such as Qt or Chromium, where it 
made the binaries a few percent smaller.

Allan


Reply via email to