On 2017.08.26 at 12:40 +0200, Allan Sandfeld Jensen wrote: > On Samstag, 26. August 2017 10:56:16 CEST Markus Trippelsdorf wrote: > > On 2017.08.26 at 01:39 -0700, Andrew Pinski wrote: > > > First let me put into some perspective on -Os usage and some history: > > > 1) -Os is not useful for non-embedded users > > > 2) the embedded folks really need the smallest code possible and > > > usually will be willing to afford the performance hit > > > 3) -Os was a mistake for Apple to use in the first place; they used it > > > and then GCC got better for PowerPC to use the string instructions > > > which is why -Oz was added :) > > > 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. > > > > > > Comparing -O3 to -Os is not totally fair on x86 due to the many > > > different instructions and encodings. > > > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a > > > big issue. > > > I soon have a need to keep overall (bare-metal) application size down > > > to just 256k. > > > Micro-controllers are places where -Os matters the most. > > > > > > This comment does not help my application usage. It rather hurts it > > > and goes against what -Os is really about. It is not about reducing > > > icache pressure but overall application code size. I really need the > > > code to fit into a specific size. > > > > For many applications using -flto does reduce code size more than just > > going from -O2 to -Os. > > I added the option to optimize with -Os in Qt, and it gives an average 15% > reduction in binary size, somtimes as high as 25%. Using lto gives almost the > same (slightly less), but the two options combine perfectly and using both > can > reduce binary size from 20 to 40%. And that is on a shared library, not even > a > statically linked binary. > > Only real minus is that some of the libraries especially QtGui would benefit > from a auto-vectorization, so it would be nice if there existed an -O3s > version which vectorized the most obvious vectorizable functions, a few > hundred bytes for an additional version here and there would do good. > Fortunately it doesn't too much damage as we have manually vectorized > routines > for to have good performance also on MSVC, if we relied more on auto- > vectorization it would be worse.
In that case using profile guided optimizations will help. It will optimize cold functions with -Os and hot functions with -O3 (when using e.g.: "-flto -O3 -fprofile-use"). Of course you will have to compile twice and also collect training data from your library in between. -- Markus