Richard Freeman <[EMAIL PROTECTED]> posted [EMAIL PROTECTED],
excerpted below, on  Sat, 04 Oct 2008 07:56:11 -0400:

> Ben de Groot wrote:
>> 
>> -Os optimizes for size, while -O2 optimizes for speed. There is no need
>> at all to use -Os on a modern desktop machine, and it will run
>> comparatively slower than -O2 optimized code, which is probably not
>> what you want.
>> 
>> 
> There are a couple of schools of thought on that, and I think
> performance can depend a great deal on what program you're talking
> about.
> 
> On any machine, memory is a limited resource.  Oh sure, you could just
> "spend a little more on decent RAM", but you could also "spend a little
> more on a decent CPU" or whatever.  For a given amount of money you can
> only buy so much hardware, so any dollar spent on RAM is a dollar not
> spent on something else.

I agree, but stress the limits of the L1/L2(/L3?) caches more than 
general system memory.  You can generally buy more system memory fairly 
easily, but Lx cache sizes aren't so easy to chance, and are likely to 
remain quite limited resources for some time.

So I generally see the benefits of -Os over -O2, with a few exceptions.  
-freorder-blocks-and-partition (only in CFLAGS, it doesn't work on C++/
CXXFLAGS) can increase the raw size but manages cache better because it 
separates code into hot and cold blocks, giving the hot code a better 
chance at staying in-cache (according to the gcc manpage).  There's a 
couple other similar flags.

Of course, to /really/ get performance, one would need to compile with 
code profiling instrumentation turned on, run the program as you would 
normally (but profiled) for awhile to generate some profiling history, 
then recompile using that history to help optimize things.  This BTW is 
one of the reasons I wonder about -ftracer when I see it in someone's 
CFLAGS.  The gcc manpage says it helps other optimization, but then links 
it to -fprofile-use.  How much help it is without the profiling isn't 
covered, but given the increase in size and the effect of that on caches, 
it's likely not worth it without the profiling.  How many people compile 
first for profiling, run the program to generate profiles, then recompile 
using the profile data?  Right, not so many, at least for most apps.  In 
that case, why do they have -ftracer in their general CFLAGS?

That said, I recently switched to -O2 from my long time -Os.  Much of the 
difference in gcc-3 was due to -funit-at-a-time and similar 
optimizations, enabled by default early on for -Os, but not for -O2 until 
gcc-4.something, I believe.  Modern gcc is more cache-usage-performance 
aware than gcc-3 was, and I think most of the remaining differences are 
like the -freorder-blocks-and-partition thing, they affect CPU cache 
usage negatively enough that you don't want them enabled except for old 
machines, embedded, and perhaps the now popular netbook/atom type 
applications.

Talking about which... I just got my Acer Aspire One (32-bit Atom n270 
CPU), and intend to do a 32-bit chroot on my main machine and create 
binpkgs to merge to the AA1.  Any idea what sort of CFLAGS to use on it?  
I know it doesn't have all that fancy branch prediction and prefetch 
stuff of a normal modern x86_(32/64) CPU.  One suggestion I've seen is 
-march=686, and I'll probably do -Os for it, but what about stuff like 
-fweb -frename-registers, etc?  It does have thru SSE3 at least, so I can 
enable that too.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


Reply via email to