Richard Freeman <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Sat, 04 Oct 2008 07:56:11 -0400:
> Ben de Groot wrote: >> >> -Os optimizes for size, while -O2 optimizes for speed. There is no need >> at all to use -Os on a modern desktop machine, and it will run >> comparatively slower than -O2 optimized code, which is probably not >> what you want. >> >> > There are a couple of schools of thought on that, and I think > performance can depend a great deal on what program you're talking > about. > > On any machine, memory is a limited resource. Oh sure, you could just > "spend a little more on decent RAM", but you could also "spend a little > more on a decent CPU" or whatever. For a given amount of money you can > only buy so much hardware, so any dollar spent on RAM is a dollar not > spent on something else. I agree, but stress the limits of the L1/L2(/L3?) caches more than general system memory. You can generally buy more system memory fairly easily, but Lx cache sizes aren't so easy to chance, and are likely to remain quite limited resources for some time. So I generally see the benefits of -Os over -O2, with a few exceptions. -freorder-blocks-and-partition (only in CFLAGS, it doesn't work on C++/ CXXFLAGS) can increase the raw size but manages cache better because it separates code into hot and cold blocks, giving the hot code a better chance at staying in-cache (according to the gcc manpage). There's a couple other similar flags. Of course, to /really/ get performance, one would need to compile with code profiling instrumentation turned on, run the program as you would normally (but profiled) for awhile to generate some profiling history, then recompile using that history to help optimize things. This BTW is one of the reasons I wonder about -ftracer when I see it in someone's CFLAGS. The gcc manpage says it helps other optimization, but then links it to -fprofile-use. How much help it is without the profiling isn't covered, but given the increase in size and the effect of that on caches, it's likely not worth it without the profiling. How many people compile first for profiling, run the program to generate profiles, then recompile using the profile data? Right, not so many, at least for most apps. In that case, why do they have -ftracer in their general CFLAGS? That said, I recently switched to -O2 from my long time -Os. Much of the difference in gcc-3 was due to -funit-at-a-time and similar optimizations, enabled by default early on for -Os, but not for -O2 until gcc-4.something, I believe. Modern gcc is more cache-usage-performance aware than gcc-3 was, and I think most of the remaining differences are like the -freorder-blocks-and-partition thing, they affect CPU cache usage negatively enough that you don't want them enabled except for old machines, embedded, and perhaps the now popular netbook/atom type applications. Talking about which... I just got my Acer Aspire One (32-bit Atom n270 CPU), and intend to do a 32-bit chroot on my main machine and create binpkgs to merge to the AA1. Any idea what sort of CFLAGS to use on it? I know it doesn't have all that fancy branch prediction and prefetch stuff of a normal modern x86_(32/64) CPU. One suggestion I've seen is -march=686, and I'll probably do -Os for it, but what about stuff like -fweb -frename-registers, etc? It does have thru SSE3 at least, so I can enable that too. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman