On Tuesday 09 December 2008 18:07:38 Duncan wrote: > Sami Näätänen <[EMAIL PROTECTED]> posted > [EMAIL PROTECTED], excerpted below, on Tue, 09 Dec > > 2008 14:23:30 +0200: > > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled > > with a 4GB of memory. No overclocking etc. Want this to be stable. :) > > > > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo? > > (Sorry if this has been up lately, but I just switched to 64bit env > > so...) > > > > > > Here is mine and some explanation of why (And I use ~arch system with > > gcc 4.3) > > Well, you say you want stable, but then say you use ~arch, so I see > you're not too stick in the mud. =:^)
Well stable binaries as I said in my clarifying (at least a litle) second post. :) > Here's mine, for a dual Opteron 290: > > CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge- > all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize - > fdirectives-only -freorder-blocks-and-partition -combine" > > CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge- > all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize - > fdirectives-only" > > You can look them up in the gcc manpage, or look back a year or so when I > explained most of them, altho that was a couple gcc versions ago and they > weren't quite the same. > > But my basic strategy is this: Because memory is so much slower than > cache on a modern processor, in general it should pay to optimize for > size even if it costs a few CPU cycles once in awhile. Thus, until > fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2 > since gcc is getting smarter about such optimizations with -O2 now, and > the few additional size optimizations with -Os now tend to be at the > expense of cache (think -freorder-blocks-and-partition). In any case, I > certainly don't want -O3 or too much loop unrolling and inlining, at the > expense of cache. > > -frename-registers and -fweb are useful for taking advantage of the > additional registers x86_64 has. -fdirectives-only is there because it > works better with ccache, which I use. You know about -ftree-vectorize > and -combine is discussed elsewhere on-thread. -fmerge-all-constants > isn't strictly C standard, but I've had absolutely zero issues with it, > and it's going to help with cache. -freorder-blocks-and-partition won't > work on most C++ code, thus (along with -combine) the reason I split > CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it > stays in cache better. The various -fgcse-* options make gcc stricter > about global common subexpression elimination (gcse) under various > conditions. This shouldn't add to size and may in fact reduce size by > reducing instruction count (or moving it out of loops, size neutral), but > it can increase compile time, the reason a few of them are enabled at -O3 > only, by default. > > -combine is the one that causes the most problems, handled per trouble- > package as mentioned in the other thread using /etc/portage/env/* files. > The -fredorder-blocks-and-partition can in some cases as well, but if you > don't have either of those in CXXFLAGS, you'll avoid a lot of the problem > right there. Those are the only C(XX)FLAGS I have had issues with > lately. The others have worked just fine. > > With quad-core you will likely be interested in upping your MAKEOPTS job > count as well. Just be aware that it too can cause issues at times. > Again, however, it's easily worked around per-package as you come across > them using the env/* files to set MAKEOPTS=-j1 or whatever. Yeah forgot to told that too. I in fact like to -j <num cores> as then There is no need for renicing in most cases and the system stays smooth. > Since you mentioned running ~arch, and assuming your PM is still portage, > you may also want to take a look at the emerge's --jobs and --load- > average options, for parallel emerges, if you haven't already. If you > use them you'll probably find --keep-going useful as well, so it doesn't > stop just because one of the parallel merges failed. Well paludis man for quite a while much better dependency handling. > Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a > tmpfs. With 4 gig memory it should speed things up dramatically, and the > worst-case is that it uses swap, sending to disk what would be 100% > guaranteed to go to disk if you had PORTAGE_TMPDIR on disk. Eah I have 3GB tmpfs for /var/tmp/paludis and 1GB tmpfs for /tmp to speed things up in normal operation. And as memory seems to be quite cheap I might change to 8GB. After all there is no such thing as too much memory... (Actually there can be, but then one has the wrong HW to use that memory ;) )