Re: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie

Sami Näätänen Tue, 09 Dec 2008 12:34:49 -0800

On Tuesday 09 December 2008 18:07:38 Duncan wrote:
> Sami Näätänen <[EMAIL PROTECTED]> posted
> [EMAIL PROTECTED], excerpted below, on  Tue, 09 Dec
>
> 2008 14:23:30 +0200:
> > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
> > with a 4GB of memory. No overclocking etc. Want this to be stable. :)
> >
> > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
> > (Sorry if this has been up lately, but I just switched to 64bit env
> > so...)
> >
> >
> > Here is mine and some explanation of why (And I use ~arch system with
> > gcc 4.3)
>
> Well, you say you want stable, but then say you use ~arch, so I see
> you're not too stick in the mud. =:^)


Well stable binaries as I said in my clarifying (at least a litle) second 
post. :)

> Here's mine, for a dual Opteron 290:
>
> CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
> fdirectives-only -freorder-blocks-and-partition -combine"
>
> CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
> fdirectives-only"
>
> You can look them up in the gcc manpage, or look back a year or so when I
> explained most of them, altho that was a couple gcc versions ago and they
> weren't quite the same.
>
> But my basic strategy is this:  Because memory is so much slower than
> cache on a modern processor, in general it should pay to optimize for
> size even if it costs a few CPU cycles once in awhile.  Thus, until
> fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2
> since gcc is getting smarter about such optimizations with -O2 now, and
> the few additional size optimizations with -Os now tend to be at the
> expense of cache (think -freorder-blocks-and-partition).  In any case, I
> certainly don't want -O3 or too much loop unrolling and inlining, at the
> expense of cache.
>
> -frename-registers and -fweb are useful for taking advantage of the
> additional registers x86_64 has.  -fdirectives-only is there because it
> works better with ccache, which I use.  You know about -ftree-vectorize
> and -combine is discussed elsewhere on-thread.  -fmerge-all-constants
> isn't strictly C standard, but I've had absolutely zero issues with it,
> and it's going to help with cache.  -freorder-blocks-and-partition won't
> work on most C++ code, thus (along with -combine) the reason I split
> CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it
> stays in cache better.  The various -fgcse-* options make gcc stricter
> about global common subexpression elimination (gcse) under various
> conditions.  This shouldn't add to size and may in fact reduce size by
> reducing instruction count (or moving it out of loops, size neutral), but
> it can increase compile time, the reason a few of them are enabled at -O3
> only, by default.
>
> -combine is the one that causes the most problems, handled per trouble-
> package as mentioned in the other thread using /etc/portage/env/* files.
> The -fredorder-blocks-and-partition can in some cases as well, but if you
> don't have either of those in CXXFLAGS, you'll avoid a lot of the problem
> right there.  Those are the only C(XX)FLAGS I have had issues with
> lately.  The others have worked just fine.
>
> With quad-core you will likely be interested in upping your MAKEOPTS job
> count as well.  Just be aware that it too can cause issues at times.
> Again, however, it's easily worked around per-package as you come across
> them using the env/* files to set MAKEOPTS=-j1 or whatever.

Yeah forgot to told that too. I in fact like to -j <num cores> as then There 
is no need for renicing in most cases and the system stays smooth. 

> Since you mentioned running ~arch, and assuming your PM is still portage,
> you may also want to take a look at the emerge's --jobs and --load-
> average options, for parallel emerges, if you haven't already.  If you
> use them you'll probably find --keep-going useful as well, so it doesn't
> stop just because one of the parallel merges failed.

Well paludis man for quite a while much better dependency handling.

> Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a
> tmpfs.  With 4 gig memory it should speed things up dramatically, and the
> worst-case is that it uses swap, sending to disk what would be 100%
> guaranteed to go to disk if you had PORTAGE_TMPDIR on disk.

Eah I have
3GB tmpfs for /var/tmp/paludis and
1GB tmpfs for /tmp to speed things up in normal operation. And as memory seems 
to be quite cheap I might change to 8GB. After all there is no such thing as 
too much memory... (Actually there can be, but then one has the wrong HW to 
use that memory ;) )

Re: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie

Reply via email to