Duncan wrote: > Daniel Iliev <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], > excerpted below, on Wed, 27 Sep 2006 08:50:03 +0300: > > >> So let me start a with 2 newbie questions caused by my first impressions >> from the x86_64 world: >> >> 1) I use CFLAGS="-march=athlon64 -mfpmath=sse -msse -msse2 -msse3 >> -m3dnow -mmmx -O3 -fomit-frame-pointer -pipe -fpic". Portage complains >> with *red letters* about the fpic flag. Every time I emerge something it >> says that "fpic breaks things", but I haven't met a single breakage so >> far. Is that a bug? Actually there was an ebuild which could not be >> compiled if mysql was compiled w/o "fpic". I'm not 100% sure but AFAIR >> it was dev-perl/DBD-mysql. >> >> 2) I see too many flags that are disabled by the profile - the kind with >> the parenthesis around them, like "(-3dnow)". Why? As I mentioned above >> I enable some of these through my CFLAGS - e.g. (-mmx), (-mmxext), >> (-sse) and (-sse2) and everything works perfect. >> > > It seems that you missed some of the Gentoo/AMD64 documentation. > Many/most of your questions are answered there. Unfortunately, I'm not > aware of a simple easy to use list of everything in one spot, so it's > reading a bit of documentation here, a bit more there, etc. > > The main Gentoo/AMD64 project page. (This would be the logical place for > such a list, but it's more the project page, tho it links some of the > docs, it's just not as easy to find those links as it could be.) > http://amd64.gentoo.org > > Gentoo/AMD64 FAQ: > http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml > > Gentoo/AMD64 HOWTOs. (There's one on -fPIC here, tho the explanation is > a bit developer-centric.) > http://www.gentoo.org/proj/en/base/amd64/howtos/index.xml > > A brief direct answer to your questions follows: > > * The sse etc CFLAGS are arch dependent. Unlike x86 where the > mmx/sse/other-extensions instructions were added as the arch matured, on > amd64, they are part of the definition of the arch itself. All x86_64 > (amd64) CPUs will have mmx/sse/sse2, etc. Thus, -march=athlon64 already > tells gcc these are available to use where it wants/needs to. The others > don't therefore provide gcc any more information than what it already has. > > * -fomit-frame-pointer isn't needed on 64-bit amd64 either, as it's turned > on for all -O levels on archs (including amd64) where doing so doesn't > interfere with debugging. (See the gcc manpage, under -O optimization.) > You may wish to continue to specify it for stuff that's compiled for > 32-bit, however, including parts of gcc, a version of glibc, a version of > the (portage) sandbox library, etc. > > * Generally speaking, -fPIC is required on amd64 for ALL LIBRARIES but the > ebuilds normally take care of it. Under certain circumstances (like > unsupported CFLAGS), the configure scripts will turn it off by mistake, see > the above mentioned -fPIC HOWTO link for details, but the solution isn't > to add it to your CFLAGS, as that means it will be used for executable > applications as well as libraries, and /some/ applications /do/ break with > it. Not many, but some, and if it's in your CFLAGS, you WILL have bugs > you file closed as INVALID or the like, due to CFLAG abuse. If there's > something not working without it, then THAT'S a bug and should be filed as > such (unless it's due to use of CFLAGS gcc doesn't support and warns > about, thus triggering the configure script detection problem discussed > above and in the HOWTO). > > * The profile "disabled" USE flags are simply hard-locked either on or > off by the profile, so aren't a USE flag option. It does NOT mean whatever > the USE flag controls is actually disabled. Sometimes, as with the > multilib USE flag, it can mean it's /enabled/. It just means that the > profile is set up to control it, generally for a pretty good reason. In > the particular cases you mention, the way Gentoo uses the SSE and similar > USE flags is 32-bit specific, enabling 32-bit specific assembler code in > the ebuild, for instance. As already mentioned, the AMD64 arch by > definition already has these features activated, so no 64-bit USE flags > are necessary, and enabling the 32-bit USE flags will cause breakage since > it activates 32-bit specific code in many instances. Thus the amd64 > profiles have a /very/ good reason to hard-lock these USE flags "off". An > example where a USE flag is hard-locked ON by a profile would be multilib. > The normal AMD64 profiles are all multilib and thus lock this flag ON (tho > it's still shown as disabled), while 64-bit-only profiles lock it OFF. > > A couple of other notes: > > Portage now supports per-package CFLAGS and certain other variables as > controlled by the environment (as long as they are used in an ebuild.sh > phase, not the python phase, since execution is via a bashrc hook). > Create /etc/portage/env/<category> as a directory, populated with package > or package-version files. The contents of these files will be sourced > into the ebuild.sh execution environment for every phase that uses > ebuild.sh. CFLAGS and similar variables as found in these files REPLACE > (that is, they don't add to, they replace entirely) the default make.conf > CFLAGS. You can use this mechanism to specify specific CFLAGS for > specific packages, and could thus set -fomit-frame-pointer and other > 32-bit x86 specific CFLAGS here if desired, avoiding them in your regular > make.conf. > > You may wish to read a bit of the archives for this list, in particular, > the recent threads on gcc 4.1.1 CFLAGS, where I discuss mine. > Specifically, it's likely -O3 is actually /worse/ performing in many > instances than -O2 or even -Os (my choice). The reasoning is this: CPU > cycles are fairly cheap in a modern processor, while the expense of > waiting on main memory in the case of a cache miss is MUCH HIGHER, due to > the fact that main memory is clocked so much slower than cache. Smaller > code fits in cache better and is thus often faster than larger code, even > when the smaller code isn't as theoretically CPU cycle efficient. While > there will certainly be certain applications where -O3 is beneficial, I > believe if you do actual comparisons, you will find -O2 or -Os faster on a > system-wide basis. Of course, it's up to you and much virtual ink has > been spilled discussing this issue, but that's just my take on things. If > you've actually done speed comparisons on AMD64 or can point to some, I'd > certainly be interested, as I've honestly not cared enough about it to do > my own, but that's my general take in the absence of specific hard data to > the contrary. Rather than optimizing for CPU cycles (-O3), I choose to > optimize for better register usage (registers being at full CPU speed, > therefore faster even than L1 cache, -frename-registers and etc) size > (-Os, disabling loop unrolling), whole and multiple unit optimization > (-funit-at-a-time, -combine) and hot/cold partitioning > (-freorder-blocks-and-partition, tho it can't be used on C++ code, etc). A > few of my flags fail on a very few specific packages, another use for the > package specific CFLAGS stuff above. > > Very detailed answer! Thank you!
Yes, you are right. I have missed the "AMD64 HowTo" documentation. I found only the FAQ via Google. It was the easiest (fastest) way to get some answers. ;-) Thank you all. -- gentoo-amd64@gentoo.org mailing list