Duncan wrote:
> Daniel Iliev <[EMAIL PROTECTED]> posted [EMAIL PROTECTED],
> excerpted below, on  Wed, 27 Sep 2006 08:50:03 +0300:
>
>   
>> So let me start a with 2 newbie questions caused by my first impressions
>> from the x86_64 world:
>>
>> 1) I use CFLAGS="-march=athlon64 -mfpmath=sse -msse -msse2 -msse3
>> -m3dnow -mmmx -O3 -fomit-frame-pointer -pipe -fpic". Portage complains
>> with *red letters* about the fpic flag. Every time I emerge something it
>> says that "fpic breaks things", but I haven't met a single breakage so
>> far. Is that a bug? Actually there was an ebuild which could not be
>> compiled if mysql was compiled w/o "fpic". I'm not 100% sure but AFAIR
>> it was dev-perl/DBD-mysql.
>>
>> 2) I see too many flags that are disabled by the profile - the kind with
>> the parenthesis around them, like "(-3dnow)". Why? As I mentioned above
>> I enable some of these through my CFLAGS - e.g. (-mmx), (-mmxext),
>> (-sse) and (-sse2) and everything works perfect.
>>     
>
> It seems that you missed some of the Gentoo/AMD64 documentation. 
> Many/most of your questions are answered there.  Unfortunately, I'm not
> aware of a simple easy to use list of everything in one spot, so it's
> reading a bit of documentation here, a bit more there, etc.
>
> The main Gentoo/AMD64 project page.  (This would be the logical place for
> such a list, but it's more the project page, tho it links some of the
> docs, it's just not as easy to find those links as it could be.)
> http://amd64.gentoo.org
>
> Gentoo/AMD64 FAQ:
> http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml
>
> Gentoo/AMD64 HOWTOs.  (There's one on -fPIC here, tho the explanation is
> a bit developer-centric.)
> http://www.gentoo.org/proj/en/base/amd64/howtos/index.xml
>
> A brief direct answer to your questions follows:
>
> *  The sse etc CFLAGS are arch dependent.  Unlike x86 where the
> mmx/sse/other-extensions instructions were added as the arch matured, on
> amd64, they are part of the definition of the arch itself.  All x86_64
> (amd64) CPUs will have mmx/sse/sse2, etc.  Thus, -march=athlon64 already
> tells gcc these are available to use where it wants/needs to.  The others
> don't therefore provide gcc any more information than what it already has.
>
> * -fomit-frame-pointer isn't needed on 64-bit amd64 either, as it's turned
> on for all -O levels on archs (including amd64) where doing so doesn't
> interfere with debugging.  (See the gcc manpage, under -O optimization.)
> You may wish to continue to specify it for stuff that's compiled for
> 32-bit, however, including parts of gcc, a version of glibc, a version of
> the (portage) sandbox library, etc.
>
> * Generally speaking, -fPIC is required on amd64 for ALL LIBRARIES but the
> ebuilds normally take care of it.  Under certain circumstances (like
> unsupported CFLAGS), the configure scripts will turn it off by mistake, see
> the above mentioned -fPIC HOWTO link for details, but the solution isn't
> to add it to your CFLAGS, as that means it will be used for executable
> applications as well as libraries, and /some/ applications /do/ break with
> it.  Not many, but some, and if it's in your CFLAGS, you WILL have bugs
> you file closed as INVALID or the like, due to CFLAG abuse.  If there's
> something not working without it, then THAT'S a bug and should be filed as
> such (unless it's due to use of CFLAGS gcc doesn't support and warns
> about, thus triggering the configure script detection problem discussed
> above and in the HOWTO).
>
> * The profile "disabled" USE flags are simply hard-locked either on or
> off by the profile, so aren't a USE flag option.  It does NOT mean whatever
> the USE flag controls is actually disabled.  Sometimes, as with the
> multilib USE flag, it can mean it's /enabled/.  It just means that the
> profile is set up to control it, generally for a pretty good reason.  In
> the particular cases you mention, the way Gentoo uses the SSE and similar
> USE flags is 32-bit specific, enabling 32-bit specific assembler code in
> the ebuild, for instance.   As already mentioned, the AMD64 arch by
> definition already has these features activated, so no 64-bit USE flags
> are necessary, and enabling the 32-bit USE flags will cause breakage since
> it activates 32-bit specific code in many instances.  Thus the amd64
> profiles have a /very/ good reason to hard-lock these USE flags "off".  An
> example where a USE flag is hard-locked ON by a profile would be multilib.
> The normal AMD64 profiles are all multilib and thus lock this flag ON (tho
> it's still shown as disabled), while 64-bit-only profiles lock it OFF.
>
> A couple of other notes:
>
> Portage now supports per-package CFLAGS and certain other variables as
> controlled by the environment (as long as they are used in an ebuild.sh
> phase, not the python phase, since execution is via a bashrc hook). 
> Create /etc/portage/env/<category> as a directory, populated with package
> or package-version files.  The contents of these files will be sourced
> into the ebuild.sh execution environment for every phase that uses
> ebuild.sh.  CFLAGS and similar variables as found in these files REPLACE
> (that is, they don't add to, they replace entirely) the default make.conf
> CFLAGS.  You can use this mechanism to specify specific CFLAGS for
> specific packages, and could thus set -fomit-frame-pointer and other
> 32-bit x86 specific CFLAGS here if desired, avoiding them in your regular
> make.conf.
>
> You may wish to read a bit of the archives for this list, in particular,
> the recent threads on gcc 4.1.1 CFLAGS, where I discuss mine. 
> Specifically, it's likely -O3 is actually /worse/ performing in many
> instances than -O2 or even -Os (my choice).  The reasoning is this:  CPU
> cycles are fairly cheap in a modern processor, while the expense of
> waiting on main memory in the case of a cache miss is MUCH HIGHER, due to
> the fact that main memory is clocked so much slower than cache.  Smaller
> code fits in cache better and is thus often faster than larger code, even
> when the smaller code isn't as theoretically CPU cycle efficient.  While
> there will certainly be certain applications where -O3 is beneficial, I
> believe if you do actual comparisons, you will find -O2 or -Os faster on a
> system-wide basis.  Of course, it's up to you and much virtual ink has
> been spilled discussing this issue, but that's just my take on things.  If
> you've actually done speed comparisons on AMD64 or can point to some, I'd
> certainly be interested, as I've honestly not cared enough about it to do
> my own, but that's my general take in the absence of specific hard data to
> the contrary.  Rather than optimizing for CPU cycles (-O3), I choose to
> optimize for better register usage (registers being at full CPU speed,
> therefore faster even than L1 cache, -frename-registers and etc) size
> (-Os, disabling loop unrolling), whole and multiple unit optimization
> (-funit-at-a-time, -combine) and hot/cold partitioning
> (-freorder-blocks-and-partition, tho it can't be used on C++ code, etc).  A
> few of my flags fail on a very few specific packages, another use for the
> package specific CFLAGS stuff above.
>
>   
Very detailed answer! Thank you!

Yes, you are right. I have missed the "AMD64 HowTo" documentation. I
found only the FAQ via Google.
It was the easiest (fastest) way to get some answers. ;-)

Thank you all.

-- 
gentoo-amd64@gentoo.org mailing list

Reply via email to