RE: [gentoo-user] bad gentoo performance

2003-11-03 Thread Van Eps, Nathan D. (James Tower)
Small correction: Another list member pointed out to me that
fomit-frame-pointer isn't enable for any of the O settings for x86
(according to the documentation). I wanted to make sure, so I emailed
gcc-help email address. Someone emailed me the following procedure to
determine exactly what flags are getting set for the different O settings.


To figure out what the differences are between the various optimization
settings, do this:

touch foo.cpp
g++ -O2 -save-temps -fverbose-asm -c foo.cpp
cat foo.s

Replace -O2 with the one(s) that you are interested in.  Compare the
differences in the .s files.  Make sure you save the foo.s that you are
interested to compare against... :-)



-Original Message-
From: SN [mailto:[EMAIL PROTECTED]
Sent: Sunday, November 02, 2003 5:26 PM
To: [EMAIL PROTECTED]
Subject: Re: [gentoo-user] bad gentoo performance


Well since I had a crash 3 days ago I can tell you what my 
exerience with
CFLAGS:

First install:

Filesystem: ext3
-march=athlon-xp -O3 -pipe  -funroll-loops

Note: according to gcc manuall, fomit-frame-pointer 
finline-functions and
all the cra is already turned on by O3

Second install:

Filesystem: reiserfs
-march=athlon-xp -02 -pipe


By accident I was already doing a little test of my own, on the first
install, startup time of konqueror, prelinked: 0,7s
second install had only 0,6s
Also the files, compiled binaries, libs were almost10% smaller. I guess
that's one part of the faster startup.

After studying the gcc manuall up and down I don't believe, 
that the gentoo
suggested O3 is not the best flag for compiling the whole 
distro, I think it
makes things much worse, only people should use that flag, who 
already know,
that they have certain functions in their programms that will 
benefit from
O3, which in most cases doesn't happen.

Also some poeple still believe, that they have to add 50 other flags to
their make.conf, cause gcc man shows them. Here is what gcc 
manual on their
HP says:

O3 contains:


  -fforce-mem
  -foptimize-sibling-calls
  -fstrength-reduce
  -fcse-follow-jumps-fcse-skip-blocks
  -frerun-cse-after-loop-frerun-loop-opt
  -fgcse   -fgcse-lm   -fgcse-sm
  -fdelete-null-pointer-checks
  -fexpensive-optimizations
  -fregmove
  -fschedule-insns  -fschedule-insns2
  -fsched-interblock -fsched-spec
  -fcaller-saves
  -fpeephole2
  -freorder-blocks  -freorder-functions
  -fstrict-aliasing
  -falign-functions  -falign-jumps
  -falign-loops  -falign-labels  -fdefer-pop
  -fmerge-constants
  -fthread-jumps
  -floop-optimize
  -fcrossjumping
  -fif-conversion
  -fif-conversion2
  -fdelayed-branch
  -fguess-branch-probability
  -fcprop-registers
-fforce-mem
Force memory operands to be copied into registers before doing 
arithmetic on
them. This produces better code by making all memory 
references potential
common subexpressions. When they are not common 
subexpressions, instruction
combination should eliminate the separate register-load.
Enabled at levels -O2, -O3, -Os.




-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that 
don't need
one. This avoids the instructions to save, set up and restore frame
pointers; it also makes an extra register available in many 
functions. It
also makes debugging impossible on some machines.
On some machines, such as the VAX, this flag has no effect, because the
standard calling sequence automatically handles the frame pointer and
nothing is saved by pretending it doesn't exist. The 
machine-description
macro FRAME_POINTER_REQUIRED controls whether a target machine 
supports this
flag. See Register Usage.

Enabled at levels -O, -O2, -O3, -Os.




-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.




-finline-functions
Integrate all simple functions into their callers. The compiler
heuristically decides which functions are simple enough to be worth
integrating in this way.
If all calls to a given function are integrated, and the function is
declared static, then the function is normally not output as 
assembler code
in its own right.

Enabled at level -O3



So guys please, don't fall for that CFLAG hype, most of these 
optimization
tales are just plain mystery and I'd say 99% of the people, 
that set CFLAGS
don't even know what they are doing, they read posts in 
threads then they
post them themselves and so on. It's just plain crap.



Set the right march and O2 and you won't loose over any other 
distro, if you
want fast startup times, prelink will do its job. Also you 
have to note,
that some distros use kernel patches that enhance speed, the 
kernel from
kernel.org ususally is very stable and works for most, but 
isn't very much
tuned to run for best performance.







- Original

Re: [gentoo-user] bad gentoo performance

2003-11-03 Thread Peter Ruskin
On Monday 03 Nov 2003 16:12, Van Eps, Nathan D. (James Tower) wrote:
 snip. I wanted to make sure, so I emailed
 gcc-help email address. Someone emailed me the following procedure to
 determine exactly what flags are getting set for the different O
 settings.


 To figure out what the differences are between the various
 optimization settings, do this:

 touch foo.cpp
 g++ -O2 -save-temps -fverbose-asm -c foo.cpp
 cat foo.s

 Replace -O2 with the one(s) that you are interested in.  Compare
 the differences in the .s files.  Make sure you save the foo.s that
 you are interested to compare against... :-)

That's interesting.  I just tried that.  I could find no difference at 
all between these two:

$ g++ -Os -save-temps -fverbose-asm -c foo.cpp
$ g++ -O2 -save-temps -fverbose-asm -c foo.cpp
...which somewhat surprised me

$ g++ -O3 -save-temps -fverbose-asm -c foo.cpp
...gave an extra -frename-registers

I'll stick with this one:
CFLAGS=-march=athlon-xp -O2 -pipe

Peter
-- 
==
Portage 2.0.49-r15 (default-x86-1.4, gcc-3.2.3, glibc-2.3.2-r1, 
2.4.23_pre8-gss)
i686 AMD Athlon(tm) XP 3200+
==


--
[EMAIL PROTECTED] mailing list



Re: [gentoo-user] bad gentoo performance

2003-11-03 Thread Martin LORANG
Le Lundi 3 Novembre 2003 00:26, SN a écrit :
 -fomit-frame-pointer
 Don't keep the frame pointer in a register for functions that don't need
 one. This avoids the instructions to save, set up and restore frame
 pointers; it also makes an extra register available in many functions. It
 also makes debugging impossible on some machines.
 On some machines, such as the VAX, this flag has no effect, because the
 standard calling sequence automatically handles the frame pointer and
 nothing is saved by pretending it doesn't exist. The machine-description
 macro FRAME_POINTER_REQUIRED controls whether a target machine supports
 this flag. See Register Usage.

 Enabled at levels -O, -O2, -O3, -Os.

That is true BUT man gcc says also : 
snip
   -O also turns on -fomit-frame-pointer on machines where doing so 
does not interfere with debugging.
/snap

and it does interfere on x86 architecture so it is not turned on by default on 
x86 at least.

Martin


--
[EMAIL PROTECTED] mailing list



Re: [gentoo-user] bad gentoo performance

2003-11-03 Thread Stroller
On Nov 2, 2003, at 10:13 pm, William Kenworthy wrote:
There was actually a lot of misinformation and pure rubbish spread on
this list about that ... I was present and can say what was done.  In 
that case,
debian and Mandrake WERE faster than gentoo - the figures are there in
black and white.

There was some good reasons for that result
And you address the one I had in mind by using the same kernel on all 
three machines in your revisited article.

But to suggest, as the original article did, that these distributions 
are faster than Gentoo is NOT empirical when the distros tested have 
such a glaring difference between them. As you know, the Gentoo-sources 
(with pre-empt patches) kernel was tested in the original article 
against another with Redhat's patch-set.

Stroller.

--
[EMAIL PROTECTED] mailing list


Re: [gentoo-user] bad gentoo performance

2003-11-03 Thread William Kenworthy
Fact: take 3 distros, install to the reccomended settings (as far as is
practical) and see which is faster.  Gentoo was slowest.  At the time
(is it still the case?) -O3 was being reccomended for gentoo in general,
celerons in particular (was a few months back now!)  This flag has a
rather drastic effect on performance, particularly with celeries.  Its
not just this test, there have been a couple of posts where people who
have dual booted with other distros and find gentoo slower until
properly tuned (did the last poster with the custom application
eventually get gentoo to run faster than debian?)

I did some tests afterwards (not on the same machine unfortunately) and
found that in general, gentoo-sources is slightly faster than open-mosix
(no thread migration) and vanilla kernels (hey, I gotta download this
stuff through a modem!).  Might be hardware dependent but I think this
will hold up in other configs.  pre-empt etc on/off made little/no
difference.  It is also worth noting that these machines were all new
installs and had no cruft, extra services running so these features
would have little effect. So the idea that using the recommended (at the
time) gentoo-sources kernel for performance was ok.  This time around I
chose gs-sources, mainly for the new hardware support, but also it
seemed to perform better than gentoo-sources which had not been updated
at the time.  So unless you have some specific tests which are biased to
some kernel feature, you are not going to see much
advantage/disadvantage there.

Attempts were made to use the gentoo kernel on the other machines
(debian), but proved too much work for the time involved.  But, whats
the point of using the same kernel on each machine when gentoo recommend
gentoo-sources, Mandrake recommends ...  It would have been nice to
check each distro with a vanilla kernel just to see what would happen,
but there were only so many hours in that day.

The test we did early last month (www.linmagau.org) showed that with a
better match of CFLAGS to hardware, you can expect about a 10% gain
(sometimes, if the CFLAGS and hardware and application are a good match)
- but can lose it by making poor choices elsewhere.

10% is not to be sneezed at, but its hardly earth shattering either.  As
I have stated previously, these are empirical tests, not definitive
scientific ones but should hold up in the real world when doing the kind
of work that I and the others do in our day jobs.

BillK

On Tue, 2003-11-04 at 02:56, Stroller wrote:
 On Nov 2, 2003, at 10:13 pm, William Kenworthy wrote:
 
  There was actually a lot of misinformation and pure rubbish spread on
...
 
  There was some good reasons for that result
 
 And you address the one I had in mind by using the same kernel on all 
 three machines in your revisited article.
 
 But to suggest, as the original article did, that these distributions 
 are faster than Gentoo is NOT empirical when the distros tested have 
 such a glaring difference between them. As you know, the Gentoo-sources 
 (with pre-empt patches) kernel was tested in the original article 
 against another with Redhat's patch-set.
 
 Stroller.
 
 
 --
 [EMAIL PROTECTED] mailing list
-- 
William Kenworthy [EMAIL PROTECTED]


--
[EMAIL PROTECTED] mailing list



[gentoo-user] bad gentoo performance

2003-11-02 Thread Eric Marchionni
hi

what about that?
http://articles.linmagau.org/modules.php?op=modloadname=Sectionsfile=indexreq=viewarticleartid=227
cheers,
eric
--
[EMAIL PROTECTED] mailing list


Re: [gentoo-user] bad gentoo performance

2003-11-02 Thread Stroller
On Nov 2, 2003, at 1:24 pm, Eric Marchionni wrote:

what about that?
http://articles.linmagau.org/modules.php? 
op=modloadname=Sectionsfile=indexreq=viewarticleartid=227
It's not new  it's already been discredited empirically. The author  
has posted here some time ago.

See http://tinyurl.com/tc3z and http://tinyurl.com/tc47

More interesting is this: http://www.gentoo.org/main/en/performance.xml

Stroller.

--
[EMAIL PROTECTED] mailing list


Re: [gentoo-user] bad gentoo performance

2003-11-02 Thread William Kenworthy

There was actually a lot of misinformation and pure rubbish spread on
this list about that (my favourite was the guy with the distcc farm who
proudly boasted that the farm would compile anything faster than a
single debian or mandrake system could do so gentoo must be better!) -
unlike him I was present and can say what was done.  In that case,
debian and Mandrake WERE faster than gentoo - the figures are there in
black and white.

There was some good reasons for that result: I have been looking at this
closely since then and it seems that recommending CFLAGS from a list
based on hardware (what we did the first time) or what you read on this
list can be flawed - how many gentoo systems have been set up that way?

Also see here for another go which looked a bit better for gentoo, but
brought up a whole lot of factors no-one expected.

http://www.linmagau.org/;  issue 9

My current thinking is you can go with something very mundane, and lose
only a fraction in ultimate performance, or ***test and tune*** which
seems to give max of ~10% in most cases (run time, not startup which is
a different case).  Just picking CFLAGS out of a hat will get the same
result as a lottery: winners and losers.  It also seems that some
applications will do better with one flagset, and others with a
different set.  You will also need to take the intended use into
account: is startup time more important, or knocking a few hours off a
batch job?  Then there's hardware ...

I suggest if you want to discredit it, you get some debian, Mandrake and
gentoo enthusiasts together and go for it.  It is what we did and was a
lot of fun, with all learning along the way.  Also having people
knowledgeable in each distro present means that there is less chance of
bias.

BillK

On Sun, 2003-11-02 at 22:09, Stroller wrote:
 On Nov 2, 2003, at 1:24 pm, Eric Marchionni wrote:
 
  what about that?
  http://articles.linmagau.org/modules.php? 
  op=modloadname=Sectionsfile=indexreq=viewarticleartid=227
 
 It's not new  it's already been discredited empirically. The author  
 has posted here some time ago.
 
 See http://tinyurl.com/tc3z and http://tinyurl.com/tc47
 
 More interesting is this: http://www.gentoo.org/main/en/performance.xml
 
 Stroller.
 
 
 --
 [EMAIL PROTECTED] mailing list


--
[EMAIL PROTECTED] mailing list



Re: [gentoo-user] bad gentoo performance

2003-11-02 Thread SN
Well since I had a crash 3 days ago I can tell you what my exerience with
CFLAGS:

First install:

Filesystem: ext3
-march=athlon-xp -O3 -pipe  -funroll-loops

Note: according to gcc manuall, fomit-frame-pointer finline-functions and
all the cra is already turned on by O3

Second install:

Filesystem: reiserfs
-march=athlon-xp -02 -pipe


By accident I was already doing a little test of my own, on the first
install, startup time of konqueror, prelinked: 0,7s
second install had only 0,6s
Also the files, compiled binaries, libs were almost10% smaller. I guess
that's one part of the faster startup.

After studying the gcc manuall up and down I don't believe, that the gentoo
suggested O3 is not the best flag for compiling the whole distro, I think it
makes things much worse, only people should use that flag, who already know,
that they have certain functions in their programms that will benefit from
O3, which in most cases doesn't happen.

Also some poeple still believe, that they have to add 50 other flags to
their make.conf, cause gcc man shows them. Here is what gcc manual on their
HP says:

O3 contains:


  -fforce-mem
  -foptimize-sibling-calls
  -fstrength-reduce
  -fcse-follow-jumps-fcse-skip-blocks
  -frerun-cse-after-loop-frerun-loop-opt
  -fgcse   -fgcse-lm   -fgcse-sm
  -fdelete-null-pointer-checks
  -fexpensive-optimizations
  -fregmove
  -fschedule-insns  -fschedule-insns2
  -fsched-interblock -fsched-spec
  -fcaller-saves
  -fpeephole2
  -freorder-blocks  -freorder-functions
  -fstrict-aliasing
  -falign-functions  -falign-jumps
  -falign-loops  -falign-labels  -fdefer-pop
  -fmerge-constants
  -fthread-jumps
  -floop-optimize
  -fcrossjumping
  -fif-conversion
  -fif-conversion2
  -fdelayed-branch
  -fguess-branch-probability
  -fcprop-registers
-fforce-mem
Force memory operands to be copied into registers before doing arithmetic on
them. This produces better code by making all memory references potential
common subexpressions. When they are not common subexpressions, instruction
combination should eliminate the separate register-load.
Enabled at levels -O2, -O3, -Os.




-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need
one. This avoids the instructions to save, set up and restore frame
pointers; it also makes an extra register available in many functions. It
also makes debugging impossible on some machines.
On some machines, such as the VAX, this flag has no effect, because the
standard calling sequence automatically handles the frame pointer and
nothing is saved by pretending it doesn't exist. The machine-description
macro FRAME_POINTER_REQUIRED controls whether a target machine supports this
flag. See Register Usage.

Enabled at levels -O, -O2, -O3, -Os.




-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.




-finline-functions
Integrate all simple functions into their callers. The compiler
heuristically decides which functions are simple enough to be worth
integrating in this way.
If all calls to a given function are integrated, and the function is
declared static, then the function is normally not output as assembler code
in its own right.

Enabled at level -O3



So guys please, don't fall for that CFLAG hype, most of these optimization
tales are just plain mystery and I'd say 99% of the people, that set CFLAGS
don't even know what they are doing, they read posts in threads then they
post them themselves and so on. It's just plain crap.



Set the right march and O2 and you won't loose over any other distro, if you
want fast startup times, prelink will do its job. Also you have to note,
that some distros use kernel patches that enhance speed, the kernel from
kernel.org ususally is very stable and works for most, but isn't very much
tuned to run for best performance.







- Original Message - 
From: Eric Marchionni [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, November 02, 2003 2:24 PM
Subject: [gentoo-user] bad gentoo performance


 hi

 what about that?

http://articles.linmagau.org/modules.php?op=modloadname=Sectionsfile=indexreq=viewarticleartid=227

 cheers,
 eric


 --
 [EMAIL PROTECTED] mailing list




--
[EMAIL PROTECTED] mailing list



Re: [gentoo-user] bad gentoo performance

2003-11-02 Thread Jason Stubbs
On Monday 03 November 2003 07:13, William Kenworthy wrote:
(B Also see here for another go which looked a bit better for gentoo, but
(B brought up a whole lot of factors no-one expected.
(B
(B "http://www.linmagau.org/"  issue 9
(B
(BYou made a small mistake in your description of CFLAGS with -mfpmath=387,sse. 
(BYou wrote "use sse if possible, 387 math co-processor instructions if not" 
(Bbut it actually means to use both at the same time to attempt to double the 
(Bamount of math registers.
(B
(BJason
(B
(B--
(B[EMAIL PROTECTED] mailing list

Re: [gentoo-user] bad gentoo performance

2003-11-02 Thread Bill Kenworthy
Thanks, your correct. :)  I should have checked that, instead of relying
on my memory!

BillK

On Mon, 2003-11-03 at 08:53, Jason Stubbs wrote:
 On Monday 03 November 2003 07:13, William Kenworthy wrote:
  Also see here for another go which looked a bit better for gentoo, but
  brought up a whole lot of factors no-one expected.
 
  http://www.linmagau.org/;  issue 9
 
 You made a small mistake in your description of CFLAGS with -mfpmath=387,sse. 
 You wrote use sse if possible, 387 math co-processor instructions if not 
 but it actually means to use both at the same time to attempt to double the 
 amount of math registers.
 
 Jason
 
 --
 [EMAIL PROTECTED] mailing list



--
[EMAIL PROTECTED] mailing list