RE: [gentoo-user] bad gentoo performance
Small correction: Another list member pointed out to me that fomit-frame-pointer isn't enable for any of the O settings for x86 (according to the documentation). I wanted to make sure, so I emailed gcc-help email address. Someone emailed me the following procedure to determine exactly what flags are getting set for the different O settings. To figure out what the differences are between the various optimization settings, do this: touch foo.cpp g++ -O2 -save-temps -fverbose-asm -c foo.cpp cat foo.s Replace -O2 with the one(s) that you are interested in. Compare the differences in the .s files. Make sure you save the foo.s that you are interested to compare against... :-) -Original Message- From: SN [mailto:[EMAIL PROTECTED] Sent: Sunday, November 02, 2003 5:26 PM To: [EMAIL PROTECTED] Subject: Re: [gentoo-user] bad gentoo performance Well since I had a crash 3 days ago I can tell you what my exerience with CFLAGS: First install: Filesystem: ext3 -march=athlon-xp -O3 -pipe -funroll-loops Note: according to gcc manuall, fomit-frame-pointer finline-functions and all the cra is already turned on by O3 Second install: Filesystem: reiserfs -march=athlon-xp -02 -pipe By accident I was already doing a little test of my own, on the first install, startup time of konqueror, prelinked: 0,7s second install had only 0,6s Also the files, compiled binaries, libs were almost10% smaller. I guess that's one part of the faster startup. After studying the gcc manuall up and down I don't believe, that the gentoo suggested O3 is not the best flag for compiling the whole distro, I think it makes things much worse, only people should use that flag, who already know, that they have certain functions in their programms that will benefit from O3, which in most cases doesn't happen. Also some poeple still believe, that they have to add 50 other flags to their make.conf, cause gcc man shows them. Here is what gcc manual on their HP says: O3 contains: -fforce-mem -foptimize-sibling-calls -fstrength-reduce -fcse-follow-jumps-fcse-skip-blocks -frerun-cse-after-loop-frerun-loop-opt -fgcse -fgcse-lm -fgcse-sm -fdelete-null-pointer-checks -fexpensive-optimizations -fregmove -fschedule-insns -fschedule-insns2 -fsched-interblock -fsched-spec -fcaller-saves -fpeephole2 -freorder-blocks -freorder-functions -fstrict-aliasing -falign-functions -falign-jumps -falign-loops -falign-labels -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers -fforce-mem Force memory operands to be copied into registers before doing arithmetic on them. This produces better code by making all memory references potential common subexpressions. When they are not common subexpressions, instruction combination should eliminate the separate register-load. Enabled at levels -O2, -O3, -Os. -fomit-frame-pointer Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines. On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage. Enabled at levels -O, -O2, -O3, -Os. -foptimize-sibling-calls Optimize sibling and tail recursive calls. Enabled at levels -O2, -O3, -Os. -finline-functions Integrate all simple functions into their callers. The compiler heuristically decides which functions are simple enough to be worth integrating in this way. If all calls to a given function are integrated, and the function is declared static, then the function is normally not output as assembler code in its own right. Enabled at level -O3 So guys please, don't fall for that CFLAG hype, most of these optimization tales are just plain mystery and I'd say 99% of the people, that set CFLAGS don't even know what they are doing, they read posts in threads then they post them themselves and so on. It's just plain crap. Set the right march and O2 and you won't loose over any other distro, if you want fast startup times, prelink will do its job. Also you have to note, that some distros use kernel patches that enhance speed, the kernel from kernel.org ususally is very stable and works for most, but isn't very much tuned to run for best performance. - Original
Re: [gentoo-user] bad gentoo performance
On Monday 03 Nov 2003 16:12, Van Eps, Nathan D. (James Tower) wrote: snip. I wanted to make sure, so I emailed gcc-help email address. Someone emailed me the following procedure to determine exactly what flags are getting set for the different O settings. To figure out what the differences are between the various optimization settings, do this: touch foo.cpp g++ -O2 -save-temps -fverbose-asm -c foo.cpp cat foo.s Replace -O2 with the one(s) that you are interested in. Compare the differences in the .s files. Make sure you save the foo.s that you are interested to compare against... :-) That's interesting. I just tried that. I could find no difference at all between these two: $ g++ -Os -save-temps -fverbose-asm -c foo.cpp $ g++ -O2 -save-temps -fverbose-asm -c foo.cpp ...which somewhat surprised me $ g++ -O3 -save-temps -fverbose-asm -c foo.cpp ...gave an extra -frename-registers I'll stick with this one: CFLAGS=-march=athlon-xp -O2 -pipe Peter -- == Portage 2.0.49-r15 (default-x86-1.4, gcc-3.2.3, glibc-2.3.2-r1, 2.4.23_pre8-gss) i686 AMD Athlon(tm) XP 3200+ == -- [EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
Le Lundi 3 Novembre 2003 00:26, SN a écrit : -fomit-frame-pointer Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines. On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage. Enabled at levels -O, -O2, -O3, -Os. That is true BUT man gcc says also : snip -O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging. /snap and it does interfere on x86 architecture so it is not turned on by default on x86 at least. Martin -- [EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
On Nov 2, 2003, at 10:13 pm, William Kenworthy wrote: There was actually a lot of misinformation and pure rubbish spread on this list about that ... I was present and can say what was done. In that case, debian and Mandrake WERE faster than gentoo - the figures are there in black and white. There was some good reasons for that result And you address the one I had in mind by using the same kernel on all three machines in your revisited article. But to suggest, as the original article did, that these distributions are faster than Gentoo is NOT empirical when the distros tested have such a glaring difference between them. As you know, the Gentoo-sources (with pre-empt patches) kernel was tested in the original article against another with Redhat's patch-set. Stroller. -- [EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
Fact: take 3 distros, install to the reccomended settings (as far as is practical) and see which is faster. Gentoo was slowest. At the time (is it still the case?) -O3 was being reccomended for gentoo in general, celerons in particular (was a few months back now!) This flag has a rather drastic effect on performance, particularly with celeries. Its not just this test, there have been a couple of posts where people who have dual booted with other distros and find gentoo slower until properly tuned (did the last poster with the custom application eventually get gentoo to run faster than debian?) I did some tests afterwards (not on the same machine unfortunately) and found that in general, gentoo-sources is slightly faster than open-mosix (no thread migration) and vanilla kernels (hey, I gotta download this stuff through a modem!). Might be hardware dependent but I think this will hold up in other configs. pre-empt etc on/off made little/no difference. It is also worth noting that these machines were all new installs and had no cruft, extra services running so these features would have little effect. So the idea that using the recommended (at the time) gentoo-sources kernel for performance was ok. This time around I chose gs-sources, mainly for the new hardware support, but also it seemed to perform better than gentoo-sources which had not been updated at the time. So unless you have some specific tests which are biased to some kernel feature, you are not going to see much advantage/disadvantage there. Attempts were made to use the gentoo kernel on the other machines (debian), but proved too much work for the time involved. But, whats the point of using the same kernel on each machine when gentoo recommend gentoo-sources, Mandrake recommends ... It would have been nice to check each distro with a vanilla kernel just to see what would happen, but there were only so many hours in that day. The test we did early last month (www.linmagau.org) showed that with a better match of CFLAGS to hardware, you can expect about a 10% gain (sometimes, if the CFLAGS and hardware and application are a good match) - but can lose it by making poor choices elsewhere. 10% is not to be sneezed at, but its hardly earth shattering either. As I have stated previously, these are empirical tests, not definitive scientific ones but should hold up in the real world when doing the kind of work that I and the others do in our day jobs. BillK On Tue, 2003-11-04 at 02:56, Stroller wrote: On Nov 2, 2003, at 10:13 pm, William Kenworthy wrote: There was actually a lot of misinformation and pure rubbish spread on ... There was some good reasons for that result And you address the one I had in mind by using the same kernel on all three machines in your revisited article. But to suggest, as the original article did, that these distributions are faster than Gentoo is NOT empirical when the distros tested have such a glaring difference between them. As you know, the Gentoo-sources (with pre-empt patches) kernel was tested in the original article against another with Redhat's patch-set. Stroller. -- [EMAIL PROTECTED] mailing list -- William Kenworthy [EMAIL PROTECTED] -- [EMAIL PROTECTED] mailing list
[gentoo-user] bad gentoo performance
hi what about that? http://articles.linmagau.org/modules.php?op=modloadname=Sectionsfile=indexreq=viewarticleartid=227 cheers, eric -- [EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
On Nov 2, 2003, at 1:24 pm, Eric Marchionni wrote: what about that? http://articles.linmagau.org/modules.php? op=modloadname=Sectionsfile=indexreq=viewarticleartid=227 It's not new it's already been discredited empirically. The author has posted here some time ago. See http://tinyurl.com/tc3z and http://tinyurl.com/tc47 More interesting is this: http://www.gentoo.org/main/en/performance.xml Stroller. -- [EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
There was actually a lot of misinformation and pure rubbish spread on this list about that (my favourite was the guy with the distcc farm who proudly boasted that the farm would compile anything faster than a single debian or mandrake system could do so gentoo must be better!) - unlike him I was present and can say what was done. In that case, debian and Mandrake WERE faster than gentoo - the figures are there in black and white. There was some good reasons for that result: I have been looking at this closely since then and it seems that recommending CFLAGS from a list based on hardware (what we did the first time) or what you read on this list can be flawed - how many gentoo systems have been set up that way? Also see here for another go which looked a bit better for gentoo, but brought up a whole lot of factors no-one expected. http://www.linmagau.org/; issue 9 My current thinking is you can go with something very mundane, and lose only a fraction in ultimate performance, or ***test and tune*** which seems to give max of ~10% in most cases (run time, not startup which is a different case). Just picking CFLAGS out of a hat will get the same result as a lottery: winners and losers. It also seems that some applications will do better with one flagset, and others with a different set. You will also need to take the intended use into account: is startup time more important, or knocking a few hours off a batch job? Then there's hardware ... I suggest if you want to discredit it, you get some debian, Mandrake and gentoo enthusiasts together and go for it. It is what we did and was a lot of fun, with all learning along the way. Also having people knowledgeable in each distro present means that there is less chance of bias. BillK On Sun, 2003-11-02 at 22:09, Stroller wrote: On Nov 2, 2003, at 1:24 pm, Eric Marchionni wrote: what about that? http://articles.linmagau.org/modules.php? op=modloadname=Sectionsfile=indexreq=viewarticleartid=227 It's not new it's already been discredited empirically. The author has posted here some time ago. See http://tinyurl.com/tc3z and http://tinyurl.com/tc47 More interesting is this: http://www.gentoo.org/main/en/performance.xml Stroller. -- [EMAIL PROTECTED] mailing list -- [EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
Well since I had a crash 3 days ago I can tell you what my exerience with CFLAGS: First install: Filesystem: ext3 -march=athlon-xp -O3 -pipe -funroll-loops Note: according to gcc manuall, fomit-frame-pointer finline-functions and all the cra is already turned on by O3 Second install: Filesystem: reiserfs -march=athlon-xp -02 -pipe By accident I was already doing a little test of my own, on the first install, startup time of konqueror, prelinked: 0,7s second install had only 0,6s Also the files, compiled binaries, libs were almost10% smaller. I guess that's one part of the faster startup. After studying the gcc manuall up and down I don't believe, that the gentoo suggested O3 is not the best flag for compiling the whole distro, I think it makes things much worse, only people should use that flag, who already know, that they have certain functions in their programms that will benefit from O3, which in most cases doesn't happen. Also some poeple still believe, that they have to add 50 other flags to their make.conf, cause gcc man shows them. Here is what gcc manual on their HP says: O3 contains: -fforce-mem -foptimize-sibling-calls -fstrength-reduce -fcse-follow-jumps-fcse-skip-blocks -frerun-cse-after-loop-frerun-loop-opt -fgcse -fgcse-lm -fgcse-sm -fdelete-null-pointer-checks -fexpensive-optimizations -fregmove -fschedule-insns -fschedule-insns2 -fsched-interblock -fsched-spec -fcaller-saves -fpeephole2 -freorder-blocks -freorder-functions -fstrict-aliasing -falign-functions -falign-jumps -falign-loops -falign-labels -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers -fforce-mem Force memory operands to be copied into registers before doing arithmetic on them. This produces better code by making all memory references potential common subexpressions. When they are not common subexpressions, instruction combination should eliminate the separate register-load. Enabled at levels -O2, -O3, -Os. -fomit-frame-pointer Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines. On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage. Enabled at levels -O, -O2, -O3, -Os. -foptimize-sibling-calls Optimize sibling and tail recursive calls. Enabled at levels -O2, -O3, -Os. -finline-functions Integrate all simple functions into their callers. The compiler heuristically decides which functions are simple enough to be worth integrating in this way. If all calls to a given function are integrated, and the function is declared static, then the function is normally not output as assembler code in its own right. Enabled at level -O3 So guys please, don't fall for that CFLAG hype, most of these optimization tales are just plain mystery and I'd say 99% of the people, that set CFLAGS don't even know what they are doing, they read posts in threads then they post them themselves and so on. It's just plain crap. Set the right march and O2 and you won't loose over any other distro, if you want fast startup times, prelink will do its job. Also you have to note, that some distros use kernel patches that enhance speed, the kernel from kernel.org ususally is very stable and works for most, but isn't very much tuned to run for best performance. - Original Message - From: Eric Marchionni [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, November 02, 2003 2:24 PM Subject: [gentoo-user] bad gentoo performance hi what about that? http://articles.linmagau.org/modules.php?op=modloadname=Sectionsfile=indexreq=viewarticleartid=227 cheers, eric -- [EMAIL PROTECTED] mailing list -- [EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
On Monday 03 November 2003 07:13, William Kenworthy wrote: (B Also see here for another go which looked a bit better for gentoo, but (B brought up a whole lot of factors no-one expected. (B (B "http://www.linmagau.org/" issue 9 (B (BYou made a small mistake in your description of CFLAGS with -mfpmath=387,sse. (BYou wrote "use sse if possible, 387 math co-processor instructions if not" (Bbut it actually means to use both at the same time to attempt to double the (Bamount of math registers. (B (BJason (B (B-- (B[EMAIL PROTECTED] mailing list
Re: [gentoo-user] bad gentoo performance
Thanks, your correct. :) I should have checked that, instead of relying on my memory! BillK On Mon, 2003-11-03 at 08:53, Jason Stubbs wrote: On Monday 03 November 2003 07:13, William Kenworthy wrote: Also see here for another go which looked a bit better for gentoo, but brought up a whole lot of factors no-one expected. http://www.linmagau.org/; issue 9 You made a small mistake in your description of CFLAGS with -mfpmath=387,sse. You wrote use sse if possible, 387 math co-processor instructions if not but it actually means to use both at the same time to attempt to double the amount of math registers. Jason -- [EMAIL PROTECTED] mailing list -- [EMAIL PROTECTED] mailing list