On Sat, Feb 10, 2007 at 07:18:04PM -0000, Athena P wrote: > Hi Randy > > Thanks for your reply. > > Surely specifying the CPU architecture is still worth while. For example > using this string must give speed improvements. > > "-O3 -march=prescott -march=prescott -mtune=prescott -mmmx -msse - > msse2 -msse3 -m3dnow -pipe -mfpmath=sse" -fomit-fram-pointer" > > Or am I missing something? > Yes, I think you are. I haven't played with toolchain optimization for a *long* while, but I think you need to consider *what* is important to you. Among the possibilities are:
(i) How long it takes to compile - in general, adding extra optimisations will slow down any particular compile. So, the whole system will take longer to build. (ii) Execution speed - this might be how long the system takes to build a particular package, or to run a particular task, or for server applications it might be throughput. (iii) Impact on your processor's caches - a bigger binary increases the pressure on your caches, and may mean more pages have to be read when a program or library is loaded. For a desktop, it is sometimes asserted that smaller binaries (smaller code, not removing the symbols to give shorter files) will provide a more responsive system. (iv) There might be other things that matter to some other people, e.g. memory pressure in a heavily-used system, perhaps running bloatware (OOo and a leaky firefox) while trying to do big compiles and simultaneously encoding some media. For a developer, being able to debug problems is important - that might constrain use of -fomit-frame-pointer. The other symbols look 'mostly harmless', although you move towards less-tested territory. The best thing you can do is identify what you hope to achieve, then come up with some repeatable testcases which actually measure what you are interested in, then measure them, ideally several times to remove random variation. My personal view is that there is enough variability in a running system to make a single short test meaningless, it needs to be repeated several times with a method to handle the variation (average all results, or run x times, eliminate best and worse and average the other, or whatever). In testing optimisations of the base system, not only do you need to build two systems to compare, but you probably want to *run* them from the same partition, and if doing file i/o (including compiles) perhaps test on the same empty or pre-loaded-after-mkfs partition, to eliminate variables. All disks I've ever tested get slower the further in you go - try making a few partitions and using hdparm on the individual partitions to see this. Sometimes the fall-off isn't major. Similarly, filesystem performance may vary according to the filesysystem's past history (e.g. where it puts a new file). Now you can maybe see why hardly anybody has performed meaningful tests on toolchain optimisation - for most users there isn't enough likely gain to make the testing worthwhile. For those supporting a package across many similar machines, testing optimisations for their package is possible, but the host system will normally be a given. The worst thing about testing optimizations is that the results are specific to a processor model and the toolchain. Just because a particular optimization is best today, doesn't mean it will be best in a year's time. Mostly, optimization is based on assertion or gut feelings, e.g. those package developers who throw in -O9 when their users are likely to be using gcc, or people who claim that they can see the difference with a particular optimization. ĸen -- das eine Mal als Tragödie, das andere Mal als Farce -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page