On Tue, Dec 09, 2003 at 11:17:23PM +0100, John Nilsson wrote:
> >Thats the express purpose that genflags was created for, to provide
> >users with a known good set of high-performance CFLAGS so they
didn't
> >need to mess around with it too much.
> Still there is no room for improvement when dealing with system wide
> optimization. Your wording here is unclear, I think you mean to say that there IS room for improvement over system wide constant CFLAGS?
English is not my native language. Writing this GLEP made me realize I have problems expressing my thoughts in ANY language =). If this GLEP is going to survive, comments on formulations and wordings (and spelling) is greatly appreciated.
Yes I meant that it is next to impossible to find a system wide optimization beyond "-march=<arch> -O2 -pipe" fit for the majority of users.
> >strip-flags to remove problematic flags on a per ebuild basis is
the
> >best solution. I do agree that unstable gcc settings are a big
> >problem, eg in a recent bug it turned out the submitter's system
(an
> >older Pentium I) couldn't handle -O3 without flaking out. Reduce it
> >to -O2 and the box went fine (both for compiling and already
compiled
> >packages).
> This is a bug in GCC. While a workaround may be a quick solution for
> Gentoo, one shouldn't base the whole system on bugs. No it isn't a bug in GCC, it's a bug with the user's specific hardware. I have an older Pentium I system that runs just fine with -O3 and the user's specified CFLAGS. I didn't force everybody to use -O2, I just got that user to change his own system down to -O2.
I miss understood you. Still this is a bug in a specific cpu. You cant guarantee stability in any case if the hardware is broken.
> >again, genflags was created for this. I've considered a sequel to
> >genflags based on the genetic optimization of compiler flags as
> >mentioned on Slashdot a while ago, but for lack of time, i'm not
even
> >looking at doing it now.
> You might want to chek:
> http://www.coyotegulch.com/potential/gccga/gccga.html
This is the original item I was referencing, but you still run into
the
problem that you need to run things on a system basis to get effective
results.
Yeah, I had the page open when I read your mail so I though I'd spare you the trouble of looking it up =)
http://www.coyotegulch.com/acovea/index.html is the rest of the article,
> http://www.rocklinux.net/packages/ccbench.html
This basically brute forces the genetic algorithms, with absolutely no
thought as to the net effects on the results of the given flags, eg,
on
my home server (an AthlonXP 2400+), it returns these results:
gcc -O3 -march=athlon -fomit-frame-pointer -funroll-loops
-frerun-loop-opt -funroll-all-loops -fschedule-insns
Of that, '-frerun-loop-opt' and '-fschedule-insns' are redundant as they are implied by -O3.
-fomit-frame-pointer and I can't debug code properly anymore, and if I
try to use -funroll-all-loops to compile mysql, even with it's
--with-low-memory option, gcc wants 600mb of memory to compile it's
sql_yacc.cc.
I had the same reaction. ccbench was what made me realize that any kind of systemwide optimization is only guesswork (often bad such).
> I meant by evolution: the process of users submiting patches to improve > individual ebuilds. What improves the performance of a given application on one machine does NOT nessicary improve it on another machine.
True, but you would have much better situation to test that fact, then what wa have now.
Read the gcc manpage and see: -fprofile-arcs -fbranch-probabilities (also read http://gcc.gnu.org/news/profiledriven.html)
Just adding these to ccbench doubles the amount of time taken to
test (as you must compile with -fprofile-arcs, run, compile with
-fbranch-probabilities, run again). It also provides some extremely
interesting and varying results. The bubblesort test for example,
improves between +15% and +300% depending on the other compiler flags.
Towers of Hanoi goes from -20% to +50%.
If users submitted _good_ non-interactive testcases for every ebuild,
it
wouldn't difficult to apply -fprofile-arcs/branch-probabilities and or
acovea to most packages at all, apart from the massive increase in
compile time.
Couldn't one save the profile data in the portage tree once a generic usecase was found?
> >Stable and high-performance is an per-system definition, as evidenced > >by the bug I mentioned with -O3. > And should as such be fixed... in gcc. If gcc cant optimize correct
> knowing the cache size of the cpu, gcc is broken. Fix gcc. Again, it isn't a gcc bug, it's an issue with a specific machine (not even a class of systems or cpus).
Lets take a tangent on this whole issue for a moment. Ignoring the implementation concerns, the end goal of your GLEP is this: The basic gain you want, is for the support of per-package CFLAG modifications (inside the ebuilds), for the purpose of performance optimization.
Do I have this correct?
Yes pleace ignore implementation details, they whre only provided as an alternative example scenario, Very open for discussion =)
The goal is not the speed as such, but the testability of it. I want to move from the current situation where you have absolutley no knowlege of the optimzation results to a situation where you would actually be able to give evidence of improvments or the reverse.
Reusability of cflags if you wish =)
/John
pgp00000.pgp
Description: PGP signature
