On 12/10/03 02:15:02, Robin H. Johnson wrote:
On Tue, Dec 09, 2003 at 11:17:23PM +0100, John Nilsson wrote:
> >Thats the express purpose that genflags was created for, to provide
> >users with a known good set of high-performance CFLAGS so they
didn't
> >need to mess around with it too much.
> Still there is no room for improvement when dealing with system wide

> optimization.
Your wording here is unclear, I think you mean to say that there IS
room for
improvement over system wide constant CFLAGS?

English is not my native language. Writing this GLEP made me realize I have problems expressing my thoughts in ANY language =). If this GLEP is going to survive, comments on formulations and wordings (and spelling) is greatly appreciated.

Yes I meant that it is next to impossible to find a system wide optimization beyond "-march=<arch> -O2 -pipe" fit for the majority of users.


> >strip-flags to remove problematic flags on a per ebuild basis is
the
> >best solution. I do agree that unstable gcc settings are a big
> >problem, eg in a recent bug it turned out the submitter's system
(an
> >older Pentium I) couldn't handle -O3 without flaking out. Reduce it
> >to -O2 and the box went fine (both for compiling and already
compiled
> >packages).
> This is a bug in GCC. While a workaround may be a quick solution for

> Gentoo, one shouldn't base the whole system on bugs.
No it isn't a bug in GCC, it's a bug with the user's specific
hardware.
I have an older Pentium I system that runs just fine with -O3 and the
user's specified CFLAGS. I didn't force everybody to use -O2, I just
got
that user to change his own system down to -O2.

I miss understood you. Still this is a bug in a specific cpu. You cant guarantee stability in any case if the hardware is broken.

> >again, genflags was created for this. I've considered a sequel to
> >genflags based on the genetic optimization of compiler flags as
> >mentioned on Slashdot a while ago, but for lack of time, i'm not
even
> >looking at doing it now.
> You might want to chek:
> http://www.coyotegulch.com/potential/gccga/gccga.html
This is the original item I was referencing, but you still run into
the
problem that you need to run things on a system basis to get effective
results.

Yeah, I had the page open when I read your mail so I though I'd spare you the trouble of looking it up =)

http://www.coyotegulch.com/acovea/index.html is the rest of the
article,

> http://www.rocklinux.net/packages/ccbench.html
This basically brute forces the genetic algorithms, with absolutely no
thought as to the net effects on the results of the given flags, eg,
on
my home server (an AthlonXP 2400+), it returns these results:
gcc -O3 -march=athlon -fomit-frame-pointer -funroll-loops
-frerun-loop-opt -funroll-all-loops -fschedule-insns

Of that, '-frerun-loop-opt' and '-fschedule-insns' are redundant as
they
are implied by -O3.

-fomit-frame-pointer and I can't debug code properly anymore, and if I
try to use -funroll-all-loops to compile mysql, even with it's
--with-low-memory option, gcc wants 600mb of memory to compile it's
sql_yacc.cc.

I had the same reaction. ccbench was what made me realize that any kind of systemwide optimization is only guesswork (often bad such).



> I meant by evolution: the process of users submiting patches to
improve
> individual ebuilds.
What improves the performance of a given application on one machine
does
NOT nessicary improve it on another machine.

True, but you would have much better situation to test that fact, then what wa have now.


Read the gcc manpage and see:
-fprofile-arcs
-fbranch-probabilities
(also read http://gcc.gnu.org/news/profiledriven.html)

Just adding these to ccbench doubles the amount of time taken to
test (as you must compile with -fprofile-arcs, run, compile with
-fbranch-probabilities, run again). It also provides some extremely
interesting and varying results. The bubblesort test for example,
improves between +15% and +300% depending on the other compiler flags.
Towers of Hanoi goes from -20% to +50%.

If users submitted _good_ non-interactive testcases for every ebuild,
it
wouldn't difficult to apply -fprofile-arcs/branch-probabilities and or
acovea to most packages at all, apart from the massive increase in
compile time.

Couldn't one save the profile data in the portage tree once a generic usecase was found?


> >Stable and high-performance is an per-system definition, as
evidenced
> >by the bug I mentioned with -O3.
> And should as such be fixed... in gcc. If gcc cant optimize correct

> knowing the cache size of the cpu, gcc is broken. Fix gcc.
Again, it isn't a gcc bug, it's an issue with a specific machine (not
even a class of systems or cpus).

Lets take a tangent on this whole issue for a moment. Ignoring the
implementation concerns, the end goal of your GLEP is this:
The basic gain you want, is for the support of per-package CFLAG
modifications (inside the ebuilds), for the purpose of performance
optimization.

Do I have this correct?

Yes pleace ignore implementation details, they whre only provided as an alternative example scenario, Very open for discussion =)

The goal is not the speed as such, but the testability of it. I want to move from the current situation where you have absolutley no knowlege of the optimzation results to a situation where you would actually be able to give evidence of improvments or the reverse.

Reusability of cflags if you wish =)



/John

Attachment: pgp00000.pgp
Description: PGP signature



Reply via email to