Re: GCC mini-summit - compiling for a particular architecture

2007-04-30 Thread Ben Elliston
On Mon, 2007-04-23 at 14:06 -0400, Diego Novillo wrote:

  So, I think there's a middle ground between exactly the same passes on
  all targets and use Acovea for every CPU to pick what -O2 means.
  Using Acovea to reveal some of the suprising, but beneficial results,
  seems like a fine idea, though.
 
 I'm hoping to hear something along those lines at the next GCC Summit. I
 have heard of a bunch of work in academia doing extensive optimization
 space searches looking for combinations of pass sequencing and
 repetition to achieve optimal results.

When I finished my Masters project a couple of years ago, I felt that
iterative compilation was a failing idea.  It seems, though, that I
should have considered submitting a GCC Summit paper on my experiences
with GCC.  Perhaps next year, if it's still relevant!

Ben

-- 
Ben Elliston [EMAIL PROTECTED]
Australia Development Lab, IBM



RE: GCC mini-summit - compiling for a particular architecture

2007-04-30 Thread Ben Elliston
On Mon, 2007-04-23 at 19:26 +0100, Dave Korn wrote:

   Has any of the Acovea research demonstrated whether there actually is any
 such thing as a good default set of flags in all cases?  If the results
 obtained diverge significantly according to the nature/coding
 style/architecture/other uncontrolled variable factors of the application, we
 may be labouring under a false premise wrt. the entire idea, mightn't we?

My experimentation found that the sequences were highly dependent on the
input programs, as you might expect.  Therefore, it would be quite hard
to choose a good default set in all cases.  In fact, you could argue
that this is totally contrary to the whole idea of iterative
compilation, which assumes that there is no good one default set and
you're better off searching.

Cheers, Ben

-- 
Ben Elliston [EMAIL PROTECTED]
Australia Development Lab, IBM



Re: GCC mini-summit - compiling for a particular architecture

2007-04-24 Thread Sebastian Pop

On 4/23/07, Diego Novillo [EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] wrote on 04/23/07 14:40:

 Any references?

Yes, at the last HiPEAC conference Grigori Fursin presented their
interactive compilation interface, which could be used for this.
http://gcc-ici.sourceforge.net/



That work is part of an European project called MilePost.  We will present
at the summit a part of this ongoing work, see the abstract:
http://www.gccsummit.org/2007/view_abstract.php?content_key=37

Another related presentation at the summit is that of Haiping Wu:
http://www.gccsummit.org/2007/view_abstract.php?content_key=9

Sebastian


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Richard Earnshaw
On Sun, 2007-04-22 at 17:32 -0700, Mark Mitchell wrote:
 Steve Ellcey wrote:
  It came up in a few side conversations.  As I understand it, RMS has
  decreed that the -On optimizations shall be architecture independent.
  That said, there are generic optimizations which really only apply
  to a single architecture, so there is some precedent for bending this
  rule.
  
  This seems unfortunate. 
 
 As others have said downthread, I don't think the idea that -O2 should
 enable the same set of optimizations on all processors is necessary or
 desirable.
 
 (In fact, there's nothing inherent in even using the same algorithms on
 all processors; I can well imagine that the best register allocation
 algorithms for x86 and Itanium might be entirely different.  I'm in no
 way trying to encourage an entire set of per-achitecture optimization
 passes; clearly the more we can keep common the better!  But, our goal
 is to produce a compiler that generates the best possible code on
 multiple architectures, not to produce a compiler that uses the same
 algorithms and optimization options on all architectures.)
 
 I have never heard RMS opine on this issue.  However, I don't think that
 this is something that the SC or FSF need to decide.  The SC has made
 very clear that it doesn't want to interfere with day-to-day technical
 development of the compiler.  If the consensus of the maintainers is
 that it's OK to turn on some extra optimizations at -O2 on Itanium, then
 I think we can just make the change.  Of course, if people would prefer
 that I ask the SC, I'm happy to do so.


I think it would be nicer if this could be done in a MI way by examining
certain target properties.  For example, the generic framework might say
something like: 'if there's more than N gp registers, enable opt_foo at
-O2 or above'.

That way we can keep the framework centralized while at the same time
allowing bigger systems to exploit useful optimizations with the
standard options.

I know a number of targets turn off sched1 at -O2 because it simply
makes code much worse (due to increased register pressure).  We could
eliminate much of this machine-dependent tweaking with suitable generic
tests.

R.



Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Richard Kenner
 (In fact, there's nothing inherent in even using the same algorithms on
 all processors; I can well imagine that the best register allocation
 algorithms for x86 and Itanium might be entirely different.  I'm in no
 way trying to encourage an entire set of per-achitecture optimization
 passes; clearly the more we can keep common the better!  But, our goal
 is to produce a compiler that generates the best possible code on
 multiple architectures, not to produce a compiler that uses the same
 algorithms and optimization options on all architectures.)
 
 I have never heard RMS opine on this issue.  However, I don't think that
 this is something that the SC or FSF need to decide.  

As far as I can recall, there's always been SOME hardware dependence
on the exact meaning of -O2 since, in at least a few cases, which
options were included in it depended on target options.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Mark Mitchell
Richard Earnshaw wrote:

 I think it would be nicer if this could be done in a MI way by examining
 certain target properties.  For example, the generic framework might say
 something like: 'if there's more than N gp registers, enable opt_foo at
 -O2 or above'.

Yes, I agree; wherever that technique works, it would make sense.
Perhaps that can be made to handle all the cases; I'm not sure.

I'm certainly not trying to suggest that we run SPEC on every
architecture, and then make -O2 be the set of optimization options that
happens to do best there, however bizarre.

But, I would certainly be happy with better performance, even at the
cost of some mild inconsistency between optimizations running on
different CPUs.  Your example of turning off sched1 is exactly the sort
of thing that seems reasonable to me, although I agree that if it can be
disabled in a machine-independent way, by checking some property of the
back end, that would be superior.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: GCC mini-summit - benchmarks

2007-04-23 Thread Steve Ellcey
Jim Wilson wrote:

 Kenneth Hoste wrote:
  I'm not sure what 'tests' mean here... Are test cases being extracted
  from the SPEC CPU2006 sources? Or are you refering to the validity tests
  of the SPEC framework itself (to check whether the output generated by
  some binary conforms with their reference output)?
 
 The claim is that SPEC CPU2006 has source code bugs that cause it to
 fail when compiled by gcc.  We weren't given a specific list of problem.

HJ, can you give us the specifics on the SPEC 2006 failures you were
seeing?

I remember the perlbench failure, it was IA64 specific, and was due to
the SPEC config file spec_config.h that defines the attribute keyword to
be null, thus eliminating all attributes.  On IA64 Linux, in the
/usr/include/bits/setjmp.h header file, the __jmp_buf buffer is defined
to have an aligned attribute on it.  If the buffer isn't aligned the
perlbench program fails.

I believe another problem was an uninitialized local variable in a 
Fortran program, but I don't recall which program or which variable
that was.

Steve Ellcey
[EMAIL PROTECTED]


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Janis Johnson
On Sun, Apr 22, 2007 at 04:39:23PM -0700, Joe Buck wrote:
 
 On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote:
   At work we use -O3 since it gives 5% performance gain against -O2.
   profile-feedback has many flags and there is no overview of it in the
   doc IIRC. Who will use it except GCC developpers? Who knows about your
   advice?
 
 On Sun, Apr 22, 2007 at 03:22:56PM +0200, Jan Hubicka wrote:
  Well, this is why -fprofile-generate and -fprofile-use was invented.
  Perhaps docs can be improved so people actually discover it.  Do you
  have any suggestions?
 
 Docs could be improved, but this also might be a case where a tutorial
 would be needed, to teach users how to use it effectively.
 
  (Perhaps a chapther for FDO or subchapter of gcov docs would do?)

We could also have examples, with lots of comments, in the testsuite,
with references to them in the docs.  That way there is code that people
can try to see what kind of effect an optimization has on their system.
This would also, of course, provide at least minimal testing for more
optimizations; last time I looked there were lots of them that are never
used in the testsuite.

Janis


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kaveh R. GHAZI
On Mon, 23 Apr 2007, Mark Mitchell wrote:

 I'm certainly not trying to suggest that we run SPEC on every
 architecture, and then make -O2 be the set of optimization options that
 happens to do best there, however bizarre.

Why not?  Is your objection because SPEC doesn't reflect real-world apps
or because the option set might be bizarre?

The bizarreness of the resulting flag set should not be a consideration
IMHO.  Humans are bad at predicting how complex systems like this behave.
The fact that the best flags may be non-intuitive is not surprising. I
find the case of picking compiler options for optimizing code very much
like choosing which part of your code to hand optimize programmatically.
People often guess wrongly where their code spends its time, that's why we
have profilers.

So I feel we should choose our per-target flags using a tool like Acovea
to find what the best options are on a per-target basis.  Then we could
insert those flags into -O2 or -Om in each backend.  Whether we use SPEC
or the testcases included in Acovea or maybe GCC itself as the app to tune
for could be argued.  And some care would be necessary to ensure that the
resulting flags don't completely hose other large classes of apps.  But
IMHO once you decide to do per-target flags, something like this seems
like the natural conclusion.

http://www.coyotegulch.com/products/acovea/

--Kaveh
--
Kaveh R. Ghazi  [EMAIL PROTECTED]


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Joe Buck

On Mon, 23 Apr 2007, Mark Mitchell wrote:
  I'm certainly not trying to suggest that we run SPEC on every
  architecture, and then make -O2 be the set of optimization options that
  happens to do best there, however bizarre.

On Mon, Apr 23, 2007 at 01:21:20PM -0400, Kaveh R. GHAZI wrote:
 Why not?  Is your objection because SPEC doesn't reflect real-world apps
 or because the option set might be bizarre?

In this case bizarre would mean untested: if we find some set of 15
options that maximizes SPEC performance, it's quite likely that no one
has used that option combination before; building complete distros with
this untested option set would almost certainly find bugs.

Still might be worth trying, but would require extra and careful testing.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Mark Mitchell
Kaveh R. GHAZI wrote:
 On Mon, 23 Apr 2007, Mark Mitchell wrote:
 
 I'm certainly not trying to suggest that we run SPEC on every
 architecture, and then make -O2 be the set of optimization options that
 happens to do best there, however bizarre.
 
 Why not?  Is your objection because SPEC doesn't reflect real-world apps
 or because the option set might be bizarre?

Some of both, I guess.  Certainly, SPEC isn't a good benchmark for some
CPUs or some users.  Also, I'd be very surprised to find that at -O2 my
inline functions weren't inlined, but I'd not be entirely amazed to find
out that on some processor that was for some reason negative on SPEC.
I'd be concerned that a SPEC run on Pentium IV doesn't necessarily guide
things well for Athlon, so, in practice, we'd still need some kind of
fallback default for CPUs where we didn't do the Acovea thing.

I'd also suspect if we move various chips to differing options, we'll
see more bugs due to rarely-tested combinations.  Yes, that will help us
fix latent bugs, but it may not improve the user experience.  We'll see
packages that used to compile on lots of platforms start to break more
often on one platform or another due either to latent bugs in GCC, or
latent bugs in the application.

So, I think there's a middle ground between exactly the same passes on
all targets and use Acovea for every CPU to pick what -O2 means.
Using Acovea to reveal some of the suprising, but beneficial results,
seems like a fine idea, though.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Diego Novillo
Mark Mitchell wrote on 04/23/07 13:56:

 So, I think there's a middle ground between exactly the same passes on
 all targets and use Acovea for every CPU to pick what -O2 means.
 Using Acovea to reveal some of the suprising, but beneficial results,
 seems like a fine idea, though.

I'm hoping to hear something along those lines at the next GCC Summit. I
have heard of a bunch of work in academia doing extensive optimization
space searches looking for combinations of pass sequencing and
repetition to achieve optimal results.

My naive idea is for someone to test all these different combinations
and give us a set of -Ox recipes that we can use by default in the compiler.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

Citeren Kaveh R. GHAZI [EMAIL PROTECTED]:


On Mon, 23 Apr 2007, Mark Mitchell wrote:


I'm certainly not trying to suggest that we run SPEC on every
architecture, and then make -O2 be the set of optimization options that
happens to do best there, however bizarre.


Why not?  Is your objection because SPEC doesn't reflect real-world apps
or because the option set might be bizarre?

The bizarreness of the resulting flag set should not be a consideration
IMHO.  Humans are bad at predicting how complex systems like this behave.
The fact that the best flags may be non-intuitive is not surprising. I
find the case of picking compiler options for optimizing code very much
like choosing which part of your code to hand optimize programmatically.
People often guess wrongly where their code spends its time, that's why we
have profilers.

So I feel we should choose our per-target flags using a tool like Acovea
to find what the best options are on a per-target basis.  Then we could
insert those flags into -O2 or -Om in each backend.  Whether we use SPEC
or the testcases included in Acovea or maybe GCC itself as the app to tune
for could be argued.  And some care would be necessary to ensure that the
resulting flags don't completely hose other large classes of apps.  But
IMHO once you decide to do per-target flags, something like this seems
like the natural conclusion.

http://www.coyotegulch.com/products/acovea/



I totally agree with you, Acovea would a good tool for helping with this.
But, in my opinion, it has one big downside: it doesn't allow a  
tradeoff between for example compilation time and execution time.


My work tries to tackle this: I'm using an evolutionary approach (like  
Acovea) which is multi-objective (unlike Acovea), meaning it optimizes  
a Pareto curve for comp. time, execution time and code size. I'm also  
trying to speed up things over the way Acovea handles things, which I  
won't describe any further for now.


I'm currently only using the SPEC CPU2000 benchmarks (and I believe  
Acovea does too), but this shouldn't be a drawback: the methodology is  
completely independent of the benchmarks used. I'm also trying to make  
sure everything is parameterizable (size of population, number of  
generations, crossover/mutation/migration rates, ...).


Unfortunately, I'm having quite a lot of problems with weird  
combinations of flags (which generate bugs, as was mentioned in this  
thread), which is slowing things down. Instead of trying to fix these  
bugs, I'm just ignoring combinations of flags which generate them for  
now (although I'l try to report each bug I run into in Bugzilla).


Hopefully, this work will produce nice results by June, and I'll make  
sure to report on it on this mailinglist once it's done.


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but what  
they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste





This message was sent using IMP, the Internet Messaging Program.


RE: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Dave Korn
On 23 April 2007 19:07, Diego Novillo wrote:

 Mark Mitchell wrote on 04/23/07 13:56:
 
 So, I think there's a middle ground between exactly the same passes on
 all targets and use Acovea for every CPU to pick what -O2 means.
 Using Acovea to reveal some of the suprising, but beneficial results,
 seems like a fine idea, though.
 
 I'm hoping to hear something along those lines at the next GCC Summit. I
 have heard of a bunch of work in academia doing extensive optimization
 space searches looking for combinations of pass sequencing and
 repetition to achieve optimal results.
 
 My naive idea is for someone to test all these different combinations
 and give us a set of -Ox recipes that we can use by default in the compiler.


  Has any of the Acovea research demonstrated whether there actually is any
such thing as a good default set of flags in all cases?  If the results
obtained diverge significantly according to the nature/coding
style/architecture/other uncontrolled variable factors of the application, we
may be labouring under a false premise wrt. the entire idea, mightn't we?



cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

Citeren Diego Novillo [EMAIL PROTECTED]:


Mark Mitchell wrote on 04/23/07 13:56:


So, I think there's a middle ground between exactly the same passes on
all targets and use Acovea for every CPU to pick what -O2 means.
Using Acovea to reveal some of the suprising, but beneficial results,
seems like a fine idea, though.


I'm hoping to hear something along those lines at the next GCC Summit. I
have heard of a bunch of work in academia doing extensive optimization
space searches looking for combinations of pass sequencing and
repetition to achieve optimal results.

My naive idea is for someone to test all these different combinations
and give us a set of -Ox recipes that we can use by default in the compiler.




Sorry to be blunt, but that's indeed quite naive :-)

Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations.
If you also kind the passes not controlled by a flag, but decided upon  
depending on the optimization level, that adds another, virtual flag  
(i.e. using -O1, -O2, -O3 or -Os as base setting).


A serious set of programs can easily take tens of minutes, so I think  
you can easily see trying all possible combinations is totally  
unfeasible. The nice thing is you don't have to: you can learn from  
the combinations you've evaluated, you can start with only a subset of  
programs and add others gradually, ...


My work is actually concentrating on building a framework to do  
exactly that: give a set of recipes for -On flags which allow a  
choice, and which are determined by trading off compilation time,  
execution time and code size.


I won't be at the GCC summit in Canada (I'm in San Diego then  
presenting some other work), but I'll make sure to announce our work  
when it's finished...


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but what  
they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


This message was sent using IMP, the Internet Messaging Program.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Diego Novillo
Dave Korn wrote on 04/23/07 14:26:

   Has any of the Acovea research demonstrated whether there actually is any
 such thing as a good default set of flags in all cases?  If the results

Not Acovea itself.  The research I'm talking about involves a compiler
whose pipeline can be modified and resequenced.  It's not just a matter
of adding -f/-m flags.  The research I've seen does a fairly good job at
modelling AI systems that traverse the immense search space looking for
different sequences.


 obtained diverge significantly according to the nature/coding
 style/architecture/other uncontrolled variable factors of the application, we
 may be labouring under a false premise wrt. the entire idea, mightn't we?

Yes.  That's why it's called research ;)  I'm sure we'll get an earful
of at least a couple of these at the summit.


RE: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

Citeren Dave Korn [EMAIL PROTECTED]:


On 23 April 2007 19:07, Diego Novillo wrote:


Mark Mitchell wrote on 04/23/07 13:56:


So, I think there's a middle ground between exactly the same passes on
all targets and use Acovea for every CPU to pick what -O2 means.
Using Acovea to reveal some of the suprising, but beneficial results,
seems like a fine idea, though.


I'm hoping to hear something along those lines at the next GCC Summit. I
have heard of a bunch of work in academia doing extensive optimization
space searches looking for combinations of pass sequencing and
repetition to achieve optimal results.

My naive idea is for someone to test all these different combinations
and give us a set of -Ox recipes that we can use by default in the compiler.



  Has any of the Acovea research demonstrated whether there actually is any
such thing as a good default set of flags in all cases?  If the results
obtained diverge significantly according to the nature/coding
style/architecture/other uncontrolled variable factors of the application, we
may be labouring under a false premise wrt. the entire idea, mightn't we?


I don't think that has been shown. Acovea evaluation has only been  
done on a few architectures, and I don't believe a comparison was made.


But, you guys are probably the best team to try that out: you have  
access to a wide range of platforms, and the experience needed to  
solve problems when they present them. Hopefully, my upcoming  
framework will boost such an effort. Remember: adjusting the way in  
which GCC handles -On flags is only needed if the tests suggest it  
will be usefull.


greetings,

Kenneth




This message was sent using IMP, the Internet Messaging Program.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Diego Novillo
[EMAIL PROTECTED] wrote on 04/23/07 14:37:

 Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations.
 If you also kind the passes not controlled by a flag, but decided upon  
 depending on the optimization level, that adds another, virtual flag  
 (i.e. using -O1, -O2, -O3 or -Os as base setting).

No, that's not what I want.  I want a static recipe.  I do *not* want
-Ox to do this search every time.

It goes like this: Somebody does a study over a set of applications that
represent certain usage patterns (say, FP and INT just to mention the
two more common classes of apps).  The slow search is done offline and
after a few months, we get the results in the form of a table that says
for each class and for each -Ox what set of passes to execute and in
what order they should be executed.

Not to say that the current sequencing and repetition are worthless, but
 I think they could be improved in a quasi systematic way using this
process (which is slow and painful, I know).


 My work is actually concentrating on building a framework to do  
 exactly that: give a set of recipes for -On flags which allow a  
 choice, and which are determined by trading off compilation time,  
 execution time and code size.

Right.  This is what I want.

 I won't be at the GCC summit in Canada (I'm in San Diego then  
 presenting some other work), but I'll make sure to announce our work  
 when it's finished...

Excellent.  Looking forward to those results.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Diego Novillo
[EMAIL PROTECTED] wrote on 04/23/07 14:40:

 Any references?

Yes, at the last HiPEAC conference Grigori Fursin presented their
interactive compilation interface, which could be used for this.
http://gcc-ici.sourceforge.net/

Ben Elliston had also experimented with a framework to allow GCC to
change the sequence of the passes from the command line.  Ben, where was
your thesis again?


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Jeffrey Law
On Mon, 2007-04-23 at 10:56 -0700, Mark Mitchell wrote:
 Kaveh R. GHAZI wrote:
  On Mon, 23 Apr 2007, Mark Mitchell wrote:
  
  I'm certainly not trying to suggest that we run SPEC on every
  architecture, and then make -O2 be the set of optimization options that
  happens to do best there, however bizarre.
  
  Why not?  Is your objection because SPEC doesn't reflect real-world apps
  or because the option set might be bizarre?
 
 Some of both, I guess.  Certainly, SPEC isn't a good benchmark for some
 CPUs or some users.  Also, I'd be very surprised to find that at -O2 my
 inline functions weren't inlined, but I'd not be entirely amazed to find
 out that on some processor that was for some reason negative on SPEC.
 I'd be concerned that a SPEC run on Pentium IV doesn't necessarily guide
 things well for Athlon, so, in practice, we'd still need some kind of
 fallback default for CPUs where we didn't do the Acovea thing.
 
 I'd also suspect if we move various chips to differing options, we'll
 see more bugs due to rarely-tested combinations.  Yes, that will help us
 fix latent bugs, but it may not improve the user experience.  We'll see
 packages that used to compile on lots of platforms start to break more
 often on one platform or another due either to latent bugs in GCC, or
 latent bugs in the application.
 
 So, I think there's a middle ground between exactly the same passes on
 all targets and use Acovea for every CPU to pick what -O2 means.
 Using Acovea to reveal some of the suprising, but beneficial results,
 seems like a fine idea, though.
If every target had a different set of optimizations on at -O2, then
I'd claim that we've gone horribly wrong somewhere -- both from a 
design and implementation standpoint.

If we're going to go down this path, I'd much rather see us tune 
towards target characteristics than towards targets themselves.

Enabling/disabling a generic optimizer on a per-target basis should
be the option of last resort.

Acovea can be helpful in the process of identifying issues, but I 
strongly believe that using it in the way some have suggested is
a cop-out for really delving into performance issues.


Jeff



Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

On 23 Apr 2007, at 20:43, Diego Novillo wrote:

[EMAIL PROTECTED] wrote on 04/23/07 14:37:

Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations.
If you also kind the passes not controlled by a flag, but decided upon
depending on the optimization level, that adds another, virtual flag
(i.e. using -O1, -O2, -O3 or -Os as base setting).

No, that's not what I want.  I want a static recipe.  I do *not* want
-Ox to do this search every time.

I'm not saying you need to do this every time you install GCC. I'm  
saying trying out 2^60 combinations (even offline) is totally  
unfeasible.



It goes like this: Somebody does a study over a set of applications that
represent certain usage patterns (say, FP and INT just to mention the
two more common classes of apps).  The slow search is done offline and
after a few months, we get the results in the form of a table that says
for each class and for each -Ox what set of passes to execute and in
what order they should be executed.

Not to say that the current sequencing and repetition are worthless, but
 I think they could be improved in a quasi systematic way using this
process (which is slow and painful, I know).

Exactly my idea.



My work is actually concentrating on building a framework to do
exactly that: give a set of recipes for -On flags which allow a
choice, and which are determined by trading off compilation time,
execution time and code size.

Right.  This is what I want.

I won't be at the GCC summit in Canada (I'm in San Diego then
presenting some other work), but I'll make sure to announce our work
when it's finished...

Excellent.  Looking forward to those results.

Cool!

--

Statistics are like a bikini. What they reveal is suggestive, but what  
they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


This message was sent using IMP, the Internet Messaging Program.


Re: GCC mini-summit - benchmarks

2007-04-23 Thread H. J. Lu
On Mon, Apr 23, 2007 at 09:49:04AM -0700, Steve Ellcey wrote:
 Jim Wilson wrote:
 
  Kenneth Hoste wrote:
   I'm not sure what 'tests' mean here... Are test cases being extracted
   from the SPEC CPU2006 sources? Or are you refering to the validity tests
   of the SPEC framework itself (to check whether the output generated by
   some binary conforms with their reference output)?
  
  The claim is that SPEC CPU2006 has source code bugs that cause it to
  fail when compiled by gcc.  We weren't given a specific list of problem.
 
 HJ, can you give us the specifics on the SPEC 2006 failures you were
 seeing?
 
 I remember the perlbench failure, it was IA64 specific, and was due to
 the SPEC config file spec_config.h that defines the attribute keyword to
 be null, thus eliminating all attributes.  On IA64 Linux, in the
 /usr/include/bits/setjmp.h header file, the __jmp_buf buffer is defined
 to have an aligned attribute on it.  If the buffer isn't aligned the
 perlbench program fails.
 
 I believe another problem was an uninitialized local variable in a 
 Fortran program, but I don't recall which program or which variable
 that was.


This is what I sent to SPEC.


H.J.

465.tonto in SPEC CPU 2006 has many checks for pointers like:

ENSURE(NOT associated(self%ex),SHELL1:copy_1 ... ex not destroyed)

It calls the associated intrinsic on a pointer to check if a
pointer is pointing to some data.

According to Fortran standard, associated should be used on
initialized pointers. The behavor of calling associated on
uninitialized pointers is undefined or compiler dependent.

Tonto tries to initialize pointers in structures by calling
nullify_ptr_part_ on stack variables to ensure that pointers
in stack variables are properly initialized. However, not all
pointers in stack variables are initialized. As the result,
the binary compiled by gcc 4.2 failed at runtime with

Error in routine SHELL1:copy_1 ... ex not destroyed

I am enclosing a patch which adds those missing calls of
nullify_ptr_part_.

Thanks.


H.J.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-22 Thread Zdenek Dvorak
 Look from what we're starting:
 
 
 @item -funroll-loops
 @opindex funroll-loops
 Unroll loops whose number of iterations can be determined at compile
 time or upon entry to the loop.  @option{-funroll-loops} implies
 @option{-frerun-cse-after-loop}.  This option makes code larger,
 and may or may not make it run faster.
 
 @item -funroll-all-loops
 @opindex funroll-all-loops
 Unroll all loops, even if their number of iterations is uncertain when
 the loop is entered.  This usually makes programs run more slowly.
 @option{-funroll-all-loops} implies the same options as
 @option{-funroll-loops},
 
 
 It could gain a few more paragraphs written by knowledgeable people.
 And expanding documentation doesn't introduce regressions :).

but also does not make anyone actually use the options.  Nobody reads
the documention.  Of course, this is a bit overstatement, but with a
few exceptions, people in general do not enable non-default flags.

Zdenek


Re: GCC mini-summit - compiling for a particular architecture

2007-04-22 Thread Laurent GUERBY
  but also does not make anyone actually use the options.  Nobody reads
  the documention.  Of course, this is a bit overstatement, but with a
  few exceptions, people in general do not enable non-default flags.
 
 I don't think this is fair.
 Most people don't read the docs because they don't care about
 performance, but most people who develop code that spends a lot of CPU
 cycles actually read the docs at least up to loop unrolling.

Exactly my experience.

Unfortunately there's no useful information on this topic in the GCC
manual...

Laurent




Re: GCC mini-summit - compiling for a particular architecture

2007-04-22 Thread Richard Guenther

On 4/22/07, Laurent GUERBY [EMAIL PROTECTED] wrote:

On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote:
 On 4/22/07, Laurent GUERBY [EMAIL PROTECTED] wrote:
but also does not make anyone actually use the options.  Nobody reads
the documention.  Of course, this is a bit overstatement, but with a
few exceptions, people in general do not enable non-default flags.
  
   I don't think this is fair.
   Most people don't read the docs because they don't care about
   performance, but most people who develop code that spends a lot of CPU
   cycles actually read the docs at least up to loop unrolling.
 
  Exactly my experience.
 
  Unfortunately there's no useful information on this topic in the GCC
  manual...

 Well, we have too many switches really.  So the default is use -O2.  If you
 want extra speed, try -O3, or even better use profile feedback.  (Not many
 people realize that with profile feedback you get faster code than with
 -O3 and smaller code than with -Os - at least for C++ programs)

At work we use -O3 since it gives 5% performance gain against -O2.
profile-feedback has many flags and there is no overview of it in the
doc IIRC. Who will use it except GCC developpers? Who knows about your
advice?

The GCC user documentation is the place...


Well, I agree.  A GCC Optimization Guide would be a nice thing to have,
besides the individual flags documentation.  Of course unless someone is
volunteering...

Richard.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-22 Thread Zdenek Dvorak
Hello,

 On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote:
  On 4/22/07, Laurent GUERBY [EMAIL PROTECTED] wrote:
 but also does not make anyone actually use the options.  Nobody reads
 the documention.  Of course, this is a bit overstatement, but with a
 few exceptions, people in general do not enable non-default flags.
   
I don't think this is fair.
Most people don't read the docs because they don't care about
performance, but most people who develop code that spends a lot of CPU
cycles actually read the docs at least up to loop unrolling.
  
   Exactly my experience.
  
   Unfortunately there's no useful information on this topic in the GCC
   manual...
  
  Well, we have too many switches really.  So the default is use -O2.  If you
  want extra speed, try -O3, or even better use profile feedback.  (Not many
  people realize that with profile feedback you get faster code than with
  -O3 and smaller code than with -Os - at least for C++ programs)
 
 At work we use -O3 since it gives 5% performance gain against -O2.
 profile-feedback has many flags

actually, only two that are really important -- -fprofile-generate
and -fprofile-use.

Zdenek


Re: GCC mini-summit - compiling for a particular architecture

2007-04-22 Thread Laurent GUERBY
On Sun, 2007-04-22 at 15:22 +0200, Jan Hubicka wrote:
  At work we use -O3 since it gives 5% performance gain against -O2.
  profile-feedback has many flags and there is no overview of it in the
  doc IIRC. Who will use it except GCC developpers? Who knows about your
  advice?
 
 Well, this is why -fprofile-generate and -fprofile-use was invented.
 Perhaps docs can be improved so people actually discover it.  Do you
 have any suggestions?
 (Perhaps a chapther for FDO or subchapter of gcov docs would do?)

I don't hve approval rights for documentation but I'd say just
create an Optimization Guide chapter with a subchapter detailing an
example on how to use feedback based compiling on a simple source
code that gives a nice speedup on a popular architecture.

Then let's hope other will come in with they favourite advice :). 

Laurent




Re: GCC mini-summit - compiling for a particular architecture

2007-04-22 Thread Joe Buck

On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote:
  At work we use -O3 since it gives 5% performance gain against -O2.
  profile-feedback has many flags and there is no overview of it in the
  doc IIRC. Who will use it except GCC developpers? Who knows about your
  advice?

On Sun, Apr 22, 2007 at 03:22:56PM +0200, Jan Hubicka wrote:
 Well, this is why -fprofile-generate and -fprofile-use was invented.
 Perhaps docs can be improved so people actually discover it.  Do you
 have any suggestions?

Docs could be improved, but this also might be a case where a tutorial
would be needed, to teach users how to use it effectively.

 (Perhaps a chapther for FDO or subchapter of gcov docs would do?)


Re: GCC mini-summit - compiling for a particular architecture

2007-04-22 Thread Mark Mitchell
Steve Ellcey wrote:
 It came up in a few side conversations.  As I understand it, RMS has
 decreed that the -On optimizations shall be architecture independent.
 That said, there are generic optimizations which really only apply
 to a single architecture, so there is some precedent for bending this
 rule.
 
 This seems unfortunate. 

As others have said downthread, I don't think the idea that -O2 should
enable the same set of optimizations on all processors is necessary or
desirable.

(In fact, there's nothing inherent in even using the same algorithms on
all processors; I can well imagine that the best register allocation
algorithms for x86 and Itanium might be entirely different.  I'm in no
way trying to encourage an entire set of per-achitecture optimization
passes; clearly the more we can keep common the better!  But, our goal
is to produce a compiler that generates the best possible code on
multiple architectures, not to produce a compiler that uses the same
algorithms and optimization options on all architectures.)

I have never heard RMS opine on this issue.  However, I don't think that
this is something that the SC or FSF need to decide.  The SC has made
very clear that it doesn't want to interfere with day-to-day technical
development of the compiler.  If the consensus of the maintainers is
that it's OK to turn on some extra optimizations at -O2 on Itanium, then
I think we can just make the change.  Of course, if people would prefer
that I ask the SC, I'm happy to do so.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: GCC mini-summit

2007-04-22 Thread Mark Mitchell
Ian Lance Taylor wrote:
 We held a GCC mini-summit at Google on Wednesday, April 18.  About 40
 people came.  This is my very brief summary of what we talked about.
 Corrections and additions very welcome.

Thank you for the summary.

I am disappointed that I wasn't able to attend, as it sounds like it was
a very productive day.  Had it not been for other obligations, I would
certainly have been there.

I particularly appreciate the summary of feedback around 4.2 and the
release process.  I've been thinking about what's there, the feedback in
response to my last GCC 4.2 status report, and various other opinions
that have been presented to me.  I've got some ideas, but I want to let
them percolate a bit before trying to write them down.  In any case, I
want people to know that I'm listening.

Thanks,

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: GCC mini-summit - compiling for a particular architecture

2007-04-21 Thread Robert Dewar

Mike Stump wrote:

On Apr 20, 2007, at 6:42 PM, Robert Dewar wrote:

One possibility would be to have a -Om switch (or whatever) that
says do all optimizations for this machine that help.


Ick, gross.  No.


Well OK, Ick, but below you recommend removingf the overly
pedantic rule. I agree with that, but the above is a
compromise suggestion if we can't remove the rule.

So, Mike, my question is, assuming we cannot remove the
rule what do you want to do

a) nothing
b) something like the above
c) something else, please specify



I must say the rule about all optimizations being the same on
all machines seems odd to me


I'd look at it this way, it isn't unreasonable to have cost metrics  
that are in fact different for each cpu and possible each tune choice  
that greatly effects _any_ codegen choice.  Sure, we can unroll the  
loops always on all targets, but, we can crank up the costs of extra  
instructions on chips where those costs are high, net result, almost  
no unrolling.  For chips where the costs are cheap and they need to  
exposed instructions to be able to optimizer further, trivially, the  
costs involved are totally different.  Net result, better code gen  
for each.


I do however think the concept of not allowing targets to set and  
unset optimization choices is, well, overly pedantic.




Re: GCC mini-summit - benchmarks

2007-04-21 Thread Jim Wilson

Kenneth Hoste wrote:
I'm not sure what 'tests' mean here... Are test cases being extracted 
from the SPEC CPU2006 sources? Or are you refering to the validity tests 
of the SPEC framework itself (to check whether the output generated by 
some binary conforms with their reference output)?


The claim is that SPEC CPU2006 has source code bugs that cause it to 
fail when compiled by gcc.  We weren't given a specific list of problem.


There are known problems with older SPEC benchmarks though.  For 
instance, vortex fails on some targets unless compiled with 
-fno-strict-aliasing.

--
Jim Wilson, GNU Tools Support, http://www.specifix.com


Re: GCC mini-summit - compiling for a particular architecture

2007-04-21 Thread Laurent GUERBY
On Fri, 2007-04-20 at 19:28 -0400, Robert Dewar wrote:
 Steve Ellcey wrote:
 
  This seems unfortunate.  I was hoping I might be able to turn on loop
  unrolling for IA64 at -O2 to improve performance.  I have only started
  looking into this idea but it seems to help performance quite a bit,
  though it is also increasing size quite a bit too so it may need some
  modification of the unrolling parameters to make it practical.
 
 To me it is obvious that optimizations are target dependent. For
 instance loop unrolling is really a totally different optimization
 on the ia64 as a result of the rotating registers.

My feeling is that it would be much more useful to have a more detailed
documentation on optimization flags in the GCC manual that at least
mention the type of source code and architectures where each
optimization option is interesting rather than to mess with new flags or
changing -On longstanding policies.

Look from what we're starting:


@item -funroll-loops
@opindex funroll-loops
Unroll loops whose number of iterations can be determined at compile
time or upon entry to the loop.  @option{-funroll-loops} implies
@option{-frerun-cse-after-loop}.  This option makes code larger,
and may or may not make it run faster.

@item -funroll-all-loops
@opindex funroll-all-loops
Unroll all loops, even if their number of iterations is uncertain when
the loop is entered.  This usually makes programs run more slowly.
@option{-funroll-all-loops} implies the same options as
@option{-funroll-loops},


It could gain a few more paragraphs written by knowledgeable people.
And expanding documentation doesn't introduce regressions :).

Laurent




Re: GCC mini-summit - compiling for a particular architecture

2007-04-21 Thread Mike Stump

On Apr 21, 2007, at 3:12 AM, Robert Dewar wrote:
So, Mike, my question is, assuming we cannot remove the rule what  
do you want to do


I think in the end, each situation is different and we have to find  
the best solution for each situation.  So, in that siprit, let's open  
a discussion for the exact case your thinking of.


Now, the closest I've come to -Om in the past would be -fast, which  
means, tune for spec.  :-)


Re: GCC mini-summit - benchmarks

2007-04-20 Thread Kenneth Hoste


On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote:


11) H.J. Lu discussed SPEC CPU 2006.  He reported that a couple of the
tests do not run successfully, and it appears to be due to bugs in
the tests which cause gcc to compile them in unexpected ways.  He
has been reporting the problems to the SPEC committee, but is
being ignored.  He encouraged other people to try it themselves
and make their own reports.



I'm not sure what 'tests' mean here... Are test cases being extracted  
from the SPEC CPU2006 sources? Or are you refering to the validity  
tests of the SPEC framework itself (to check whether the output  
generated by some binary conforms with their reference output)?




12) Jose Dana reported his results comparing different versions of gcc
and icc at CERN.  They have a lot of performance sensitive C++
code, and have gathered a large number of standalone snippets
which they use for performance comparisons.  Some of these can
directly become bugzilla enhancement requests.  In general this
could ideally be turned into a free performance testsuite.  All
the code is under free software licenses.


Is this performance testsuite available somewhere? Sounds interesting  
to add to my (long) list of benchmark suites.


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Kenneth Hoste


On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote:


13) Michael Meissner raised the idea of compiling functions
differently for different processors, choosing the version based
on a runtime decision.  This led to some discussion of how this
could be done effectively.  In particular if there is an
architectural difference, such as Altivec, you may get prologue
instructions which save and restore registers which are not
present on all architectures.



Related to this: have you guys ever considered to making the -On  
flags dependent on the architecture?
Now, the decision which flags are activated for the -On flags is  
independent of the architecture I believe (except for flags which  
need to be disabled to ensure correct code generation, such as - 
fschedule-insns for x86). I must say I haven't looked into this in  
great detail, but atleast for the passes controlled by flags on x86,  
this seems to be the case.


I think choosing the flags in function of the architecture you are  
compiling for, might be highly beneficial (see http://gcc.gnu.org/ 
bugzilla/show_bug.cgi?id=31528 for example).


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Ollie Wild

Related to this: have you guys ever considered to making the -On
flags dependent on the architecture?


It came up in a few side conversations.  As I understand it, RMS has
decreed that the -On optimizations shall be architecture independent.
That said, there are generic optimizations which really only apply
to a single architecture, so there is some precedent for bending this
rule.

There were also suggestions of making the order of optimizations
command line configurable and allowing dynamically loaded libraries to
register new passes.

Ollie


Re: GCC mini-summit - unicorn with rainbows

2007-04-20 Thread Benjamin Kosnik

 10) Eric Christopher reported that Tom Tromey (who was not present)
 had suggested a new mascot for gcc: a unicorn with rainbows.  This
was met with general approval, and Eric suggested that everybody
e-mail Tom with their comments.  I personally would like to see
the drawing.

This sounds fantastic. Rainbow tattoos, or more like shooting a
rainbow out of the uni-horn?

So cool! 

Ian, thanks for hosting this, for championing the idea of a free
mini-conf, and for writing up what was talked about for the rest of
us.

I'm sorry to have missed the ice cream.

-benjamin



Re: GCC mini-summit

2007-04-20 Thread Steven Bosscher

On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote:

I am afraid that merging it earlier stops progress on the df
infrastructurey (e.g. Ken will work only on LTO)


There's nothing holding you, and many others, back from helping out,
other than that the work is on a branch.  By merging, the rest of the
community will hopefully start help trying to exploit the good things
of the df framework.


and that also
further transition of some optimizations to the df infrastructure
will make code even slower and finally again we will have slower
compiler with worse code.


Ah, speculation.  Why do you think this?  Have you even looked at what
is _really_ going on? Like, some optimizations computing things they
already have available if they'd use the df infrastructure?  Maybe you
can be more specific about your concerns, instead of spreading FUD.

Gr.
Steven


Re: GCC mini-summit

2007-04-20 Thread Bernd Schmidt

Vladimir N. Makarov wrote:


 And I am disagree that it is within compilation time guidelines set
by SC.  Ken fixed a big compilation time degradation a few days ago
and preliminary what I see now (comparison with the last merge point)
is

x86_64
SPECInt2000 5.7%
SPECFp200   8.7%

ppc64
SPECInt2000 6.5%
SPECFp2000  5.5%

Itanium
SPECInt2000 9%
SPECFp2000  10.9%

Besides as I understand correctly the SC criteria means that there is
no degradation on code quality.  There is code size degradation about
1% and some degradation on SPEC2000 (e.g. about 2% degradation on a
few tests on ia64).


I'll be away for a week, so I'll miss most of the big flamewar, but I'd 
just like to throw in my opinion that based on these numbers I don't see 
why we're even considering it for inclusion at this point.


I also agree with Vlad's point that it needs to be reviewed before being 
committed to mainline.



Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH  Wilhelm-Wagenfeld-Str. 6  80807 Muenchen
Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, Vincent Roche, Joseph E. McDonough


Re: GCC mini-summit

2007-04-20 Thread Steven Bosscher

On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote:

Did not I write several times that the data structure of DF is too fat
(because rtl info duplication) and that is probably the problem?


Yes, you have complained that you believe the data structure of DF is
too fat. I guess that is a valid complaint. I don't see the rtl info
duplication though. You've only complained about the current data
structures, but I have not really seen you propose anything leaner.



Is it
not reasonable that using more fat structures without changing
algorithms makes compiler slower?


No, because the without changing algorithms is what you're assuming.
But if you look at e.g. e.g. regmove, you see that many of the insn
chain walks could easily be replaced with simpler/faster reg-def
chains, which are always available in the df framework. Likewise for
the changed registers for CPROP, which runs three times(!).  It is
easy to really speed this pass up by using the df framework to e.g.
replace things like oprs_unchanged_p and the whole reg_avail_mess.

Gr.
Steven


Re: GCC mini-summit

2007-04-20 Thread Vladimir N. Makarov

Steven Bosscher wrote:


On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote:


Did not I write several times that the data structure of DF is too fat
(because rtl info duplication) and that is probably the problem?



Yes, you have complained that you believe the data structure of DF is
too fat. I guess that is a valid complaint. I don't see the rtl info
duplication though. You've only complained about the current data
structures, but I have not really seen you propose anything leaner.

I did proposed to attach the info to rtl reg.  It is more difficult way 
than the current approach because of code sharing before the reload.  
But may be it is more rewarding.  To be honest I don't know.





Is it
not reasonable that using more fat structures without changing
algorithms makes compiler slower?



No, because the without changing algorithms is what you're assuming.
But if you look at e.g. e.g. regmove, you see that many of the insn
chain walks could easily be replaced with simpler/faster reg-def
chains, which are always available in the df framework. Likewise for
the changed registers for CPROP, which runs three times(!).  It is
easy to really speed this pass up by using the df framework to e.g.
replace things like oprs_unchanged_p and the whole reg_avail_mess.


Sorry, I wrote in several times that I am not against a 
df-infrastructure because I believed in the example you just wrote too.  
But according to your logic the compiler should be faster.  It did not 
happen yet because the df advantages is less significant than its 
dissadvantages for now imho.


Your argument says me that if you find  another such optimization to 
rewrite it and speed it up, the df advantages will overtake its 
disadvantages and it can be merged without my objections (although who 
cares about my opinion).  That is my another proposal.




Re: GCC mini-summit

2007-04-20 Thread Steven Bosscher

On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote:

 Yes, you have complained that you believe the data structure of DF is
 too fat. I guess that is a valid complaint. I don't see the rtl info
 duplication though. You've only complained about the current data
 structures, but I have not really seen you propose anything leaner.

I did proposed to attach the info to rtl reg.  It is more difficult way
than the current approach because of code sharing before the reload.
But may be it is more rewarding.  To be honest I don't know.


Changing the dataflow solvers is one thing.  Changing a fundamental
assumption in the compiler, namely that pseudoregs are shared, is
another.  We rely on that property in so many places. It would not
require rewriting parts of the compiler, but rewriting the entire
backend.

Gr.
Steven


Re: GCC mini-summit

2007-04-20 Thread Vladimir N. Makarov

Steven Bosscher wrote:


On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote:


I am afraid that merging it earlier stops progress on the df
infrastructurey (e.g. Ken will work only on LTO)



There's nothing holding you, and many others, back from helping out,
other than that the work is on a branch.  By merging, the rest of the
community will hopefully start help trying to exploit the good things
of the df framework.


and that also
further transition of some optimizations to the df infrastructure
will make code even slower and finally again we will have slower
compiler with worse code.



Ah, speculation.  Why do you think this?  Have you even looked at what
is _really_ going on? Like, some optimizations computing things they
already have available if they'd use the df infrastructure?  Maybe you
can be more specific about your concerns, instead of spreading FUD.


Steven, could you stop spreading FUD about me spreading FUD.

Did not I write several times that the data structure of DF is too fat 
(because rtl info duplication) and that is probably the problem?  Is it 
not reasonable that using more fat structures without changing 
algorithms makes compiler slower?




Re: GCC mini-summit - unicorn with rainbows

2007-04-20 Thread Joe Buck


  10) Eric Christopher reported that Tom Tromey (who was not present)
  had suggested a new mascot for gcc: a unicorn with rainbows.  This
 was met with general approval, and Eric suggested that everybody
 e-mail Tom with their comments.  I personally would like to see
 the drawing.

On Fri, Apr 20, 2007 at 06:02:46AM -0400, Benjamin Kosnik wrote:
 This sounds fantastic. Rainbow tattoos, or more like shooting a
 rainbow out of the uni-horn?

I originally proposed the gnu emerging from an egg idea, it was based
on the fact that some of us were pronouncing egcs eggs, and suggested
that the compiler was being re-born.  But we haven't been egcs for a long
time, and I no longer like the existing logo much.  So I'm fine with the
unicorn idea. (I had to leave the summit before this topic came up).

 Ian, thanks for hosting this, for championing the idea of a free
 mini-conf, and for writing up what was talked about for the rest of
 us.

Agreed; thanks Ian.

 I'm sorry to have missed the ice cream.

I missed the ice cream, but I did get free lunch in the famous Google
cafeteria.



Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Joe Buck
On Fri, Apr 20, 2007 at 12:58:39AM -0700, Ollie Wild wrote:
 Related to this: have you guys ever considered to making the -On
 flags dependent on the architecture?
 
 It came up in a few side conversations.  As I understand it, RMS has
 decreed that the -On optimizations shall be architecture independent.

But decrees of this kind from RMS (on purely technical matters) are
negotiable.  On matters of free software principle, RMS is the law.  On
technical matters he's (IMHO) just one hacker, though as the original
author of gcc he should get respect and a certain amount of deference.

If champions of this idea can make the case that the benefits outweigh the
costs by a significant factor, it could be considered.

But there are considerable costs: paths that get lots of testing are
solid; paths that get less testing aren't.  If every port uses a different
set of optimizations we will see more target-specific bugs that really
aren't.



Re: GCC mini-summit - Patch tracker

2007-04-20 Thread Tom Tromey
Ian I proposed automatic e-mail pings, but that wasn't generally
Ian welcomed.

Bummer.  Why?

Dan If people are okay with this, I have no problem implementing it.

If you're taking feature requests, it would be handy to canonize the
Area field somehow.  I was filtering based on preprocessor and then
yesterday noticed things filed against libcpp and cpp.
Alternatively, filtering by regex would work just as well for me.

Tom


RE: GCC mini-summit - Patch tracker

2007-04-20 Thread Dave Korn
On 20 April 2007 18:43, Tom Tromey wrote:

 Ian I proposed automatic e-mail pings, but that wasn't generally
 Ian welcomed.
 
 Bummer.  Why?
 
 Dan If people are okay with this, I have no problem implementing it.
 
 If you're taking feature requests, it would be handy to canonize the
 Area field somehow.  I was filtering based on preprocessor and then
 yesterday noticed things filed against libcpp and cpp.

  Heh.  Guilty as charged.

 Alternatively, filtering by regex would work just as well for me.

  Or just suggesting a list of canonical names for people to use.

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: GCC mini-summit - Patch tracker

2007-04-20 Thread Tom Tromey
 Dave == Dave Korn [EMAIL PROTECTED] writes:

 If you're taking feature requests, it would be handy to canonize the
 Area field somehow.  I was filtering based on preprocessor and then
 yesterday noticed things filed against libcpp and cpp.

Dave   Heh.  Guilty as charged.

Sorry, wasn't trying to single anybody out.  The problem perhaps can't
be solved on the submission end since people forget, there are
misspellings, etc.  

 Alternatively, filtering by regex would work just as well for me.

Dave   Or just suggesting a list of canonical names for people to use.

That would also help.

Tom


Re: GCC mini-summit - Patch tracker

2007-04-20 Thread Daniel Berlin

On 20 Apr 2007 11:42:57 -0600, Tom Tromey [EMAIL PROTECTED] wrote:

Ian I proposed automatic e-mail pings, but that wasn't generally
Ian welcomed.

Bummer.  Why?

Dan If people are okay with this, I have no problem implementing it.

If you're taking feature requests, it would be handy to canonize the
Area field somehow.  I was filtering based on preprocessor and then
yesterday noticed things filed against libcpp and cpp.

I may just have it list all the maintenance areas.


Alternatively, filtering by regex would work just as well for me.


It *is* a regex :)


Tom



Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Zdenek Dvorak
Hello,

 Steve Ellcey wrote:
 
 This seems unfortunate.  I was hoping I might be able to turn on loop
 unrolling for IA64 at -O2 to improve performance.  I have only started
 looking into this idea but it seems to help performance quite a bit,
 though it is also increasing size quite a bit too so it may need some
 modification of the unrolling parameters to make it practical.
 
 To me it is obvious that optimizations are target dependent. For
 instance loop unrolling is really a totally different optimization
 on the ia64 as a result of the rotating registers.

that we do not use.  Nevertheless, there are still compelling reasons
for why unrolling is more useful on ia64 then on other architectures
(importance of scheduling, insensitivity to code size growth).

Another option would be to consider enabling (e.g.) -funroll-loops
-fprefetch-loop-arrays by default on -O3.  I think it is fairly rare
for these flags to cause performance regressions (although of course
more measurements to support this claim would be necessary).

Zdenek


Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Robert Dewar

Zdenek Dvorak wrote:

Hello,


Steve Ellcey wrote:


This seems unfortunate.  I was hoping I might be able to turn on loop
unrolling for IA64 at -O2 to improve performance.  I have only started
looking into this idea but it seems to help performance quite a bit,
though it is also increasing size quite a bit too so it may need some
modification of the unrolling parameters to make it practical.

To me it is obvious that optimizations are target dependent. For
instance loop unrolling is really a totally different optimization
on the ia64 as a result of the rotating registers.


that we do not use. 

Right but we might in the future


Nevertheless, there are still compelling reasons
for why unrolling is more useful on ia64 then on other architectures
(importance of scheduling, insensitivity to code size growth).

And large number of registers.


Another option would be to consider enabling (e.g.) -funroll-loops
-fprefetch-loop-arrays by default on -O3.  I think it is fairly rare
for these flags to cause performance regressions (although of course
more measurements to support this claim would be necessary).


Well unroll loops blows up code size, so it has to have positive
value, not merely no negative value :-)


Zdenek




Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Robert Dewar

Diego Novillo wrote:

H. J. Lu wrote on 04/20/07 21:30:


-fprefetch-loop-arrays shouldn't be on by default since HW prefetch
usually will have negative performance impact on Intel.


We are talking about one specific architecture where it usually helps: ia64.


Right, but the follow on discussion was *if* we have to have the same
set of optimization options for all architectures, *then* could we
consider turning on loop unrolling by default.

One possibility would be to have a -Om switch (or whatever) that
says do all optimizations for this machine that help.

I must say the rule about all optimizations being the same on
all machines seems odd to me in another respect. What if you
have an optimziation that applies *only* to one machine (ia64
has a number of such possibilities!)


Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Diego Novillo
H. J. Lu wrote on 04/20/07 21:30:

 -fprefetch-loop-arrays shouldn't be on by default since HW prefetch
 usually will have negative performance impact on Intel.

We are talking about one specific architecture where it usually helps: ia64.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Diego Novillo
Robert Dewar wrote on 04/20/07 21:42:

 One possibility would be to have a -Om switch (or whatever) that
 says do all optimizations for this machine that help.

I think this is a good compromise.  I personally don't think we should
limit ourselves to doing the exact same optimizations across all
architectures, but I can see why some people may find that useful.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Mike Stump

On Apr 20, 2007, at 6:42 PM, Robert Dewar wrote:

One possibility would be to have a -Om switch (or whatever) that
says do all optimizations for this machine that help.


Ick, gross.  No.


I must say the rule about all optimizations being the same on
all machines seems odd to me


I'd look at it this way, it isn't unreasonable to have cost metrics  
that are in fact different for each cpu and possible each tune choice  
that greatly effects _any_ codegen choice.  Sure, we can unroll the  
loops always on all targets, but, we can crank up the costs of extra  
instructions on chips where those costs are high, net result, almost  
no unrolling.  For chips where the costs are cheap and they need to  
exposed instructions to be able to optimizer further, trivially, the  
costs involved are totally different.  Net result, better code gen  
for each.


I do however think the concept of not allowing targets to set and  
unset optimization choices is, well, overly pedantic.