Re: GCC mini-summit - compiling for a particular architecture
On Mon, 2007-04-23 at 14:06 -0400, Diego Novillo wrote: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. When I finished my Masters project a couple of years ago, I felt that iterative compilation was a failing idea. It seems, though, that I should have considered submitting a GCC Summit paper on my experiences with GCC. Perhaps next year, if it's still relevant! Ben -- Ben Elliston [EMAIL PROTECTED] Australia Development Lab, IBM
RE: GCC mini-summit - compiling for a particular architecture
On Mon, 2007-04-23 at 19:26 +0100, Dave Korn wrote: Has any of the Acovea research demonstrated whether there actually is any such thing as a good default set of flags in all cases? If the results obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? My experimentation found that the sequences were highly dependent on the input programs, as you might expect. Therefore, it would be quite hard to choose a good default set in all cases. In fact, you could argue that this is totally contrary to the whole idea of iterative compilation, which assumes that there is no good one default set and you're better off searching. Cheers, Ben -- Ben Elliston [EMAIL PROTECTED] Australia Development Lab, IBM
Re: GCC mini-summit - compiling for a particular architecture
On 4/23/07, Diego Novillo [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 04/23/07 14:40: Any references? Yes, at the last HiPEAC conference Grigori Fursin presented their interactive compilation interface, which could be used for this. http://gcc-ici.sourceforge.net/ That work is part of an European project called MilePost. We will present at the summit a part of this ongoing work, see the abstract: http://www.gccsummit.org/2007/view_abstract.php?content_key=37 Another related presentation at the summit is that of Haiping Wu: http://www.gccsummit.org/2007/view_abstract.php?content_key=9 Sebastian
Re: GCC mini-summit - compiling for a particular architecture
On Sun, 2007-04-22 at 17:32 -0700, Mark Mitchell wrote: Steve Ellcey wrote: It came up in a few side conversations. As I understand it, RMS has decreed that the -On optimizations shall be architecture independent. That said, there are generic optimizations which really only apply to a single architecture, so there is some precedent for bending this rule. This seems unfortunate. As others have said downthread, I don't think the idea that -O2 should enable the same set of optimizations on all processors is necessary or desirable. (In fact, there's nothing inherent in even using the same algorithms on all processors; I can well imagine that the best register allocation algorithms for x86 and Itanium might be entirely different. I'm in no way trying to encourage an entire set of per-achitecture optimization passes; clearly the more we can keep common the better! But, our goal is to produce a compiler that generates the best possible code on multiple architectures, not to produce a compiler that uses the same algorithms and optimization options on all architectures.) I have never heard RMS opine on this issue. However, I don't think that this is something that the SC or FSF need to decide. The SC has made very clear that it doesn't want to interfere with day-to-day technical development of the compiler. If the consensus of the maintainers is that it's OK to turn on some extra optimizations at -O2 on Itanium, then I think we can just make the change. Of course, if people would prefer that I ask the SC, I'm happy to do so. I think it would be nicer if this could be done in a MI way by examining certain target properties. For example, the generic framework might say something like: 'if there's more than N gp registers, enable opt_foo at -O2 or above'. That way we can keep the framework centralized while at the same time allowing bigger systems to exploit useful optimizations with the standard options. I know a number of targets turn off sched1 at -O2 because it simply makes code much worse (due to increased register pressure). We could eliminate much of this machine-dependent tweaking with suitable generic tests. R.
Re: GCC mini-summit - compiling for a particular architecture
(In fact, there's nothing inherent in even using the same algorithms on all processors; I can well imagine that the best register allocation algorithms for x86 and Itanium might be entirely different. I'm in no way trying to encourage an entire set of per-achitecture optimization passes; clearly the more we can keep common the better! But, our goal is to produce a compiler that generates the best possible code on multiple architectures, not to produce a compiler that uses the same algorithms and optimization options on all architectures.) I have never heard RMS opine on this issue. However, I don't think that this is something that the SC or FSF need to decide. As far as I can recall, there's always been SOME hardware dependence on the exact meaning of -O2 since, in at least a few cases, which options were included in it depended on target options.
Re: GCC mini-summit - compiling for a particular architecture
Richard Earnshaw wrote: I think it would be nicer if this could be done in a MI way by examining certain target properties. For example, the generic framework might say something like: 'if there's more than N gp registers, enable opt_foo at -O2 or above'. Yes, I agree; wherever that technique works, it would make sense. Perhaps that can be made to handle all the cases; I'm not sure. I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. But, I would certainly be happy with better performance, even at the cost of some mild inconsistency between optimizations running on different CPUs. Your example of turning off sched1 is exactly the sort of thing that seems reasonable to me, although I agree that if it can be disabled in a machine-independent way, by checking some property of the back end, that would be superior. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC mini-summit - benchmarks
Jim Wilson wrote: Kenneth Hoste wrote: I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? The claim is that SPEC CPU2006 has source code bugs that cause it to fail when compiled by gcc. We weren't given a specific list of problem. HJ, can you give us the specifics on the SPEC 2006 failures you were seeing? I remember the perlbench failure, it was IA64 specific, and was due to the SPEC config file spec_config.h that defines the attribute keyword to be null, thus eliminating all attributes. On IA64 Linux, in the /usr/include/bits/setjmp.h header file, the __jmp_buf buffer is defined to have an aligned attribute on it. If the buffer isn't aligned the perlbench program fails. I believe another problem was an uninitialized local variable in a Fortran program, but I don't recall which program or which variable that was. Steve Ellcey [EMAIL PROTECTED]
Re: GCC mini-summit - compiling for a particular architecture
On Sun, Apr 22, 2007 at 04:39:23PM -0700, Joe Buck wrote: On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote: At work we use -O3 since it gives 5% performance gain against -O2. profile-feedback has many flags and there is no overview of it in the doc IIRC. Who will use it except GCC developpers? Who knows about your advice? On Sun, Apr 22, 2007 at 03:22:56PM +0200, Jan Hubicka wrote: Well, this is why -fprofile-generate and -fprofile-use was invented. Perhaps docs can be improved so people actually discover it. Do you have any suggestions? Docs could be improved, but this also might be a case where a tutorial would be needed, to teach users how to use it effectively. (Perhaps a chapther for FDO or subchapter of gcov docs would do?) We could also have examples, with lots of comments, in the testsuite, with references to them in the docs. That way there is code that people can try to see what kind of effect an optimization has on their system. This would also, of course, provide at least minimal testing for more optimizations; last time I looked there were lots of them that are never used in the testsuite. Janis
Re: GCC mini-summit - compiling for a particular architecture
On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? The bizarreness of the resulting flag set should not be a consideration IMHO. Humans are bad at predicting how complex systems like this behave. The fact that the best flags may be non-intuitive is not surprising. I find the case of picking compiler options for optimizing code very much like choosing which part of your code to hand optimize programmatically. People often guess wrongly where their code spends its time, that's why we have profilers. So I feel we should choose our per-target flags using a tool like Acovea to find what the best options are on a per-target basis. Then we could insert those flags into -O2 or -Om in each backend. Whether we use SPEC or the testcases included in Acovea or maybe GCC itself as the app to tune for could be argued. And some care would be necessary to ensure that the resulting flags don't completely hose other large classes of apps. But IMHO once you decide to do per-target flags, something like this seems like the natural conclusion. http://www.coyotegulch.com/products/acovea/ --Kaveh -- Kaveh R. Ghazi [EMAIL PROTECTED]
Re: GCC mini-summit - compiling for a particular architecture
On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. On Mon, Apr 23, 2007 at 01:21:20PM -0400, Kaveh R. GHAZI wrote: Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? In this case bizarre would mean untested: if we find some set of 15 options that maximizes SPEC performance, it's quite likely that no one has used that option combination before; building complete distros with this untested option set would almost certainly find bugs. Still might be worth trying, but would require extra and careful testing.
Re: GCC mini-summit - compiling for a particular architecture
Kaveh R. GHAZI wrote: On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? Some of both, I guess. Certainly, SPEC isn't a good benchmark for some CPUs or some users. Also, I'd be very surprised to find that at -O2 my inline functions weren't inlined, but I'd not be entirely amazed to find out that on some processor that was for some reason negative on SPEC. I'd be concerned that a SPEC run on Pentium IV doesn't necessarily guide things well for Athlon, so, in practice, we'd still need some kind of fallback default for CPUs where we didn't do the Acovea thing. I'd also suspect if we move various chips to differing options, we'll see more bugs due to rarely-tested combinations. Yes, that will help us fix latent bugs, but it may not improve the user experience. We'll see packages that used to compile on lots of platforms start to break more often on one platform or another due either to latent bugs in GCC, or latent bugs in the application. So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC mini-summit - compiling for a particular architecture
Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler.
Re: GCC mini-summit - compiling for a particular architecture
Citeren Kaveh R. GHAZI [EMAIL PROTECTED]: On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? The bizarreness of the resulting flag set should not be a consideration IMHO. Humans are bad at predicting how complex systems like this behave. The fact that the best flags may be non-intuitive is not surprising. I find the case of picking compiler options for optimizing code very much like choosing which part of your code to hand optimize programmatically. People often guess wrongly where their code spends its time, that's why we have profilers. So I feel we should choose our per-target flags using a tool like Acovea to find what the best options are on a per-target basis. Then we could insert those flags into -O2 or -Om in each backend. Whether we use SPEC or the testcases included in Acovea or maybe GCC itself as the app to tune for could be argued. And some care would be necessary to ensure that the resulting flags don't completely hose other large classes of apps. But IMHO once you decide to do per-target flags, something like this seems like the natural conclusion. http://www.coyotegulch.com/products/acovea/ I totally agree with you, Acovea would a good tool for helping with this. But, in my opinion, it has one big downside: it doesn't allow a tradeoff between for example compilation time and execution time. My work tries to tackle this: I'm using an evolutionary approach (like Acovea) which is multi-objective (unlike Acovea), meaning it optimizes a Pareto curve for comp. time, execution time and code size. I'm also trying to speed up things over the way Acovea handles things, which I won't describe any further for now. I'm currently only using the SPEC CPU2000 benchmarks (and I believe Acovea does too), but this shouldn't be a drawback: the methodology is completely independent of the benchmarks used. I'm also trying to make sure everything is parameterizable (size of population, number of generations, crossover/mutation/migration rates, ...). Unfortunately, I'm having quite a lot of problems with weird combinations of flags (which generate bugs, as was mentioned in this thread), which is slowing things down. Instead of trying to fix these bugs, I'm just ignoring combinations of flags which generate them for now (although I'l try to report each bug I run into in Bugzilla). Hopefully, this work will produce nice results by June, and I'll make sure to report on it on this mailinglist once it's done. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
RE: GCC mini-summit - compiling for a particular architecture
On 23 April 2007 19:07, Diego Novillo wrote: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Has any of the Acovea research demonstrated whether there actually is any such thing as a good default set of flags in all cases? If the results obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? cheers, DaveK -- Can't think of a witty .sigline today
Re: GCC mini-summit - compiling for a particular architecture
Citeren Diego Novillo [EMAIL PROTECTED]: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Sorry to be blunt, but that's indeed quite naive :-) Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). A serious set of programs can easily take tens of minutes, so I think you can easily see trying all possible combinations is totally unfeasible. The nice thing is you don't have to: you can learn from the combinations you've evaluated, you can start with only a subset of programs and add others gradually, ... My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - compiling for a particular architecture
Dave Korn wrote on 04/23/07 14:26: Has any of the Acovea research demonstrated whether there actually is any such thing as a good default set of flags in all cases? If the results Not Acovea itself. The research I'm talking about involves a compiler whose pipeline can be modified and resequenced. It's not just a matter of adding -f/-m flags. The research I've seen does a fairly good job at modelling AI systems that traverse the immense search space looking for different sequences. obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? Yes. That's why it's called research ;) I'm sure we'll get an earful of at least a couple of these at the summit.
RE: GCC mini-summit - compiling for a particular architecture
Citeren Dave Korn [EMAIL PROTECTED]: On 23 April 2007 19:07, Diego Novillo wrote: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Has any of the Acovea research demonstrated whether there actually is any such thing as a good default set of flags in all cases? If the results obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? I don't think that has been shown. Acovea evaluation has only been done on a few architectures, and I don't believe a comparison was made. But, you guys are probably the best team to try that out: you have access to a wide range of platforms, and the experience needed to solve problems when they present them. Hopefully, my upcoming framework will boost such an effort. Remember: adjusting the way in which GCC handles -On flags is only needed if the tests suggest it will be usefull. greetings, Kenneth This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - compiling for a particular architecture
[EMAIL PROTECTED] wrote on 04/23/07 14:37: Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). No, that's not what I want. I want a static recipe. I do *not* want -Ox to do this search every time. It goes like this: Somebody does a study over a set of applications that represent certain usage patterns (say, FP and INT just to mention the two more common classes of apps). The slow search is done offline and after a few months, we get the results in the form of a table that says for each class and for each -Ox what set of passes to execute and in what order they should be executed. Not to say that the current sequencing and repetition are worthless, but I think they could be improved in a quasi systematic way using this process (which is slow and painful, I know). My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. Right. This is what I want. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... Excellent. Looking forward to those results.
Re: GCC mini-summit - compiling for a particular architecture
[EMAIL PROTECTED] wrote on 04/23/07 14:40: Any references? Yes, at the last HiPEAC conference Grigori Fursin presented their interactive compilation interface, which could be used for this. http://gcc-ici.sourceforge.net/ Ben Elliston had also experimented with a framework to allow GCC to change the sequence of the passes from the command line. Ben, where was your thesis again?
Re: GCC mini-summit - compiling for a particular architecture
On Mon, 2007-04-23 at 10:56 -0700, Mark Mitchell wrote: Kaveh R. GHAZI wrote: On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? Some of both, I guess. Certainly, SPEC isn't a good benchmark for some CPUs or some users. Also, I'd be very surprised to find that at -O2 my inline functions weren't inlined, but I'd not be entirely amazed to find out that on some processor that was for some reason negative on SPEC. I'd be concerned that a SPEC run on Pentium IV doesn't necessarily guide things well for Athlon, so, in practice, we'd still need some kind of fallback default for CPUs where we didn't do the Acovea thing. I'd also suspect if we move various chips to differing options, we'll see more bugs due to rarely-tested combinations. Yes, that will help us fix latent bugs, but it may not improve the user experience. We'll see packages that used to compile on lots of platforms start to break more often on one platform or another due either to latent bugs in GCC, or latent bugs in the application. So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. If every target had a different set of optimizations on at -O2, then I'd claim that we've gone horribly wrong somewhere -- both from a design and implementation standpoint. If we're going to go down this path, I'd much rather see us tune towards target characteristics than towards targets themselves. Enabling/disabling a generic optimizer on a per-target basis should be the option of last resort. Acovea can be helpful in the process of identifying issues, but I strongly believe that using it in the way some have suggested is a cop-out for really delving into performance issues. Jeff
Re: GCC mini-summit - compiling for a particular architecture
On 23 Apr 2007, at 20:43, Diego Novillo wrote: [EMAIL PROTECTED] wrote on 04/23/07 14:37: Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). No, that's not what I want. I want a static recipe. I do *not* want -Ox to do this search every time. I'm not saying you need to do this every time you install GCC. I'm saying trying out 2^60 combinations (even offline) is totally unfeasible. It goes like this: Somebody does a study over a set of applications that represent certain usage patterns (say, FP and INT just to mention the two more common classes of apps). The slow search is done offline and after a few months, we get the results in the form of a table that says for each class and for each -Ox what set of passes to execute and in what order they should be executed. Not to say that the current sequencing and repetition are worthless, but I think they could be improved in a quasi systematic way using this process (which is slow and painful, I know). Exactly my idea. My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. Right. This is what I want. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... Excellent. Looking forward to those results. Cool! -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - benchmarks
On Mon, Apr 23, 2007 at 09:49:04AM -0700, Steve Ellcey wrote: Jim Wilson wrote: Kenneth Hoste wrote: I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? The claim is that SPEC CPU2006 has source code bugs that cause it to fail when compiled by gcc. We weren't given a specific list of problem. HJ, can you give us the specifics on the SPEC 2006 failures you were seeing? I remember the perlbench failure, it was IA64 specific, and was due to the SPEC config file spec_config.h that defines the attribute keyword to be null, thus eliminating all attributes. On IA64 Linux, in the /usr/include/bits/setjmp.h header file, the __jmp_buf buffer is defined to have an aligned attribute on it. If the buffer isn't aligned the perlbench program fails. I believe another problem was an uninitialized local variable in a Fortran program, but I don't recall which program or which variable that was. This is what I sent to SPEC. H.J. 465.tonto in SPEC CPU 2006 has many checks for pointers like: ENSURE(NOT associated(self%ex),SHELL1:copy_1 ... ex not destroyed) It calls the associated intrinsic on a pointer to check if a pointer is pointing to some data. According to Fortran standard, associated should be used on initialized pointers. The behavor of calling associated on uninitialized pointers is undefined or compiler dependent. Tonto tries to initialize pointers in structures by calling nullify_ptr_part_ on stack variables to ensure that pointers in stack variables are properly initialized. However, not all pointers in stack variables are initialized. As the result, the binary compiled by gcc 4.2 failed at runtime with Error in routine SHELL1:copy_1 ... ex not destroyed I am enclosing a patch which adds those missing calls of nullify_ptr_part_. Thanks. H.J.
Re: GCC mini-summit - compiling for a particular architecture
Look from what we're starting: @item -funroll-loops @opindex funroll-loops Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. @option{-funroll-loops} implies @option{-frerun-cse-after-loop}. This option makes code larger, and may or may not make it run faster. @item -funroll-all-loops @opindex funroll-all-loops Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. @option{-funroll-all-loops} implies the same options as @option{-funroll-loops}, It could gain a few more paragraphs written by knowledgeable people. And expanding documentation doesn't introduce regressions :). but also does not make anyone actually use the options. Nobody reads the documention. Of course, this is a bit overstatement, but with a few exceptions, people in general do not enable non-default flags. Zdenek
Re: GCC mini-summit - compiling for a particular architecture
but also does not make anyone actually use the options. Nobody reads the documention. Of course, this is a bit overstatement, but with a few exceptions, people in general do not enable non-default flags. I don't think this is fair. Most people don't read the docs because they don't care about performance, but most people who develop code that spends a lot of CPU cycles actually read the docs at least up to loop unrolling. Exactly my experience. Unfortunately there's no useful information on this topic in the GCC manual... Laurent
Re: GCC mini-summit - compiling for a particular architecture
On 4/22/07, Laurent GUERBY [EMAIL PROTECTED] wrote: On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote: On 4/22/07, Laurent GUERBY [EMAIL PROTECTED] wrote: but also does not make anyone actually use the options. Nobody reads the documention. Of course, this is a bit overstatement, but with a few exceptions, people in general do not enable non-default flags. I don't think this is fair. Most people don't read the docs because they don't care about performance, but most people who develop code that spends a lot of CPU cycles actually read the docs at least up to loop unrolling. Exactly my experience. Unfortunately there's no useful information on this topic in the GCC manual... Well, we have too many switches really. So the default is use -O2. If you want extra speed, try -O3, or even better use profile feedback. (Not many people realize that with profile feedback you get faster code than with -O3 and smaller code than with -Os - at least for C++ programs) At work we use -O3 since it gives 5% performance gain against -O2. profile-feedback has many flags and there is no overview of it in the doc IIRC. Who will use it except GCC developpers? Who knows about your advice? The GCC user documentation is the place... Well, I agree. A GCC Optimization Guide would be a nice thing to have, besides the individual flags documentation. Of course unless someone is volunteering... Richard.
Re: GCC mini-summit - compiling for a particular architecture
Hello, On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote: On 4/22/07, Laurent GUERBY [EMAIL PROTECTED] wrote: but also does not make anyone actually use the options. Nobody reads the documention. Of course, this is a bit overstatement, but with a few exceptions, people in general do not enable non-default flags. I don't think this is fair. Most people don't read the docs because they don't care about performance, but most people who develop code that spends a lot of CPU cycles actually read the docs at least up to loop unrolling. Exactly my experience. Unfortunately there's no useful information on this topic in the GCC manual... Well, we have too many switches really. So the default is use -O2. If you want extra speed, try -O3, or even better use profile feedback. (Not many people realize that with profile feedback you get faster code than with -O3 and smaller code than with -Os - at least for C++ programs) At work we use -O3 since it gives 5% performance gain against -O2. profile-feedback has many flags actually, only two that are really important -- -fprofile-generate and -fprofile-use. Zdenek
Re: GCC mini-summit - compiling for a particular architecture
On Sun, 2007-04-22 at 15:22 +0200, Jan Hubicka wrote: At work we use -O3 since it gives 5% performance gain against -O2. profile-feedback has many flags and there is no overview of it in the doc IIRC. Who will use it except GCC developpers? Who knows about your advice? Well, this is why -fprofile-generate and -fprofile-use was invented. Perhaps docs can be improved so people actually discover it. Do you have any suggestions? (Perhaps a chapther for FDO or subchapter of gcov docs would do?) I don't hve approval rights for documentation but I'd say just create an Optimization Guide chapter with a subchapter detailing an example on how to use feedback based compiling on a simple source code that gives a nice speedup on a popular architecture. Then let's hope other will come in with they favourite advice :). Laurent
Re: GCC mini-summit - compiling for a particular architecture
On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote: At work we use -O3 since it gives 5% performance gain against -O2. profile-feedback has many flags and there is no overview of it in the doc IIRC. Who will use it except GCC developpers? Who knows about your advice? On Sun, Apr 22, 2007 at 03:22:56PM +0200, Jan Hubicka wrote: Well, this is why -fprofile-generate and -fprofile-use was invented. Perhaps docs can be improved so people actually discover it. Do you have any suggestions? Docs could be improved, but this also might be a case where a tutorial would be needed, to teach users how to use it effectively. (Perhaps a chapther for FDO or subchapter of gcov docs would do?)
Re: GCC mini-summit - compiling for a particular architecture
Steve Ellcey wrote: It came up in a few side conversations. As I understand it, RMS has decreed that the -On optimizations shall be architecture independent. That said, there are generic optimizations which really only apply to a single architecture, so there is some precedent for bending this rule. This seems unfortunate. As others have said downthread, I don't think the idea that -O2 should enable the same set of optimizations on all processors is necessary or desirable. (In fact, there's nothing inherent in even using the same algorithms on all processors; I can well imagine that the best register allocation algorithms for x86 and Itanium might be entirely different. I'm in no way trying to encourage an entire set of per-achitecture optimization passes; clearly the more we can keep common the better! But, our goal is to produce a compiler that generates the best possible code on multiple architectures, not to produce a compiler that uses the same algorithms and optimization options on all architectures.) I have never heard RMS opine on this issue. However, I don't think that this is something that the SC or FSF need to decide. The SC has made very clear that it doesn't want to interfere with day-to-day technical development of the compiler. If the consensus of the maintainers is that it's OK to turn on some extra optimizations at -O2 on Itanium, then I think we can just make the change. Of course, if people would prefer that I ask the SC, I'm happy to do so. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC mini-summit
Ian Lance Taylor wrote: We held a GCC mini-summit at Google on Wednesday, April 18. About 40 people came. This is my very brief summary of what we talked about. Corrections and additions very welcome. Thank you for the summary. I am disappointed that I wasn't able to attend, as it sounds like it was a very productive day. Had it not been for other obligations, I would certainly have been there. I particularly appreciate the summary of feedback around 4.2 and the release process. I've been thinking about what's there, the feedback in response to my last GCC 4.2 status report, and various other opinions that have been presented to me. I've got some ideas, but I want to let them percolate a bit before trying to write them down. In any case, I want people to know that I'm listening. Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC mini-summit - compiling for a particular architecture
Mike Stump wrote: On Apr 20, 2007, at 6:42 PM, Robert Dewar wrote: One possibility would be to have a -Om switch (or whatever) that says do all optimizations for this machine that help. Ick, gross. No. Well OK, Ick, but below you recommend removingf the overly pedantic rule. I agree with that, but the above is a compromise suggestion if we can't remove the rule. So, Mike, my question is, assuming we cannot remove the rule what do you want to do a) nothing b) something like the above c) something else, please specify I must say the rule about all optimizations being the same on all machines seems odd to me I'd look at it this way, it isn't unreasonable to have cost metrics that are in fact different for each cpu and possible each tune choice that greatly effects _any_ codegen choice. Sure, we can unroll the loops always on all targets, but, we can crank up the costs of extra instructions on chips where those costs are high, net result, almost no unrolling. For chips where the costs are cheap and they need to exposed instructions to be able to optimizer further, trivially, the costs involved are totally different. Net result, better code gen for each. I do however think the concept of not allowing targets to set and unset optimization choices is, well, overly pedantic.
Re: GCC mini-summit - benchmarks
Kenneth Hoste wrote: I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? The claim is that SPEC CPU2006 has source code bugs that cause it to fail when compiled by gcc. We weren't given a specific list of problem. There are known problems with older SPEC benchmarks though. For instance, vortex fails on some targets unless compiled with -fno-strict-aliasing. -- Jim Wilson, GNU Tools Support, http://www.specifix.com
Re: GCC mini-summit - compiling for a particular architecture
On Fri, 2007-04-20 at 19:28 -0400, Robert Dewar wrote: Steve Ellcey wrote: This seems unfortunate. I was hoping I might be able to turn on loop unrolling for IA64 at -O2 to improve performance. I have only started looking into this idea but it seems to help performance quite a bit, though it is also increasing size quite a bit too so it may need some modification of the unrolling parameters to make it practical. To me it is obvious that optimizations are target dependent. For instance loop unrolling is really a totally different optimization on the ia64 as a result of the rotating registers. My feeling is that it would be much more useful to have a more detailed documentation on optimization flags in the GCC manual that at least mention the type of source code and architectures where each optimization option is interesting rather than to mess with new flags or changing -On longstanding policies. Look from what we're starting: @item -funroll-loops @opindex funroll-loops Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. @option{-funroll-loops} implies @option{-frerun-cse-after-loop}. This option makes code larger, and may or may not make it run faster. @item -funroll-all-loops @opindex funroll-all-loops Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. @option{-funroll-all-loops} implies the same options as @option{-funroll-loops}, It could gain a few more paragraphs written by knowledgeable people. And expanding documentation doesn't introduce regressions :). Laurent
Re: GCC mini-summit - compiling for a particular architecture
On Apr 21, 2007, at 3:12 AM, Robert Dewar wrote: So, Mike, my question is, assuming we cannot remove the rule what do you want to do I think in the end, each situation is different and we have to find the best solution for each situation. So, in that siprit, let's open a discussion for the exact case your thinking of. Now, the closest I've come to -Om in the past would be -fast, which means, tune for spec. :-)
Re: GCC mini-summit - benchmarks
On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote: 11) H.J. Lu discussed SPEC CPU 2006. He reported that a couple of the tests do not run successfully, and it appears to be due to bugs in the tests which cause gcc to compile them in unexpected ways. He has been reporting the problems to the SPEC committee, but is being ignored. He encouraged other people to try it themselves and make their own reports. I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? 12) Jose Dana reported his results comparing different versions of gcc and icc at CERN. They have a lot of performance sensitive C++ code, and have gathered a large number of standalone snippets which they use for performance comparisons. Some of these can directly become bugzilla enhancement requests. In general this could ideally be turned into a free performance testsuite. All the code is under free software licenses. Is this performance testsuite available somewhere? Sounds interesting to add to my (long) list of benchmark suites. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC mini-summit - compiling for a particular architecture
On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote: 13) Michael Meissner raised the idea of compiling functions differently for different processors, choosing the version based on a runtime decision. This led to some discussion of how this could be done effectively. In particular if there is an architectural difference, such as Altivec, you may get prologue instructions which save and restore registers which are not present on all architectures. Related to this: have you guys ever considered to making the -On flags dependent on the architecture? Now, the decision which flags are activated for the -On flags is independent of the architecture I believe (except for flags which need to be disabled to ensure correct code generation, such as - fschedule-insns for x86). I must say I haven't looked into this in great detail, but atleast for the passes controlled by flags on x86, this seems to be the case. I think choosing the flags in function of the architecture you are compiling for, might be highly beneficial (see http://gcc.gnu.org/ bugzilla/show_bug.cgi?id=31528 for example). greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC mini-summit - compiling for a particular architecture
Related to this: have you guys ever considered to making the -On flags dependent on the architecture? It came up in a few side conversations. As I understand it, RMS has decreed that the -On optimizations shall be architecture independent. That said, there are generic optimizations which really only apply to a single architecture, so there is some precedent for bending this rule. There were also suggestions of making the order of optimizations command line configurable and allowing dynamically loaded libraries to register new passes. Ollie
Re: GCC mini-summit - unicorn with rainbows
10) Eric Christopher reported that Tom Tromey (who was not present) had suggested a new mascot for gcc: a unicorn with rainbows. This was met with general approval, and Eric suggested that everybody e-mail Tom with their comments. I personally would like to see the drawing. This sounds fantastic. Rainbow tattoos, or more like shooting a rainbow out of the uni-horn? So cool! Ian, thanks for hosting this, for championing the idea of a free mini-conf, and for writing up what was talked about for the rest of us. I'm sorry to have missed the ice cream. -benjamin
Re: GCC mini-summit
On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: I am afraid that merging it earlier stops progress on the df infrastructurey (e.g. Ken will work only on LTO) There's nothing holding you, and many others, back from helping out, other than that the work is on a branch. By merging, the rest of the community will hopefully start help trying to exploit the good things of the df framework. and that also further transition of some optimizations to the df infrastructure will make code even slower and finally again we will have slower compiler with worse code. Ah, speculation. Why do you think this? Have you even looked at what is _really_ going on? Like, some optimizations computing things they already have available if they'd use the df infrastructure? Maybe you can be more specific about your concerns, instead of spreading FUD. Gr. Steven
Re: GCC mini-summit
Vladimir N. Makarov wrote: And I am disagree that it is within compilation time guidelines set by SC. Ken fixed a big compilation time degradation a few days ago and preliminary what I see now (comparison with the last merge point) is x86_64 SPECInt2000 5.7% SPECFp200 8.7% ppc64 SPECInt2000 6.5% SPECFp2000 5.5% Itanium SPECInt2000 9% SPECFp2000 10.9% Besides as I understand correctly the SC criteria means that there is no degradation on code quality. There is code size degradation about 1% and some degradation on SPEC2000 (e.g. about 2% degradation on a few tests on ia64). I'll be away for a week, so I'll miss most of the big flamewar, but I'd just like to throw in my opinion that based on these numbers I don't see why we're even considering it for inclusion at this point. I also agree with Vlad's point that it needs to be reviewed before being committed to mainline. Bernd -- This footer brought to you by insane German lawmakers. Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen Registergericht Muenchen HRB 40368 Geschaeftsfuehrer Thomas Wessel, Vincent Roche, Joseph E. McDonough
Re: GCC mini-summit
On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: Did not I write several times that the data structure of DF is too fat (because rtl info duplication) and that is probably the problem? Yes, you have complained that you believe the data structure of DF is too fat. I guess that is a valid complaint. I don't see the rtl info duplication though. You've only complained about the current data structures, but I have not really seen you propose anything leaner. Is it not reasonable that using more fat structures without changing algorithms makes compiler slower? No, because the without changing algorithms is what you're assuming. But if you look at e.g. e.g. regmove, you see that many of the insn chain walks could easily be replaced with simpler/faster reg-def chains, which are always available in the df framework. Likewise for the changed registers for CPROP, which runs three times(!). It is easy to really speed this pass up by using the df framework to e.g. replace things like oprs_unchanged_p and the whole reg_avail_mess. Gr. Steven
Re: GCC mini-summit
Steven Bosscher wrote: On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: Did not I write several times that the data structure of DF is too fat (because rtl info duplication) and that is probably the problem? Yes, you have complained that you believe the data structure of DF is too fat. I guess that is a valid complaint. I don't see the rtl info duplication though. You've only complained about the current data structures, but I have not really seen you propose anything leaner. I did proposed to attach the info to rtl reg. It is more difficult way than the current approach because of code sharing before the reload. But may be it is more rewarding. To be honest I don't know. Is it not reasonable that using more fat structures without changing algorithms makes compiler slower? No, because the without changing algorithms is what you're assuming. But if you look at e.g. e.g. regmove, you see that many of the insn chain walks could easily be replaced with simpler/faster reg-def chains, which are always available in the df framework. Likewise for the changed registers for CPROP, which runs three times(!). It is easy to really speed this pass up by using the df framework to e.g. replace things like oprs_unchanged_p and the whole reg_avail_mess. Sorry, I wrote in several times that I am not against a df-infrastructure because I believed in the example you just wrote too. But according to your logic the compiler should be faster. It did not happen yet because the df advantages is less significant than its dissadvantages for now imho. Your argument says me that if you find another such optimization to rewrite it and speed it up, the df advantages will overtake its disadvantages and it can be merged without my objections (although who cares about my opinion). That is my another proposal.
Re: GCC mini-summit
On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: Yes, you have complained that you believe the data structure of DF is too fat. I guess that is a valid complaint. I don't see the rtl info duplication though. You've only complained about the current data structures, but I have not really seen you propose anything leaner. I did proposed to attach the info to rtl reg. It is more difficult way than the current approach because of code sharing before the reload. But may be it is more rewarding. To be honest I don't know. Changing the dataflow solvers is one thing. Changing a fundamental assumption in the compiler, namely that pseudoregs are shared, is another. We rely on that property in so many places. It would not require rewriting parts of the compiler, but rewriting the entire backend. Gr. Steven
Re: GCC mini-summit
Steven Bosscher wrote: On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: I am afraid that merging it earlier stops progress on the df infrastructurey (e.g. Ken will work only on LTO) There's nothing holding you, and many others, back from helping out, other than that the work is on a branch. By merging, the rest of the community will hopefully start help trying to exploit the good things of the df framework. and that also further transition of some optimizations to the df infrastructure will make code even slower and finally again we will have slower compiler with worse code. Ah, speculation. Why do you think this? Have you even looked at what is _really_ going on? Like, some optimizations computing things they already have available if they'd use the df infrastructure? Maybe you can be more specific about your concerns, instead of spreading FUD. Steven, could you stop spreading FUD about me spreading FUD. Did not I write several times that the data structure of DF is too fat (because rtl info duplication) and that is probably the problem? Is it not reasonable that using more fat structures without changing algorithms makes compiler slower?
Re: GCC mini-summit - unicorn with rainbows
10) Eric Christopher reported that Tom Tromey (who was not present) had suggested a new mascot for gcc: a unicorn with rainbows. This was met with general approval, and Eric suggested that everybody e-mail Tom with their comments. I personally would like to see the drawing. On Fri, Apr 20, 2007 at 06:02:46AM -0400, Benjamin Kosnik wrote: This sounds fantastic. Rainbow tattoos, or more like shooting a rainbow out of the uni-horn? I originally proposed the gnu emerging from an egg idea, it was based on the fact that some of us were pronouncing egcs eggs, and suggested that the compiler was being re-born. But we haven't been egcs for a long time, and I no longer like the existing logo much. So I'm fine with the unicorn idea. (I had to leave the summit before this topic came up). Ian, thanks for hosting this, for championing the idea of a free mini-conf, and for writing up what was talked about for the rest of us. Agreed; thanks Ian. I'm sorry to have missed the ice cream. I missed the ice cream, but I did get free lunch in the famous Google cafeteria.
Re: GCC mini-summit - compiling for a particular architecture
On Fri, Apr 20, 2007 at 12:58:39AM -0700, Ollie Wild wrote: Related to this: have you guys ever considered to making the -On flags dependent on the architecture? It came up in a few side conversations. As I understand it, RMS has decreed that the -On optimizations shall be architecture independent. But decrees of this kind from RMS (on purely technical matters) are negotiable. On matters of free software principle, RMS is the law. On technical matters he's (IMHO) just one hacker, though as the original author of gcc he should get respect and a certain amount of deference. If champions of this idea can make the case that the benefits outweigh the costs by a significant factor, it could be considered. But there are considerable costs: paths that get lots of testing are solid; paths that get less testing aren't. If every port uses a different set of optimizations we will see more target-specific bugs that really aren't.
Re: GCC mini-summit - Patch tracker
Ian I proposed automatic e-mail pings, but that wasn't generally Ian welcomed. Bummer. Why? Dan If people are okay with this, I have no problem implementing it. If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. Alternatively, filtering by regex would work just as well for me. Tom
RE: GCC mini-summit - Patch tracker
On 20 April 2007 18:43, Tom Tromey wrote: Ian I proposed automatic e-mail pings, but that wasn't generally Ian welcomed. Bummer. Why? Dan If people are okay with this, I have no problem implementing it. If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. Heh. Guilty as charged. Alternatively, filtering by regex would work just as well for me. Or just suggesting a list of canonical names for people to use. cheers, DaveK -- Can't think of a witty .sigline today
Re: GCC mini-summit - Patch tracker
Dave == Dave Korn [EMAIL PROTECTED] writes: If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. Dave Heh. Guilty as charged. Sorry, wasn't trying to single anybody out. The problem perhaps can't be solved on the submission end since people forget, there are misspellings, etc. Alternatively, filtering by regex would work just as well for me. Dave Or just suggesting a list of canonical names for people to use. That would also help. Tom
Re: GCC mini-summit - Patch tracker
On 20 Apr 2007 11:42:57 -0600, Tom Tromey [EMAIL PROTECTED] wrote: Ian I proposed automatic e-mail pings, but that wasn't generally Ian welcomed. Bummer. Why? Dan If people are okay with this, I have no problem implementing it. If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. I may just have it list all the maintenance areas. Alternatively, filtering by regex would work just as well for me. It *is* a regex :) Tom
Re: GCC mini-summit - compiling for a particular architecture
Hello, Steve Ellcey wrote: This seems unfortunate. I was hoping I might be able to turn on loop unrolling for IA64 at -O2 to improve performance. I have only started looking into this idea but it seems to help performance quite a bit, though it is also increasing size quite a bit too so it may need some modification of the unrolling parameters to make it practical. To me it is obvious that optimizations are target dependent. For instance loop unrolling is really a totally different optimization on the ia64 as a result of the rotating registers. that we do not use. Nevertheless, there are still compelling reasons for why unrolling is more useful on ia64 then on other architectures (importance of scheduling, insensitivity to code size growth). Another option would be to consider enabling (e.g.) -funroll-loops -fprefetch-loop-arrays by default on -O3. I think it is fairly rare for these flags to cause performance regressions (although of course more measurements to support this claim would be necessary). Zdenek
Re: GCC mini-summit - compiling for a particular architecture
Zdenek Dvorak wrote: Hello, Steve Ellcey wrote: This seems unfortunate. I was hoping I might be able to turn on loop unrolling for IA64 at -O2 to improve performance. I have only started looking into this idea but it seems to help performance quite a bit, though it is also increasing size quite a bit too so it may need some modification of the unrolling parameters to make it practical. To me it is obvious that optimizations are target dependent. For instance loop unrolling is really a totally different optimization on the ia64 as a result of the rotating registers. that we do not use. Right but we might in the future Nevertheless, there are still compelling reasons for why unrolling is more useful on ia64 then on other architectures (importance of scheduling, insensitivity to code size growth). And large number of registers. Another option would be to consider enabling (e.g.) -funroll-loops -fprefetch-loop-arrays by default on -O3. I think it is fairly rare for these flags to cause performance regressions (although of course more measurements to support this claim would be necessary). Well unroll loops blows up code size, so it has to have positive value, not merely no negative value :-) Zdenek
Re: GCC mini-summit - compiling for a particular architecture
Diego Novillo wrote: H. J. Lu wrote on 04/20/07 21:30: -fprefetch-loop-arrays shouldn't be on by default since HW prefetch usually will have negative performance impact on Intel. We are talking about one specific architecture where it usually helps: ia64. Right, but the follow on discussion was *if* we have to have the same set of optimization options for all architectures, *then* could we consider turning on loop unrolling by default. One possibility would be to have a -Om switch (or whatever) that says do all optimizations for this machine that help. I must say the rule about all optimizations being the same on all machines seems odd to me in another respect. What if you have an optimziation that applies *only* to one machine (ia64 has a number of such possibilities!)
Re: GCC mini-summit - compiling for a particular architecture
H. J. Lu wrote on 04/20/07 21:30: -fprefetch-loop-arrays shouldn't be on by default since HW prefetch usually will have negative performance impact on Intel. We are talking about one specific architecture where it usually helps: ia64.
Re: GCC mini-summit - compiling for a particular architecture
Robert Dewar wrote on 04/20/07 21:42: One possibility would be to have a -Om switch (or whatever) that says do all optimizations for this machine that help. I think this is a good compromise. I personally don't think we should limit ourselves to doing the exact same optimizations across all architectures, but I can see why some people may find that useful.
Re: GCC mini-summit - compiling for a particular architecture
On Apr 20, 2007, at 6:42 PM, Robert Dewar wrote: One possibility would be to have a -Om switch (or whatever) that says do all optimizations for this machine that help. Ick, gross. No. I must say the rule about all optimizations being the same on all machines seems odd to me I'd look at it this way, it isn't unreasonable to have cost metrics that are in fact different for each cpu and possible each tune choice that greatly effects _any_ codegen choice. Sure, we can unroll the loops always on all targets, but, we can crank up the costs of extra instructions on chips where those costs are high, net result, almost no unrolling. For chips where the costs are cheap and they need to exposed instructions to be able to optimizer further, trivially, the costs involved are totally different. Net result, better code gen for each. I do however think the concept of not allowing targets to set and unset optimization choices is, well, overly pedantic.