GCC mini-summit
We held a GCC mini-summit at Google on Wednesday, April 18. About 40 people came. This is my very brief summary of what we talked about. Corrections and additions very welcome. The goal of the mini-summit was just to let gcc developers meet face to face and talk. There was no goal of actually making any decisions, and, indeed, no decisions were made. If you comment on some particular item here, I'd like to suggest that you change the Subject line so that conversations can be tracked more easily. I didn't do a good job of recording who said what. And indeed most of this is from memory. I apologize for any mistakes or slights. They are entirely unintentional. 1) Introductions. 2) A phone call with Uday Khedker and other gcc developers in India from IIT Bombay in Mumbai. David Edelsohn arranged for this to happen. The call had to be from another room, and I didn't participate, so I don't have any details. 3) A discussion of gcc 4.2 and the gcc release process in general. Nobody objected to shipping gcc 4.2 today, even though it does not meet the goal of fewer than 100 regressions from an earlier release. People did comment that we should fix all the P1 bugs before shipping. We count regressions from any old release. That means that we have open bugs which were regressions in 3.x, but we are still counting them as release-blocking regressions. There were a few suggestions about how to avoid this: a) only count regressions from the last two releases. b) discount old regressions over time, perhaps by lowering their priority. c) add voting to bugzilla for which old regressions should count as release-blocking. Somebody suggested using the number of e-mail addresses CC'ed on the bug as votes. d) some people commented that this was a good thing, since it encourages us to fix old regressions, and that we should not avoid it at all. Somebody observed that our focus on regressions can cause us to ignore important wrong-code bugs which should be fixed even if they are not actually regressions--perhaps they are bugs in new features which were not present in earlier versions. 4) A discussion of dataflow. Ken Zadeck described the current state of dataflow branch. It seems stable, and just about within the compilation time guidelines set by the SC. He will do more testing and retesting this weekend, and hopes to commit it to mainline quite soon, maybe as early as next week if the testing goes well. 5) A discussion of tuples, the IR, and LTO. Diego Novillo described the tuples proposal, which is an incremental change to the IR. Ken Zadeck described the current state of his LTO work. He described LTO as having three parts: writing out types, which is being done in DWARF; serializing trees; eliminating langhooks. Discussion about writing out types in DWARF, since apparently several new DWARF attributes had to be invented to capture everything that gcc cares about. Michael Eager pointed out that if the new attributes would be of use for the debugger, he and the DWARF standardization committee would like to hear about them. Discussion of LTO and IMA; Geoff Keating said that if LTO works better than IMA, IMA should be removed. There was some concern that LTO would be slower than IMA, since in some ways it has to do more work, since the trees have to be written out and read in; however, in other ways it does less work. Discussion of a middle-end type system. Daniel Berlin said that it wasn't clear that we really needed a type system. He proposed that the middle-end use structural equivalence for the type system, and that front-ends annotate types which appear the same but are not. Interestingly, the types_compatible_p langhook is only used for C and C++ at present. People talked about the consequences for debugging information, with no clear outcome. Frontends would have to annotate types with their alias set in order for TBAA to work properly. 6) Lunch at the main Google cafeteria. 7) Daniel Berlin described the most recent draft of GPLv3. His conclusion is that, unlike earlier drafts, it is not substantially different from GPLv2. There was a fair amount of discussion. In particular there was some concern about the libgcc exception license, and whether that could ever go away. The general feeling was that that was unlikely. 8) A discussion of register allocation. This didn't go too far as none of the people working on register allocation were there. Diego Novillo briefly described Vlad Makarov's work and mentioned Andrew MacLeod's work. David Edelsohn mentioned Peter Bergner's work. There was some discussion of eliminating reload as the first step, and of taking pieces out of reload as the first incremental step. 9) A discussion of maintainership and code review. We discussed the
Re: GCC mini-summit - benchmarks
On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote: 11) H.J. Lu discussed SPEC CPU 2006. He reported that a couple of the tests do not run successfully, and it appears to be due to bugs in the tests which cause gcc to compile them in unexpected ways. He has been reporting the problems to the SPEC committee, but is being ignored. He encouraged other people to try it themselves and make their own reports. I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? 12) Jose Dana reported his results comparing different versions of gcc and icc at CERN. They have a lot of performance sensitive C++ code, and have gathered a large number of standalone snippets which they use for performance comparisons. Some of these can directly become bugzilla enhancement requests. In general this could ideally be turned into a free performance testsuite. All the code is under free software licenses. Is this performance testsuite available somewhere? Sounds interesting to add to my (long) list of benchmark suites. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC mini-summit - compiling for a particular architecture
On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote: 13) Michael Meissner raised the idea of compiling functions differently for different processors, choosing the version based on a runtime decision. This led to some discussion of how this could be done effectively. In particular if there is an architectural difference, such as Altivec, you may get prologue instructions which save and restore registers which are not present on all architectures. Related to this: have you guys ever considered to making the -On flags dependent on the architecture? Now, the decision which flags are activated for the -On flags is independent of the architecture I believe (except for flags which need to be disabled to ensure correct code generation, such as - fschedule-insns for x86). I must say I haven't looked into this in great detail, but atleast for the passes controlled by flags on x86, this seems to be the case. I think choosing the flags in function of the architecture you are compiling for, might be highly beneficial (see http://gcc.gnu.org/ bugzilla/show_bug.cgi?id=31528 for example). greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC mini-summit - compiling for a particular architecture
Related to this: have you guys ever considered to making the -On flags dependent on the architecture? It came up in a few side conversations. As I understand it, RMS has decreed that the -On optimizations shall be architecture independent. That said, there are generic optimizations which really only apply to a single architecture, so there is some precedent for bending this rule. There were also suggestions of making the order of optimizations command line configurable and allowing dynamically loaded libraries to register new passes. Ollie
arm,gcc and dsp instructions
Hello all, I`m working with an arm core 9260EJ-S under Linux (Linux kernel 2.6.15; armv5l-linux toolchain with compiler gnu gcc 3.4.2). I would like to take advantage of the asm DSP like functions the core provides. I compile this way: arm-linux-gnu -msoft-float -mtune=arm926ejs -S mul.c Code generated uses these assembly dsp instructions sometimes (I think critical parts should be assembly written), but it exists any kind of patch or derivative to tell gcc to improve the code generated using these kind of 1 cycle instructions more offently? Thanks in advance. Victor Librado Sancho
Re: arm,gcc and dsp instructions
On Apr 20, 2007, at 1:24 AM, Victor Librado wrote: Hello all, I`m working with an arm core 9260EJ-S under Linux (Linux kernel 2.6.15; armv5l-linux toolchain with compiler gnu gcc 3.4.2). I would like to take advantage of the asm DSP like functions the core provides. I compile this way: arm-linux-gnu -msoft-float -mtune=arm926ejs -S mul.c Code generated uses these assembly dsp instructions sometimes (I think critical parts should be assembly written), but it exists any kind of patch or derivative to tell gcc to improve the code generated using these kind of 1 cycle instructions more offently? You'll want to probably use -march=variant so that it will generate the instructions. Tuning is otherwise just costs, and some universal features of the core that would also run on more generic arm chips. -eric
Re: arm,gcc and dsp instructions
On Fri, 2007-04-20 at 10:24 +0200, Victor Librado wrote: Hello all, I`m working with an arm core 9260EJ-S under Linux (Linux kernel 2.6.15; armv5l-linux toolchain with compiler gnu gcc 3.4.2). I would like to take advantage of the asm DSP like functions the core provides. I compile this way: arm-linux-gnu -msoft-float -mtune=arm926ejs -S mul.c Code generated uses these assembly dsp instructions sometimes (I think critical parts should be assembly written), but it exists any kind of patch or derivative to tell gcc to improve the code generated using these kind of 1 cycle instructions more offently? You should really be asking this sort of question on gcc-help... As Eric has already pointed out, -mtune does not affect the set of available instructions that the compiler can choose from, it just adjusts the preference tables that are used for making a choice. To change the available instructions for the arm926ej-s you need to use -march=armv5tej. However, the flag -mcpu=arm926ejs-s is short-hand for setting both the -mtune and the -march flags appropriately: in your case, I think you should use that. Given the above, will GCC emit any of the DSP instructions available on that core? Answer: not many. It does occasionally use the additional multiply instructions, but the saturating operations do not map onto normal C constructs and so aren't used. There's a current development project for the compiler to add support for fixed-point arithmetic (http://gcc.gnu.org/wiki/FixedPointArithmetic). Once that project is integrated, gcc could be adapted to use the 926's dsp instructions in some cases. R.
Re: GCC mini-summit - unicorn with rainbows
10) Eric Christopher reported that Tom Tromey (who was not present) had suggested a new mascot for gcc: a unicorn with rainbows. This was met with general approval, and Eric suggested that everybody e-mail Tom with their comments. I personally would like to see the drawing. This sounds fantastic. Rainbow tattoos, or more like shooting a rainbow out of the uni-horn? So cool! Ian, thanks for hosting this, for championing the idea of a free mini-conf, and for writing up what was talked about for the rest of us. I'm sorry to have missed the ice cream. -benjamin
[M32C] Incorrect Frame information generated
Hi, I have built a tool chain for m32c target using the latest sources. I am using a third party debugger to debug the application built using this tool chain. However, I am not able to view the complete call stack. It seems that the .debug_frame section is not generating the correct unwind information. Please consider the .debug_frame section generated for the following code: int foo(); int foo2(); int main() { int j = 23; j += foo(); return 0; } int foo() { int k = 45; int l = 56; int m = 6; m += foo2(); return (m+l+k) ; } int foo2() { int n = 9; int o = 26; return (n+o) ; } Command used to compile the code: m32c-elf-gcc -mcpu=r8c main.c -g -nostartfiles Command used to display debug information: m32c-elf-readelf -w a.out /// The section .debug_frame contains: 0010 CIE Version: 1 Augmentation: Code alignment factor: 1 Data alignment factor: -1 Return address column: 13 DW_CFA_def_cfa: r12 ofs 3 DW_CFA_offset: r13 at cfa-3 DW_CFA_nop DW_CFA_nop 0014 0014 FDE cie= pc=c000..c013 DW_CFA_advance_loc: 3 to c003 DW_CFA_def_cfa_offset: 5 DW_CFA_def_cfa_reg: r11 DW_CFA_offset: r11 at cfa-5 DW_CFA_nop 002c 0014 FDE cie= pc=c013..c035 DW_CFA_advance_loc: 3 to c016 DW_CFA_def_cfa_offset: 5 DW_CFA_def_cfa_reg: r11 DW_CFA_offset: r11 at cfa-5 DW_CFA_nop 0044 0014 FDE cie= pc=c035..c04a DW_CFA_advance_loc: 3 to c038 DW_CFA_def_cfa_offset: 5 DW_CFA_def_cfa_reg: r11 DW_CFA_offset: r11 at cfa-5 DW_CFA_nop /// The Frame Description Entries (FDE) generated for all the functions are identical. The number of local variables are different in all the functions. Therefore the frame size allocated for every function will be different. But this is not reflected in the debug information. The stack adjustment (accounting for the local variables) is emitted correctly in the function prologue using enter instruction. But it seems that the debug frame information for this stack adjustment is not emitted. Is my understanding correct? If yes, how can I generate the information that will account for the local variables? Regards, Ina Pandit KPIT Cummins InfoSystems Ltd. Pune, India ~~ Free download of GNU based tool-chains for Renesas' SH, H8, R8C, M16C and M32C Series. The following site also offers free technical support to its users. Visit http://www.kpitgnutools.com for details. Latest versions of KPIT GNU tools were released on February 6, 2007. ~~
Re: assign numbers to warnings; treat selected warnings as errors
[adjusting Subject and also forwarding to [EMAIL PROTECTED] On Wed, 2007-04-18 at 12:12 -0700, Vivek Rao wrote: Here is a feature of g95 that I would like to see in gfortran. G95 assigns numbers to warnings and allows selected warnings to be treated as errors. [...] g95 -Wall -Wextra -Werror=113,115,137 xunused.f90 turns those warnings into errors. Gfortran does not assign numbers to warnings, and the option -Werror turns ALL warnings into errors. I'd like finer control. This does sound like a useful feature, not only for gfortran, but for all of gcc. Thoughts, comments? Thomas
Re: New option: -fstatic-libgfortran
On Fri, 20 Apr 2007, Fran?ois-Xavier Coudert wrote: Attached is a first draft of a patch to add a -fstatic-libgfortran option. This new option is recognized by the driver and instead of I think -static-libgfortran (no initial f) would be a better spelling, for consistency with -static-libgcc. -- Joseph S. Myers [EMAIL PROTECTED]
Re: GCC mini-summit
On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: I am afraid that merging it earlier stops progress on the df infrastructurey (e.g. Ken will work only on LTO) There's nothing holding you, and many others, back from helping out, other than that the work is on a branch. By merging, the rest of the community will hopefully start help trying to exploit the good things of the df framework. and that also further transition of some optimizations to the df infrastructure will make code even slower and finally again we will have slower compiler with worse code. Ah, speculation. Why do you think this? Have you even looked at what is _really_ going on? Like, some optimizations computing things they already have available if they'd use the df infrastructure? Maybe you can be more specific about your concerns, instead of spreading FUD. Gr. Steven
Re: GCC mini-summit
Vladimir N. Makarov wrote: And I am disagree that it is within compilation time guidelines set by SC. Ken fixed a big compilation time degradation a few days ago and preliminary what I see now (comparison with the last merge point) is x86_64 SPECInt2000 5.7% SPECFp200 8.7% ppc64 SPECInt2000 6.5% SPECFp2000 5.5% Itanium SPECInt2000 9% SPECFp2000 10.9% Besides as I understand correctly the SC criteria means that there is no degradation on code quality. There is code size degradation about 1% and some degradation on SPEC2000 (e.g. about 2% degradation on a few tests on ia64). I'll be away for a week, so I'll miss most of the big flamewar, but I'd just like to throw in my opinion that based on these numbers I don't see why we're even considering it for inclusion at this point. I also agree with Vlad's point that it needs to be reviewed before being committed to mainline. Bernd -- This footer brought to you by insane German lawmakers. Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen Registergericht Muenchen HRB 40368 Geschaeftsfuehrer Thomas Wessel, Vincent Roche, Joseph E. McDonough
Re: GCC mini-summit
On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: Did not I write several times that the data structure of DF is too fat (because rtl info duplication) and that is probably the problem? Yes, you have complained that you believe the data structure of DF is too fat. I guess that is a valid complaint. I don't see the rtl info duplication though. You've only complained about the current data structures, but I have not really seen you propose anything leaner. Is it not reasonable that using more fat structures without changing algorithms makes compiler slower? No, because the without changing algorithms is what you're assuming. But if you look at e.g. e.g. regmove, you see that many of the insn chain walks could easily be replaced with simpler/faster reg-def chains, which are always available in the df framework. Likewise for the changed registers for CPROP, which runs three times(!). It is easy to really speed this pass up by using the df framework to e.g. replace things like oprs_unchanged_p and the whole reg_avail_mess. Gr. Steven
Re: GCC mini-summit
Steven Bosscher wrote: On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: Did not I write several times that the data structure of DF is too fat (because rtl info duplication) and that is probably the problem? Yes, you have complained that you believe the data structure of DF is too fat. I guess that is a valid complaint. I don't see the rtl info duplication though. You've only complained about the current data structures, but I have not really seen you propose anything leaner. I did proposed to attach the info to rtl reg. It is more difficult way than the current approach because of code sharing before the reload. But may be it is more rewarding. To be honest I don't know. Is it not reasonable that using more fat structures without changing algorithms makes compiler slower? No, because the without changing algorithms is what you're assuming. But if you look at e.g. e.g. regmove, you see that many of the insn chain walks could easily be replaced with simpler/faster reg-def chains, which are always available in the df framework. Likewise for the changed registers for CPROP, which runs three times(!). It is easy to really speed this pass up by using the df framework to e.g. replace things like oprs_unchanged_p and the whole reg_avail_mess. Sorry, I wrote in several times that I am not against a df-infrastructure because I believed in the example you just wrote too. But according to your logic the compiler should be faster. It did not happen yet because the df advantages is less significant than its dissadvantages for now imho. Your argument says me that if you find another such optimization to rewrite it and speed it up, the df advantages will overtake its disadvantages and it can be merged without my objections (although who cares about my opinion). That is my another proposal.
Re: GCC mini-summit
On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: Yes, you have complained that you believe the data structure of DF is too fat. I guess that is a valid complaint. I don't see the rtl info duplication though. You've only complained about the current data structures, but I have not really seen you propose anything leaner. I did proposed to attach the info to rtl reg. It is more difficult way than the current approach because of code sharing before the reload. But may be it is more rewarding. To be honest I don't know. Changing the dataflow solvers is one thing. Changing a fundamental assumption in the compiler, namely that pseudoregs are shared, is another. We rely on that property in so many places. It would not require rewriting parts of the compiler, but rewriting the entire backend. Gr. Steven
Re: [M32C] Incorrect Frame information generated
Those frame offsets are relative to $fp, not $sp. *Those* offsets are the same for those functions. Your debugger needs to interpret the DW_CFA_def_cfa_reg codes.
Does vectorizer support extension?
Hi Dorit, SSE4 has vector zero/sign-extensions like: (define_insn sse4_1_zero_extendv2siv2di2 [(set (match_operand:V2DI 0 register_operand =x) (zero_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 nonimmediate_operand xm) (parallel [(const_int 0) (const_int 1)]] TARGET_SSE4_1 pmovzxdq\t{%1, %0|%0, %1} [(set_attr type ssemov) (set_attr mode TI)]) Does vectorizer support them? Thanks. H.J.
Re: [MIPS] MADD issue
Nigel Stephens [EMAIL PROTECTED] writes: OK, so maybe as the person who removed adddi3 from the MIPS backend, and the main proponent of the new fused opcodes, you get to volunteer to implement this? :) Hey, I was pretty happy with the status quo ;) Richard
Re: [MIPS] MADD issue
Richard Sandiford wrote: Nigel Stephens [EMAIL PROTECTED] writes: I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, mcore, score, arm pa backends still implement adddi3 as either a define_insn which outputs two instructions or an explicit define_expand followed define_split and associated sub patterns. Are we setting the bar too high for MIPS? :) I don't think that follows. The main reason that ports like rs6000, i386, arm, sparc and pa define adddi3 is because those architectures provide special add-with-carry type instructions, or similar architecture-specific optimisations. Right, good point. MIPS has nothing like that. Actually the MIPS DSP ASE does have addsc and addwc, which could be used for this purpose. Sadly not subsc and subwc, though. The old MIPS patterns just re-implemented the standard optabs version (and often did so less efficiently, as I said before). Whilst I'm sure that your proposal is the right one going forward, it still feels like it could be significant amount of work to implement. And the simplified optab/expand support would only work for multiply-add sequences expressed within a single expression, and wouldn't be able to optimise disjoint multiply, then add expressions. Or have I missed something. I don't think that follows either. Out-of-ssa should coalesce them if they are in the same basic block. And if they aren't in the same basic block, that's presumably because the tree optimisers think they shouldn't be. Combine wouldn't look across basic block boundaries either AFAIK. E.g. compiling: typedef unsigned long long ull; typedef unsigned int ui; ull foo (ui x, ui y, ull z) { ull tmp = (ull) x * y; return tmp + z; } with a recent snapshot and -O2 -fdump-tree-all shows that the final_cleanup dump does indeed have the combined form: ;; Function foo (foo) foo (x, y, z) { bb 2: return (long long unsigned int) y * (long long unsigned int) x + z; } Ah I see. Fair enough. I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. in the short term we really do need to reenable madd/msub for MIPS32 targets in GCC 4. We could do that with a local patch to put back adddi3, but it would be better if we could coordinate this with you. If removing the patterns had been purely a clean-up, I would be more open to the idea of putting the patterns back. But given that removing the patterns had an optimisation benefit of its own, I'm less open to the idea of adding them back, especially when (as far as I'm concerned) there's a clean way of getting the best of both worlds. OK, so maybe as the person who removed adddi3 from the MIPS backend, and the main proponent of the new fused opcodes, you get to volunteer to implement this? :) In the meantime the performance gain from being able to use a widening madd is more important to us than the benefit of improved optimisation of 64-bit addition, so we'll probably have to put adddi3 back in as a local patch. Nigel
Re: GCC -On optimization passes: flag and doc issues
Kenneth Hoste [EMAIL PROTECTED] writes: A related question: how is decided which priority a bug gets? In general the release manager, Mark Mitchell, sets the priorities of bugs in the bug database. He follows general guidelines where wrong-code is more important, primary platforms are more important, etc. That said, setting priorities has only a weak effect on which bugs actually get fixed. Which bugs are fixed is more related to which ones catch the interest of individual developers, and which ones are related to customers of people and organizations which provide paid support for gcc. - the -fipa-X flags are not mentioned in the 4.1.2 documentation Looks like a bug. A documentation bug you mean? Yes. - the documentation for fmove-loop-invariants and ftree-loop-optimize mentions they are enabled at -O1 when they are not Actually I think they are enabled at -O1. This is done in a slightly tricky way: they default to being on, but they have no effect unless the other loop optimization passes are run. I think it would be appropriate to clean this up to make the code more obvious. Hmm, okay, thank you. Do you know of any other flags being activated like this in a 'hidden' way? I don't know. One way to find some would be to look at common.opt for the variables which have Init(1), and consider whether they are really enabled at all times or whether they only take effect when other flags are turned on. A general case is that many passes are only run when optimization is turned on. There are also interactions which may be unexpected, such as the way that -frename-registers is turned on if -funroll-loops or -fpeel-loops is used. Look in process_options() in toplev.c for those sorts of interactions. - When new optimization passes are completed, how is decided where they go to (-O1, -O2, -O3, -Os, none)? For example, why did funit-at- a-time switch from -O2 to -O1 a while ago? And, as I noticed from the 4.3.0 docs, why is fsplit-wide-types enabled at -O1, and not -O2? Is there some testing done, or is it just people saying I think X belongs there? Where to put optimizations is a trade-off between the amount of time they take and the amount of good they do. The testing which is done is compilation time testing. Hmm, okay. I still feel this is a bit ad-hoc. What about flags that don't get activated in any of the -On flags? Are these not stable enough yet on important architectures, or are there other reasons? Flags which are not enabled at any optimization option are generally considered to not pay off in significant numbers of cases: they sometimes help and sometimes hurt. For example, -funroll-all-loops. Ideally, anybody who uses -funroll-all-loops would benchmark the code with and without the option when deciding whether to use it. We try to avoid flags which are unstable. There have been a few here and there over the years, but I'm not aware of any at present. That said obviously flags which are not turned on by default get less testing. But, yet, it is definitely ad-hoc. If somebody had a more organized proposal, it would be considered seriously. The current assumption is that we will converge on appropriate settings over time. Ian
Re: assign numbers to warnings; treat selected warnings as errors
On 20/04/07, Joseph S. Myers [EMAIL PROTECTED] wrote: On Fri, 20 Apr 2007, Thomas Koenig wrote: This does sound like a useful feature, not only for gfortran, but for all of gcc. GCC has -Werror=foo in 4.2 or later (with warning option names, not numbers). That gives you the command-line syntax and semantics; to use it in gfortran, you'd need either to use the common diagnostics infrastructure or add the feature to the Fortran-specific diagnostics code. Not only that, but you can do -Werror -Wno-error=foo, to get errors for everything except -Wfoo. Also, you can do -fdiagnostics-show-options to find out which -Wfoo option generates each warning message. It is unfortunate that this is missing from gcc-4.2/changes.html. Cheers, Manuel.
Re: GCC mini-summit
Steven Bosscher wrote: On 4/20/07, Vladimir N. Makarov [EMAIL PROTECTED] wrote: I am afraid that merging it earlier stops progress on the df infrastructurey (e.g. Ken will work only on LTO) There's nothing holding you, and many others, back from helping out, other than that the work is on a branch. By merging, the rest of the community will hopefully start help trying to exploit the good things of the df framework. and that also further transition of some optimizations to the df infrastructure will make code even slower and finally again we will have slower compiler with worse code. Ah, speculation. Why do you think this? Have you even looked at what is _really_ going on? Like, some optimizations computing things they already have available if they'd use the df infrastructure? Maybe you can be more specific about your concerns, instead of spreading FUD. Steven, could you stop spreading FUD about me spreading FUD. Did not I write several times that the data structure of DF is too fat (because rtl info duplication) and that is probably the problem? Is it not reasonable that using more fat structures without changing algorithms makes compiler slower?
Re: [MIPS] MADD issue
Richard Sandiford wrote: Nigel Stephens [EMAIL PROTECTED] writes: While I agree with you philosophically, it feels like (b) might be quite a major task. A number of optimisation passes which currently recognise and MUL and PLUS separately (e.g. loop strength reduction) would now need to be extended to handle the fused MULPLUS and MULSUB operators. And although the reduction in instruction count due to your previous change is good, what is it as a percentage of the total? After all it only helps code which uses 64-bit integer types with a 32-bit ABI, which is probably quite a small proportion of most real-life applications -- whereas for some algorithms the ability to use MADD is absolutely critical to performance, and for them losing the ability to generate MADD is a significant backward step for the compiler. How about, as a workaround until (b) sees the light of day, we reimplement adddi3 and subdi3 only (not the other di mode patterns), qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented more cleanly nowadays, using define_insn_and_split and/or a # template, to avoid generating multi-instruction assembler sequences. The old patterns had a define_split too. That wasn't really the problem. If you don't want to add a tree code yet, it would still be possible to add the optab and expand support, recognising mult-add sequences in a similar way to how we recognise widening multiplies now. I feel at least that's a step in the right direction. Richard I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, mcore, score, arm pa backends still implement adddi3 as either a define_insn which outputs two instructions or an explicit define_expand followed define_split and associated sub patterns. Are we setting the bar too high for MIPS? :) Whilst I'm sure that your proposal is the right one going forward, it still feels like it could be significant amount of work to implement. And the simplified optab/expand support would only work for multiply-add sequences expressed within a single expression, and wouldn't be able to optimise disjoint multiply, then add expressions. Or have I missed something. in the short term we really do need to reenable madd/msub for MIPS32 targets in GCC 4. We could do that with a local patch to put back adddi3, but it would be better if we could coordinate this with you. Nigel
Re: [MIPS] MADD issue
Nigel Stephens [EMAIL PROTECTED] writes: I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, mcore, score, arm pa backends still implement adddi3 as either a define_insn which outputs two instructions or an explicit define_expand followed define_split and associated sub patterns. Are we setting the bar too high for MIPS? :) I don't think that follows. The main reason that ports like rs6000, i386, arm, sparc and pa define adddi3 is because those architectures provide special add-with-carry type instructions, or similar architecture-specific optimisations. MIPS has nothing like that. The old MIPS patterns just re-implemented the standard optabs version (and often did so less efficiently, as I said before). Whilst I'm sure that your proposal is the right one going forward, it still feels like it could be significant amount of work to implement. And the simplified optab/expand support would only work for multiply-add sequences expressed within a single expression, and wouldn't be able to optimise disjoint multiply, then add expressions. Or have I missed something. I don't think that follows either. Out-of-ssa should coalesce them if they are in the same basic block. And if they aren't in the same basic block, that's presumably because the tree optimisers think they shouldn't be. Combine wouldn't look across basic block boundaries either AFAIK. E.g. compiling: typedef unsigned long long ull; typedef unsigned int ui; ull foo (ui x, ui y, ull z) { ull tmp = (ull) x * y; return tmp + z; } with a recent snapshot and -O2 -fdump-tree-all shows that the final_cleanup dump does indeed have the combined form: ;; Function foo (foo) foo (x, y, z) { bb 2: return (long long unsigned int) y * (long long unsigned int) x + z; } I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. in the short term we really do need to reenable madd/msub for MIPS32 targets in GCC 4. We could do that with a local patch to put back adddi3, but it would be better if we could coordinate this with you. If removing the patterns had been purely a clean-up, I would be more open to the idea of putting the patterns back. But given that removing the patterns had an optimisation benefit of its own, I'm less open to the idea of adding them back, especially when (as far as I'm concerned) there's a clean way of getting the best of both worlds. Richard
Re: [MIPS] MADD issue
Richard Sandiford [EMAIL PROTECTED] writes: I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. The main issue I know of is the RTL level loop optimizers (combine and CSE can mostly work off of REG_EQUAL notes). If you define_expand adddi3, they won't be able to handle loops using long long types. Whether this matters in practice for real code, I don't know. Certainly adddi3 and friends should not be straight define_insns, as they used to be for MIPS. With the lower-subreg pass, they should be either define_expand to individual insns or define_insn_and_split with an unconditional split before reload. Ian
Re: [MIPS] MADD issue
Ian Lance Taylor [EMAIL PROTECTED] writes: Richard Sandiford [EMAIL PROTECTED] writes: I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. The main issue I know of is the RTL level loop optimizers (combine and CSE can mostly work off of REG_EQUAL notes). If you define_expand adddi3, they won't be able to handle loops using long long types. Whether this matters in practice for real code, I don't know. My point was that I thought the interesting multi-word optimisations (including loop optimisations) ought to be done at the tree level instead, and that the main focus of the RTL optimisers ought to be optimising things after machine-specific information has been exposed. In contrast, the MIPS define_insn define_splits existed specifically to avoid exposing machine-specific information to those optimisations. I'm not sure from your reply whether you disagree (although it sounds like you might). Certainly adddi3 and friends should not be straight define_insns, as they used to be for MIPS. With the lower-subreg pass, they should be either define_expand to individual insns or define_insn_and_split with an unconditional split before reload. Well, the old patterns had define_splits too. I don't think that was really the problem. Richard
Re: [MIPS] MADD issue
Richard Sandiford [EMAIL PROTECTED] writes: Ian Lance Taylor [EMAIL PROTECTED] writes: Richard Sandiford [EMAIL PROTECTED] writes: I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. The main issue I know of is the RTL level loop optimizers (combine and CSE can mostly work off of REG_EQUAL notes). If you define_expand adddi3, they won't be able to handle loops using long long types. Whether this matters in practice for real code, I don't know. My point was that I thought the interesting multi-word optimisations (including loop optimisations) ought to be done at the tree level instead, and that the main focus of the RTL optimisers ought to be optimising things after machine-specific information has been exposed. In contrast, the MIPS define_insn define_splits existed specifically to avoid exposing machine-specific information to those optimisations. I'm not sure from your reply whether you disagree (although it sounds like you might). I suppose I neither agree nor disagree. It's a matter for testing. It's clear that with our present scheme there are loop optimization opportunities at the RTL level in the form of hoisting new loop invariants created by expanding the addressing modes. And, of course, some machines have specific loop instructions which currently can only be handled at the RTL level. However, those should be more or less independent of adddi3. Ian
Re: [MIPS] MADD issue
Ian Lance Taylor [EMAIL PROTECTED] writes: Richard Sandiford [EMAIL PROTECTED] writes: Ian Lance Taylor [EMAIL PROTECTED] writes: Richard Sandiford [EMAIL PROTECTED] writes: I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. The main issue I know of is the RTL level loop optimizers (combine and CSE can mostly work off of REG_EQUAL notes). If you define_expand adddi3, they won't be able to handle loops using long long types. Whether this matters in practice for real code, I don't know. My point was that I thought the interesting multi-word optimisations (including loop optimisations) ought to be done at the tree level instead, and that the main focus of the RTL optimisers ought to be optimising things after machine-specific information has been exposed. In contrast, the MIPS define_insn define_splits existed specifically to avoid exposing machine-specific information to those optimisations. I'm not sure from your reply whether you disagree (although it sounds like you might). I suppose I neither agree nor disagree. It's a matter for testing. It's clear that with our present scheme there are loop optimization opportunities at the RTL level in the form of hoisting new loop invariants created by expanding the addressing modes. And, of course, some machines have specific loop instructions which currently can only be handled at the RTL level. Right. In case my message might be interpreted as saying that we shouldn't have RTL loop optimisers, that wasn't the intention at all. MIPS definitely benefits from them with %hi accesses, etc. It was more the opposite: splitting the instructions early ought to give the RTL optimisers the opportunity to do more things that the tree optimisers simply couldn't. OTOH, trying to give the RTL optimisers the same sort of arithmetic operations that the tree level had seems like going out of our way to make the RTL optimisers repeat work. However, those should be more or less independent of adddi3. Yeah, I hope so. Richard
Re: GCC mini-summit - unicorn with rainbows
10) Eric Christopher reported that Tom Tromey (who was not present) had suggested a new mascot for gcc: a unicorn with rainbows. This was met with general approval, and Eric suggested that everybody e-mail Tom with their comments. I personally would like to see the drawing. On Fri, Apr 20, 2007 at 06:02:46AM -0400, Benjamin Kosnik wrote: This sounds fantastic. Rainbow tattoos, or more like shooting a rainbow out of the uni-horn? I originally proposed the gnu emerging from an egg idea, it was based on the fact that some of us were pronouncing egcs eggs, and suggested that the compiler was being re-born. But we haven't been egcs for a long time, and I no longer like the existing logo much. So I'm fine with the unicorn idea. (I had to leave the summit before this topic came up). Ian, thanks for hosting this, for championing the idea of a free mini-conf, and for writing up what was talked about for the rest of us. Agreed; thanks Ian. I'm sorry to have missed the ice cream. I missed the ice cream, but I did get free lunch in the famous Google cafeteria.
Re: GCC mini-summit - compiling for a particular architecture
On Fri, Apr 20, 2007 at 12:58:39AM -0700, Ollie Wild wrote: Related to this: have you guys ever considered to making the -On flags dependent on the architecture? It came up in a few side conversations. As I understand it, RMS has decreed that the -On optimizations shall be architecture independent. But decrees of this kind from RMS (on purely technical matters) are negotiable. On matters of free software principle, RMS is the law. On technical matters he's (IMHO) just one hacker, though as the original author of gcc he should get respect and a certain amount of deference. If champions of this idea can make the case that the benefits outweigh the costs by a significant factor, it could be considered. But there are considerable costs: paths that get lots of testing are solid; paths that get less testing aren't. If every port uses a different set of optimizations we will see more target-specific bugs that really aren't.
tuples: initial infrastructure
Hi folks. I have some preliminary code laying out the tuples infrastructure as has been documented in the tuples design document. I'd like to get this out for public review sooner than later, to make sure I'm not taking any wrong approaches, and folks know where things are. I'll be reviving the gimple-tuples-branch again, and starting to offload things there. The current plan is to disable the passes that haven't been converted, start with the gimplifier and out-of-ssa, then work towards the middle. I'm currently working on the gimplifier. I also suspect we'll have to write a tuples back to trees pass for the rtl expander, while the expander is converted. Richard, Diego, could y'all take a look at this? A few points... I have currently converted gimplify_return_expr() in the gimplifier to give an idea of how I envision things happening. Trees are converted into a sequence, similar to the way we expand rtl, and the current {pre,post}_q's become sequences that get passed around. Sequences are just a structure with a first and last gimple statement (see struct gs_sequence). At rth's suggestion, all tuple fields `body' are no sequences, instead of trees. I am a little uncomfortable with gimple_statement_structure() where it comes to determining what subcode to use for GS_ASSIGN. Richard had pointed out that it was too convoluted. I simplified things a bit, but it seems the underlying problem may be that we may be overloading GS_ASSIGN too much. I don't know. Suggestions welcome. This is just a general idea to make sure we're on the same page, or at least the same chapter. Aldy Index: gengtype.c === --- gengtype.c (revision 123746) +++ gengtype.c (working copy) @@ -1534,7 +1534,7 @@ open_base_files (void) hard-reg-set.h, basic-block.h, cselib.h, insn-addr.h, optabs.h, libfuncs.h, debug.h, ggc.h, cgraph.h, tree-flow.h, reload.h, cpp-id-data.h, tree-chrec.h, - cfglayout.h, except.h, output.h, NULL + cfglayout.h, except.h, output.h, gimple-ir.h, NULL }; const char *const *ifp; outf_p gtype_desc_c; Index: tree-gimple.h === --- tree-gimple.h (revision 123746) +++ tree-gimple.h (working copy) @@ -24,6 +24,7 @@ Boston, MA 02110-1301, USA. */ #include tree-iterator.h +#include gimple-ir.h extern tree create_tmp_var_raw (tree, const char *); extern tree create_tmp_var_name (const char *); @@ -110,13 +111,13 @@ enum gimplify_status { GS_ALL_DONE = 1 /* The expression is fully gimplified. */ }; -extern enum gimplify_status gimplify_expr (tree *, tree *, tree *, +extern enum gimplify_status gimplify_expr (tree *, gs_seq, tree *, tree *, bool (*) (tree), fallback_t); extern void gimplify_type_sizes (tree, tree *); extern void gimplify_one_sizepos (tree *, tree *); extern void gimplify_stmt (tree *); extern void gimplify_to_stmt_list (tree *); -extern void gimplify_body (tree *, tree, bool); +extern void gimplify_body (tree *, gs_seq, tree, bool); extern void push_gimplify_context (void); extern void pop_gimplify_context (tree); extern void gimplify_and_add (tree, tree *); Index: gsstruct.def === --- gsstruct.def(revision 0) +++ gsstruct.def(revision 0) @@ -0,0 +1,55 @@ +/* This file contains the definitions for the gimple IR structure + enumeration used in GCC. + + Copyright (C) 2007 Free Software Foundation, Inc. + Contributed by Aldy Hernandez [EMAIL PROTECTED] + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 2, or (GSS_at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING. If not, write to the Free +Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301, USA. */ + +/* The format of this file is + DEFGSSTRUCT(GSS_enumeration value, printable name). + Each enum value should correspond with a single member of the union + gimple_statement_d. + */ + +DEFGSSTRUCT(GSS_BASE, base) +DEFGSSTRUCT(GSS_WITH_OPS, with_ops) +DEFGSSTRUCT(GSS_WITH_MEM_OPS, with_mem_ops) +DEFGSSTRUCT(GSS_OMP, omp) + +DEFGSSTRUCT(GSS_BIND, bind) +DEFGSSTRUCT(GSS_CATCH, catch) +DEFGSSTRUCT(GSS_EH_FILTER, eh_filter) +DEFGSSTRUCT(GSS_LABEL, label) +DEFGSSTRUCT(GSS_PHI, phi) +DEFGSSTRUCT(GSS_RESX, resx) +DEFGSSTRUCT(GSS_TRY, try) +DEFGSSTRUCT(GSS_ASSIGN_BINARY, assign_binary)
GCC mini-summit - Patch tracker
We discussed the patch tracker. None of the active maintainers who were there appear to use it very much or at all. This is because it does not enable them to easily review patches, only to see which they have missed ;) I proposed automatic e-mail pings, but that wasn't generally welcomed. Richard Henderson suggested that the patch tracker should have a way to quickly and easily approve a patch. A couple of people said it would be nice to be able to use the tracker to send an e-mail reply; it does provide a mechanism for that, but it is web-based rather than being part of one's usual mail reader. One possibility mentioned was to have a button for the patch tracker to send the maintainer a copy of the e-mail, which would be a convenient way to write the reply in a preferred mail reader. I am happy to do either (let it approve patches or email) For email, I could provide a nice interface that displayed the actual message/diff of the patch in question, properly quoted into a text box where one could add their comments. For approving directly, I would probably need to add usernames and passwords. If people are okay with this, I have no problem implementing it.
Re: gcc preprocessor
Ok can you tell me what directives does it provide to do what I have said . And I am not a beginner to gcc. 1. Repeating a block a certain number of times for example repeat expr foo() end Then you can call expr 5 to have foo called 5 times. 2. Multiline macros with new lines Using #defines with escape characters is very clumsy and it just concatenates it into one line as far as I understand. Please correct me if that is wrong. 3. Setting a symbolic constant inside a #define for example #set a i+1 foo(a) Thanks for any help dz On 4/18/07, Joe Buck [EMAIL PROTECTED] wrote: On Wed, Apr 18, 2007 at 09:07:07PM -0400, drizzle drizzle wrote: Can some one tell me if gcc preprocessor can support in some way the following features You are asking a beginner C programming question. gcc's preprocessor does what standard C preprocessors do.
RE: gcc preprocessor
On 20 April 2007 18:28, drizzle drizzle wrote: Ok can you tell me what directives does it provide to do what I have said . And I am not a beginner to gcc. Then you should have RTFMd by now. cheers, DaveK -- Can't think of a witty .sigline today
Re: GCC mini-summit - Patch tracker
Ian I proposed automatic e-mail pings, but that wasn't generally Ian welcomed. Bummer. Why? Dan If people are okay with this, I have no problem implementing it. If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. Alternatively, filtering by regex would work just as well for me. Tom
RE: GCC mini-summit - Patch tracker
On 20 April 2007 18:43, Tom Tromey wrote: Ian I proposed automatic e-mail pings, but that wasn't generally Ian welcomed. Bummer. Why? Dan If people are okay with this, I have no problem implementing it. If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. Heh. Guilty as charged. Alternatively, filtering by regex would work just as well for me. Or just suggesting a list of canonical names for people to use. cheers, DaveK -- Can't think of a witty .sigline today
Re: GCC mini-summit - Patch tracker
Dave == Dave Korn [EMAIL PROTECTED] writes: If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. Dave Heh. Guilty as charged. Sorry, wasn't trying to single anybody out. The problem perhaps can't be solved on the submission end since people forget, there are misspellings, etc. Alternatively, filtering by regex would work just as well for me. Dave Or just suggesting a list of canonical names for people to use. That would also help. Tom
Re: gcc preprocessor
On Fri, Apr 20, 2007 at 01:27:48PM -0400, drizzle drizzle wrote: Ok can you tell me what directives does it provide to do what I have said . And I am not a beginner to gcc. The answer is that gcc provides what the C standard specifies and nothing more. You appear to want a more complicated macro-processing language. You'll need to look elsewhere; m4 is one possibility, though I'm not a fan of it.
Re: GCC mini-summit - Patch tracker
On 20 Apr 2007 11:42:57 -0600, Tom Tromey [EMAIL PROTECTED] wrote: Ian I proposed automatic e-mail pings, but that wasn't generally Ian welcomed. Bummer. Why? Dan If people are okay with this, I have no problem implementing it. If you're taking feature requests, it would be handy to canonize the Area field somehow. I was filtering based on preprocessor and then yesterday noticed things filed against libcpp and cpp. I may just have it list all the maintenance areas. Alternatively, filtering by regex would work just as well for me. It *is* a regex :) Tom
gcc-4.3-20070420 is now available
Snapshot gcc-4.3-20070420 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20070420/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 124004 You'll find: gcc-4.3-20070420.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-20070420.tar.bz2 C front end and core compiler gcc-ada-4.3-20070420.tar.bz2 Ada front end and runtime gcc-fortran-4.3-20070420.tar.bz2 Fortran front end and runtime gcc-g++-4.3-20070420.tar.bz2 C++ front end and runtime gcc-java-4.3-20070420.tar.bz2 Java front end and runtime gcc-objc-4.3-20070420.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-20070420.tar.bz2The GCC testsuite Diffs from 4.3-20070413 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: gcc preprocessor
Any developer sense on what it might take to extend the gcc preprocessor to do these ? I have some experience with gcc front end. I am especially keen abt multiline macros, so that the lines can be on separate lines. Any neat trick that can accomplish this by using #define ? dz On 4/20/07, Joe Buck [EMAIL PROTECTED] wrote: On Fri, Apr 20, 2007 at 01:27:48PM -0400, drizzle drizzle wrote: Ok can you tell me what directives does it provide to do what I have said . And I am not a beginner to gcc. The answer is that gcc provides what the C standard specifies and nothing more. You appear to want a more complicated macro-processing language. You'll need to look elsewhere; m4 is one possibility, though I'dm not a fan of it.
Re: gcc preprocessor
drizzle drizzle wrote: Any developer sense on what it might take to extend the gcc preprocessor to do these ? I have some experience with gcc front end. I am especially keen abt multiline macros, so that the lines can be on separate lines. Any neat trick that can accomplish this by using #define ? The standard answer applies here: Use emacs as your editor. You don't hack up the compiler to work around deficiencies in your source code editor. David Daney
Re: gcc preprocessor
drizzle drizzle wrote: Any developer sense on what it might take to extend the gcc preprocessor to do these ? I have some experience with gcc front end. I am especially keen abt multiline macros, so that the lines can be on separate lines. Any neat trick that can accomplish this by using #define ? as was said in the answer, the intention is that the gcc preprocessor should provide what is required by the standard, nothing more, nothing less. If you want more, then follow the advice of looking at general purpose macro languages.
Re: GCC mini-summit - compiling for a particular architecture
Hello, Steve Ellcey wrote: This seems unfortunate. I was hoping I might be able to turn on loop unrolling for IA64 at -O2 to improve performance. I have only started looking into this idea but it seems to help performance quite a bit, though it is also increasing size quite a bit too so it may need some modification of the unrolling parameters to make it practical. To me it is obvious that optimizations are target dependent. For instance loop unrolling is really a totally different optimization on the ia64 as a result of the rotating registers. that we do not use. Nevertheless, there are still compelling reasons for why unrolling is more useful on ia64 then on other architectures (importance of scheduling, insensitivity to code size growth). Another option would be to consider enabling (e.g.) -funroll-loops -fprefetch-loop-arrays by default on -O3. I think it is fairly rare for these flags to cause performance regressions (although of course more measurements to support this claim would be necessary). Zdenek
Re: GCC mini-summit - compiling for a particular architecture
Zdenek Dvorak wrote: Hello, Steve Ellcey wrote: This seems unfortunate. I was hoping I might be able to turn on loop unrolling for IA64 at -O2 to improve performance. I have only started looking into this idea but it seems to help performance quite a bit, though it is also increasing size quite a bit too so it may need some modification of the unrolling parameters to make it practical. To me it is obvious that optimizations are target dependent. For instance loop unrolling is really a totally different optimization on the ia64 as a result of the rotating registers. that we do not use. Right but we might in the future Nevertheless, there are still compelling reasons for why unrolling is more useful on ia64 then on other architectures (importance of scheduling, insensitivity to code size growth). And large number of registers. Another option would be to consider enabling (e.g.) -funroll-loops -fprefetch-loop-arrays by default on -O3. I think it is fairly rare for these flags to cause performance regressions (although of course more measurements to support this claim would be necessary). Well unroll loops blows up code size, so it has to have positive value, not merely no negative value :-) Zdenek
Re: GCC mini-summit - compiling for a particular architecture
Diego Novillo wrote: H. J. Lu wrote on 04/20/07 21:30: -fprefetch-loop-arrays shouldn't be on by default since HW prefetch usually will have negative performance impact on Intel. We are talking about one specific architecture where it usually helps: ia64. Right, but the follow on discussion was *if* we have to have the same set of optimization options for all architectures, *then* could we consider turning on loop unrolling by default. One possibility would be to have a -Om switch (or whatever) that says do all optimizations for this machine that help. I must say the rule about all optimizations being the same on all machines seems odd to me in another respect. What if you have an optimziation that applies *only* to one machine (ia64 has a number of such possibilities!)
Re: GCC mini-summit - compiling for a particular architecture
H. J. Lu wrote on 04/20/07 21:30: -fprefetch-loop-arrays shouldn't be on by default since HW prefetch usually will have negative performance impact on Intel. We are talking about one specific architecture where it usually helps: ia64.
Re: GCC mini-summit - compiling for a particular architecture
Robert Dewar wrote on 04/20/07 21:42: One possibility would be to have a -Om switch (or whatever) that says do all optimizations for this machine that help. I think this is a good compromise. I personally don't think we should limit ourselves to doing the exact same optimizations across all architectures, but I can see why some people may find that useful.
Re: GCC mini-summit - compiling for a particular architecture
On Apr 20, 2007, at 6:42 PM, Robert Dewar wrote: One possibility would be to have a -Om switch (or whatever) that says do all optimizations for this machine that help. Ick, gross. No. I must say the rule about all optimizations being the same on all machines seems odd to me I'd look at it this way, it isn't unreasonable to have cost metrics that are in fact different for each cpu and possible each tune choice that greatly effects _any_ codegen choice. Sure, we can unroll the loops always on all targets, but, we can crank up the costs of extra instructions on chips where those costs are high, net result, almost no unrolling. For chips where the costs are cheap and they need to exposed instructions to be able to optimizer further, trivially, the costs involved are totally different. Net result, better code gen for each. I do however think the concept of not allowing targets to set and unset optimization choices is, well, overly pedantic.
[Bug libfortran/27740] libgfortran should use versioned symbols
--- Comment #13 from patchapp at dberlin dot org 2007-04-20 07:45 --- Subject: Bug number PR 27740 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01253.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27740
[Bug libfortran/27740] libgfortran should use versioned symbols
--- Comment #14 from jb at gcc dot gnu dot org 2007-04-20 07:45 --- Patch here: http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01253.html Also removed the dependency on PR25709 (ISO_C_BINDING), since we don't depend on that one anymore for this functionality. -- jb at gcc dot gnu dot org changed: What|Removed |Added BugsThisDependsOn|25709 | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27740
[Bug target/29826] __attribute__ dllimport makes optimization crash on cygwin
--- Comment #14 from dannysmith at users dot sourceforge dot net 2007-04-20 07:49 --- (In reply to comment #13) I'm going to try again since it seems my last post was just ignored. The test case works fine on 4.2 but it still occurs under some circumstances. If you provide preprocessed source that reproduces the circumstances , as suggested, I will not ignore it. Danny -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29826
[Bug target/31403] wrong branch instructions generated with -m2a on sh-elf
--- Comment #2 from kkojima at gcc dot gnu dot org 2007-04-20 08:45 --- A binary search shows that this started to fail from the revision 123295 * config/sh/sh.md (movsi_ie): Fix memory constraints attribute length. I'd like to add Christian to the cc list because he must be interested in this issue. movsi_ie is used also for SH2A and I didn't know that SH2A has 4-byte move instructions like mov.l reg,@(12-bit_disp,reg'). An easy fix would be the patch below, though I can't test it until the other bootstrap/regtest cycles end up. --- ORIG/trunk/gcc/config/sh/sh.md 2007-03-29 08:44:33.0 +0900 +++ LOCAL/trunk/gcc/config/sh/sh.md 2007-04-19 20:36:20.0 +0900 @@ -4968,7 +4968,36 @@ label: ! move optimized away [(set_attr type pcload_si,move,movi8,move,*,load_si,mac_gp,prget,arith,store,mac_mem,pstore,gp_mac,prset,mem_mac,pload,load,fstore,pcload_si,gp_fpul,fpul_gp,fmove,fmove,fmove,nil) (set_attr late_fp_use *,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes,*,*,yes,*,*,*,*) - (set_attr length *,*,*,4,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,0)]) + (set_attr_alternative length + [(const_int 2) + (const_int 2) + (const_int 2) + (const_int 4) + (const_int 2) + (if_then_else + (ne (symbol_ref TARGET_SH2A) (const_int 0)) + (const_int 4) (const_int 2)) + (const_int 2) + (const_int 2) + (const_int 2) + (if_then_else + (ne (symbol_ref TARGET_SH2A) (const_int 0)) + (const_int 4) (const_int 2)) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 2) + (const_int 0)])]) (define_insn movsi_i_lowpart [(set (strict_low_part (match_operand:SI 0 general_movdst_operand +r,r,r,r,r,r,r,m,r)) -- kkojima at gcc dot gnu dot org changed: What|Removed |Added CC||kkojima at gcc dot gnu dot ||org, christian dot bruel at ||st dot com Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-04-20 08:45:54 date|| Summary|Problem while compiling gcc |wrong branch instructions |for sh-elf |generated with -m2a on sh- ||elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31403
[Bug libstdc++/31638] [4.0/4.1/4.2/4.3 Regression] string usage leads to warning with -Wcast-align
--- Comment #2 from pcarlini at suse dot de 2007-04-20 08:50 --- I will check, but I don't think this can possibly happen in mainline, after we fixed c++/30500. Otherwise, the underlying issue is libstdc++/19495, of course. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31638
[Bug target/31640] New: cache align alignment is too aggressive on sh-elf
The sh4 port aligns blocks that have no fallthrus and that are either frequently executed (JUMP_ALIGN) or preceeded a barrier (LABEL_ALIGN_AFTER_BARRIER) on a cache line. While in theory this help to avoid cache misses if the block slits over 2 cache lines, in practise this reduces cache locality and lenghten distance between blocks. The number of issued instructions are also impacted. For example the relative indirect address in jump tables needs a byte zero extend instruction if the distance occupies 8 bits instead of 7 bits. I ran some experiments and benchmarked (eembc) with 2 strategies 1) -falign-jumps=1 2) Align the block if the size is bigger than a given threshold. (empirically set to 16 bytes, half of the cache line size). See illustrating attached patch. My conclusion is that in -O3 the performance never degrades (option 2 is a little bit better, even improving dhrystone by 3%) when removing this padding. And the text size improves by ~15%. So I was not able to measurate the benefit of the cache line padding although the code size impact is big (even in -O2/-O3 a code size bloat should be motivated by some performance improvement). Is there a motivating test that justifies this microoptimisation ? In the illustrating patch I still align the basic blocks on 4-bytes to account for better instruction fetch accesses -- Summary: cache align alignment is too aggressive on sh-elf Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org GCC target triplet: sh-superh-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
[Bug target/31640] cache block alignment is too aggressive on sh-elf
-- chrbr at gcc dot gnu dot org changed: What|Removed |Added Severity|normal |minor Summary|cache align alignment is too|cache block alignment is too |aggressive on sh-elf|aggressive on sh-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
[Bug tree-optimization/31632] [4.1 regression] ICE in compare_values
--- Comment #4 from jakub at gcc dot gnu dot org 2007-04-20 12:40 --- Subject: Bug 31632 Author: jakub Date: Fri Apr 20 12:40:47 2007 New Revision: 123988 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=123988 Log: PR tree-optimization/31632 * fold-const.c (fold_binary): Use op0 and op1 instead of arg0 and arg1 for optimizations of comparison against min/max values. Fold arg0 to arg1's type for optimizations of comparison against min+1 and max-1 values. * gcc.c-torture/compile/20070419-1.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/compile/20070419-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31632
[Bug tree-optimization/31602] Overflow warning causes GDB -Werror build failure
--- Comment #3 from ian at airs dot com 2007-04-20 16:17 --- Created an attachment (id=13394) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13394action=view) Proposed patch This patch fixes the test case in the PR. I am testing it. It would be interesting to hear whether it also fixes the actual code in gdb. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31602
[Bug target/31640] cache block alignment is too aggressive on sh-elf
--- Comment #1 from chrbr at gcc dot gnu dot org 2007-04-20 14:13 --- Created an attachment (id=13391) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13391action=view) Illustrative patch to not align small basic blocks I used this patch to reduce the number of basic blocks aligned on cache-lines. My choice was not to align blocks less than 16 bytes (also tried 32 bytes) seems to give the best results. Note than never aligning doesn't degrade eebmc perfs (similar to -falign-jumps=1) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
[Bug target/31641] [4.1/4.2/4.3 Regression] ICE in s390_expand_setmem, at config/s390/s390.c:3618
--- Comment #1 from tbm at cyrius dot com 2007-04-20 15:25 --- Created an attachment (id=13392) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13392action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31641
[Bug target/31641] New: [4.1/4.2/4.3 Regression] ICE in s390_expand_setmem, at config/s390/s390.c:3618
[ Forwarded from http://bugs.debian.org/395533 ] We're seeing the following ICE on s390. This is with gcc 4.1, but I can also reproduce it with current 4.3 from SVN. I haven't tried 4.0, but 3.4 works. raptor% /usr/lib/gcc-snapshot/bin/g++ -c -O min4.c min4.c: In member function 'void StringTest::constructor()': min4.c:64: internal compiler error: in s390_expand_setmem, at config/s390/s390.c:3618 Please submit a full bug report, -- Summary: [4.1/4.2/4.3 Regression] ICE in s390_expand_setmem, at config/s390/s390.c:3618 Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tbm at cyrius dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31641
[Bug tree-optimization/29789] Missed invariant out of the loop with conditionals and shifts
--- Comment #5 from rguenth at gcc dot gnu dot org 2007-04-20 15:33 --- Note that this does fix the loop invariant motion only in the case of two ifs can be merged (because that re-instantiates the A (1 B) form). The following parts are still not resolved: void quantum_cnot(int control, int target, unsigned long *state, int size) { int i; for(i=0; isize; i++) { if (state[i] ((unsigned long) 1 control)) state[i] ^= ((unsigned long) 1 target); } } (and more similar loops in libquantum). It would be nice if rtl loop-invariant motion could detect this form: (insn 23 22 24 4 (parallel [ (set (reg:DI 67) (lshiftrt:DI (reg:DI 62 [ D.1992 ]) (subreg:QI (reg/v:SI 63 [ control ]) 0))) (clobber (reg:CC 17 flags)) ]) 470 {*lshrdi3_1_rex64} (nil) (nil)) (insn 24 23 25 4 (parallel [ (set (reg:SI 68) (and:SI (subreg:SI (reg:DI 67) 0) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) 301 {*andsi_1} (nil) (nil)) and move the invariant (1 control). It does move the (1 target) which looks like (insn 30 28 31 5 (set (reg:DI 70) (const_int 1 [0x1])) 82 {*movdi_1_rex64} (nil) (nil)) (insn 31 30 32 5 (parallel [ (set (reg:DI 69) (ashift:DI (reg:DI 70) (subreg:QI (reg/v:SI 64 [ target ]) 0))) (clobber (reg:CC 17 flags)) ]) 411 {*ashldi3_1_rex64} (nil) (expr_list:REG_EQUAL (ashift:DI (const_int 1 [0x1]) (subreg:QI (reg/v:SI 64 [ target ]) 0)) (nil))) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29789
[Bug c/31642] New: -O2 -fno-guess-branch-probability -fno-tree-ch -fno-tree-dominator-opts -fno-tree-lrs -fno-tree-dce -fno-tree-vrp -funit-at-a-time -ftree-copy-prop -ftree-copyrename yields an infi
Using -O2 -fno-guess-branch-probability -fno-tree-ch -fno-tree-dominator-opts -fno-tree-lrs -fno-tree-dce -fno-tree-vrp -funit-at-a-time -ftree-copy-prop -ftree-copyrename on the SPEC CPU2000 benchmark 176.gcc yields an infinite loop (using the train input which usually runs for 2s runs for hours). This is confirmed on Fedora Core 4 / x86 (Intel Pentium 4), and also on an linux/amd64 for which I have no details. Removing any of the -fno-X flags seems to solve the issue, and adding -ffloat-store doesn't change a thing, which suggests this is not floating-point (convergence) related. -- Summary: -O2 -fno-guess-branch-probability -fno-tree-ch -fno- tree-dominator-opts -fno-tree-lrs -fno-tree-dce -fno- tree-vrp -funit-at-a-time -ftree-copy-prop -ftree- copyrename yields an infinite loop in SPEC benchmark 176.gcc Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: kenneth dot hoste at elis dot ugent dot be GCC host triplet: i686-Linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31642
[Bug tree-optimization/29789] Missed invariant out of the loop with conditionals and shifts
--- Comment #6 from rakdver at atrey dot karlin dot mff dot cuni dot cz 2007-04-20 16:09 --- Subject: Re: Missed invariant out of the loop with conditionals and shifts void quantum_cnot(int control, int target, unsigned long *state, int size) { int i; for(i=0; isize; i++) { if (state[i] ((unsigned long) 1 control)) state[i] ^= ((unsigned long) 1 target); } } (and more similar loops in libquantum). It would be nice if rtl loop-invariant motion could detect this form: (insn 23 22 24 4 (parallel [ (set (reg:DI 67) (lshiftrt:DI (reg:DI 62 [ D.1992 ]) (subreg:QI (reg/v:SI 63 [ control ]) 0))) (clobber (reg:CC 17 flags)) ]) 470 {*lshrdi3_1_rex64} (nil) (nil)) (insn 24 23 25 4 (parallel [ (set (reg:SI 68) (and:SI (subreg:SI (reg:DI 67) 0) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) 301 {*andsi_1} (nil) (nil)) and move the invariant (1 control). how exactly do you imagine this transformation should work? The insns you show essentially are tmp = x control; z = tmp 1; I do not see how to transform just this pattern profitably (without also seeing that z is only used in condition). I could hack something in, however handling just this single pattern specially seems a bit weird to me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29789
[Bug target/31640] cache block alignment is too aggressive on sh-elf
--- Comment #2 from chrbr at gcc dot gnu dot org 2007-04-20 15:51 --- Created an attachment (id=13393) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13393action=view) testcase for new instruction introduced by increased distance In this example, the max distance between the jump table and the cases is artificially augmented by the padding. Although each basic block is very small and has very few chances to spread over several cache blocks. In addition the extu.b r1,r1 instruction can be avoided. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
[Bug tree-optimization/29789] Missed invariant out of the loop with conditionals and shifts
--- Comment #4 from rguenth at gcc dot gnu dot org 2007-04-20 13:51 --- Mine. I have a tree if-conversion patch that produces L0:; D.1993 = MEM[base: state, index: ivtmp.32, step: 8]; if (pretmp.25 == (pretmp.25 D.1993)) goto L1; else goto L3; L1:; MEM[base: state, index: ivtmp.32, step: 8] = 1 target ^ D.1993; L3:; ivtmp.32 = ivtmp.32 + 1; if (size (int) ivtmp.32) goto L0; else goto L6; and finally .L4: movq(%r11,%r9,8), %rdi movq%rsi, %rax andq%rdi, %rax cmpq%rax, %rsi jne .L5 xorq%rdx, %rdi movq%rdi, (%r11,%r9,8) .L5: addq$1, %r9 cmpl%r9d, %r8d jg .L4 for the inner loop. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2006-11-10 01:55:58 |2007-04-20 13:51:52 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29789
[Bug tree-optimization/31632] [4.1 regression] ICE in compare_values
--- Comment #5 from jakub at gcc dot gnu dot org 2007-04-20 12:46 --- Subject: Bug 31632 Author: jakub Date: Fri Apr 20 12:46:06 2007 New Revision: 123990 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=123990 Log: PR tree-optimization/31632 * fold-const.c (fold_binary): Use op0 and op1 instead of arg0 and arg1 for optimizations of comparison against min/max values. Fold arg0 to arg1's type for optimizations of comparison against min+1 and max-1 values. * gcc.c-torture/compile/20070419-1.c: New test. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.c-torture/compile/20070419-1.c Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/fold-const.c branches/gcc-4_2-branch/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31632
[Bug tree-optimization/31632] [4.1 regression] ICE in compare_values
--- Comment #6 from jakub at gcc dot gnu dot org 2007-04-20 12:49 --- Subject: Bug 31632 Author: jakub Date: Fri Apr 20 12:49:37 2007 New Revision: 123992 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=123992 Log: PR tree-optimization/31632 * fold-const.c (fold_binary): Use op0 and op1 instead of arg0 and arg1 for optimizations of comparison against min/max values. Fold arg0 to arg1's type for optimizations of comparison against min+1 and max-1 values. * gcc.c-torture/compile/20070419-1.c: New test. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.c-torture/compile/20070419-1.c Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/fold-const.c branches/gcc-4_1-branch/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31632
[Bug tree-optimization/29789] Missed invariant out of the loop with conditionals and shifts
--- Comment #7 from rguenth at gcc dot gnu dot org 2007-04-20 16:59 --- I posted a patch for the tree level im instead to handle this special case right after the reciprocal special case. I agree it's a special case, but as it happens in spec 2k6... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29789
[Bug target/29826] __attribute__ dllimport makes optimization crash on cygwin
--- Comment #15 from arcangelpip at hotmail dot com 2007-04-20 17:06 --- Don't you just hate idiots in these cases. (Yes, I am referring to myself here) Well, it's completely broken again on my system, look here. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31636 I apologise for being less than helpful here. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29826
[Bug java/31622] Segment violation in the #8220;toString#8221; method on a mathematical expression
--- Comment #4 from tromey at gcc dot gnu dot org 2007-04-20 17:39 --- Please post the output of running gcj -C -v Fail.java Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31622
[Bug middle-end/31642] -O2 -fno-guess-branch-probability -fno-tree-ch -fno-tree-dominator-opts -fno-tree-lrs -fno-tree-dce -fno-tree-vrp -funit-at-a-time -ftree-copy-prop -ftree-copyrename yields an i
--- Comment #1 from pinskia at gcc dot gnu dot org 2007-04-20 18:24 --- We need a testcase. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org Status|UNCONFIRMED |WAITING Component|c |middle-end http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31642
[Bug c++/17000] parse error calling member template function of non-lvalue from within template class member
--- Comment #10 from ddneilson at gmail dot com 2007-04-20 18:51 --- (In reply to comment #9) (In reply to comment #8) The work around is doing: get_a().template func2 int(); Which tells the compiler for sure func2 is a template. Thanks, yeh I figured that out just now. Should it happen like this though? Surely the compiler should be able to work out it's a template and therefore not need the qualification? I was just about to report a similar bug, but it was also fixed by adding a template qualifier to the function call. However, when studying the bug I ran my test case through the xlC compiler. It compiles get_a().func2int(); just fine; it doesn't need get_a().template func2int(); syntax. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17000
[Bug c/31642] -O2 -fno-guess-branch-probability -fno-tree-ch -fno-tree-dominator-opts -fno-tree-lrs -fno-tree-dce -fno-tree-vrp -funit-at-a-time -ftree-copy-prop -ftree-copyrename yields an infinite l
--- Comment #2 from kenneth dot hoste at elis dot ugent dot be 2007-04-20 19:24 --- (In reply to comment #1) We need a testcase. I would provide one if I knew how, I'm quite new at this... Any pointers? -- kenneth dot hoste at elis dot ugent dot be changed: What|Removed |Added Component|middle-end |c http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31642
[Bug c++/31498] [4.1/4.2/4.3 Regression] ICE with vector initializer in template
--- Comment #1 from janis at gcc dot gnu dot org 2007-04-20 19:35 --- This starts passing for me between 2007-03-10 and 2007-03-20. Andrew, if it fails for you with a later mainline than that, perhaps it's an intermittent failure rather than a regression. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31498
[Bug tree-optimization/31602] Overflow warning causes GDB -Werror build failure
--- Comment #4 from drow at gcc dot gnu dot org 2007-04-20 20:04 --- Subject: Re: Overflow warning causes GDB -Werror build failure On Fri, Apr 20, 2007 at 03:17:19PM -, ian at airs dot com wrote: This patch fixes the test case in the PR. I am testing it. It would be interesting to hear whether it also fixes the actual code in gdb. With this patch I can successfully build GDB for mips-linux - thanks! (barring PR31605). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31602
[Bug c++/31498] [4.1 Regression] ICE with vector initializer in template
--- Comment #2 from pinskia at gcc dot gnu dot org 2007-04-20 20:30 --- I was testing at the time 4.3.0 20070306 and I tested with yesterday's trunk and it passes. It also works on the 4.2 branch as of 4.2.0 20070415. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Known to fail|3.4.0 4.0.2 4.1.0 4.2.0 |3.4.0 4.0.2 4.1.0 4.2.0 |4.3.0 | Known to work|3.3.3 |3.3.3 4.3.0 Summary|[4.1/4.2/4.3 Regression] ICE|[4.1 Regression] ICE with |with vector initializer in |vector initializer in |template|template http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31498
[Bug target/28623] [4.1/4.2/4.3 regression] ICE in extract_insn, at recog.c:2077 (nrecognizable insn) [alpha]
--- Comment #4 from rth at gcc dot gnu dot org 2007-04-20 20:36 --- Subject: Bug 28623 Author: rth Date: Fri Apr 20 20:35:55 2007 New Revision: 124002 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124002 Log: PR target/28623 * config/alpha/alpha.c (get_unaligned_address): Remove extra_offset argument; update all callers. (get_unaligned_offset): New. * config/alpha/alpha.md (extendqidi2, extendhidi2): Don't use get_unaligned_address, just pass on the address directly. (unaligned_extendqidi): Use gen_lowpart instead of open-coding the subreg in the helper patterns. (unaligned_extendqidi_le): Use get_unaligned_offset. (unaligned_extendqidi_be, unaligned_extendhidi_le): Likewise. (unaligned_extendhidi_be): Likewise. (unaligned_extendhidi): Tidy. * config/alpha/alpha-protos.h: Update. Modified: trunk/gcc/ChangeLog trunk/gcc/config/alpha/alpha-protos.h trunk/gcc/config/alpha/alpha.c trunk/gcc/config/alpha/alpha.md -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28623
[Bug middle-end/31642] -O2 -fno-guess-branch-probability -fno-tree-ch -fno-tree-dominator-opts -fno-tree-lrs -fno-tree-dce -fno-tree-vrp -funit-at-a-time -ftree-copy-prop -ftree-copyrename yields an i
--- Comment #3 from kenneth dot hoste at elis dot ugent dot be 2007-04-20 20:51 --- identifying source code lines corresponding to infinite loop using GDB: (gdb) backtrace #0 regclass (f=0x9ac29f4, nregs=71) at regclass.c:732 #1 0x08065d5c in rest_of_compilation (decl=0x9b4526c) at toplev.c:3056 #2 0x0805713b in finish_function (nested=0) at c-decl.c:6791 #3 0x08048a25 in yyparse () at c-parse.c:1684 #4 0x08066a83 in compile_file (name=Variable name is not available. ) at toplev.c:2227 #5 0x080681a3 in main (argc=4, argv=0xbf945ea4) at toplev.c:3921 (gdb) n 736 else if ((GET_CODE (insn) == INSN (gdb) n 717 for (insn = f; insn; insn = NEXT_INSN (insn)) (gdb) n 729 if (GET_CODE (insn) == NOTE (gdb) n 732 else if (GET_CODE (insn) == NOTE (gdb) n 736 else if ((GET_CODE (insn) == INSN (gdb) n 717 for (insn = f; insn; insn = NEXT_INSN (insn)) (gdb) backtrace #0 regclass (f=0x9ac29f4, nregs=71) at regclass.c:717 #1 0x08065d5c in rest_of_compilation (decl=0x9b4526c) at toplev.c:3056 #2 0x0805713b in finish_function (nested=0) at c-decl.c:6791 #3 0x08048a25 in yyparse () at c-parse.c:1684 #4 0x08066a83 in compile_file (name=Variable name is not available. ) at toplev.c:2227 #5 0x080681a3 in main (argc=4, argv=0xbf945ea4) at toplev.c:3921 (gdb) The program seems stuck in the for-loop at line 717. The value for insn seems to be cycling (it's taking different values, but cycles back to a previous value). Any pointers on how to reduce this further are welcome... Since this is (pretty old though) GCC code (the 176.gcc corresponds to a 2.x version of GCC I believe), maybe someone can shed soem light on this? I'm not at all familiar with GCCs insides... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31642
[Bug fortran/30285] gfortran excessive memory usage with large modules
--- Comment #12 from bdavis at gcc dot gnu dot org 2007-04-20 20:56 --- i can confirm the attached patch is wrong. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30285
[Bug rtl-optimization/323] optimized code gives strange floating point results
--- Comment #96 from David dot Monniaux at ens dot fr 2007-04-20 21:19 --- The following paper explains how this kind of behaviour occurs, why it is correct, why it is difficult to fix but how it can be partly fixed, and how this breaks many testing and proving techniques: http://hal.archives-ouvertes.fr/hal-00128124 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
[Bug rtl-optimization/17387] Redundant instructions in loop optimization
--- Comment #20 from steven at gcc dot gnu dot org 2007-04-20 21:58 --- It is my intention to fix see.c to work on x86* hardware, so I'm taking this bug. -- steven at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |steven at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2006-01-15 20:30:45 |2007-04-20 21:58:36 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17387
[Bug rtl-optimization/17387] Redundant instructions in loop optimization
--- Comment #21 from steven at gcc dot gnu dot org 2007-04-20 22:10 --- Collection of important related links: http://gcc.gnu.org/ml/gcc-patches/2006-04/msg00766.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27437#c5 -- steven at gcc dot gnu dot org changed: What|Removed |Added BugsThisDependsOn||27437 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17387
[Bug fortran/31587] Module files shouldn't be updated if their content doesn't change
--- Comment #12 from fxcoudert at gcc dot gnu dot org 2007-04-20 22:13 --- (In reply to comment #11) Time to CC Janis? No need. There's nothing a bit of trial-and-error can't help you write :) The following adds the necessary dejagnu directive, and uses it in a new test. I guess the MD5 sum should be the same on all platforms, so it's a good test, although it means the test will need updating every time someone changes the format of module files :) Index: lib/gcc-dg.exp === --- lib/gcc-dg.exp (revision 123942) +++ lib/gcc-dg.exp (working copy) @@ -455,6 +455,24 @@ } } +# Scan Fortran modules for a given regexp. +# +# Argument 0 is the module name +# Argument 1 is the regexp to match +proc scan-module { args } { +set modfilename [string tolower [lindex $args 0]].mod +set fd [open $modfilename r] +set text [read $fd] +close $fd + +upvar 2 name testcase +if [regexp -- [lindex $args 1] $text] { + pass $testcase scan-module [lindex $args 1] +} else { + fail $testcase scan-module [lindex $args 1] +} +} + # Verify that the compiler output file exists, invoked via dg-final. proc output-exists { args } { # Process an optional target or xfail list. Index: gfortran.dg/module_md5_1.f90 === --- gfortran.dg/module_md5_1.f90(revision 0) +++ gfortran.dg/module_md5_1.f90(revision 0) @@ -0,0 +1,14 @@ +! Check that we can write a module file, that it has a correct MD5 sum, +! and that we can read it back. +! +! { dg-do compile } +module foo + integer(kind=4), parameter :: pi = 3_4 +end module foo + +program test + use foo + print *, pi +end program test +! { dg-final { scan-module foo MD5:18a257e13c90e3872b7b9400c2fc6e4b } } +! { dg-final { cleanup-modules foo } } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31587
[Bug libstdc++/31643] New: Codecvt facets with UTF-8 encoding fail to recognize partial encoding sequences
Problem: When calling the out() method of a codecvt facet for a locale that specifies UTF-8 encoding, the method fails to recognize partial (i.e., incomplete) UTF-8 encoding sequences at the end of the source string. Instead of returning the expected std::codecvt_base::partial status code with the returned source position (arg-4) indexing the start of the incomplete sequence, the method returns std::codecvt_base::ok with the returned source position just past the end of the source string. Nothing from the partial sequence ends up in the destination wide string (as expected). Compilation: gcc -v --save-temps -Wall -ansi -pedantic -g -o localetest localetest.cxx Compilation output: Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/u sr/share/info --enable-shared --enable-threads=posix --enable-checking=release - -with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable- libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable -java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java- 1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=i386-redhat-linux Thread model: posix gcc version 4.1.1 20070105 (Red Hat 4.1.1-51) /usr/libexec/gcc/i386-redhat-linux/4.1.1/cc1plus -E -quiet -v -D_GNU_SOURCE loc aletest.cxx -mtune=generic -ansi -Wall -pedantic -fworking-directory -fpch-prepr ocess -o localetest.ii ignoring nonexistent directory /usr/lib/gcc/i386-redhat-linux/4.1.1/../../../.. /i386-redhat-linux/include #include ... search starts here: #include ... search starts here: /usr/lib/gcc/i386-redhat-linux/4.1.1/../../../../include/c++/4.1.1 /usr/lib/gcc/i386-redhat-linux/4.1.1/../../../../include/c++/4.1.1/i386-redhat- linux /usr/lib/gcc/i386-redhat-linux/4.1.1/../../../../include/c++/4.1.1/backward /usr/local/include /usr/lib/gcc/i386-redhat-linux/4.1.1/include /usr/include End of search list. /usr/libexec/gcc/i386-redhat-linux/4.1.1/cc1plus -fpreprocessed localetest.ii - quiet -dumpbase localetest.cxx -mtune=generic -ansi -auxbase localetest -g -Wall -pedantic -ansi -version -o localetest.s GNU C++ version 4.1.1 20070105 (Red Hat 4.1.1-51) (i386-redhat-linux) compiled by GNU C version 4.1.1 20070105 (Red Hat 4.1.1-51). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 4720743fdfefd64206c8550433f6e508 as -V -Qy -o localetest.o localetest.s GNU assembler version 2.17.50.0.6-2.fc6 (i386-redhat-linux) using BFD version 2. 17.50.0.6-2.fc6 20061020 /usr/libexec/gcc/i386-redhat-linux/4.1.1/collect2 --eh-frame-hdr -m elf_i386 -- hash-style=gnu -dynamic-linker /lib/ld-linux.so.2 -o localetest /usr/lib/gcc/i38 6-redhat-linux/4.1.1/../../../crt1.o /usr/lib/gcc/i386-redhat-linux/4.1.1/../../ ../crti.o /usr/lib/gcc/i386-redhat-linux/4.1.1/crtbegin.o -L/usr/lib/gcc/i386-re dhat-linux/4.1.1 -L/usr/lib/gcc/i386-redhat-linux/4.1.1 -L/usr/lib/gcc/i386-redh at-linux/4.1.1/../../.. localetest.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgc c /usr/lib/gcc/i386-redhat-linux/4.1.1/crtend.o /usr/lib/gcc/i386-redhat-linux/4 .1.1/../../../crtn.o Test Source File (localetest.cxx): // // This test demonstrates that UTF-8 codecvt facets are ignoring incomplete // trailing encoding sequences. The expected behavior is a return of the // status value std::codecvt_base::partial, with the returned current source // position at the start of the failed sequence. The actual behavior is a // return of std::codecvt_base::ok, with the returned current source position // at the end of the source string (i.e., the incomplete sequence is ignored). // #include iostream #include string #include locale using namespace std; // // Some typedefs to help with facet access. // typedef codecvt_base::result Result; typedef string::traits_type::state_type State; typedef codecvtwstring::value_type, string::value_type, State Converter; wchar_t to[256];// Destination buffer. // // Perform each test iteration fresh, just to make sure that there isn't any // lingering context between tests. // void dotest( const string test_name, const char *const locale_name, const string test_string ) { State q;// Shift state context. const string::value_type *me = 0; // Multibyte source current postion. wstring::value_type *we = 0;// Wide destination current position. Result status; // Conversion status. // // Set the current locale. // locale loc(locale_name); locale::global(loc); cout.imbue(loc); // // Start with a clear output buffer. // memset(to, 0, sizeof(to)); // // Do the conversion from narrow multibyte to wide unicode. // const Converter cvt = use_facetConverter(loc); memset(q, 0, sizeof(q)); string::size_type src_size = test_string.size(); status = cvt.in(
[Bug fortran/31618] backspace intrinsic is not working on an unformatted file
--- Comment #4 from tkoenig at gcc dot gnu dot org 2007-04-20 22:29 --- Strictly speaking, this is not a violation of the Fortran standard, as there are no guarantees of what happens after an error. Nonetheless, we should fix it. I'm investigating. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31618
[Bug libstdc++/31643] Codecvt facets with UTF-8 encoding fail to recognize partial encoding sequences
--- Comment #1 from jcavalla at postini dot com 2007-04-20 22:30 --- Created an attachment (id=13395) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13395action=view) Verbose compilation output Produced with: g++ -v --save-temps -Wall -ansi -pedantic -g -o localetest localetest.cxx -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31643
[Bug libstdc++/31643] Codecvt facets with UTF-8 encoding fail to recognize partial encoding sequences
--- Comment #2 from jcavalla at postini dot com 2007-04-20 22:31 --- Created an attachment (id=13396) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13396action=view) Original test case source file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31643
[Bug libstdc++/31643] Codecvt facets with UTF-8 encoding fail to recognize partial encoding sequences
--- Comment #3 from jcavalla at postini dot com 2007-04-20 22:31 --- Created an attachment (id=13397) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13397action=view) Preprocessed intermediate file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31643
[Bug libstdc++/31643] Codecvt facets with UTF-8 encoding fail to recognize partial encoding sequences
--- Comment #4 from jcavalla at postini dot com 2007-04-20 22:32 --- Created an attachment (id=13398) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13398action=view) Test results -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31643
[Bug libstdc++/31643] Codecvt facets with UTF-8 encoding fail to recognize partial encoding sequences
--- Comment #5 from jcavalla at postini dot com 2007-04-20 22:37 --- 1. Please note that 'g++' was used to compile, not 'gcc' as stated below. Sorry. 2. I marked this bug 'major' instead or 'normal' because callers will not be able to determine whether or not they need to supply more input to complete a sequence. If in a read/convert type loop with preserved shift state across convert calls, this may not matter. I will run such a test and post the results. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31643
[Bug libstdc++/31643] Codecvt facets with UTF-8 encoding fail to recognize partial encoding sequences
--- Comment #6 from jcavalla at postini dot com 2007-04-20 22:59 --- I ran additional tests just to make sure that the shift state was valid across calls, even though partial is not returned when a chunk ends in a partial encoding sequence. I split several 2,3, and 4 byte UTF character sequences across two calls to the codecvt in() method. Each time, the sequence was correctly widened into 1 UTF-32 character code. Thus, the shift state appears to be OK. Just the return value of 'ok' is incorrect. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31643
[Bug target/31628] stdcall function is miscompiled
--- Comment #6 from rth at gcc dot gnu dot org 2007-04-21 00:53 --- Subject: Bug 31628 Author: rth Date: Sat Apr 21 00:53:37 2007 New Revision: 124014 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124014 Log: PR target/31628 * config/i386/i386.c (type_has_variadic_args_p): Look for any TREE_LIST with a void_type_node value, not void_list_node exactly. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31628
[Bug target/31628] stdcall function is miscompiled
--- Comment #7 from rth at gcc dot gnu dot org 2007-04-21 00:58 --- Fixed. -- rth at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31628
[Bug tree-optimization/31605] [4.2/4.3 Regression] VRP eliminates a useful test due with conversion from unsigned int to int
--- Comment #5 from ian at airs dot com 2007-04-21 02:08 --- Created an attachment (id=13399) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13399action=view) Proposed patch Currently testing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31605
[Bug preprocessor/11242] [mingw32] #include memory takes my memory directory instead of the standard memory header file
--- Comment #6 from zackw at panix dot com 2007-04-21 02:56 --- I am inclined to think that this is an operating system bug and should be worked around in the mingw32 libraries, not in cpplib. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11242