Re: A recent patch increased GCC's memory consumption in some cases!
On Sun, 22 Apr 2007, Andrew Pinski wrote: On 4/22/07, Andrew Pinski [EMAIL PROTECTED] wrote: I think it was just by accident that the libfuncs would fit in phi_node+3 operand slot. Also I think extra_order_size_table needs to be relooked at after my phi_node and the gimple_stmt patches as I think we now put have some duplicates and the sizes have slightly changed. Ok, it was just happened on x86 to be sizeof(struct tree_function_decl) == sizeof (struct tree_phi_node) + sizeof (struct phi_arg_d) * 3 which is no longer true any more. Now we are wasting 32bytes on x86 per function decl. This was never true on any other target anyways as sizeof(struct tree_function_decl) was always bigger. So the only thing to do is to add sizeof(struct tree_function_decl) to extra_order_size_table or change how we deal with small allocations. This is what I suspected. I suppose I can do another round of tuning here at the end of stage2 - and maybe before push a slight interface change to request alignment on memory allocation, this would allow for even more compaction (remember that failed because we cannot validly compute alignment requirements out of object size). Richard.
Re: GCC mini-summit - compiling for a particular architecture
On Sun, 2007-04-22 at 17:32 -0700, Mark Mitchell wrote: Steve Ellcey wrote: It came up in a few side conversations. As I understand it, RMS has decreed that the -On optimizations shall be architecture independent. That said, there are generic optimizations which really only apply to a single architecture, so there is some precedent for bending this rule. This seems unfortunate. As others have said downthread, I don't think the idea that -O2 should enable the same set of optimizations on all processors is necessary or desirable. (In fact, there's nothing inherent in even using the same algorithms on all processors; I can well imagine that the best register allocation algorithms for x86 and Itanium might be entirely different. I'm in no way trying to encourage an entire set of per-achitecture optimization passes; clearly the more we can keep common the better! But, our goal is to produce a compiler that generates the best possible code on multiple architectures, not to produce a compiler that uses the same algorithms and optimization options on all architectures.) I have never heard RMS opine on this issue. However, I don't think that this is something that the SC or FSF need to decide. The SC has made very clear that it doesn't want to interfere with day-to-day technical development of the compiler. If the consensus of the maintainers is that it's OK to turn on some extra optimizations at -O2 on Itanium, then I think we can just make the change. Of course, if people would prefer that I ask the SC, I'm happy to do so. I think it would be nicer if this could be done in a MI way by examining certain target properties. For example, the generic framework might say something like: 'if there's more than N gp registers, enable opt_foo at -O2 or above'. That way we can keep the framework centralized while at the same time allowing bigger systems to exploit useful optimizations with the standard options. I know a number of targets turn off sched1 at -O2 because it simply makes code much worse (due to increased register pressure). We could eliminate much of this machine-dependent tweaking with suitable generic tests. R.
volunteer to help with Fortran
Hi there I'd like to offer to help with Gnu Fortran development. Please would you let me know in what areas volunteer help is needed, and I can then tell you which areas fit in with my expertise. I like working on apple macs, and I think I know them fairly well, but also have a couple of pcs if necessary. I live in London - as my wife is Dutch we pop across to Holland a few times every year, so in case you wanted to meet up, that's not a problem. Are you going to proceed with a Fortran 200x project? If so, is that an area where you need volunteer assistance? Thank you .. Michael Steinbock
Re: GCC mini-summit - compiling for a particular architecture
(In fact, there's nothing inherent in even using the same algorithms on all processors; I can well imagine that the best register allocation algorithms for x86 and Itanium might be entirely different. I'm in no way trying to encourage an entire set of per-achitecture optimization passes; clearly the more we can keep common the better! But, our goal is to produce a compiler that generates the best possible code on multiple architectures, not to produce a compiler that uses the same algorithms and optimization options on all architectures.) I have never heard RMS opine on this issue. However, I don't think that this is something that the SC or FSF need to decide. As far as I can recall, there's always been SOME hardware dependence on the exact meaning of -O2 since, in at least a few cases, which options were included in it depended on target options.
Re: GCC mini-summit - compiling for a particular architecture
Richard Earnshaw wrote: I think it would be nicer if this could be done in a MI way by examining certain target properties. For example, the generic framework might say something like: 'if there's more than N gp registers, enable opt_foo at -O2 or above'. Yes, I agree; wherever that technique works, it would make sense. Perhaps that can be made to handle all the cases; I'm not sure. I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. But, I would certainly be happy with better performance, even at the cost of some mild inconsistency between optimizations running on different CPUs. Your example of turning off sched1 is exactly the sort of thing that seems reasonable to me, although I agree that if it can be disabled in a machine-independent way, by checking some property of the back end, that would be superior. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Bootstrap failure for current gcc trunk on cygwin: in set_curr_insn_source_location, at cfglayout.c:284
I hit a recent problem, not there in revision 124020 ../../xgcc -B../../ -c -O2 -g -O2 -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -fno-common -gnatpg -gnata -I- -I../rts -I. -I/usr/loc al/src/trunk/gcc/gcc/ada /usr/local/src/trunk/gcc/gcc/ada/tree_io.adb -o tree_io .o +===GNAT BUG DETECTED==+ | 4.3.0 20070423 (experimental) (i686-pc-cygwin) GCC error:| | in set_curr_insn_source_location, at cfglayout.c:284 | | Error detected around /usr/local/src/trunk/gcc/gcc/ada/tree_io.adb:511 | | Please submit a bug report; see http://gcc.gnu.org/bugs.html.| | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact gcc or gnatmake command that you entered. | | Also include sources listed below in gnatchop format | | (concatenated together with no headers between files). | +==+ Please include these source files with error report Note that list may not be accurate in some cases, so please double check that the problem can still be reproduced with the set of files listed. /usr/local/src/trunk/gcc/gcc/ada/tree_io.adb /usr/local/src/trunk/gcc/gcc/ada/tree_io.ads /usr/local/src/trunk/gcc/gcc/ada/types.ads /usr/local/src/trunk/gcc/gcc/ada/debug.ads /usr/local/src/trunk/gcc/gcc/ada/output.ads /usr/local/src/trunk/gcc/gcc/ada/hostparm.ads raised TYPES.UNRECOVERABLE_ERROR : comperr.adb:398 make[3]: *** [tree_io.o] Error 1 make[3]: Leaving directory `/usr/local/src/trunk/objdir/gcc/ada/tools' make[2]: *** [gnattools-native] Error 2 make[2]: Leaving directory `/usr/local/src/trunk/objdir/gnattools' make[1]: *** [all-gnattools] Error 2 make[1]: Leaving directory `/usr/local/src/trunk/objdir' make: *** [all] Error 2 -- Cheers, /ChJ
Re: Bootstrap failure for current gcc trunk on cygwin: in set_curr_insn_source_location, at cfglayout.c:284
On 4/23/07, Christian Joensson [EMAIL PROTECTED] wrote: I hit a recent problem, not there in revision 124020 This is a know problem until the Ada people fix their frontend. I suppose the upfront notice was not sent. Richard. ../../xgcc -B../../ -c -O2 -g -O2 -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -fno-common -gnatpg -gnata -I- -I../rts -I. -I/usr/loc al/src/trunk/gcc/gcc/ada /usr/local/src/trunk/gcc/gcc/ada/tree_io.adb -o tree_io .o +===GNAT BUG DETECTED==+ | 4.3.0 20070423 (experimental) (i686-pc-cygwin) GCC error:| | in set_curr_insn_source_location, at cfglayout.c:284 | | Error detected around /usr/local/src/trunk/gcc/gcc/ada/tree_io.adb:511 | | Please submit a bug report; see http://gcc.gnu.org/bugs.html.| | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact gcc or gnatmake command that you entered. | | Also include sources listed below in gnatchop format | | (concatenated together with no headers between files). | +==+ Please include these source files with error report Note that list may not be accurate in some cases, so please double check that the problem can still be reproduced with the set of files listed. /usr/local/src/trunk/gcc/gcc/ada/tree_io.adb /usr/local/src/trunk/gcc/gcc/ada/tree_io.ads /usr/local/src/trunk/gcc/gcc/ada/types.ads /usr/local/src/trunk/gcc/gcc/ada/debug.ads /usr/local/src/trunk/gcc/gcc/ada/output.ads /usr/local/src/trunk/gcc/gcc/ada/hostparm.ads raised TYPES.UNRECOVERABLE_ERROR : comperr.adb:398 make[3]: *** [tree_io.o] Error 1 make[3]: Leaving directory `/usr/local/src/trunk/objdir/gcc/ada/tools' make[2]: *** [gnattools-native] Error 2 make[2]: Leaving directory `/usr/local/src/trunk/objdir/gnattools' make[1]: *** [all-gnattools] Error 2 make[1]: Leaving directory `/usr/local/src/trunk/objdir' make: *** [all] Error 2 -- Cheers, /ChJ
DF-branch benchmarking on SPEC2000
I've promised to make more thorough and accurate comparison of df-branch and mainline on last merge point to the branch. The df-branch compiler does not include sunday's Steven's patch which uses a separate obstack for df bitmaps. It does not change code but it can speedup the df-branch compiler (I'll do compilation time measurement of the branch with the patch later on this week). It does contain Ken's patch that fixed a huge compilation time degradation on SPECFp2000. All toolchains were compiled with --enable-checking=release (-O2 was used to SPEC2000 benchmarks). Four platforms for benchmarking were used: x86_64: 2.6 Ghz Core2. Itanium: 1.6Ghz Itanium2. PPC64: 2.5Ghz G5. Pentium4: 3.2Ghz Pentium4 (additionally -mtune=pentium4 was used). The attachments contain more verbose info about SPEC2000 results (there base is df-branch, peak is mainline on the last merge point). Here is the summary: Compilation time SPECINT2000 (user time sec): **Mainline Branch Change x86_64141.69 150.23 +6.0% Itanium 448.07 487.33 +8.8% PPC64 385.21 408.67 +6.0% Pentium4 260.19 275.23 +5.8% Compilation time SPECFP2000 (user time sec): **Mainline Branch Change x86_64104.64 113.24 +8.2% Itanium 366.86 412.67 +12.5% PPC64 311.49 329.11 +5.7% Pentium4 188.05 201.78 +7.3% Code Size SPECINT2000 (text segment) **Change x86_64+0.42% Itanium -0.04% PPC64 +0.60% Pentium4 +0.44% Code Size SPECFP2000 (text segment) **Change x86_64+0.18% Itanium -0.09% PPC64 +0.42% Pentium4 -0.09% SPECINT200 Scores: **Mainline Branch Change x86_641934 1934 0% Itanium864 862 -0.23% PPC64 671 667 -0.59% Pentium4 892 886 -0.67% SPECFP200 Scores: **Mainline Branch Change x86_641846 1838 -0.43% Itanium568 566 -0.35% PPC64 661 660 -0.15% Pentium4 770 765 -0.65% To improve the scores I'd recommend to pay attention to big degradation in SPEC score: 9% perlbmk degradation on Pentium4 3% fma3d degradation on Core2 3% eon and art degradation on Itanium 3% gap and wupwise degradation on PPC64. Estimated Estimated Base Base Base Peak Peak Peak BenchmarksRef Time Run Time RatioRef Time Run Time Ratio 164.gzip 1400 1001399* 1400 1011392 164.gzip 1400 1001399 1400 1011391 164.gzip 1400 1001398 1400 1011392* 175.vpr 1400 86.9 1612* 1400 87.3 1603 175.vpr 1400 86.8 1613 1400 86.8 1613* 175.vpr 1400 87.0 1610 1400 86.8 1614 176.gcc 1100 61.0 1803 1100 61.2 1796 176.gcc 1100 61.1 1801 1100 61.2 1799 176.gcc 1100 61.0 1802* 1100 61.2 1797* 181.mcf 1800 1141577* 1800 1131588 181.mcf 1800 1141573 1800 1131594* 181.mcf 1800 1131591 1800 1131594 186.crafty1000 37.8 2647 1000 37.9 2638 186.crafty1000 37.8 2647* 1000 37.9 2636 186.crafty1000 37.8 2647 1000 37.9 2636* 197.parser1800 1431255 1800 1431259 197.parser1800 1441254 1800 1431260* 197.parser1800 1441254* 1800 1431260 252.eon 1300 52.6 2472* 1300 51.8 2509* 252.eon 1300 52.6 2473 1300 51.9 2504 252.eon 1300 52.7 2469 1300 51.8 2509 253.perlbmk 1800 71.8 2506 1800 73.3 2456 253.perlbmk 1800 72.6 2479 1800 73.5 2449 253.perlbmk 1800 71.9 2503* 1800 73.4 2451* 254.gap 1100 54.7 2012* 1100 54.8 2008* 254.gap 1100 54.7 2012 1100 54.8 2008 254.gap 1100 54.6 2015 1100 54.9 2005 255.vortex1900 85.9 2213 1900 83.9 2265 255.vortex1900 86.0 2210 1900 83.9 2266* 255.vortex1900 85.9 2211* 1900 83.8
Re: DF-branch benchmarking on SPEC2000
On 4/23/07, Vladimir Makarov [EMAIL PROTECTED] wrote: ... To improve the scores I'd recommend to pay attention to big degradation in SPEC score: 9% perlbmk degradation on Pentium4 3% fma3d degradation on Core2 3% eon and art degradation on Itanium 3% gap and wupwise degradation on PPC64. Vlad, Thanks a LOT for the measurement and the summary. Thanks to your previous report, I've been looking at wupwise on PPC64, and have found at least one missed optimization opportunity. I'm testing the fix. Hopefully it will address some of the remaining regressions above. As for the perlbmk slowdown on P4. my initial guess is that it might be due to cross-jumping or block ordering - those are things I noticed the dataflow branch generates slightly different code than mainline. I didn't try to narrow down where the difference comes from, as most of the differences seemed unimportant. -- #pragma ident Seongbae Park, compiler, http://seongbae.blogspot.com;
Re: GCC mini-summit - benchmarks
Jim Wilson wrote: Kenneth Hoste wrote: I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? The claim is that SPEC CPU2006 has source code bugs that cause it to fail when compiled by gcc. We weren't given a specific list of problem. HJ, can you give us the specifics on the SPEC 2006 failures you were seeing? I remember the perlbench failure, it was IA64 specific, and was due to the SPEC config file spec_config.h that defines the attribute keyword to be null, thus eliminating all attributes. On IA64 Linux, in the /usr/include/bits/setjmp.h header file, the __jmp_buf buffer is defined to have an aligned attribute on it. If the buffer isn't aligned the perlbench program fails. I believe another problem was an uninitialized local variable in a Fortran program, but I don't recall which program or which variable that was. Steve Ellcey [EMAIL PROTECTED]
Re: Where is gstdint.h
Tim Prince wrote: [EMAIL PROTECTED] wrote: Where is gstdint.h ? Does it acctually exist ? libdecnumber seems to use it. decimal32|64|128.h's include decNumber.h which includes deccontext.h which includes gstdint.h When you configure libdecnumber (e.g. by running top-level gcc configure), gstdint.h should be created, by modifying stdint.h. Since you said nothing about the conditions where you had a problem, you can't expect anyone to fix it for you. If you do want it fixed, you should at least file a complete PR. As it is more likely to happen with a poorly supported target, you may have to look into it in more detail than that. When this happened to me, I simply made a copy of stdint.h to get over the hump. This might happen when you run the top level gcc configure in its own directory. You may want to try to make a new directory elsewhere and run configure there. pwd .../my-gcc-source-tree mkdir ../build cd ../build ../my-gcc-source-tree/configure make
Re: DF-branch benchmarking on SPEC2000
On 4/23/07, Seongbae Park [EMAIL PROTECTED] wrote: As for the perlbmk slowdown on P4. my initial guess is that it might be due to cross-jumping or block ordering - those are things I noticed the dataflow branch generates slightly different code than mainline. I didn't try to narrow down where the difference comes from, as most of the differences seemed unimportant. The crossjumping issue is already filed: http://gcc.gnu.org/PR30905 The perlbmk benchmark always seems to jump up and down for no obvious reason. This one will be tough to analyze, I think. Gr. Steven
Re: GCC mini-summit - compiling for a particular architecture
On Sun, Apr 22, 2007 at 04:39:23PM -0700, Joe Buck wrote: On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote: At work we use -O3 since it gives 5% performance gain against -O2. profile-feedback has many flags and there is no overview of it in the doc IIRC. Who will use it except GCC developpers? Who knows about your advice? On Sun, Apr 22, 2007 at 03:22:56PM +0200, Jan Hubicka wrote: Well, this is why -fprofile-generate and -fprofile-use was invented. Perhaps docs can be improved so people actually discover it. Do you have any suggestions? Docs could be improved, but this also might be a case where a tutorial would be needed, to teach users how to use it effectively. (Perhaps a chapther for FDO or subchapter of gcov docs would do?) We could also have examples, with lots of comments, in the testsuite, with references to them in the docs. That way there is code that people can try to see what kind of effect an optimization has on their system. This would also, of course, provide at least minimal testing for more optimizations; last time I looked there were lots of them that are never used in the testsuite. Janis
Re: GCC mini-summit - compiling for a particular architecture
On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? The bizarreness of the resulting flag set should not be a consideration IMHO. Humans are bad at predicting how complex systems like this behave. The fact that the best flags may be non-intuitive is not surprising. I find the case of picking compiler options for optimizing code very much like choosing which part of your code to hand optimize programmatically. People often guess wrongly where their code spends its time, that's why we have profilers. So I feel we should choose our per-target flags using a tool like Acovea to find what the best options are on a per-target basis. Then we could insert those flags into -O2 or -Om in each backend. Whether we use SPEC or the testcases included in Acovea or maybe GCC itself as the app to tune for could be argued. And some care would be necessary to ensure that the resulting flags don't completely hose other large classes of apps. But IMHO once you decide to do per-target flags, something like this seems like the natural conclusion. http://www.coyotegulch.com/products/acovea/ --Kaveh -- Kaveh R. Ghazi [EMAIL PROTECTED]
Re: GCC mini-summit - compiling for a particular architecture
On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. On Mon, Apr 23, 2007 at 01:21:20PM -0400, Kaveh R. GHAZI wrote: Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? In this case bizarre would mean untested: if we find some set of 15 options that maximizes SPEC performance, it's quite likely that no one has used that option combination before; building complete distros with this untested option set would almost certainly find bugs. Still might be worth trying, but would require extra and careful testing.
Re: GCC mini-summit - compiling for a particular architecture
Kaveh R. GHAZI wrote: On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? Some of both, I guess. Certainly, SPEC isn't a good benchmark for some CPUs or some users. Also, I'd be very surprised to find that at -O2 my inline functions weren't inlined, but I'd not be entirely amazed to find out that on some processor that was for some reason negative on SPEC. I'd be concerned that a SPEC run on Pentium IV doesn't necessarily guide things well for Athlon, so, in practice, we'd still need some kind of fallback default for CPUs where we didn't do the Acovea thing. I'd also suspect if we move various chips to differing options, we'll see more bugs due to rarely-tested combinations. Yes, that will help us fix latent bugs, but it may not improve the user experience. We'll see packages that used to compile on lots of platforms start to break more often on one platform or another due either to latent bugs in GCC, or latent bugs in the application. So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Does vectorizer support extension?
H. J. Lu [EMAIL PROTECTED] wrote on 23/04/2007 01:34:39: On Mon, Apr 23, 2007 at 12:55:26AM +0300, Dorit Nuzman wrote: H. J. Lu [EMAIL PROTECTED] wrote on 23/04/2007 00:29:16: On Sun, Apr 22, 2007 at 11:14:20PM +0300, Dorit Nuzman wrote: H. J. Lu [EMAIL PROTECTED] wrote on 20/04/2007 18:02:09: Hi Dorit, SSE4 has vector zero/sign-extensions like: (define_insn sse4_1_zero_extendv2siv2di2 [(set (match_operand:V2DI 0 register_operand =x) (zero_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 nonimmediate_operand xm) (parallel [(const_int 0) (const_int 1)]] TARGET_SSE4_1 pmovzxdq\t{%1, %0|%0, %1} [(set_attr type ssemov) (set_attr mode TI)]) Does vectorizer support them? (sorry, I was away from email during Friday-Saturday) - so this looks like a vec_unpacku_hi_v4si (or _lo?), i.e. what is now modeled as follows in sse.md: (define_expand vec_unpacku_hi_v4si [(match_operand:V2DI 0 register_operand ) (match_operand:V4SI 1 register_operand )] TARGET_SSE2 { ix86_expand_sse_unpack (operands, true, true); DONE; }) I am not sure if they are the same since SSE4.1 instructions extend the first 2 elements in the vector, not the high/low parts. unpack high/low means the high/low elements of the vector SSE4.1 has 1. The first 8 elements of V16QI zero/sign externd to V8HI. This one is equivalent to vec_unpacku/s_hi_v16qi. 2. The first 4 elements of V16QI/V8HI zero/sign externd to V4SI. The second of these two - first 4 elements of V8HI zero/sign externd to V4SI - is equivalent to vec_unpacku/s_hi_v8hi. 2. The first 2 elements of V16QI/V8HI/V4SI zero/sign externd to V2DI. The last of these three - first 2 elements of V4SI zero/sign extend to V2DI - is equivalent to vec_unpacku/s_hi_v4si. We currently don't have idioms that represent the other forms. By the way, the vectorizer will not be able to make use of these vec_unpacku/s_hi_* insns if you don't define the corresponding vec_unpacku/s_lo_* patterns (although I think these are already defined in sse.md, though maybe less efficiently than the way sse4 can support them?). dorit H.J.
Re: GCC mini-summit - compiling for a particular architecture
Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler.
Re: GCC mini-summit - compiling for a particular architecture
Citeren Kaveh R. GHAZI [EMAIL PROTECTED]: On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? The bizarreness of the resulting flag set should not be a consideration IMHO. Humans are bad at predicting how complex systems like this behave. The fact that the best flags may be non-intuitive is not surprising. I find the case of picking compiler options for optimizing code very much like choosing which part of your code to hand optimize programmatically. People often guess wrongly where their code spends its time, that's why we have profilers. So I feel we should choose our per-target flags using a tool like Acovea to find what the best options are on a per-target basis. Then we could insert those flags into -O2 or -Om in each backend. Whether we use SPEC or the testcases included in Acovea or maybe GCC itself as the app to tune for could be argued. And some care would be necessary to ensure that the resulting flags don't completely hose other large classes of apps. But IMHO once you decide to do per-target flags, something like this seems like the natural conclusion. http://www.coyotegulch.com/products/acovea/ I totally agree with you, Acovea would a good tool for helping with this. But, in my opinion, it has one big downside: it doesn't allow a tradeoff between for example compilation time and execution time. My work tries to tackle this: I'm using an evolutionary approach (like Acovea) which is multi-objective (unlike Acovea), meaning it optimizes a Pareto curve for comp. time, execution time and code size. I'm also trying to speed up things over the way Acovea handles things, which I won't describe any further for now. I'm currently only using the SPEC CPU2000 benchmarks (and I believe Acovea does too), but this shouldn't be a drawback: the methodology is completely independent of the benchmarks used. I'm also trying to make sure everything is parameterizable (size of population, number of generations, crossover/mutation/migration rates, ...). Unfortunately, I'm having quite a lot of problems with weird combinations of flags (which generate bugs, as was mentioned in this thread), which is slowing things down. Instead of trying to fix these bugs, I'm just ignoring combinations of flags which generate them for now (although I'l try to report each bug I run into in Bugzilla). Hopefully, this work will produce nice results by June, and I'll make sure to report on it on this mailinglist once it's done. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
RE: GCC mini-summit - compiling for a particular architecture
On 23 April 2007 19:07, Diego Novillo wrote: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Has any of the Acovea research demonstrated whether there actually is any such thing as a good default set of flags in all cases? If the results obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? cheers, DaveK -- Can't think of a witty .sigline today
Re: DF-branch benchmarking on SPEC2000
Vladimir Makarov wrote: I've promised to make more thorough and accurate comparison of df-branch and mainline on last merge point to the branch. The df-branch compiler does not include sunday's Steven's patch which uses a separate obstack for df bitmaps. It does not change code but it can speedup the df-branch compiler (I'll do compilation time measurement of the branch with the patch later on this week). Here the result of how last Steven's patch affects compilation times. SpecInt2000: X864_64 without the patch 150.23user 14.31system 3:23.17elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k with the patch 149.97user 14.28system 3:26.54elapsed 79%CPU (0avgtext+0avgdata 0maxresident)k Itanium without the patch 487.33user 14.10system 9:18.45elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k with the patch 487.30user 14.20system 9:27.26elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k PPC64 without the patch 408.67user 32.05system 7:42.78elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k with the patch 409.13user 31.94system 7:57.40elapsed 92%CPU (0avgtext+0avgdata 0maxresident)k Pentium4 without the patch 275.23user 22.49system 5:31.34elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k with the patch 275.40user 22.67system 6:02.62elapsed 82%CPU (0avgtext+0avgdata 0maxresident)k SPECFP2000: X86_64 without the patch 113.24user 8.43system 2:24.88elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k with the patch 113.02user 7.96system 2:19.89elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k Itanium without the patch 412.67user 10.47system 7:33.24elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k with the patch 412.52user 9.52system 7:36.70elapsed 92%CPU (0avgtext+0avgdata 0maxresident)k PPC64 without the patch 329.11user 24.04system 6:05.33elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k with the patch 327.45user 22.14system 6:06.81elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k Pentium4 without the patch 201.78user 13.76system 3:57.78elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k with the patch 200.08user 14.13system 4:05.18elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
Re: GCC mini-summit - compiling for a particular architecture
Citeren Diego Novillo [EMAIL PROTECTED]: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Sorry to be blunt, but that's indeed quite naive :-) Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). A serious set of programs can easily take tens of minutes, so I think you can easily see trying all possible combinations is totally unfeasible. The nice thing is you don't have to: you can learn from the combinations you've evaluated, you can start with only a subset of programs and add others gradually, ... My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - compiling for a particular architecture
Dave Korn wrote on 04/23/07 14:26: Has any of the Acovea research demonstrated whether there actually is any such thing as a good default set of flags in all cases? If the results Not Acovea itself. The research I'm talking about involves a compiler whose pipeline can be modified and resequenced. It's not just a matter of adding -f/-m flags. The research I've seen does a fairly good job at modelling AI systems that traverse the immense search space looking for different sequences. obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? Yes. That's why it's called research ;) I'm sure we'll get an earful of at least a couple of these at the summit.
RE: GCC mini-summit - compiling for a particular architecture
Citeren Dave Korn [EMAIL PROTECTED]: On 23 April 2007 19:07, Diego Novillo wrote: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Has any of the Acovea research demonstrated whether there actually is any such thing as a good default set of flags in all cases? If the results obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? I don't think that has been shown. Acovea evaluation has only been done on a few architectures, and I don't believe a comparison was made. But, you guys are probably the best team to try that out: you have access to a wide range of platforms, and the experience needed to solve problems when they present them. Hopefully, my upcoming framework will boost such an effort. Remember: adjusting the way in which GCC handles -On flags is only needed if the tests suggest it will be usefull. greetings, Kenneth This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - compiling for a particular architecture
[EMAIL PROTECTED] wrote on 04/23/07 14:37: Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). No, that's not what I want. I want a static recipe. I do *not* want -Ox to do this search every time. It goes like this: Somebody does a study over a set of applications that represent certain usage patterns (say, FP and INT just to mention the two more common classes of apps). The slow search is done offline and after a few months, we get the results in the form of a table that says for each class and for each -Ox what set of passes to execute and in what order they should be executed. Not to say that the current sequencing and repetition are worthless, but I think they could be improved in a quasi systematic way using this process (which is slow and painful, I know). My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. Right. This is what I want. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... Excellent. Looking forward to those results.
Re: Does vectorizer support extension?
On Mon, Apr 23, 2007 at 09:05:05PM +0300, Dorit Nuzman wrote: H. J. Lu [EMAIL PROTECTED] wrote on 23/04/2007 01:34:39: On Mon, Apr 23, 2007 at 12:55:26AM +0300, Dorit Nuzman wrote: H. J. Lu [EMAIL PROTECTED] wrote on 23/04/2007 00:29:16: On Sun, Apr 22, 2007 at 11:14:20PM +0300, Dorit Nuzman wrote: H. J. Lu [EMAIL PROTECTED] wrote on 20/04/2007 18:02:09: Hi Dorit, SSE4 has vector zero/sign-extensions like: (define_insn sse4_1_zero_extendv2siv2di2 [(set (match_operand:V2DI 0 register_operand =x) (zero_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 nonimmediate_operand xm) (parallel [(const_int 0) (const_int 1)]] TARGET_SSE4_1 pmovzxdq\t{%1, %0|%0, %1} [(set_attr type ssemov) (set_attr mode TI)]) Does vectorizer support them? (sorry, I was away from email during Friday-Saturday) - so this looks like a vec_unpacku_hi_v4si (or _lo?), i.e. what is now modeled as follows in sse.md: (define_expand vec_unpacku_hi_v4si [(match_operand:V2DI 0 register_operand ) (match_operand:V4SI 1 register_operand )] TARGET_SSE2 { ix86_expand_sse_unpack (operands, true, true); DONE; }) I am not sure if they are the same since SSE4.1 instructions extend the first 2 elements in the vector, not the high/low parts. unpack high/low means the high/low elements of the vector SSE4.1 has 1. The first 8 elements of V16QI zero/sign externd to V8HI. This one is equivalent to vec_unpacku/s_hi_v16qi. Did you mean vec_unpacku/s_lo_v16qi? 2. The first 4 elements of V16QI/V8HI zero/sign externd to V4SI. The second of these two - first 4 elements of V8HI zero/sign externd to V4SI - is equivalent to vec_unpacku/s_hi_v8hi. Did you mean vec_unpacku/s_lo_v8hi? 2. The first 2 elements of V16QI/V8HI/V4SI zero/sign externd to V2DI. The last of these three - first 2 elements of V4SI zero/sign extend to V2DI - is equivalent to vec_unpacku/s_hi_v4si. Did you mean vec_unpacku/s_lo_v4si? We currently don't have idioms that represent the other forms. By the way, the vectorizer will not be able to make use of these vec_unpacku/s_hi_* insns if you don't define the corresponding vec_unpacku/s_lo_* patterns (although I think these are already defined in sse.md, though maybe less efficiently than the way sse4 can support them?). With my SSE4.1 patch applied, for typedef char vec_t; typedef short vecx_t; extern __attribute__((aligned(16))) vec_t x [64]; extern __attribute__((aligned(16))) vecx_t y [64]; void foo () { int i; for (i = 0; i 64; i++) y [i] = x [i]; } I got movdqa x(%rip), %xmm0 movl$16, %eax pxor%xmm2, %xmm2 pmovsxbw%xmm0, %xmm1 movdqa %xmm1, y(%rip) movdqa %xmm2, %xmm1 pcmpgtb %xmm0, %xmm1 punpckhbw %xmm1, %xmm0 movdqa %xmm0, y+16(%rip) When extention is a single instruction, it is better to extend one low element at a time: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667 H.J.
Re: GCC mini-summit - compiling for a particular architecture
[EMAIL PROTECTED] wrote on 04/23/07 14:40: Any references? Yes, at the last HiPEAC conference Grigori Fursin presented their interactive compilation interface, which could be used for this. http://gcc-ici.sourceforge.net/ Ben Elliston had also experimented with a framework to allow GCC to change the sequence of the passes from the command line. Ben, where was your thesis again?
Re: GCC mini-summit - compiling for a particular architecture
On Mon, 2007-04-23 at 10:56 -0700, Mark Mitchell wrote: Kaveh R. GHAZI wrote: On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be bizarre? Some of both, I guess. Certainly, SPEC isn't a good benchmark for some CPUs or some users. Also, I'd be very surprised to find that at -O2 my inline functions weren't inlined, but I'd not be entirely amazed to find out that on some processor that was for some reason negative on SPEC. I'd be concerned that a SPEC run on Pentium IV doesn't necessarily guide things well for Athlon, so, in practice, we'd still need some kind of fallback default for CPUs where we didn't do the Acovea thing. I'd also suspect if we move various chips to differing options, we'll see more bugs due to rarely-tested combinations. Yes, that will help us fix latent bugs, but it may not improve the user experience. We'll see packages that used to compile on lots of platforms start to break more often on one platform or another due either to latent bugs in GCC, or latent bugs in the application. So, I think there's a middle ground between exactly the same passes on all targets and use Acovea for every CPU to pick what -O2 means. Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. If every target had a different set of optimizations on at -O2, then I'd claim that we've gone horribly wrong somewhere -- both from a design and implementation standpoint. If we're going to go down this path, I'd much rather see us tune towards target characteristics than towards targets themselves. Enabling/disabling a generic optimizer on a per-target basis should be the option of last resort. Acovea can be helpful in the process of identifying issues, but I strongly believe that using it in the way some have suggested is a cop-out for really delving into performance issues. Jeff
Re: GCC mini-summit - compiling for a particular architecture
On 23 Apr 2007, at 20:43, Diego Novillo wrote: [EMAIL PROTECTED] wrote on 04/23/07 14:37: Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). No, that's not what I want. I want a static recipe. I do *not* want -Ox to do this search every time. I'm not saying you need to do this every time you install GCC. I'm saying trying out 2^60 combinations (even offline) is totally unfeasible. It goes like this: Somebody does a study over a set of applications that represent certain usage patterns (say, FP and INT just to mention the two more common classes of apps). The slow search is done offline and after a few months, we get the results in the form of a table that says for each class and for each -Ox what set of passes to execute and in what order they should be executed. Not to say that the current sequencing and repetition are worthless, but I think they could be improved in a quasi systematic way using this process (which is slow and painful, I know). Exactly my idea. My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. Right. This is what I want. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... Excellent. Looking forward to those results. Cool! -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - benchmarks
On Mon, Apr 23, 2007 at 09:49:04AM -0700, Steve Ellcey wrote: Jim Wilson wrote: Kenneth Hoste wrote: I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? The claim is that SPEC CPU2006 has source code bugs that cause it to fail when compiled by gcc. We weren't given a specific list of problem. HJ, can you give us the specifics on the SPEC 2006 failures you were seeing? I remember the perlbench failure, it was IA64 specific, and was due to the SPEC config file spec_config.h that defines the attribute keyword to be null, thus eliminating all attributes. On IA64 Linux, in the /usr/include/bits/setjmp.h header file, the __jmp_buf buffer is defined to have an aligned attribute on it. If the buffer isn't aligned the perlbench program fails. I believe another problem was an uninitialized local variable in a Fortran program, but I don't recall which program or which variable that was. This is what I sent to SPEC. H.J. 465.tonto in SPEC CPU 2006 has many checks for pointers like: ENSURE(NOT associated(self%ex),SHELL1:copy_1 ... ex not destroyed) It calls the associated intrinsic on a pointer to check if a pointer is pointing to some data. According to Fortran standard, associated should be used on initialized pointers. The behavor of calling associated on uninitialized pointers is undefined or compiler dependent. Tonto tries to initialize pointers in structures by calling nullify_ptr_part_ on stack variables to ensure that pointers in stack variables are properly initialized. However, not all pointers in stack variables are initialized. As the result, the binary compiled by gcc 4.2 failed at runtime with Error in routine SHELL1:copy_1 ... ex not destroyed I am enclosing a patch which adds those missing calls of nullify_ptr_part_. Thanks. H.J.
Re: Bootstrap failure for current gcc trunk on cygwin: in set_curr_insn_source_location, at cfglayout.c:284
This is a know problem until the Ada people fix their frontend. Could you elaborate? What's known problem exactly? I suppose the upfront notice was not sent. Indeed, and the timing is quite unfortunate since the Ada compiler was independently broken yesterday too and is moreover plagued by serious problems due to the SRA bit-field patch. -- Eric Botcazou
Re: tuples: initial infrastructure
On Fri, Apr 20, 2007 at 01:07:14PM -0400, Aldy Hernandez wrote: + /* There can be 3 types of unary operations: + + SYM = constant == GSS_ASSIGN_UNARY_REG + SYM = SYM2 == GSS_ASSIGN_UNARY_MEM Um, ssa_name = ssa_name isn't a memory +/* A sequences of gimple statements. */ +#define GS_SEQP_FIRST(S) (S)-first +#define GS_SEQP_LAST(S) (S)-last +#define GS_SEQ_FIRST(S) (S).first +#define GS_SEQ_LAST(S) (S).last Why do you have both of these? Otherwise it looks ok. I figure you'll want to build a set of iterators and such for gs_sequences, like for tree-iterator.[ch]. r~
gcov in cross-compile: have a patch, seek direction
I am working on the cegcc project (http://cegcc.sourceforge.net), which bundles a bunch of the GNU development tools to produce a cross-development environment for ARM devices running Windows CE. The development hosts supported are Linux and Cygwin. Gcov normally puts the files where it writes profiling information in the source directory. In a cross-development environment, that directory isn't always available. Gcc has support for overriding that directory at runtime. Unfortunately, on Windows CE, that is not always easy. I've patched my copy of gcc to be able to specify a different directory at compile-time (instead of at run time). I can cleanup and submit my patch if there's interest. Prior to that, I have a question : should this support be steered via parameters on the compiler command line, or from environment values ? I've currently implemented a command line arguement -fcoverage-base=xx which can be used like this : arm-wince-cegcc-gcc -g -D_WIN32_IE=0x0400 --coverage -fcoverage-base=\ \storage card\\devel -o fibo.exe fibo.c Danny -- Danny Backx ; danny.backx - at - scarlet.be ; http://danny.backx.info signature.asc Description: This is a digitally signed message part
Re: tuples: initial infrastructure
+/* A sequences of gimple statements. */ +#define GS_SEQP_FIRST(S) (S)-first +#define GS_SEQP_LAST(S)(S)-last +#define GS_SEQ_FIRST(S)(S).first +#define GS_SEQ_LAST(S) (S).last Why do you have both of these? Most places in the gimplifier we will send sequences as pointers, but for saving state (see gimplify_and_add), we use local variables. I figured it'd be better than doing GS_SEQP_FIRST(non_pointer), but I can if you prefer. Otherwise it looks ok. I figure you'll want to build a set of iterators and such for gs_sequences, like for tree-iterator.[ch]. Yee haw! Thanks so much for reviewing this. I'll commit to the branch and start the long haul. Aldy
Re: Bootstrap failure for current gcc trunk on cygwin: in set_curr_insn_source_location, at cfglayout.c:284
I presume that this: -I../../trunk/gcc/../libdecnumber/bid -I../libdecnumber ../../trunk/gcc/gimplify.c -o gimplify.o ../../trunk/gcc/gimplify.c: In function 'create_tmp_var_name': ../../trunk/gcc/gimplify.c:431: internal compiler error: in set_curr_insn_source_location, at cfglayout.c:284 on x86_ia64/fc5 is not a coincidence? Paul -- Success is the ability to go from one failure to another with no loss of enthusiasm. - Winston Churchill
Re: Bootstrap failure for current gcc trunk on cygwin: in set_curr_insn_source_location, at cfglayout.c:284
On 4/23/07, Paul Richard Thomas [EMAIL PROTECTED] wrote: on x86_ia64/fc5 is not a coincidence? More over, there were a lot of targets by this patch because they would call insn_locators_initialize when generating the thunks (x86 did not because it uses text based thunks and not RTL based thunks). -- Pinski
Re: Does vectorizer support extension?
H. J. Lu [EMAIL PROTECTED] wrote on 23/04/2007 21:56:37: ... so this looks like a vec_unpacku_hi_v4si (or _lo?), i.e. what is now modeled as follows in sse.md: (define_expand vec_unpacku_hi_v4si [(match_operand:V2DI 0 register_operand ) (match_operand:V4SI 1 register_operand )] TARGET_SSE2 { ix86_expand_sse_unpack (operands, true, true); DONE; }) I am not sure if they are the same since SSE4.1 instructions extend the first 2 elements in the vector, not the high/low parts. unpack high/low means the high/low elements of the vector SSE4.1 has 1. The first 8 elements of V16QI zero/sign externd to V8HI. This one is equivalent to vec_unpacku/s_hi_v16qi. Did you mean vec_unpacku/s_lo_v16qi? 2. The first 4 elements of V16QI/V8HI zero/sign externd to V4SI. The second of these two - first 4 elements of V8HI zero/sign externd to V4SI - is equivalent to vec_unpacku/s_hi_v8hi. Did you mean vec_unpacku/s_lo_v8hi? 2. The first 2 elements of V16QI/V8HI/V4SI zero/sign externd to V2DI. The last of these three - first 2 elements of V4SI zero/sign extend to V2DI - is equivalent to vec_unpacku/s_hi_v4si. Did you mean vec_unpacku/s_lo_v4si? yes, I might be confusing high and low on x86, sorry about that. I hope the point is clear though which sse4 insns can be modeled using existing idioms that the vectorizer can use? (I think it is cause I think you already included it in your patch?) We currently don't have idioms that represent the other forms. By the way, the vectorizer will not be able to make use of these vec_unpacku/s_hi_* insns if you don't define the corresponding vec_unpacku/s_lo_* patterns (although I think these are already defined in sse.md, though maybe less efficiently than the way sse4 can support them?). With my SSE4.1 patch applied, for typedef char vec_t; typedef short vecx_t; extern __attribute__((aligned(16))) vec_t x [64]; extern __attribute__((aligned(16))) vecx_t y [64]; void foo () { int i; for (i = 0; i 64; i++) y [i] = x [i]; } I got movdqa x(%rip), %xmm0 movl$16, %eax pxor%xmm2, %xmm2 pmovsxbw%xmm0, %xmm1 movdqa %xmm1, y(%rip) movdqa %xmm2, %xmm1 pcmpgtb %xmm0, %xmm1 punpckhbw %xmm1, %xmm0 movdqa %xmm0, y+16(%rip) When extention is a single instruction, it is better to extend one low element at a time: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667 So I haven't had a chance to look at your patch, but judging from the code that was generated above I'm assuming that: - the code above was generated using the vectorizer, - the above is a body of a loop that iterates 4 times, - the vectorizer used the vec_unpack_lo/hi idioms to vectorize this loop, - with your patch, sse.md models vec_unpack_lo_* using the new pmovsx* insns, - sse.md still models vec_unpack_hi_* using the old mechanisms. (correct so far?) IIUC, icc used a vectorization factor 8, and in each vector iteration expanded 8 chars to 8 shorts, using the new pmovzx* insn, and then completely unrolled the loop. gcc OTOH used a vectorization factor 16, and in each vector iteration expanded 16 chars to 16 shorts, using a combination of the new and old sse insns for vector unpacking. In order for us to vectorize like icc, i.e. - using VF=8, I think the vectorizer needs to be extended to work with multiple vector sizes at once (a vector of 8 bytes to hold 8 chars, and a vector of 16 bytes to hold 8 shorts). We currently assume a single VS (16 in this case) and determine the VF according to the smallest elements in the loop (chars in this case). Note however, that while the code generated by icc looks much more compact indeed, it does generate twice as many loads as gcc (each of the 8 pmovzx* directly operates on memory), and in addition every other load is unaligned. gcc OTOH generated only loads, all aligned. So, in short, the two important notes are that (1) the vectorizer could to be extended to support multiple vector-sizes or anyhow the way the VF is determined could be extended to consider other alternatives, and (2) from the vectorizer's prespective, it's not obvious that working with VF=8 like icc is necessarily better than working with a VF=16, and a cost model would hopefully help gcc make the right choice... A few minor comments: * The icc code you show operates on unsigned types, and the gcc code you show operates on signed types. I don't know if it makes much of a difference, just for the record. * Also I wonder how the gcc code looks like when complete unrolling is applied (did you use -funoll-loops?). (like the point above, this is just so that we aompre apples w apples). * I don't entirely follow the code that gcc generates what's that for exactly?: pxor%xmm2, %xmm2 movdqa %xmm2,
Re: tuples: initial infrastructure
On Mon, Apr 23, 2007 at 04:30:40PM -0400, Aldy Hernandez wrote: I figured it'd be better than doing GS_SEQP_FIRST(non_pointer), but I can if you prefer. I think I would, though without the P. r~
Re: HTML of -fdump-tree-XXXX proposal.
J.C. Pizarro wrote: Your idea with JavaScript, CSS, XSLT, .. is very good! :) Thanks you - but ideas are cheap. Turned a vague idea into something useful is a different matter -- --Per Bothner [EMAIL PROTECTED] http://per.bothner.com/
bootstrap broken on powerpc: implicit declaration of function 'pthread_getaffinity_np'
Since the change listed below, bootstrap on powerpc is broken when you configure for both powerpc-linux and powerpc64-linux: 2007-04-04 Jakub Jelinek [EMAIL PROTECTED] * libgomp.h (gomp_cpu_affinity, gomp_cpu_affinity_len): New extern decls. The error I get is: ../../../../src/libgomp/config/linux/affinity.c: In function 'gomp_init_affinity': ../../../../src/libgomp/config/linux/affinity.c:51: error: implicit declaration of function 'pthread_getaffinity_np' ../../../../src/libgomp/config/linux/affinity.c:77: error: implicit declaration of function 'pthread_setaffinity_np' ../../../../src/libgomp/config/linux/affinity.c: In function 'gomp_init_thread_affinity': ../../../../src/libgomp/config/linux/affinity.c:100: error: implicit declaration of function 'pthread_attr_setaffinity_np' make[8]: *** [affinity.lo] Error 1 You can reproduce this with: ../src/configure -v --enable-languages=c,c++ --enable-targets=powerpc-linux,powerpc64-linux powerpc-linux-gnu make -- Martin Michlmayr http://www.cyrius.com/
gcc-4.1-20070423 is now available
Snapshot gcc-4.1-20070423 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20070423/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 124082 You'll find: gcc-4.1-20070423.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20070423.tar.bz2 C front end and core compiler gcc-ada-4.1-20070423.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20070423.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20070423.tar.bz2 C++ front end and runtime gcc-java-4.1-20070423.tar.bz2 Java front end and runtime gcc-objc-4.1-20070423.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20070423.tar.bz2The GCC testsuite Diffs from 4.1-20070416 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
debug output and register allocation
Hi I am wondering how gcc handles producing debug information for automatic variables that do not reside inthe stack.For example when say register allocation decides to assign a particular register to a variable or say it decides that the value is a constant. Can some one point me to the relevant pieces of code ? thanks for any pointers dz
Re: Does vectorizer support extension?
They should be independent of vec_unpack. H.J. .file pmovsxbw.c .text .p2align 4,,15 .globl foo .type foo, @function foo: pxor%xmm2, %xmm2 movdqa x(%rip), %xmm9 movdqa x+16(%rip), %xmm6 movdqa %xmm2, %xmm10 movdqa %xmm2, %xmm7 movdqa x+32(%rip), %xmm3 movdqa %xmm2, %xmm4 pmovsxbw%xmm9, %xmm11 movdqa x+48(%rip), %xmm0 pcmpgtb %xmm9, %xmm10 pcmpgtb %xmm6, %xmm7 pmovsxbw%xmm6, %xmm8 pcmpgtb %xmm3, %xmm4 pmovsxbw%xmm3, %xmm5 pcmpgtb %xmm0, %xmm2 pmovsxbw%xmm0, %xmm1 punpckhbw %xmm10, %xmm9 punpckhbw %xmm7, %xmm6 punpckhbw %xmm4, %xmm3 punpckhbw %xmm2, %xmm0 movdqa %xmm11, y(%rip) movdqa %xmm9, y+16(%rip) movdqa %xmm8, y+32(%rip) movdqa %xmm6, y+48(%rip) movdqa %xmm5, y+64(%rip) movdqa %xmm3, y+80(%rip) movdqa %xmm1, y+96(%rip) movdqa %xmm0, y+112(%rip) ret .size foo, .-foo .ident GCC: (GNU) 4.3.0 20070423 (experimental) [trunk revision 124056] .section.note.GNU-stack,,@progbits
Re: tuples: initial infrastructure
I figured it'd be better than doing GS_SEQP_FIRST(non_pointer), but I can if you prefer. I think I would, though without the P. Ok, everything fixed, except I haven't added the sequence iterators yet. I am committing the patch below to the gimple-tuples-branch. Thanks again. * gimple-ir.c: New file. * gimple-ir.h: New file. * gsstruct.def: New file. * gs.def: New file. * gengtype.c (open_base_files): Add gimple-ir.h. * tree-gimple.h: Include gimple-ir.h. Add sequence to gimplify_expr and gimplify_body prototypes. * gimplify.c: Include gimple-ir.h. (gimplify_and_add): Adjust for gimple IR. (gimplify_return_expr): Same. (gimplify_stmt): Add seq_p argument. (gimplify_expr): Add seq_p sequence and adjust accordingly. (gimplify_body): Same. * coretypes.h: Add gimple_statement_d and gimple definitions. * Makefile.in (GIMPLE_IR_H): New. (TREE_GIMPLE_H): Add gimple-ir.h. (OBJS-common): Add gimple-ir.o. (gimplify.o): Add GIMPLE_IR_H. (gimple-ir.o): New. (build/gencheck.o): Add gs.def. Index: gimple-ir.c === --- gimple-ir.c (revision 0) +++ gimple-ir.c (revision 0) @@ -0,0 +1,154 @@ +/* Gimple IR support functions. + + Copyright 2007 Free Software Foundation, Inc. + Contributed by Aldy Hernandez [EMAIL PROTECTED] + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 2, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING. If not, write to the Free +Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301, USA. */ + +#include config.h +#include system.h +#include coretypes.h +#include tree.h +#include ggc.h +#include errors.h +#include tree-gimple.h +#include gimple-ir.h + +#define DEFGSCODE(SYM, NAME) NAME, +static const char *gs_code_name[] = { +#include gs.def +}; +#undef DEFGSCODE + +/* Gimple tuple constructors. */ + +/* Construct a GS_RETURN statement. */ + +gimple +gs_build_return (bool result_decl_p, tree retval) +{ + gimple p = ggc_alloc_cleared (sizeof (struct gimple_statement_return)); + + GS_CODE (p) = GS_RETURN; + GS_SUBCODE_FLAGS (p) = (int) result_decl_p; + GS_RETURN_OPERAND_RETVAL (p) = retval; + return p; +} + +/* Return which gimple structure is used by T. The enums here are defined + in gsstruct.def. */ + +enum gimple_statement_structure_enum +gimple_statement_structure (gimple gs) +{ + unsigned int code = GS_CODE (gs); + unsigned int subcode = GS_SUBCODE_FLAGS (gs); + + switch (code) +{ +case GS_ASSIGN: + { + enum tree_code_class class = TREE_CODE_CLASS (subcode); + + if (class == tcc_binary + || class == tcc_comparison) + return GSS_ASSIGN_BINARY; + else + { + /* There can be 3 types of unary operations: + +SYM = constant == GSS_ASSIGN_UNARY_REG +SYM = SSA_NAME == GSS_ASSIGN_UNARY_REG +SYM = SYM2 == GSS_ASSIGN_UNARY_MEM +SYM = UNARY_OP SYM2== GSS_ASSIGN_UNARY_MEM + */ + if (class == tcc_constant || subcode == SSA_NAME) + return GSS_ASSIGN_UNARY_REG; + + /* Must be class == tcc_unary. */ + return GSS_ASSIGN_UNARY_MEM; + } + } +case GS_ASM: return GSS_ASM; +case GS_BIND: return GSS_BIND; +case GS_CALL: return GSS_CALL; +case GS_CATCH: return GSS_CATCH; +case GS_COND: return GSS_COND; +case GS_EH_FILTER: return GSS_EH_FILTER; +case GS_GOTO: return GSS_GOTO; +case GS_LABEL: return GSS_LABEL; +case GS_NOP: return GSS_BASE; +case GS_PHI: return GSS_PHI; +case GS_RESX: return GSS_RESX; +case GS_RETURN:return GSS_RETURN; +case GS_SWITCH:return GSS_SWITCH; +case GS_TRY: return GSS_TRY; +case GS_OMP_CRITICAL: return GSS_OMP_CRITICAL; +case GS_OMP_FOR: return GSS_OMP_FOR; +case GS_OMP_CONTINUE: +case GS_OMP_MASTER: +case GS_OMP_ORDERED: +case GS_OMP_RETURN: +case GS_OMP_SECTION: + return GSS_OMP; +case GS_OMP_PARALLEL: return GSS_OMP_PARALLEL; +case GS_OMP_SECTIONS: return GSS_OMP_SECTIONS; +case
Re: Bootstrap failure for current gcc trunk on cygwin: in set_curr_insn_source_location, at cfglayout.c:284
On 4/23/07, Paul Richard Thomas [EMAIL PROTECTED] wrote: on x86_ia64/fc5 is not a coincidence? More over, there were a lot of targets by this patch because they would call insn_locators_initialize when generating the thunks (x86 did not because it uses text based thunks and not RTL based thunks). I've reverted the patch, so it should be OK now. My apologizes for the breakage. Honza -- Pinski
Re: Bootstrap failure for current gcc trunk on cygwin: in set_curr_insn_source_location, at cfglayout.c:284
It happens! It also meant that I got to bed early:) Thanks Paul On 4/24/07, Jan Hubicka [EMAIL PROTECTED] wrote: On 4/23/07, Paul Richard Thomas [EMAIL PROTECTED] wrote: on x86_ia64/fc5 is not a coincidence? More over, there were a lot of targets by this patch because they would call insn_locators_initialize when generating the thunks (x86 did not because it uses text based thunks and not RTL based thunks). I've reverted the patch, so it should be OK now. My apologizes for the breakage. Honza -- Pinski -- Success is the ability to go from one failure to another with no loss of enthusiasm. - Winston Churchill
Re: Where is gstdint.h
[EMAIL PROTECTED] wrote: Tim Prince wrote: [EMAIL PROTECTED] wrote: Where is gstdint.h ? Does it acctually exist ? libdecnumber seems to use it. decimal32|64|128.h's include decNumber.h which includes deccontext.h which includes gstdint.h When you configure libdecnumber (e.g. by running top-level gcc configure), gstdint.h should be created, by modifying stdint.h. Since you said nothing about the conditions where you had a problem, you can't expect anyone to fix it for you. If you do want it fixed, you should at least file a complete PR. As it is more likely to happen with a poorly supported target, you may have to look into it in more detail than that. When this happened to me, I simply made a copy of stdint.h to get over the hump. This might happen when you run the top level gcc configure in its own directory. You may want to try to make a new directory elsewhere and run configure there. pwd .../my-gcc-source-tree mkdir ../build cd ../build ../my-gcc-source-tree/configure make If you're suggesting trying to build in the top level directory to see if the same problem occurs, I would expect other problems to arise. If it would help diagnose the problem, and the problem persists for a few weeks, I'd be willing to try it.
[Bug java/31622] Segment violation in the #8220;toString#8221; method on a mathematical expression
--- Comment #5 from eduardo dot iniesta at aquiline dot es 2007-04-23 07:53 --- (In reply to comment #4) Please post the output of running gcj -C -v Fail.java Thanks. The output of running gcj -C -v Fail.java is: $ gcj -C -v Fail.java salida_compila.txt Usando especificaciones internas. Objetivo: i486-linux-gnu Configurado con: /usr/src/gcc/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr/src/gcc/build --enable-shared --with-system-zlib --enable-libstdcxx-debug --enable-mpfr --with-tune=i686 --enable-checking=release i486-linux-gnu : (reconfigured) /usr/src/gcc/configure -v --enable-languages=all --prefix=/usr/src/gcc/build --enable-shared --with-system-zlib --enable-libstdcxx-debug --enable-mpfr --with-tune=i686 --enable-checking=release i486-linux-gnu : (reconfigured) /usr/src/gcc/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang,java --prefix=/usr/src/gcc/build --enable-shared --with-system-zlib --enable-libstdcxx-debug --enable-mpfr --enable-libgcj --with-tune=i686 --enable-checking=release i486-linux-gnu : (reconfigured) /usr/src/gcc/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang,java --prefix=/usr/src/gcc/build --enable-shared --enable-libstdcxx-debug --enable-mpfr --enable-libgcj --with-tune=i686 --enable-checking=release i486-linux-gnu Modelo de hilos: posix gcc versión 4.3.0 20061212 (experimental) /usr/src/gcc/build/libexec/gcc/i486-linux-gnu/4.3.0/jc1 Fail.java -quiet -dumpbase Fail.java -mtune=i686 -auxbase-strip NONE -g1 -version -fsyntax-only -femit-class-files -o /dev/null GNU Java versión 4.3.0 20061212 (experimental) (i486-linux-gnu) compilado por GNU C versión 4.3.0 20061212 (experimental). GGC heurÃsticas: --param ggc-min-expand=99 --param ggc-min-heapsize=129589 Class path starts here: ./ /usr/src/gcc/build/share/java/libgcj-4.3.0.jar/ (system) (zip) Fail.java: In class 'Fail': Fail.java: In method 'Fail.main(java.lang.String[])': Fail.java:5: error interno del compilador: Violación de segmento Por favor envÃe un reporte completo de bichos, con el código preprocesado si es apropiado. Vea URL:http://gcc.gnu.org/bugs.html para más instrucciones. Bests regards -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31622
[Bug target/31403] wrong branch instructions generated with -m2a on sh-elf
--- Comment #3 from chrbr at gcc dot gnu dot org 2007-04-23 07:59 --- Hi Kaj, The same problem seems to transpire from the movsf_ie pattern for the sh2a-fpu that also have 32 bit memory instructions. So your fix also applies there. Note that traditional sh memory move instructions can also have a length of 2 so your fix is conservative (but not more than the previous code). Shouldn't the new 4 bytes instructions be described latter with a new memory constraint ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31403
[Bug c++/31663] New: [4.3 Regression] Segfault in constrain_class_visibility with anonymous namespace
I get the following segfault with current gcc 4.3. This was introduced between 20070330-r123378 (works) and 20070417-r123941 (segfault). This is related to the use of anonymous namespace. I believe it was caused by: 2007-04-16 Seongbae Park [EMAIL PROTECTED] PR c++/29365 * cp/decl2.c (constrain_class_visibility): Do not warn about the use of anonymous namespace in the main input file. (sid)19349:[EMAIL PROTECTED]: ~] /usr/lib/gcc-snapshot/bin/g++ -c kdevelop-mainwindowshare.cc kdevelop-mainwindowshare.cc:6: internal compiler error: Segmentation fault Please submit a full bug report, Testcase: namespace { class ToolbarListView; } class KEditToolbarWidget { ToolbarListView *m_inactiveList; }; Backtrace: Program received signal SIGSEGV, Segmentation fault. 0x00497f92 in constrain_class_visibility (type=0x2aae86bd2540) at /home/tbm/scratch/gcc/gcc/cp/decl2.c:1865 1865DECL_SOURCE_FILE (TYPE_MAIN_DECL (ftype (gdb) where #0 0x00497f92 in constrain_class_visibility (type=0x2aae86bd2540) at /home/tbm/scratch/gcc/gcc/cp/decl2.c:1865 #1 0x004915ff in finish_struct_1 (t=0x2aae86bd2540) at /home/tbm/scratch/gcc/gcc/cp/class.c:5103 #2 0x00492560 in finish_struct (t=0x2aae868df960, attributes=0x0) at /home/tbm/scratch/gcc/gcc/cp/class.c:5221 #3 0x004b9146 in cp_parser_type_specifier (parser=0x2aae86a3f550, flags=value optimized out, decl_specs=0x7fff247a2f30, is_declaration=1 '\001', declares_class_or_enum=0x7fff247a2ee0, is_cv_qualifier=value optimized out) at /home/tbm/scratch/gcc/gcc/cp/parser.c:13821 #4 0x004be14a in cp_parser_decl_specifier_seq (parser=0x2aae86a3f550, flags=CP_PARSER_FLAGS_OPTIONAL, decl_specs=0x7fff247a2f30, declares_class_or_enum=0x7fff247a2f88) at /home/tbm/scratch/gcc/gcc/cp/parser.c:8023 #5 0x004c34d5 in cp_parser_simple_declaration (parser=0x2aae86a3f550, function_definition_allowed_p=1 '\001') at /home/tbm/scratch/gcc/gcc/cp/parser.c:7725 #6 0x004c56e7 in cp_parser_block_declaration (parser=0x2aae86a3f550, statement_p=0 '\0') at /home/tbm/scratch/gcc/gcc/cp/parser.c:7686 #7 0x004c805e in cp_parser_declaration (parser=0x2aae86a3f550) at /home/tbm/scratch/gcc/gcc/cp/parser.c:7594 #8 0x004c7976 in cp_parser_declaration_seq_opt (parser=0x2aae86a3f550) at /home/tbm/scratch/gcc/gcc/cp/parser.c:7489 #9 0x004c7c91 in c_parse_file () at /home/tbm/scratch/gcc/gcc/cp/parser.c:2941 #10 0x005771ea in c_common_parse_file (set_yydebug=value optimized out) at /home/tbm/scratch/gcc/gcc/c-opts.c:1268 #11 0x00786f04 in toplev_main (argc=value optimized out, argv=value optimized out) at /home/tbm/scratch/gcc/gcc/toplev.c:1050 #12 0x2aae866b2314 in __libc_start_main () from /lib/libc.so.6 -- Summary: [4.3 Regression] Segfault in constrain_class_visibility with anonymous namespace Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tbm at cyrius dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31663
[Bug fortran/31616] testsuite failures in gfortran.dg/open_errors.f90
--- Comment #6 from ghazi at gcc dot gnu dot org 2007-04-23 08:52 --- Subject: Bug 31616 Author: ghazi Date: Mon Apr 23 08:52:24 2007 New Revision: 124059 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124059 Log: PR fortran/31616 * gfortran.dg/open_errors.f90: Allow a different error message. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/open_errors.f90 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31616
[Bug target/31403] wrong branch instructions generated with -m2a on sh-elf
--- Comment #4 from kkojima at gcc dot gnu dot org 2007-04-23 09:53 --- The same problem seems to transpire from the movsf_ie pattern for the sh2a-fpu that also have 32 bit memory instructions. So your fix also applies there. Ah, thanks! I'll add movsf_ie part when I return to this problem. Shouldn't the new 4 bytes instructions be described latter with a new memory constraint ? Maybe, though I'm not sure if it's worth to effort. Of course, it'd be interesting to collect some numbers with such a change. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31403
[Bug bootstrap/31523] bootstrap xgcc internal compiler error (using -O3)
--- Comment #9 from anirkko at insel dot ch 2007-04-23 11:35 --- (In reply to comment #8) The bootstrap works fine with all flags equal to '-O2' Great, thanks for the confirmation. Still worrysome, because somewhere in the installation instructions, it is recommended to bootstrap the compiler with the flags one intends to later use it with. Please point me to this and I'll immediately propose to delete it. 99% of the testing of the compiler is done with the default bootstrap settings. Does it mean the offending flag is broken and should never be used at all? -mcpu=supersparc is totally untested these days so bugs can be expected, yes. Please point me to this... It is in the main directory, file FAQ, section Optimizing the compiler itself: If you want to test a particular optimization option, it's useful to try bootstrapping the compiler with that option turned on. But please don't let this be removed, because 1) I find the suggestion useful, and it seems to somewhat parallel the reason why you want the compiler bootstrap itself through all stages in the first place: to make sure it can compile itself. Likewise, if it can compile itself reproducibly through the bootstrap using different options, you have made sure these options are pretty well tested for real code by the time you have the compiler finished 2) You closed this bug on the premises that -mcpu=supersparc is the culprit. Well, it isn't: I now bootstrapped with '-O2 -mcpu=supersparc' and everything went fine! Therefore, I still think there is a bug in -O3. (alternatively, in the combination of the two). Would you consider reopening this bug if it was -O3? (In this case, I might re-run the bootstrap with only '-O3', next weekend). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31523
[Bug bootstrap/31523] bootstrap xgcc internal compiler error (using -O3)
--- Comment #10 from ebotcazou at gcc dot gnu dot org 2007-04-23 11:55 --- It is in the main directory, file FAQ, section Optimizing the compiler itself: If you want to test a particular optimization option, it's useful to try bootstrapping the compiler with that option turned on. OK, but note that it's quite different from what you said. First, it's not recommended, it's useful to try. Second, it's only for *testing* the option, not for production use. 1) I find the suggestion useful, and it seems to somewhat parallel the reason why you want the compiler bootstrap itself through all stages in the first place: to make sure it can compile itself. Likewise, if it can compile itself reproducibly through the bootstrap using different options, you have made sure these options are pretty well tested for real code by the time you have the compiler finished OK, that makes sense. Therefore, I still think there is a bug in -O3. (alternatively, in the combination of the two). Would you consider reopening this bug if it was -O3? (In this case, I might re-run the bootstrap with only '-O3', next weekend). Yes, I'm going to reopen it, but I'm not sure someone will tackle it. -- ebotcazou at gcc dot gnu dot org changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|WONTFIX | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31523
[Bug fortran/29523] ICE with some non up-to date .mod files
--- Comment #6 from keinstein_junior at gmx dot net 2007-04-23 12:08 --- I tried to add a new testcase which shows the error with my gfortran-4.2. But got an error. It's located at http://rcswww.urz.tu-dresden.de/~s7935097/src-differror2.tgz now. FYI alloys.mod depends on the other two, is outdated and should be rebuild. But ICE is not a real solution. I'd expect some error about inconsistent .mod fieles (probably noting, which one should be rebuilt). [EMAIL PROTECTED]:~/bugs/src-differror2$ ./start-gfc diffussion.F90: In function #8216;MAIN__#8217;: diffussion.F90:9: internal compiler error: in gfc_get_derived_type, at fortran/trans-types.c:1452 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. [EMAIL PROTECTED]:~/bugs/src-differror2$ vi start-gfc [EMAIL PROTECTED]:~/bugs/src-differror2$ ./start-gfc Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /projects/tob/gcc-4_2-branch/configure --enable-languages=c,fortran --prefix=/projects/tob/gcc-4.2 Thread model: posix gcc version 4.2.0 20070215 (prerelease) /usr/local/gcc-4.2/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.2.0/cc1 -E -lang-fortran -traditional-cpp -D_LANGUAGE_FORTRAN -quiet -v -I. -I../../src -I ../../includes -I potentials -I tools -I lattices -I Verfahren -I filter -iprefix /usr/local/gcc-4.2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.0/ -DPACKAGE_NAME=matprop -DPACKAGE_TARNAME=matprop -DPACKAGE_VERSION=0.1.0 -DPACKAGE_STRING=matprop 0.1.0 -DPACKAGE_BUGREPORT=[EMAIL PROTECTED] -DPACKAGE=matprop -DVERSION=0.1.0 -DDEBUG diffussion.F90 -mtune=generic -Wall -Wsurprising -fbounds-check -ftree-vectorize -fworking-directory -O3 -o /tmp/ccDZMQzD.f95 ignoring nonexistent directory /usr/local/gcc-4.2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.0/../../../../x86_64-unknown-linux-gnu/include ignoring nonexistent directory /projects/tob/gcc-4.2/include ignoring nonexistent directory /projects/tob/gcc-4.2/lib/gcc/x86_64-unknown-linux-gnu/4.2.0/include ignoring nonexistent directory /projects/tob/gcc-4.2/x86_64-unknown-linux-gnu/include ignoring nonexistent directory ../../src ignoring nonexistent directory ../../includes ignoring nonexistent directory potentials ignoring nonexistent directory tools ignoring nonexistent directory lattices ignoring nonexistent directory Verfahren ignoring nonexistent directory filter #include ... search starts here: #include ... search starts here: . /usr/local/gcc-4.2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.0/include /usr/local/include /usr/include End of search list. /usr/local/gcc-4.2/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.2.0/f951 /tmp/ccDZMQzD.f95 -ffree-form -quiet -dumpbase diffussion.F90 -mtune=generic -auxbase-strip diffussion.o -g -O3 -Wall -Wsurprising -version -p -fbounds-check -ftree-vectorize -I. -I../../src -I ../../includes -I potentials -I tools -I lattices -I Verfahren -I filter -fpreprocessed -I /usr/local/gcc-4.2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.0/finclude -o /tmp/ccsyx4u7.s GNU F95 version 4.2.0 20070215 (prerelease) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.2.0 20070215 (prerelease). GGC heuristics: --param ggc-min-expand=98 --param ggc-min-heapsize=128471 diffussion.F90: In function #8216;MAIN__#8217;: diffussion.F90:9: internal compiler error: in gfc_get_derived_type, at fortran/trans-types.c:1452 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. -- keinstein_junior at gmx dot net changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29523
[Bug target/31641] [4.1/4.2/4.3 Regression] ICE in s390_expand_setmem, at config/s390/s390.c:3618
--- Comment #2 from krebbel at gcc dot gnu dot org 2007-04-23 12:21 --- In your example the memset function is called with -1 as length argument. When GCC tries to expand this as a builtin function an assertion in the s390 back end function s390_expand_setmem is triggered. Although an ICE is the wrong thing to respond I would consider it a code bug as well. I've proposed a patch to issue a proper error message and call the library function in that situation. The library function probably would write one byte below the target address causing a segfault for a -1 length which is most likely not what the programmer intended but thats what would happen in the -O0 as well. -- krebbel at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-04-23 12:21:59 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31641
[Bug tree-optimization/15353] [tree-ssa] Merge two ifs if one subsumes the other.
--- Comment #10 from rguenth at gcc dot gnu dot org 2007-04-23 12:47 --- Mine. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2006-10-24 12:39:16 |2007-04-23 12:47:56 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15353
[Bug libstdc++/31638] [4.0/4.1/4.2 Regression] string usage leads to warning with -Wcast-align
--- Comment #6 from pcarlini at suse dot de 2007-04-23 13:35 --- (In reply to comment #5) It is OK with me. Ok, excellent. For 4_2-branch we have a nuisance, in that the original testcase involves -Wconversion which in that branch does nothing in C++. Thus I tested on ia64-linux the attached, which adds a new test. If requested I can add it to mainline too, otherwise I will simply commit to the branch and close this PR. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31638
[Bug libstdc++/31638] [4.0/4.1/4.2 Regression] string usage leads to warning with -Wcast-align
--- Comment #7 from pcarlini at suse dot de 2007-04-23 13:36 --- Created an attachment (id=13430) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13430action=view) Patch for 4_2-branch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31638
[Bug target/31641] [4.1/4.2/4.3 Regression] ICE in s390_expand_setmem, at config/s390/s390.c:3618
--- Comment #3 from uweigand at gcc dot gnu dot org 2007-04-23 14:51 --- I don't think the patch is correct; according to the C standard, the third argument of memset is of type size_t, which must be an *unsigned* type, so it cannot in fact be negative. What apparently happens is that the argument (after conversion to size_t) is so big that it appears to be negative in its representation as CONST_INT, so the assert in s390.c triggers. A proper fix would probably be to remove the assert in s390_expand_setmem and at the same time make sure those big sizes are handled correctly. (In any case, the testcase certainly is broken anyway.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31641
[Bug c++/31665] New: %s substituted with built-in/library can't be properly translated
In cp/decl.c there is this code warning (OPT_Wshadow, shadowing %s function %q#D, DECL_BUILT_IN (olddecl) ? built-in : library, olddecl); The strings substituted for the first %s are not available for translation, so this can not be properly translated. Even if they were, composing a sentence like this is not in general possible to do for an arbitrary language. -- Summary: %s substituted with built-in/library can't be properly translated Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: goeran at uddeborg dot se http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31665
[Bug c++/31663] [4.3 Regression] Segfault in constrain_class_visibility with anonymous namespace
--- Comment #1 from pinskia at gcc dot gnu dot org 2007-04-23 15:07 --- http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01191.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31663
[Bug c++/31666] New: [4.3 regression]: g++.old-deja/g++.other/vbase5.C execution test
In http://gcc.gnu.org/ml/gcc-testresults/2007-04/msg01156.html there is FAIL: g++.old-deja/g++.other/vbase5.C execution test -- Summary: [4.3 regression]: g++.old-deja/g++.other/vbase5.C execution test Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl at lucon dot org GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31666
[Bug c++/31666] [4.3 regression]: g++.old-deja/g++.other/vbase5.C execution test
--- Comment #1 from rguenth at gcc dot gnu dot org 2007-04-23 15:26 --- Confirmed, I also see this. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-04-23 15:26:32 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31666
[Bug preprocessor/30468] [4.0/4.1/4.2 Regression] -M not fully chops dirname
--- Comment #7 from tromey at gcc dot gnu dot org 2007-04-23 15:26 --- Subject: Bug 30468 Author: tromey Date: Mon Apr 23 15:26:21 2007 New Revision: 124067 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124067 Log: PR preprocessor/30468: * mkdeps.c (apply_vpath): Strip successive '/'s if we stripped './'. Modified: branches/gcc-4_1-branch/libcpp/ChangeLog branches/gcc-4_1-branch/libcpp/mkdeps.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30468
[Bug preprocessor/30468] [4.0/4.1/4.2 Regression] -M not fully chops dirname
--- Comment #8 from tromey at gcc dot gnu dot org 2007-04-23 15:27 --- Subject: Bug 30468 Author: tromey Date: Mon Apr 23 15:26:51 2007 New Revision: 124068 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124068 Log: PR preprocessor/30468: * mkdeps.c (apply_vpath): Strip successive '/'s if we stripped './'. Modified: branches/gcc-4_2-branch/libcpp/ChangeLog branches/gcc-4_2-branch/libcpp/mkdeps.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30468
[Bug tree-optimization/31667] New: Integer externsions aren't vectorized
SSE4.1 has pmovzx and pmovsx. For code like: [EMAIL PROTECTED] vect]$ cat pmovzxbw.c typedef unsigned char vec_t; typedef unsigned short vecx_t; extern __attribute__((aligned(16))) vec_t x [64]; extern __attribute__((aligned(16))) vecx_t y [64]; void foo () { int i; for (i = 0; i 64; i++) y [i] = x [i]; } Icc generates pmovzxbw x(%rip), %xmm0#13.14 pmovzxbw 8+x(%rip), %xmm1 #13.14 pmovzxbw 16+x(%rip), %xmm2 #13.14 pmovzxbw 24+x(%rip), %xmm3 #13.14 pmovzxbw 32+x(%rip), %xmm4 #13.14 pmovzxbw 40+x(%rip), %xmm5 #13.14 pmovzxbw 48+x(%rip), %xmm6 #13.14 pmovzxbw 56+x(%rip), %xmm7 #13.14 movdqa%xmm0, y(%rip)#13.5 movdqa%xmm1, 16+y(%rip) #13.5 movdqa%xmm2, 32+y(%rip) #13.5 movdqa%xmm3, 48+y(%rip) #13.5 movdqa%xmm4, 64+y(%rip) #13.5 movdqa%xmm5, 80+y(%rip) #13.5 movdqa%xmm6, 96+y(%rip) #13.5 movdqa%xmm7, 112+y(%rip)#13.5 ret #14.1 -- Summary: Integer externsions aren't vectorized Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl at lucon dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667
[Bug preprocessor/30468] [4.0/4.1/4.2 Regression] -M not fully chops dirname
--- Comment #9 from tromey at gcc dot gnu dot org 2007-04-23 15:27 --- Fixed on 4.1 and 4.2 branches. -- tromey at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30468
[Bug java/31570] ports/gcc43 fails on FreeBSD 6.2 with signal 9
--- Comment #2 from vaclav dot kocian at wo dot cz 2007-04-23 15:33 --- My RAM is 512MB. Well, the newer version of gcc says : You need to increase the datasize limit to at least 70 (and set kern.maxdsiz=734003200 in /boot/loader.conf) to build with Java support. I do not understand it fully, but it seems quite differently. For now, I'm satisfied with gcc42. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31570
[Bug fortran/31620] [4.3 regression] Zeroing one component of array of derived types zeros the whole structure.
--- Comment #8 from pault at gcc dot gnu dot org 2007-04-23 16:14 --- Subject: Bug 31620 Author: pault Date: Mon Apr 23 16:13:48 2007 New Revision: 124069 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124069 Log: 2007-04-23 Paul Thomas [EMAIL PROTECTED] PR fortran/31630 * resolve.c (resolve_symbol): Allow resolution of formal namespaces nested within formal namespaces coming from modules. PR fortran/31620 * trans-expr.c (gfc_trans_assignment): Make the call to gfc_trans_zero_assign conditional on the lhs array ref being the only reference. 2007-04-23 Paul Thomas [EMAIL PROTECTED] PR fortran/31630 * gfortran.dg/used_types_17.f90: New test. PR fortran/31620 * gfortran.dg/zero_array_components_1.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/used_types_17.f90 trunk/gcc/testsuite/gfortran.dg/zero_array_components_1.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/resolve.c trunk/gcc/fortran/trans-expr.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31620
[Bug fortran/31630] [4.3 regression] ICE on nasty derived types code
--- Comment #3 from pault at gcc dot gnu dot org 2007-04-23 16:14 --- Subject: Bug 31630 Author: pault Date: Mon Apr 23 16:13:48 2007 New Revision: 124069 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124069 Log: 2007-04-23 Paul Thomas [EMAIL PROTECTED] PR fortran/31630 * resolve.c (resolve_symbol): Allow resolution of formal namespaces nested within formal namespaces coming from modules. PR fortran/31620 * trans-expr.c (gfc_trans_assignment): Make the call to gfc_trans_zero_assign conditional on the lhs array ref being the only reference. 2007-04-23 Paul Thomas [EMAIL PROTECTED] PR fortran/31630 * gfortran.dg/used_types_17.f90: New test. PR fortran/31620 * gfortran.dg/zero_array_components_1.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/used_types_17.f90 trunk/gcc/testsuite/gfortran.dg/zero_array_components_1.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/resolve.c trunk/gcc/fortran/trans-expr.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31630
[Bug c++/31598] g++ does not accept some OpenMP code
--- Comment #2 from theodore dot papadopoulo at sophia dot inria dot fr 2007-04-23 16:46 --- (From update of attachment 13378) Slightly simplified the testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31598
[Bug c++/31663] [4.3 Regression] Segfault in constrain_class_visibility with anonymous namespace
--- Comment #2 from spark at gcc dot gnu dot org 2007-04-23 16:53 --- My patch (which is still waiting for a review) should fix this. -- spark at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |spark at gcc dot gnu dot org |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-04-23 16:53:04 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31663
[Bug c++/31598] g++ does not accept some OpenMP code
--- Comment #3 from theodore dot papadopoulo at sophia dot inria dot fr 2007-04-23 17:01 --- Sorry to have added you without asking Jakub, but it looks like you are one of the person that deals with OpenMP and this bug seems to have been unnoticed up to now... It seems that this problem is related to the template instanciation mecanism (as far as I can tell). The trouble is that finish_omp_clauses ensures that the variable b is complete by calling require_complete_type but since we are in template instanciation mode, this function returns immediately without doing anything. Further in finish_omp_clauses (semantics.c:3655), assuming that the type is complete the code tries to build a call to a (copy?) constructor, build_call calls complete_type_or_else which calls complete_type which calls instantiate_class_template which returns immediately because TYPE_BEING_DEFINED is true so complete_type_or_else complains. Is there a way to defer the completeness check to after the type has been instantiated ?? -- theodore dot papadopoulo at sophia dot inria dot fr changed: What|Removed |Added CC||jakub at redhat dot com Summary|g++ does not accept some|g++ does not accept some |OpenMP code |OpenMP code http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31598
[Bug c++/31419] [4.1/4.2/4.3 regression] template user defined conversion operator instantiated for conversion to self
--- Comment #5 from janis at gcc dot gnu dot org 2007-04-23 17:02 --- A regression hunt on powerc-linux using the submitter's testcase identified the following patch: http://gcc.gnu.org/viewcvs?view=revrev=64815 r64815 | nathan | 2003-03-24 19:47:17 + (Mon, 24 Mar 2003) -- janis at gcc dot gnu dot org changed: What|Removed |Added CC||nathan at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31419
[Bug c++/31338] [4.1/4.2/4.3 regression] Cannot apply ! to complex constants
--- Comment #3 from janis at gcc dot gnu dot org 2007-04-23 17:06 --- A regression hunt on powerpc-linux identified the following patch: http://gcc.gnu.org/viewcvs?view=revrev=69715 r69715 | mmitchel | 2003-07-23 18:44:43 + (Wed, 23 Jul 2003) -- janis at gcc dot gnu dot org changed: What|Removed |Added CC||mmitchel at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31338
[Bug c++/31337] [4.2/4.3 regression] ICE with statement expression
--- Comment #3 from janis at gcc dot gnu dot org 2007-04-23 17:19 --- A regression hunt on powerpc-linux identified the following patch: http://gcc.gnu.org/viewcvs?view=revrev=116311 r116311 | jason | 2006-08-21 13:54:57 -0700 (Mon, 21 Aug 2006) -- janis at gcc dot gnu dot org changed: What|Removed |Added CC||jason at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31337
[Bug java/31622] Segment violation in the #8220;toString#8221; method on a mathematical expression
--- Comment #6 from tromey at gcc dot gnu dot org 2007-04-23 17:24 --- Ok. You are running a version of gcj 4.3 from *before* the gcj-eclipse merge. So, this is correctly marked as a duplicate. If you update and rebuild, and follow the new instructions vis a vis ecj1, you will get a working gcj. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31622
[Bug c++/31584] ICE on probably invalid code
--- Comment #4 from numerical dot simulation at web dot de 2007-04-23 17:24 --- Sorry, the link was wrong, must be http://groups.google.de/group/comp.lang.c++.moderated/browse_thread/thread/8c3b8a84ed78b003/4d9603171894a75d?hl=de#4d9603171894a75d -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31584
[Bug c++/31411] [4.1/4.2/4.3 Regression] ICE in gimplify_expr with throw/special copy constructor with initializer with a deconstructor
--- Comment #7 from janis at gcc dot gnu dot org 2007-04-23 17:27 --- A regression hunt identified the tree-ssa merge to mainline. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31411
[Bug fortran/31620] [4.3 regression] Zeroing one component of array of derived types zeros the whole structure.
--- Comment #9 from pault at gcc dot gnu dot org 2007-04-23 17:52 --- Fixed on trunk Paul -- pault at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31620
[Bug fortran/31630] [4.3 regression] ICE on nasty derived types code
--- Comment #4 from pault at gcc dot gnu dot org 2007-04-23 17:52 --- Fixed on trunk Paul -- pault at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31630
[Bug fortran/31668] New: %VAL rejected for PROC_MODULE and PROC_INTERNAL procedures
Reported by Arjan van Dijk, http://gcc.gnu.org/ml/fortran/2007-04/msg00367.html gfortran rejects the following code with the error: By-value argument at (1) is not allowed in this context This is because the following check is matched: resolve.c, resolve_actual_arglist(): /* Intrinsics are still PROC_UNKNOWN here. However, since same file external procedures are not resolvable in gfortran, it is a good deal easier to leave them to intrinsic.c. */ if (ptype != PROC_UNKNOWN ptype != PROC_DUMMY ptype != PROC_EXTERNAL) However, in this case: ptype == PROC_MODULE. The following values are possible: PROC_UNKNOWN, PROC_MODULE, PROC_INTERNAL, PROC_DUMMY, PROC_INTRINSIC, PROC_ST_FUNCTION, PROC_EXTERNAL I have to think about which cases make sense and which don't. What speaks against allowing PROC_INTRINSIC, PROC_INTERNAL, PROC_ST_FUNCTION? Is there one which needs always be rejected? If yes, a test case would be nice. For PROC_INTERNAL I would argue it should be allowed: SUBROUTINE Grid2BMP(NX) INTEGER, INTENT(IN) :: NX call bmp_write(%val(nx)) contains subroutine bmp_write(nx) integer, intent(in) :: nx end subroutine bmp_write END SUBROUTINE Grid2BMP end For statement functions, I agree it should be invalid (for %VAL, %REF, %DESCR): SUBROUTINE Grid2BMP(NX) INTEGER, INTENT(IN) :: NX integer :: i,f f(i)=i**2 ! statement function i = f(%VAL(i)) END SUBROUTINE Grid2BMP This agrees with ifort which also rejects it. And for PROC_INTRINSIC: ifort rejects sin(%VAL(x)) as above (%VAL/%REF/%DESCR invalid in this context), gfortran has the error: Argument list function at (1) is not allowed in this context. (I prefer Intel's error message.) At the moment, I don't see any intrinsic which would work with %VAL, but maybe I miss something. Is there an example for PROC_INTRINSIC or PROC_UNKNOWN which needs to be supported? Example: a) Longer example, see follow up email (URL above) b) short example (works if one moves the interface into the procedure body): module x interface subroutine bmp_write(nx) integer, intent(in) :: nx end subroutine bmp_write end interface contains SUBROUTINE Grid2BMP(NX) INTEGER, INTENT(IN) :: NX call bmp_write(%val(nx)) END SUBROUTINE Grid2BMP END module end -- Summary: %VAL rejected for PROC_MODULE and PROC_INTERNAL procedures Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: burnus at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31668
[Bug fortran/31668] %VAL rejected for PROC_MODULE and PROC_INTERNAL procedures
-- fxcoudert at gcc dot gnu dot org changed: What|Removed |Added CC||fxcoudert at gcc dot gnu dot ||org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-04-23 18:05:48 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31668
[Bug ada/31669] New: GNAT blows up during compilation
Ada compiler, GNAT, blows up during compilation of the code segment below with an unhandled exception error message raised RTSIFND.RE_NOT_AVAILABLE : rts.find.adb:210 package Abstract_Base is pragma Pure; type Base_Type is abstract tagged limited private; procedure Update (This : access Base_Type; Data : in Integer) is abstract; private type Base_Type is abstract tagged limited null record; end Abstract_Base; with Abstract_Base; package Re_Not_Available is pragma Remote_Call_Interface; type Base_Access is access all Abstract_Base.Base_Type'Class; type Subscription is private; Null_Sub : constant Subscription; procedure Subscribe (This : in out Subscription; Callback : in Base_Access); procedure Unsubscribe (This : in out Subscription); type Publication is Private; Null_Pub : constant Publication; procedure Register (This : in out Publication); function Register return Publication; procedure Unregister (This : in out Publication); procedure Publish (This : Publication; Data : Integer); private Max_Subscribers : constant := 20; type Subscriber_Id is range 0 .. Max_Subscribers; type Subscription is new Subscriber_Id; type Publication is new Boolean; Null_Sub : constant Subscription := 0; Null_Pub : constant Publication := False; end Re_Not_Available; -- Summary: GNAT blows up during compilation Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: blocker Priority: P3 Component: ada AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: anhvofrcaus at gmail dot com GCC build triplet: gcc-4.3-20070420 GCC host triplet: RedHat 10.0 on X86 GCC target triplet: RedHat 10.0 on X86 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31669
[Bug ada/31669] GNAT blows up during compilation
-- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org Severity|blocker |normal http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31669
[Bug c/31670] New: Error calculating size of structs
#include stdio.h typedef struct{ char data[261]; int n; } packet; int main(int argc, char *argv[]){ packet p; //It should print packet=265... it prints packet=268 printf(packet = %d\n, sizeof(packet)); //It should print p=265... it prints p=268 printf(p = %d\n, sizeof(p)); //It should print p.n=4... OK printf(p.n = %d\n, sizeof(p.n)); //It should print p.data=261... OK printf(p.data = %d\n, sizeof(p.data)); return 0; } -- Summary: Error calculating size of structs Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: padrinator at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31670
[Bug c/31670] Error calculating size of structs
--- Comment #1 from pinskia at gcc dot gnu dot org 2007-04-23 19:32 --- Learn about alignment when doing struct layout. Basically you want to use the attribute packed to get the sizes you want. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31670
[Bug rtl-optimization/28812] RTL aliasing vs may_alias and structs
--- Comment #6 from amylaar at gcc dot gnu dot org 2007-04-23 19:37 --- (In reply to comment #5) Fixed. I also see mayalias-2 failing with gcc 4.2 on arc at -O3 -fomit-frame-pointer. Life analysis finds a 'clever' way to initialize the pointer p by copying the frame pointer and then using pre-decrement addressing. The following rtl dump is from mayalias-2.c.134r.life2 ; note that insn 13 is the store of 1, and insn 15 is the dependent load. Because the address in insn 13 does not show the frame pointer explicitly, fixed_scalar_and_varying_struct_p decides that 'p' is varying, and hence the two memory accesses don't alias. (insn:HI 42 7 8 2 (set (reg/v/f:SI 149 [ p ]) (reg/f:SI 27 fp)) 12 {*movsi_insn} (nil) (nil)) (insn:HI 8 42 10 2 (set (mem/c/i:SI (pre_dec:SI (reg/v/f:SI 149 [ p ])) [3 a+0 S4 A32]) (reg:SI 151)) 12 {*movsi_insn} (insn_list:REG_DEP_TRUE 7 (insn_list:REG_DEP_TRUE 42 (nil))) (expr_list:REG_DEAD (reg:SI 151) (expr_list:REG_INC (reg/v/f:SI 149 [ p ]) (expr_list:REG_EQUAL (const_int 10 [0xa]) (nil) (note:HI 10 8 12 2 NOTE_INSN_DELETED) (insn:HI 12 10 13 2 (set (reg:HI 152) (const_int 1 [0x1])) 7 {*movhi_insn} (nil) (nil)) (insn:HI 13 12 15 2 (set (mem/s/j:HI (reg/v/f:SI 149 [ p ]) [0 variable.x+0 S2 A16]) (reg:HI 152)) 7 {*movhi_insn} (insn_list:REG_DEP_TRUE 10 (insn_list:REG_DEP_TRUE 12 (nil))) (expr_list:REG_DEAD (reg:HI 152) (expr_list:REG_DEAD (reg/v/f:SI 149 [ p ]) (expr_list:REG_EQUAL (const_int 1 [0x1]) (nil) (insn:HI 15 13 16 2 (set (reg:SI 153 [ a ]) (mem/c/i:SI (plus:SI (reg/f:SI 27 fp) (const_int -4 [0xfffc])) [3 a+0 S4 A32])) 12 {*movsi_insn} (nil) (nil)) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28812
[Bug c++/31671] New: Non-type template of type const ref accepted as a non-const ref
The following code compiles in all the gcc versions I tested (4.0.1, 4.1.1, 4.1.2): templateint i void doit() { i = 0; } templateconst int i class X { public: void foo() { doiti(); } }; int i; Xi x; int main(int argc, char **argv) { x.foo(); } Note that if i is declared const then the code will not compile. -- Summary: Non-type template of type const ref accepted as a non- const ref Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chgros at coverity dot com GCC build triplet: i486-linux-gnu GCC host triplet: i686-linux-gnu GCC target triplet: i686-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31671
[Bug c++/31671] Non-type template of type const ref accepted as a non-const ref
--- Comment #1 from pinskia at gcc dot gnu dot org 2007-04-23 20:14 --- Confirmed, not a regression as far as I can tell. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 GCC build triplet|i486-linux-gnu | GCC host triplet|i686-linux-gnu | GCC target triplet|i686-linux-gnu | Keywords||accepts-invalid Known to fail||3.3.3 4.3.0 4.2.0 Last reconfirmed|-00-00 00:00:00 |2007-04-23 20:14:52 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31671
[Bug c++/31337] [4.2/4.3 regression] ICE with statement expression
-- jason at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |jason at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED Last reconfirmed|2007-03-25 07:46:59 |2007-04-23 20:19:45 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31337
[Bug fortran/31672] New: Not Implemented: Initialization of overlapping variables
This feature is used classic Fortran 77 libraries such as Starpac, as mentioned in comp.lang.fortran on 23-Apr-2007 . c:\fortran type d1mach.f90 function d1mach(i) implicit none double precision d1mach,dmach(5) integer i,large(4),small(4) equivalence ( dmach(1), small(1) ) equivalence ( dmach(2), large(1) ) data small(1),small(2) / 0, 1048576/ data large(1),large(2) /-1,2146435071/ d1mach = 0.0d0 end function d1mach c:\fortran gfortran -c -v d1mach.f90 Using built-in specs. Target: i386-pc-mingw32 Configured with: ../trunk/configure --prefix=/mingw --enable-languages=c,fortran,c++,objc,obj-c++ --with-gmp=/home/coudert/local --disable-nls --with-ld=/mingw/bin/ld --with-as=/mingw/bin/as --disable-werror --enable-bootstrap --enable-threads --build=i386-pc-mingw32 --disable-shared --enable-libgomp Thread model: win32 gcc version 4.3.0 20070406 (experimental) c:/programs/gfortran/bin/../libexec/gcc/i386-pc-mingw32/4.3.0/f951.exe d1mach.f90 -quiet -dumpbase d1mach.f90 -mtune=i386 -auxbase d1mach -version -fintrinsic-modules-path c:/programs/gfortran/bin/../lib/gcc/i386-pc-mingw32/4.3.0/finclude -o C:\DOCUME~1\vrao\LOCALS~1\Temp/cchM3Bia.s GNU F95 version 4.3.0 20070406 (experimental) (i386-pc-mingw32) compiled by GNU C version 4.3.0 20070406 (experimental). GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 d1mach.f90: In function 'd1mach': d1mach.f90:1: fatal error: gfc_todo: Not Implemented: Initialization of overlapping variables compilation terminated. -- Summary: Not Implemented: Initialization of overlapping variables Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: beliavsky at aol dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31672
[Bug fortran/31618] backspace intrinsic is not working on an unformatted file
--- Comment #8 from tkoenig at gcc dot gnu dot org 2007-04-23 20:44 --- Subject: Bug 31618 Author: tkoenig Date: Mon Apr 23 20:43:54 2007 New Revision: 124079 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=124079 Log: 2007-04-23 Thomas Koenig [EMAIL PROTECTED] PR fortran/31618 * io/transfer.c (read_block_direct): Instead of calling us_read, set dtp-u.p.current_unit-current_record = 0 so that pre_position will read the record marker. (data_transfer_init): For different error conditions, call generate_error, then return. 2007-04-23 Thomas Koenig [EMAIL PROTECTED] PR fortran/31618 * gfortran.dg/backspace_8.f: New test case. Added: trunk/gcc/testsuite/gfortran.dg/backspace_8.f Modified: trunk/gcc/testsuite/ChangeLog trunk/libgfortran/ChangeLog trunk/libgfortran/io/transfer.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31618
[Bug fortran/31618] [4.2, 4.1 only] backspace intrinsic is not working on an unformatted file
--- Comment #9 from tkoenig at gcc dot gnu dot org 2007-04-23 20:46 --- Fixed on trunk. Maybe we should backport this once 4.2.1 is open. -- tkoenig at gcc dot gnu dot org changed: What|Removed |Added Known to fail||4.2.0 4.1.2 Known to work||4.3.0 Summary|backspace intrinsic is not |[4.2, 4.1 only] backspace |working on an unformatted |intrinsic is not working on |file|an unformatted file http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31618
[Bug fortran/29786] [4.1/4.2/4.3 Regression] rejects equivalence
--- Comment #8 from brooks at gcc dot gnu dot org 2007-04-23 20:48 --- *** Bug 31672 has been marked as a duplicate of this bug. *** -- brooks at gcc dot gnu dot org changed: What|Removed |Added CC||beliavsky at aol dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29786
[Bug fortran/31672] Not Implemented: Initialization of overlapping variables
--- Comment #1 from brooks at gcc dot gnu dot org 2007-04-23 20:48 --- Thanks for reporting this -- this is a rather nicer testcase than the one we had for this already. I've also changed the title of PR #29786 to make it easier to find. *** This bug has been marked as a duplicate of 29786 *** -- brooks at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31672
[Bug fortran/29786] [4.1/4.2/4.3 Regression] Initialization of overlapping variables: Not implemented
--- Comment #9 from brooks at gcc dot gnu dot org 2007-04-23 20:49 --- I'm changing the name of this bug to make it a lot easier to find, now that we know what the actual problem is. Also, PR #31672 contains an excellent testcase for this, which I'll quote here: -- function d1mach(i) implicit none double precision d1mach,dmach(5) integer i,large(4),small(4) equivalence ( dmach(1), small(1) ) equivalence ( dmach(2), large(1) ) data small(1),small(2) / 0, 1048576/ data large(1),large(2) /-1,2146435071/ d1mach = 0.0d0 end function d1mach -- -- brooks at gcc dot gnu dot org changed: What|Removed |Added Summary|[4.1/4.2/4.3 Regression]|[4.1/4.2/4.3 Regression] |rejects equivalence |Initialization of ||overlapping variables: Not ||implemented http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29786
[Bug target/29826] __attribute__ dllimport makes optimization crash on cygwin
--- Comment #17 from arcangelpip at hotmail dot com 2007-04-23 20:51 --- Yup, that did it. Building the cross compiler with that patch fixed the test case ICE and the one I kept getting from gettext. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29826
[Bug fortran/29786] [4.1/4.2/4.3 Regression] Initialization of overlapping variables: Not implemented
-- brooks at gcc dot gnu dot org changed: What|Removed |Added Priority|P5 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29786