Trunk frozen for VTA merge
Subject says it all, I guess. Jakub
Re: Call for testers: MPC 0.7 prerelease tarball
Kaveh R. GHAZI wrote: Hello, A prerelease tarball of the upcoming MPC 0.7 is available here: http://www.multiprecision.org/mpc/download/mpc-0.7-dev.tar.gz Please help test it for portability and bugs by downloading and compiling it on systems you have access to. Fell at the first hurdle for me: gcc-4 -shared-libgcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D_FORTIFY_SOURCE=2 -p edantic -Wall -Wextra -Werror -O2 -pipe -MT inp_str.lo -MD -MP -MF .deps/inp_str .Tpo -c inp_str.c -DDLL_EXPORT -DPIC -o .libs/inp_str.o cc1: warnings being treated as errors inp_str.c: In function 'extract_string': inp_str.c:113:10: error: array subscript has type 'char' inp_str.c:114:10: error: array subscript has type 'char' inp_str.c:115:10: error: array subscript has type 'char' inp_str.c:118:13: error: array subscript has type 'char' inp_str.c:119:13: error: array subscript has type 'char' inp_str.c:120:13: error: array subscript has type 'char' make[2]: *** [inp_str.lo] Error 1 make[2]: *** Waiting for unfinished jobs I'd like a report to contain your target triplet and the versions of your compiler, GMP and MPFR used when building MPC. $ /gnu/gcc/gcc/config.guess i686-pc-cygwin $ gcc-4 -v Using built-in specs. Target: i686-pc-cygwin Configured with: /gnu/gcc/gcc-patched/configure --prefix=/opt/gcc-tools -v --with-gmp=/usr --with-mpfr=/usr --enable-bootstrap --enable-version-specific-runtime-libs --enable-static --enable-shared --enable-shared-libgcc --disable-__cxa_atexit --with-gnu-ld --with-gnu-as --with-dwarf2 --disable-sjlj-exceptions --disable-symvers --disable-libjava --disable-interpreter --program-suffix=-4 --disable-libgomp --enable-libssp --enable-libada --enable-threads=posix --with-arch=i686 --with-tune=generic CC=gcc-4 CXX=g++-4 CC_FOR_TARGET=gcc-4 CXX_FOR_TARGET=g++-4 --with-ecj-jar=/usr/share/java/ecj.jar LD=/opt/gcc-tools/bin/ld.exe LD_FOR_TARGET=/opt/gcc-tools/bin/ld.exe AS=/opt/gcc-tools/bin/as.exe AS_FOR_TARGET=/opt/gcc-tools/bin/as.exe --disable-win32-registry --disable-libgcj-debug --enable-languages=c,c++,ada Thread model: posix gcc version 4.5.0 20090730 (experimental) (GCC) $ cygcheck -c gmp mpfr libgmp3 libmpfr1 Cygwin Package Information Package VersionStatus gmp 4.3.1-3OK libgmp3 4.3.1-3OK libmpfr1 2.4.1-4OK mpfr 2.4.1-4OK $ BTW, I configured mpc with --prefix=/usr --disable-static --enable-shared (after first receiving configure: error: gmp.h is a DLL: use --disable-static --enable-shared when I tried with just --prefix). Also please include your results from make check. N/A ! cheers, DaveK
Re: Call for testers: MPC 0.7 prerelease tarball
Dave Korn wrote: Fell at the first hurdle for me: gcc-4 -shared-libgcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D_FORTIFY_SOURCE=2 -p edantic -Wall -Wextra -Werror -O2 -pipe -MT inp_str.lo -MD -MP -MF .deps/inp_str .Tpo -c inp_str.c -DDLL_EXPORT -DPIC -o .libs/inp_str.o cc1: warnings being treated as errors inp_str.c: In function 'extract_string': inp_str.c:113:10: error: array subscript has type 'char' inp_str.c:114:10: error: array subscript has type 'char' inp_str.c:115:10: error: array subscript has type 'char' inp_str.c:118:13: error: array subscript has type 'char' inp_str.c:119:13: error: array subscript has type 'char' inp_str.c:120:13: error: array subscript has type 'char' make[2]: *** [inp_str.lo] Error 1 make[2]: *** Waiting for unfinished jobs Attached allowed it to build, and seems to be what the function was already doing for isspace earlier. Test results will follow. cheers, DaveK --- orig/mpc-0.7-dev/src/inp_str.c 2009-08-26 21:24:41.0 +0100 +++ mpc-0.7-dev/src/inp_str.c 2009-09-01 12:17:04.546875000 +0100 @@ -110,14 +110,14 @@ extract_string (FILE *stream) /* (n-char-sequence) only after a NaN */ if ((nread != 3 - || tolower (str[0]) != 'n' - || tolower (str[1]) != 'a' - || tolower (str[2]) != 'n') + || tolower ((unsigned char) str[0]) != 'n' + || tolower ((unsigned char) str[1]) != 'a' + || tolower ((unsigned char) str[2]) != 'n') (nread != 5 || str[0] != '@' -|| tolower (str[1]) != 'n' -|| tolower (str[2]) != 'a' -|| tolower (str[3]) != 'n' +|| tolower ((unsigned char) str[1]) != 'n' +|| tolower ((unsigned char) str[2]) != 'a' +|| tolower ((unsigned char) str[3]) != 'n' || str[4] != '@')) { ungetc (c, stream); return str;
Re: Call for testers: MPC 0.7 prerelease tarball
Dave Korn wrote: Attached allowed it to build, And with that patch: === All 45 tests passed === cheers, DaveK
Re: Why no strings in error messages?
On Wed, Aug 26, 2009 at 03:02:44PM -0400, Bradley Lucier wrote: On Wed, 2009-08-26 at 20:38 +0200, Paolo Bonzini wrote: When I worked at AMD, I was starting to suspect that it may be more beneficial to re-enable the first schedule insns pass if you were compiling in 64-bit mode, since you have more registers available, and the new registers do not have hard wired uses, which in the past always meant a lot of spills (also, the default floating point unit is SSE instead of the x87 stack). I never got around to testing this before AMD and I parted company. Unfortunately, hardwired use of %ecx for shifts is still enough to kill -fschedule-insns on AMD64. The AMD64 Architecture manual I found said that various combinations of the RSI, RDI, and RCX registers are used implicitly by ten instructions or prefixes, and RBX is used by XLAT, XLATB. So it appears that there are 12 general-purpose registers available for allocation. XLATB is essentially useless (well maybe had some uses back in 16 bit days, when only a few registers could be used for addressing) and never generated by GCC. However %ebx is used for PIC addressing in 32 bit mode so it is not always free either (I don't know about PIE code). In 64 bit mode, PIC/PIE use PC relative addressing, so this gives you actually 9 more free registers than in 32 bit mode. However for some reason you glossed over the case of integer division which always use %edx and %eax. This is true even when dividing by a constant (non power of 2) in which case gcc will often use a widening multiply instead, whose results are in %edx:%eax, so it's almost a wash in terms of fixed register usage (not exactly, the divisions use %edx:%eax as dividends and need the divisor somewhere else, while the widening multiply use %eax as one input but %edx can be used for the other). (As a side note, %edx and %eax are also special with regard to I/O port accesses but this is only of interest in device drivers). Are 12 registers not enough, in principle, to do scheduling before register allocation? I don't know, but I would say that you have about 14 registers for address computations/indexing since you seem to be interested in FP code. I would think that it is sufficient for many inner loops (but not all, it really depends on the number of arrays that you access and the number of independant indexes that you have to keep). I was getting a 15% speedup on some numerical codes, as pre-scheduling spaced out the vector loads among the floating-point computations. Well vector loads and floating point computations do not have anything to do with integer register choices. The 16 FP registers are nicely orthogonal (compared to the real nightmare that the x87 stack was). In practice you schedule on 16 FP registers and 14 (15 if you omit the frame pointer) addressing/indexing/counting registers. In this type of code there are typically very few instructions with fixed register constraints, and the less likely are the string instructions. Shifts of variable amount and integer divides are still possible, but unlikely. Gabriel
question about -mpush-args -maccumulate-outgoing-args on gcc for x86
Hi, I'm using gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) for a x86 target. The info page says: `-mpush-args' `-mno-push-args' Use PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some cases disabling it may improve performance because of improved scheduling and reduced dependencies. `-maccumulate-outgoing-args' If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies `-mno-push-args'. This information is also found on http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html Is this information up-to-date? It appears to me that '-mno-push-args' is the enabled by default (*), and not '-mpush-args'. Moreover, since -maccumulate-outgoing-args implies -mno-push-args, it appears that the only way to obtain 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a switch which the documentation doesn't even mention. I have searched the mailing list archives and the only post I found was this one: http://gcc.gnu.org/ml/gcc/2005-01/msg00761.html which is at odds with the documentation above. Thanks. - Godmar (*) for instance, see: gb...@setzer [39](~/tmp) cat call.c void caller(void) { extern void callee(int); callee(5); } gb...@setzer [40](~/tmp) gcc -mno-push-args -S call.c gb...@setzer [41](~/tmp) cat call.s .file call.c .text .globl caller .type caller, @function caller: pushl %ebp movl%esp, %ebp subl$8, %esp movl$5, (%esp) callcallee leave ret .size caller, .-caller .ident GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-44) .section.note.GNU-stack,,@progbits
Replacing certain operations with function calls
Dear all, I have been also been looking into how to generate a function call for certain operations. I've looked at various other targets for a similar problem/solution but have not seen anything. On my target architecture, we have certain optimized versions of the multiplication for example. I wanted to replace certain mutliplications with a function call. The solution I found was to do perform a FAIL on the define_expand of the multiplication for these cases. This forces the compiler to generate a function call to __multdi3. I then go in the define_expand of the function call and check the symbol_ref to see what function is called. I can then modify the call at that point. My question is: is this a good approach or is there another solution that you would use? Thanks again for your time, Jean Christophe Beyler
Re: IRA undoing scheduling decisions
Peter Bergner wrote: On Mon, 2009-08-24 at 23:56 +, Charles J. Tabony wrote: I am seeing a performance regression on the port I maintain, and I would appreciate some pointers. When I compile the following code void f(int *x, int *y){ *x = 7; *y = 4; } with GCC 4.3.2, I get the desired sequence of instructions. I'll call it sequence A: r0 = 7 r1 = 4 [x] = r0 [y] = r1 When I compile the same code with GCC 4.4.0, I get a sequence that is lower performance for my target machine. I'll call it sequence B: r0 = 7 [x] = r0 r0 = 4 [y] = r0 This is caused by update_equiv_regs() which IRA inherited from local-alloc.c. Although with gcc 4.3 and earlier, you don't see the problem, it is still there, because if you look at the 4.3 dumps, you will see that update_equiv_regs() unordered them for us. What is saving us is that sched2 reschedules them again for us in the order we want. With 4.4, IRA happens to reuse the same register for both pseudos, so sched2 is hand tied and cannot schedule them back again for us. Peter, thanks for the investigation. We could do update_equiv_regs in a separate pass before the 1st insn scheduling as it was before IRA. I'll try this and see how will it work for mainstream targets (x86, ppc). Looking at update_equiv_regs(), if I disable the replacement for regs that are local to one basic block (patch below) like it existed before John Wehle's patch way back in Oct 2000: http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html then we get the ordering we want. Does anyone know why John removed that part of the test in his patch? Thoughts anyone? I have no idea. But if it works well, we could use it.
Re: Bit fields
On 08/31/2009 07:20 PM, Jean Christophe Beyler wrote: Ok, is it normal to see a ashift with a negative value though or is this already sign of a (potentially) different problem? I seem to recall that it's normal. Combine was originally written in the days of VAX, where negative shifts were allowed. You'll just want to reject them in your patterns. r~
Re: asm goto vs simulate_block
On 08/31/2009 05:06 PM, Richard Henderson wrote: The following patch appears to work for both. I'll commit it after a bootstrap and test cycle completes. Committed with one additional change, to prevent VRP from crashing. r~ (vrp_visit_stmt): Be prepared for non-interesting stmts. @@ -6087,7 +6090,9 @@ vrp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p) fprintf (dump_file, \n); } - if (is_gimple_assign (stmt) || is_gimple_call (stmt)) + if (!stmt_interesting_for_vrp (stmt)) +gcc_assert (stmt_ends_bb_p (stmt)); + else if (is_gimple_assign (stmt) || is_gimple_call (stmt)) { /* In general, assignments with virtual operands are not useful for deriving ranges, with the obvious exception of calls to
Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86
Minor correction to my previous email: On Tue, Sep 1, 2009 at 10:08 AM, Godmar Backgod...@gmail.com wrote: gb...@setzer [39](~/tmp) cat call.c void caller(void) { extern void callee(int); callee(5); } This: gb...@setzer [40](~/tmp) gcc -mno-push-args -S call.c should be '-mpush-args' as in: gb...@cyan [4](~/tmp) gcc -S -mpush-args call.c gb...@cyan [5](~/tmp) cat call.s .file call.c .text .globl caller .type caller, @function caller: pushl %ebp movl%esp, %ebp subl$8, %esp movl$5, (%esp) callcallee leave ret .size caller, .-caller .ident GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-44) .section.note.GNU-stack,,@progbits The point here is that '-mpush-args' is ineffective unless '-mno-accumulate-outgoing-args' is given, and that the documentation, in my opinion, may be misleading by a) not mentioning the -mno-accumulate-outgoing-args switch b) saying that '-mpush-args' is the default when it's an ineffective default (since the default -maccumulate-outgoing-args appears to override it) c) not mentioning that -maccumulate-outgoing-args is the default - in fact, the discussion in the section of push-args/no-push-args appears to imply that it shouldn't be the default. Thanks. - Godmar
[lto] Reader-writer compatibility?
Is it required that the same compiler that generated lto objects be used to read them? I've come across a couple ICEs with the current revision reading lto objects created by a slightly older version but same configuration. Is this simply invalid usage of my part? Regards, Ryan Mansfield
Re: [lto] Reader-writer compatibility?
On Tue, Sep 1, 2009 at 11:42, Ryan Mansfieldrmansfi...@qnx.com wrote: Is it required that the same compiler that generated lto objects be used to read them? I've come across a couple ICEs with the current revision reading lto objects created by a slightly older version but same configuration. Is this simply invalid usage of my part? It's likely. How much drift between the two revisions? Can you recreate the ICE if you write and read with the exact same revision? If so, please file a bug. Diego.
Re: Using MEM_EXPR inside a call expression
On 08/28/2009 12:38 AM, Adam Nemet wrote: ... To assist the linker we need to annotate the indirect call with the function symbol. Since the call is expanded early... Having experimented with this on Alpha a few years back, the only thing I can suggest is to not expand them early. I use a combination of peep2's and normal splitters to determine if the post-call GP reload is needed, and to expand the call itself. r~
Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86
Godmar Back god...@gmail.com writes: It appears to me that '-mno-push-args' is the enabled by default (*), and not '-mpush-args'. The default varies by processor--it dependson the -mtune option. Moreover, since -maccumulate-outgoing-args implies -mno-push-args, it appears that the only way to obtain 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a switch which the documentation doesn't even mention. That is likely true. If you want to send a patch for the docs, that would be great. Ian
Re: Replacing certain operations with function calls
Jean Christophe Beyler jean.christophe.bey...@gmail.com writes: I have been also been looking into how to generate a function call for certain operations. I've looked at various other targets for a similar problem/solution but have not seen anything. On my target architecture, we have certain optimized versions of the multiplication for example. I wanted to replace certain mutliplications with a function call. The solution I found was to do perform a FAIL on the define_expand of the multiplication for these cases. This forces the compiler to generate a function call to __multdi3. I then go in the define_expand of the function call and check the symbol_ref to see what function is called. I can then modify the call at that point. My question is: is this a good approach or is there another solution that you would use? I think that what you describe will work. I would probably generate a call to a builtin function in the define_expand. Look for the way targets use init_builtins and expand_builtin. Normally expand_builtin expands to some target-specific RTL, but it can expand to a function call too. Ian
Re: DI mode and endianess
On 08/19/2009 06:50 AM, Mohamed Shafi wrote: mov _h,d4 mov _h+4,d5 mov _j,d2 mov _j+4,d3 addd4,d2 adcd5,d3 irrespective of which endian it is. What could i be missing here? Should i add anything specific for this in the back-end? Given that the compiler is generating adc, I have to assume that you have an adddi3 pattern. At which point I have to assume that you're doing something wrong in there that's producing the little-endian sequence even for big-endian. r~
Re: [lto] Reader-writer compatibility?
Diego Novillo wrote: On Tue, Sep 1, 2009 at 11:42, Ryan Mansfieldrmansfi...@qnx.com wrote: Is it required that the same compiler that generated lto objects be used to read them? I've come across a couple ICEs with the current revision reading lto objects created by a slightly older version but same configuration. Is this simply invalid usage of my part? It's likely. How much drift between the two revisions? Can you recreate the ICE if you write and read with the exact same revision? If so, please file a bug. The objects were created with rev 15 and being read using 151271. No, I can't reproduce the ICE using the same version. Thanks for confirming this is not expected to work. Regards, Ryan Mansfield
Re: Replacing certain operations with function calls
I have looked at how other targest use the init_builtins/expand_builtins. Of course, I don't understand everything there but it seems indeed to be more for generating a series of instructions instead of a function call. I haven't seen anything resembling what I want to do. I had also first thought of going directly in the define_expand and expanding to the function call I would want. The problem I have is that it is unclear to me how to handle (set-up) the arguments of the builtin_function I am trying to define. To go from no function call to : - Potentially spill output registers - Potentially spill scratch registers - Setup output registers with the operands - Perform function call - Copy return to output operand - Potentially restore scratch registers - Potentially restore output registers Seems a bit difficult to do at the define_expand level and might not generate good code. I guess I could potentially perform a pass in the tree representation to do what I am looking for but I am not sure that that is the best solution either. For the moment, I will continue looking at what you suggest and also see if my solution works. I see that, for example, the compiler will not always generate the call I need to change. Therefore, it does seem that I need another solution than the one I propose. I'm more and more considering a pass in the middle-end to get what I need. Do you think this is better? Thanks for your input, Jc On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylori...@google.com wrote: Jean Christophe Beyler jean.christophe.bey...@gmail.com writes: I have been also been looking into how to generate a function call for certain operations. I've looked at various other targets for a similar problem/solution but have not seen anything. On my target architecture, we have certain optimized versions of the multiplication for example. I wanted to replace certain mutliplications with a function call. The solution I found was to do perform a FAIL on the define_expand of the multiplication for these cases. This forces the compiler to generate a function call to __multdi3. I then go in the define_expand of the function call and check the symbol_ref to see what function is called. I can then modify the call at that point. My question is: is this a good approach or is there another solution that you would use? I think that what you describe will work. I would probably generate a call to a builtin function in the define_expand. Look for the way targets use init_builtins and expand_builtin. Normally expand_builtin expands to some target-specific RTL, but it can expand to a function call too. Ian
Re: [lto] Reader-writer compatibility?
Ryan Mansfield rmansfi...@qnx.com writes: The objects were created with rev 15 and being read using 151271. No, I can't reproduce the ICE using the same version. Thanks for confirming this is not expected to work. Is it the intent that this work properly in the future? It is not absurd to imagine that someone with a treeful of .o files might suffer an unexpected compiler upgrade before a later reuse/relink attempt. - FChE
Re: [lto] Reader-writer compatibility?
On Tue, Sep 1, 2009 at 14:32, Frank Ch. Eiglerf...@redhat.com wrote: Ryan Mansfield rmansfi...@qnx.com writes: The objects were created with rev 15 and being read using 151271. No, I can't reproduce the ICE using the same version. Thanks for confirming this is not expected to work. Is it the intent that this work properly in the future? Yes. We likely want to maintain streamer compatibility within the same major release. I actually don't think we'll change the bytecode format too much. It will mostly depend on how much gimple changes in a single release. Clearly, we need better version drift detection. Diego.
Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86
On Tue, Sep 1, 2009 at 12:31 PM, Ian Lance Taylori...@google.com wrote: Godmar Back god...@gmail.com writes: It appears to me that '-mno-push-args' is the enabled by default (*), and not '-mpush-args'. The default varies by processor--it dependson the -mtune option. I don't know how to find out which tuning is enabled by default; I assume -mtune=generic? Would statements with respect to what default is apply to the default mtune setting? Moreover, since -maccumulate-outgoing-args implies -mno-push-args, it appears that the only way to obtain 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a switch which the documentation doesn't even mention. That is likely true. If you want to send a patch for the docs, that would be great. Whilst in general I am not opposed to this, and have contributed to many open source projects in the past, I feel that the documentation should be updated by someone who can actually vouch for the completeness and accuracy of what's written, which I definitely cannot. I also cannot verify the accuracy of the claims with respect to speeds of the two options. Moreover, these claims are made in a section of the documentation that applies to an entire architecture rather than a specific processor implementation. Perhaps they should simply be removed? I'm also uncertain where exactly the difference between accumulate-outgoing-args and push-args is. accumulate implies no-push-arg, and no-accumulate+push-arg is the traditional approach, but what does no-accumulate+no+push+arg look like and does it even make sense? It would also be great if '-mpush-args' without -mno-accumulate-outgoing-args would trigger a warning: Warning: -mpush-args ignored while -maccumulate-outgoing-args is in effect. - Godmar
GCC 4.4.2 Status Report (2009-09-01)
Status == The 4.4 branch is open for commits under the usual release branch rules. The timing of the 4.4.2 release (at least two months after the 4.4.1 release, so no sooner than September 22) at a point when there are no P1 regressions open for the branch) has yet to be determined. Quality Data Priority # Change from Last Report --- --- P14 + 3 P2 89 + 1 P31 - 1 --- --- Total94 + 3 Previous Report === http://gcc.gnu.org/ml/gcc/2009-08/msg00373.html The next report for 4.4.2 will be sent by Richard.
Re: Using MEM_EXPR inside a call expression
Richard Henderson writes: On 08/28/2009 12:38 AM, Adam Nemet wrote: ... To assist the linker we need to annotate the indirect call with the function symbol. Since the call is expanded early... Having experimented with this on Alpha a few years back, the only thing I can suggest is to not expand them early. I use a combination of peep2's and normal splitters to determine if the post-call GP reload is needed, and to expand the call itself. I see. So I guess you're saying that there is little chance to optimize the loop I had in my previous email ;(. Now suppose we split late, shouldn't we still assume that data-flow can change later. IOW, wouldn't we be required to use the literal/lituse counting that alpha does? If yes then I guess it's still better to use MEM_EXPR. MEM_EXPR also has the benefit that it does not deem indirect calls as different when cross-jumping compares the insns. I don't know how important this is though. Adam
Re: IRA undoing scheduling decisions
On Wed, 2009-08-26 at 17:12 -0500, Peter Bergner wrote: On Wed, 2009-08-26 at 23:30 +0200, Richard Guenther wrote: Hmm. I suppose if you conditionalize it on flag_schedule_insns it might be an overall win. Care to SPEC test that change? I assume you mean like the change below? Yeah, I can SPEC test that. Peter Index: ira.c === --- ira.c (revision 15) +++ ira.c (working copy) @@ -2510,6 +2510,8 @@ update_equiv_regs (void) calls. */ if (REG_N_REFS (regno) == 2 +(!flag_schedule_insns + || REG_BASIC_BLOCK (regno) NUM_FIXED_BLOCKS) (rtx_equal_p (x, src) || ! equiv_init_varies_p (src)) NONJUMP_INSN_P (insn) Pat ran the patch on SPEC2000 and it was very neutral. The overall SPECFP number didn't change and the SPECINT number only improved by 0.2%, which is pretty much in the noise. I think Vlad's suggestion of moving update_equiv_regs() to its own pass before sched1 sounds interesting. If that works, it's probably better than this patch. Peter
Re: IRA undoing scheduling decisions
On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote: We could do update_equiv_regs in a separate pass before the 1st insn scheduling as it was before IRA. IIRC, update_equiv_regs() was always called as part of local-alloc, so it was always after sched1 even before IRA. That said, moving it to its own pass before sched1 sounds like an interesting idea. My patch from the other note basically didn't affect SPEC2000 at all, and we could use it, but if your idea works, I'm more than happy to dump my patch. :) Were you going to whip that patch up or did you want me to? Peter
Re: IRA undoing scheduling decisions
Peter Bergner wrote: On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote: We could do update_equiv_regs in a separate pass before the 1st insn scheduling as it was before IRA. IIRC, update_equiv_regs() was always called as part of local-alloc, so it was always after sched1 even before IRA. That said, moving it to its own pass before sched1 sounds like an interesting idea. My patch from the other note basically didn't affect SPEC2000 at all, and we could use it, but if your idea works, I'm more than happy to dump my patch. :) Were you going to whip that patch up or did you want me to? I am going to do it by myself. Thanks for testing your patch, Peter.
Re: Using MEM_EXPR inside a call expression
On 09/01/2009 12:48 PM, Adam Nemet wrote: I see. So I guess you're saying that there is little chance to optimize the loop I had in my previous email ;(. Not at the rtl level. Gimple-level loop splitting should do it though. Now suppose we split late, shouldn't we still assume that data-flow can change later. IOW, wouldn't we be required to use the literal/lituse counting that alpha does? If you split post-reload, data flow isn't going to change in any significant way. If yes then I guess it's still better to use MEM_EXPR. MEM_EXPR also has the benefit that it does not deem indirect calls as different when cross-jumping compares the insns. I don't know how important this is though. It depends on how much benefit you get from the direct branch. On alpha it's quite a bit, so we work hard to make sure that we can get one, if at all possible. r~
Re: [lto] Reader-writer compatibility?
Diego Novillo wrote: On Tue, Sep 1, 2009 at 11:42, Ryan Mansfieldrmansfi...@qnx.com wrote: Is it required that the same compiler that generated lto objects be used to read them? I've come across a couple ICEs with the current revision reading lto objects created by a slightly older version but same configuration. Is this simply invalid usage of my part? It's likely. How much drift between the two revisions? Can you recreate the ICE if you write and read with the exact same revision? If so, please file a bug. Please add version checking. gfortran's module files (extension .mod) that are generated from source files that contain MODULE ... END MODULE constructs *now* contain version information. I still get occasionally beaten by picking up modules from 4.3 that don't have this - you'll get all sorts of unintelligible error messages that just distract from what's really wrong. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
Re: IRA undoing scheduling decisions
On Tue, 2009-09-01 at 16:46 -0400, Vladimir Makarov wrote: Peter Bergner wrote: Were you going to whip that patch up or did you want me to? I am going to do it by myself. Great! I'd like to see how your patch affects POWER6 performance. Do you have access to a POWER6 box? If not, can you send Pat and I the patch and we'll fire off a run on our POWER6 benchmark system. Thanks. Peter
Re: Using MEM_EXPR inside a call expression
Richard Henderson writes: On 09/01/2009 12:48 PM, Adam Nemet wrote: I see. So I guess you're saying that there is little chance to optimize the loop I had in my previous email ;(. Not at the rtl level. Gimple-level loop splitting should do it though. Now suppose we split late, shouldn't we still assume that data-flow can change later. IOW, wouldn't we be required to use the literal/lituse counting that alpha does? If you split post-reload, data flow isn't going to change in any significant way. If yes then I guess it's still better to use MEM_EXPR. MEM_EXPR also has the benefit that it does not deem indirect calls as different when cross-jumping compares the insns. I don't know how important this is though. It depends on how much benefit you get from the direct branch. On alpha it's quite a bit, so we work hard to make sure that we can get one, if at all possible. Thanks, RTH. RichardS, Can you comment on what RTH is suggesting? Besides cross-jumping I haven't seen indirect PIC calls get optimized much, and it seems that splitting late will avoid the data-flow complications. I can experiment with this but it would be nice to get some early buy-in. BTW, I have the R_MIPS_JALR patch ready for submission but if we don't need to worry about data-flow changes then using MEM_EXPR is not necessary. Adam
Re: Replacing certain operations with function calls
Actually, what I've done is probably something in between what you were suggesting and what I was initially doing. If we consider the multiplication, I've modified the define_expand for example to: (define_expand muldi3 [(set (match_operand:DI 0 register_operand ) (mult:DI (match_operand:DI 1 register_operand ) (match_operand:DI 2 register_operand )))] { emit_function_call_2args (DImode, DImode, DImode, \my_version_of_mull\, operands[0], operands[1], operands[2]); DONE; }) and my emit function is: void emit_function_call_2args ( enum machine_mode return_mode, enum machine_mode arg1_mode, enum machine_mode arg2_mode, const char *fname, rtx op0, rtx op1, rtx op2) { tree id; rtx insn; /* Move arguments */ emit_move_insn (gen_rtx_REG (arg1_mode, GP_ARG_FIRST), op1); emit_move_insn (gen_rtx_REG (arg2_mode, GP_ARG_FIRST + 1), op2); /* Get name */ id = get_identifier (fname); /* Generate call value */ insn = gen_call_value ( gen_rtx_REG (return_mode, 6), gen_rtx_MEM (DImode, gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id))), GEN_INT (64), NULL ); /* Annotate the call to say we are using both argument registers */ use_reg (CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG (arg1_mode, GP_ARG_FIRST)); use_reg (CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG (arg1_mode, GP_ARG_FIRST + 1)); /* Emit call */ emit_call_insn (insn); /* Set back return to op0 */ emit_move_insn (op0, gen_rtx_REG (return_mode, GP_RETURN)); } First off: does this seem correct? Second, I have a bit of a worry in the case where, if we consider this C code : bar (a * b, c * d); it is possible that the compiler would have normally generated this : mult output1, a, b mult output2, c, d call bar Which would be problematic for my expand system since this would expand into: mov output1, a mov output2, b call internal_mult mov output1, return_reg mov output1, c #Rewriting on output1... mov output2, d call internal_mult mov output2, return_reg call bar However, I am unsure this is possible in the expand stage, would the expand stage automatically have this instead: mult tmp1, a, b mult tmp2, c, d mov output1, tmp1 mov output2, tmp2 call bar in which case, I know I can do what I am currently doing. Thanks again for your help and I apologize for these basic questions... Jc On Tue, Sep 1, 2009 at 2:30 PM, Jean Christophe Beylerjean.christophe.bey...@gmail.com wrote: I have looked at how other targest use the init_builtins/expand_builtins. Of course, I don't understand everything there but it seems indeed to be more for generating a series of instructions instead of a function call. I haven't seen anything resembling what I want to do. I had also first thought of going directly in the define_expand and expanding to the function call I would want. The problem I have is that it is unclear to me how to handle (set-up) the arguments of the builtin_function I am trying to define. To go from no function call to : - Potentially spill output registers - Potentially spill scratch registers - Setup output registers with the operands - Perform function call - Copy return to output operand - Potentially restore scratch registers - Potentially restore output registers Seems a bit difficult to do at the define_expand level and might not generate good code. I guess I could potentially perform a pass in the tree representation to do what I am looking for but I am not sure that that is the best solution either. For the moment, I will continue looking at what you suggest and also see if my solution works. I see that, for example, the compiler will not always generate the call I need to change. Therefore, it does seem that I need another solution than the one I propose. I'm more and more considering a pass in the middle-end to get what I need. Do you think this is better? Thanks for your input, Jc On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylori...@google.com wrote: Jean Christophe Beyler jean.christophe.bey...@gmail.com writes: I have been also been looking into how to generate a function call for certain operations. I've looked at various other targets for a similar problem/solution but have not seen anything. On my target architecture, we have certain optimized versions of the multiplication for example. I wanted to replace certain mutliplications with a function call. The solution I found was to do perform a FAIL on the define_expand of the multiplication for these cases. This forces the compiler to
Re: Replacing certain operations with function calls
Jean Christophe Beyler jean.christophe.bey...@gmail.com writes: First off: does this seem correct? Awkward though it is, it may be more reliable to build a small tree here and pass it to expand_call. This assumes that you want to use the standard ABI when calling this function. Then your second issue would go away. Ian
Re: Replacing certain operations with function calls
I don't think I quite understand what you're meaning. I want to use the standard ABI, basically I want to transform certain operations into function calls. In regard to what you said, do you mean I should build the tree before the expand pass, by writing a new pass that will work on the trees instead of rtx? Otherwise, I fail to see how that is different to what I'm already doing. Would you have an example? Thanks, Jc PS: Although when I look at what GCC generates at the expand stage, it really does seem that he first generates the calculation of the parameters in pseudo-registers and then moves them to the actual output registers. It's the next phases that will combine the two to save a move. On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylori...@google.com wrote: Jean Christophe Beyler jean.christophe.bey...@gmail.com writes: First off: does this seem correct? Awkward though it is, it may be more reliable to build a small tree here and pass it to expand_call. This assumes that you want to use the standard ABI when calling this function. Then your second issue would go away. Ian
Re: Replacing certain operations with function calls
Finally, I guess the one thing I can do is simply generate pseudo_registers and copy all my registers into the pseudos before the call I make. Then I do my expand like I showed above. And finally, move everything back. Later passes will remove anything that was not needed, anything that was will be kept. This could be a solution to the second issue but I'll wait to understand what you meant first. Jc On Tue, Sep 1, 2009 at 6:35 PM, Jean Christophe Beylerjean.christophe.bey...@gmail.com wrote: I don't think I quite understand what you're meaning. I want to use the standard ABI, basically I want to transform certain operations into function calls. In regard to what you said, do you mean I should build the tree before the expand pass, by writing a new pass that will work on the trees instead of rtx? Otherwise, I fail to see how that is different to what I'm already doing. Would you have an example? Thanks, Jc PS: Although when I look at what GCC generates at the expand stage, it really does seem that he first generates the calculation of the parameters in pseudo-registers and then moves them to the actual output registers. It's the next phases that will combine the two to save a move. On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylori...@google.com wrote: Jean Christophe Beyler jean.christophe.bey...@gmail.com writes: First off: does this seem correct? Awkward though it is, it may be more reliable to build a small tree here and pass it to expand_call. This assumes that you want to use the standard ABI when calling this function. Then your second issue would go away. Ian
Re: Replacing certain operations with function calls
Jean Christophe Beyler jean.christophe.bey...@gmail.com writes: In regard to what you said, do you mean I should build the tree before the expand pass, by writing a new pass that will work on the trees instead of rtx? Oh, sorry, I'm an idiot. I forgot that you only have RTL at this point. I would go with what you wrote and see what happens. Ian
[Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
--- Comment #14 from jv244 at cam dot ac dot uk 2009-09-01 06:56 --- I wanted to try Vladimir Makarov's new patch for this testcase, but on an unpatched trunk I notice a serious runtime regression with '-fschedule-insns' with respect to 4.3.3 Using as base options (for the attached testcase) gfortran -O3 -march=native -funroll-loops -ffast-math test.f90 4.3.3 w -fschedule-insns : 3.372s 4.3.3 w/o -fschedule-insns : 4.384s 4.4.2 w -fschedule-insns : 4.748s 4.4.2 w/o -fschedule-insns : 4.408s 4.5.0 w -fschedule-insns : 4.712s 4.5.0 w/o -fschedule-insns : 4.408s so 4.3 against 4.5 'w -fschedule-insns' is about 40% faster. I guess this is pretty target specific, I'm running this on an Opteron, this is what -v reports: Target: x86_64-unknown-linux-gnu Configured with: /data03/vondele/gcc_trunk/gcc/configure --disable-bootstrap --prefix=/data03/vondele/gcc_trunk/build --enable-languages=c,c++,fortran --disable-multilib --with-ppl=/data03/vondele/gcc_trunk/build/ --with-cloog=/data03/vondele/gcc_trunk/build/ Thread model: posix gcc version 4.5.0 20090830 (experimental) [trunk revision 151229] (GCC) COLLECT_GCC_OPTIONS='-O3' '-funroll-loops' '-ffast-math' '-fschedule-insns' '-v' '-shared-libgcc' /data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/f951 test.f90 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -quiet -dumpbase test.f90 -auxbase test -O3 -version -funroll-loops -ffast-math -fschedule-insns -fintrinsic-modules-path /data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/finclude -o /tmp/ccvGq2CO.s -- jv244 at cam dot ac dot uk changed: What|Removed |Added CC||vmakarov at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306
[Bug bootstrap/41205] [4.5 Regression] Bootstrap broken on i686-apple-darwin9 by revision 151249
-- dodji at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |dodji at gcc dot gnu dot org |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-09-01 07:00:06 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41205
[Bug bootstrap/41205] [4.5 Regression] Bootstrap broken on i686-apple-darwin9 by revision 151249
--- Comment #2 from dodji at gcc dot gnu dot org 2009-09-01 07:01 --- Created an attachment (id=18459) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18459action=view) Obvious fix candidate Could you please test this patch on darwin and see if it fixes bootstrap for you ? I am sorry I don't have any darwin system at hand to test. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41205
[Bug c/41203] -mtune=pentium2 structure related tree-outof-ssa internal compiler error
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-09-01 08:16 --- Works for me on a i686 host with current trunk. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41203
[Bug bootstrap/41205] [4.5 Regression] Bootstrap broken on i686-apple-darwin9 by revision 151249
--- Comment #3 from dodji at gcc dot gnu dot org 2009-09-01 08:46 --- Subject: Bug 41205 Author: dodji Date: Tue Sep 1 08:45:38 2009 New Revision: 151262 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=151262 Log: Fix bootstrap after patch PR debug/30161 gcc/ChangeLog: PR bootstrap/41205 Fix AIX bootstrap after PR debug/30161 * dwarf2out.c (make_ith_pack_parameter_name): Don't used strnlen that is a GNU extension. (tmpl_value_parm_die_table): Move the definition of this global outside #ifdef DWARF2_DEBUGGING_INFO region. gcc/cp/ChangeLog: PR bootstrap/41205 * pt.c (make_ith_pack_parameter_name): Don't use strnlen that is a GNU extension. Modified: trunk/gcc/ChangeLog trunk/gcc/cp/ChangeLog trunk/gcc/cp/pt.c trunk/gcc/dwarf2out.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41205
[Bug debug/30161] GCC should generate dwarf info about template parameters
--- Comment #10 from dodji at gcc dot gnu dot org 2009-09-01 08:46 --- Subject: Bug 30161 Author: dodji Date: Tue Sep 1 08:45:38 2009 New Revision: 151262 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=151262 Log: Fix bootstrap after patch PR debug/30161 gcc/ChangeLog: PR bootstrap/41205 Fix AIX bootstrap after PR debug/30161 * dwarf2out.c (make_ith_pack_parameter_name): Don't used strnlen that is a GNU extension. (tmpl_value_parm_die_table): Move the definition of this global outside #ifdef DWARF2_DEBUGGING_INFO region. gcc/cp/ChangeLog: PR bootstrap/41205 * pt.c (make_ith_pack_parameter_name): Don't use strnlen that is a GNU extension. Modified: trunk/gcc/ChangeLog trunk/gcc/cp/ChangeLog trunk/gcc/cp/pt.c trunk/gcc/dwarf2out.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30161
[Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
--- Comment #15 from bonzini at gnu dot org 2009-09-01 08:54 --- Please try -O2 and -O2 -funroll-loops too, since -O3 is not always good for speed. (It would be even better if -O2 is not slower and you can find out what the culprit is at -O3; this is not necessarily possible though). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306
[Bug bootstrap/41205] [4.5 Regression] Bootstrap broken on i686-apple-darwin9 by revision 151249
--- Comment #4 from dodji at gcc dot gnu dot org 2009-09-01 08:55 --- Fixed in trunk. -- dodji at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41205
[Bug c++/41207] New: The resulting 64-bit binary doesn't get recognized as proper binary by windows vista x64
x86_64-w64-mingw32-g++ produce binary will not run. Runtime : libqt4_plugin.dll' (%1 is not a valid Win32 application. (error 193)) plugins/libqt4_plugin.dll: file format pei-x86-64 Disassembly of section .text: 68e81000 _pre_c_init: 68e81000: 53 push %rbx 68e81001: b9 00 01 00 00 mov$0x100,%ecx 68e81006: 48 83 ec 20 sub$0x20,%rsp 68e8100a: e8 01 16 71 00 callq 69592610 _malloc 68e8100f: 48 89 c3mov%rax,%rbx 68e81012: 48 89 c1mov%rax,%rcx 68e81015: e8 f6 7f 6f 00 callq 69579010 __encode_pointer 68e8101a: 48 85 dbtest %rbx,%rbx 68e8101d: 48 89 05 bc a4 cb 00mov%rax,0xcba4bc(%rip)# 69b3b4e0 ___onexitbegin 68e81024: 48 89 05 c5 a4 cb 00mov%rax,0xcba4c5(%rip)# 69b3b4f0 ___onexitend 68e8102b: b8 01 00 00 00 mov$0x1,%eax 68e81030: 74 09 je 68e8103b _pre_c_init+0x3b 68e81032: 48 c7 03 00 00 00 00movq $0x0,(%rbx) 68e81039: 30 c0 xor%al,%al 68e8103b: 48 83 c4 20 add$0x20,%rsp 68e8103f: 5b pop%rbx 68e81040: c3 retq 68e81041: 66 66 66 66 66 66 2enopw %cs:0x0(%rax,%rax,1) 68e81048: 0f 1f 84 00 00 00 00 68e8104f: 00 68e81050 __CRT_INIT: 68e81050: 41 54 push %r12 68e81052: 55 push %rbp 68e81053: 57 push %rdi 68e81054: 4c 89 c7mov%r8,%rdi 68e81057: 56 push %rsi 68e81058: 53 push %rbx 68e81059: 48 89 cbmov%rcx,%rbx 68e8105c: 48 83 ec 20 sub$0x20,%rsp 68e81060: 85 d2 test %edx,%edx 68e81062: 75 7d jne68e810e1 __CRT_INIT+0x91 68e81064: 8b 15 96 ef ca 00 mov0xcaef96(%rip),%edx# 69b3 __bss_start__ 68e8106a: 31 c0 xor%eax,%eax 68e8106c: 85 d2 test %edx,%edx 68e8106e: 7e 66 jle68e810d6 __CRT_INIT+0x86 68e81070: 83 ea 01sub$0x1,%edx 68e81073: 31 c0 xor%eax,%eax 68e81075: 89 15 85 ef ca 00 mov%edx,0xcaef85(%rip)# 69b3 __bss_start__ 68e8107b: ba 01 00 00 00 mov$0x1,%edx 68e81080: f0 48 0f b1 15 87 a4lock cmpxchg %rdx,0xcba487(%rip) # 69b3b510 ___native_startup_lock 68e81087: cb 00 68e81089: 48 85 c0test %rax,%rax 68e8108c: 74 2a je 68e810b8 __CRT_INIT+0x68 68e8108e: 48 8b 35 6b ec cb 00mov0xcbec6b(%rip),%rsi# 69b3fd00 __imp__Sleep 68e81095: bf 01 00 00 00 mov$0x1,%edi 68e8109a: 31 db xor%ebx,%ebx 68e8109c: 0f 1f 40 00 nopl 0x0(%rax) 68e810a0: b9 e8 03 00 00 mov$0x3e8,%ecx 68e810a5: ff d6 callq *%rsi 68e810a7: 48 89 d8mov%rbx,%rax 68e810aa: f0 48 0f b1 3d 5d a4lock cmpxchg %rdi,0xcba45d(%rip) # 69b3b510 ___native_startup_lock 68e810b1: cb 00 68e810b3: 48 85 c0test %rax,%rax 68e810b6: 75 e8 jne68e810a0 __CRT_INIT+0x50 68e810b8: 8b 05 42 a4 cb 00 mov0xcba442(%rip),%eax# 69b3b500 ___native_startup_state 68e810be: 83 f8 02cmp$0x2,%eax 68e810c1: 0f 84 e9 00 00 00 je 68e811b0 __CRT_INIT+0x160 68e810c7: b9 1f 00 00 00 mov$0x1f,%ecx 68e810cc: e8 47 15 71 00 callq 69592618 __amsg_exit -- Summary: The resulting 64-bit binary doesn't get recognized as proper binary by windows vista x64 Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: t7 at gmail dot com GCC build triplet: x86_64-w64-mingw32 GCC host triplet: x86_64-w64-mingw32 GCC target triplet: x86_64-w64-mingw32 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41207
[Bug boehm-gc/41208] New: illegal instruction lwsync reported on e500
In running applications on e500, we got illegal instruction error, finally, we found it's caused by asm code below: gcc-4.3.2/boehm-gc/include/private/gc_locks.h __asm__ __volatile__(lwsync: : : memory); lwsync is not supported by e500, even though powerpc claims that. There are similar issues before since __NO_LWSYNC__ has been introduced. it should be modified to use msync or sync in this case. + #ifdef __NO_LWSYNC__ + __asm__ __volatile__(msync: : : memory); + #else __asm__ __volatile__(lwsync: : : memory); + #endif -- Summary: illegal instruction lwsync reported on e500 Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: boehm-gc AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: harry dot he at freescale dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: powerpc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41208
[Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
--- Comment #16 from jv244 at cam dot ac dot uk 2009-09-01 09:13 --- (In reply to comment #15) Please try -O2 and -O2 -funroll-loops too, since -O3 is not always good for speed. (It would be even better if -O2 is not slower and you can find out what the culprit is at -O3; this is not necessarily possible though). you're right that, without -fschedule-insns -O2 is faster than -O3 on this case, but nothing comes close to 4.3 performance. adding '-fschedule-insns' to the fastest -O2 choice makes it 20% slower. All numbers with trunk: -O2 -march=native -funroll-loops -ffast-math: 4.032 -O2 -march=native -funroll-loops -ffast-math -fschedule-insns: 4.712 -O3 -march=native -funroll-loops -ffast-math: 4.408 -O2 -march=native -ffast-math: 11.373 -O2 -march=native -ffast-math -fschedule-insns: 11.409 -O3 -march=native -ffast-math: 4.296 -O3 -march=native -ffast-math -fschedule-insns: 4.656 I can test other flags if you've a hint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306
[Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
--- Comment #17 from jv244 at cam dot ac dot uk 2009-09-01 09:17 --- (In reply to comment #16) All numbers with trunk: with 4.3 there is no difference between -O2 and -O3 -O2 -march=native -funroll-loops -ffast-math: 4.388 -O2 -march=native -funroll-loops -ffast-math -fschedule-insns: 3.352 -O3 -march=native -funroll-loops -ffast-math: 4.380 -O3 -march=native -funroll-loops -ffast-math -fschedule-insns: 3.372 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306
[Bug target/41156] [4.4/4.5 Regression] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #6 from jakub at gcc dot gnu dot org 2009-09-01 09:28 --- IMHO either standard options compiled code shouldn't be called from -mpreferred-stack-boundary=2 code, or it needs to be compiled with -mincoming-stack-boundary=2. But it should be user's responsibility. Ensuring by default outgoing calls are 16 byte aligned, but not assuming it is just a very stupid thing to do and unnecessarily penalizes normal users. It is certainly not true that most code is compiled with -mpreferred-stack-boundary=2, only kernel and a handful of packages is by default, and kernel has its own ABI (and doesn't use FPU nor SSE*). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156
[Bug c++/41153] [4.4 Regression] ICE in building Qt4 src/core
--- Comment #11 from jakub at gcc dot gnu dot org 2009-09-01 09:32 --- Mark, I don't think this should be P1, __optimize__ attribute is new in 4.4 (so considering it regression is already quite weird, though the attribute is ignored in older releases, so technically it is a regression, albeit one wouldn't use it with pre-4.4), but more importantly is known to be broken in many ways. -- jakub at gcc dot gnu dot org changed: What|Removed |Added CC||mmitchel at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41153
[Bug fortran/41209] New: Add full ATTRIBUTE support to gfortran (ALIGN, (WEAK)ALIAS, ...)
gfortran currently supports only STDCALL, FASTCALL and CDECL as attributes using !GCC$ ATTRIBUTE list :: symbol The attributes are saved as bit in the attr struct. For full attribute support, one presumably should save the string and convert it later in trans*.c into a TREE. The string should then be saved in the .mod file. (That is also the reason that one cannot directly save the attributes into a TREE.) Issue: For STDCALL etc. we do some conformance checking for proc-pointer assignments. That should continue to work. Maybe one needs to do further checks. See: http://gcc.gnu.org/onlinedocs/gfortran/GNU-Fortran-Compiler-Directives.html http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html PR 34112 and PR 40955 -- Summary: Add full ATTRIBUTE support to gfortran (ALIGN, (WEAK)ALIAS, ...) Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: burnus at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41209
[Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
--- Comment #29 from dominiq at lps dot ens dot fr 2009-09-01 09:37 --- Does anyone understand why commenting a write can change crtl-maybe_hot_insn_p from 1 to 0? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
[Bug target/41210] New: ICE with vsx_movv2df
gcc.target/powerpc/vsx-builtin-7.c testcase ICEs with -m32 -fstack-protector or -m64: ./cc1 -I include/ -O2 -m32 -fstack-protector -mcpu=power7 vsx-builtin-7.c (insn 19 31 22 2 vsx-builtin-7.c:21 (set (reg/i:V2DF 3 3) (mem/c/i:V2DF (plus:SI (reg:SI 9 9) (reg:SI 0 0 [135])) [6 D.1882+0 S16 A128])) 651 {*vsx_movv2df} (nil)) vsx-builtin-7.c:21:1: internal compiler error: in reload_cse_simplify_operands, at postreload.c:396 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. ./cc1 -I include/ -O2 -m64 -mcpu=power7 vsx-builtin-7.c vsx-builtin-7.c: In function âinsert_df_nâ: vsx-builtin-7.c:21:1: error: insn does not satisfy its constraints: (insn 19 32 22 2 vsx-builtin-7.c:21 (set (reg/i:V2DF 3 3) (mem/c/i:V2DF (plus:DI (reg:DI 9 9) (reg:DI 0 0 [136])) [6 D.1879+0 S16 A128])) 651 {*vsx_movv2df} (nil)) vsx-builtin-7.c:21:1: internal compiler error: in reload_cse_simplify_operands, at postreload.c:396 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. similarly, vsx-vector-5.c ICEs with -m64 -fstack-protector the same way. -- Summary: ICE with vsx_movv2df Product: gcc Version: 4.5.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jakub at gcc dot gnu dot org GCC target triplet: powerpc*-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41210
[Bug testsuite/41199] gcc.dg/20081223-1.c should be in gcc.dg/lto/
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-09-01 10:34 --- Fixed. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41199
[Bug testsuite/41199] gcc.dg/20081223-1.c should be in gcc.dg/lto/
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-09-01 10:34 --- Subject: Bug 41199 Author: rguenth Date: Tue Sep 1 10:34:17 2009 New Revision: 151265 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=151265 Log: 2009-09-01 Richard Guenther rguent...@suse.de PR lto/41199 * gcc.dg/20081223-1.c: Conditionalize -fwhopr on target lto. Modified: branches/lto/gcc/testsuite/ChangeLog.lto branches/lto/gcc/testsuite/gcc.dg/20081223-1.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41199
[Bug c/41196] The use of ARM NEON vshll_n_u8 intrinsic results in compile error on valid code
--- Comment #1 from ramana at gcc dot gnu dot org 2009-09-01 11:13 --- Also occurs with trunk. -- ramana at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-09-01 11:13:16 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41196
[Bug c/41211] New: internal compiler error when using x86_64-w64-mingw32-gcc to build sqlite3 alter.c
I built an x86_64-w64-mingw32 cross compiler under x86_64 linux using latest gcc SVN code, then use this cross compiler to build sqlite3 The compile failed with the following message : E:\code\sqlite3_sepmake gcc -pipe -Wall -g -O2 -save-temps -c -o alter.o alter.c gcc: warning: -pipe ignored because -save-temps specified alter.c: In function 'sqlite3AlterFinishAddColumn': alter.c:465:6: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. make: *** [alter.o] Error 1 -- Summary: internal compiler error when using x86_64-w64-mingw32- gcc to build sqlite3 alter.c Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drangon dot mail at gmail dot com GCC build triplet: x86_64-gnu-linux GCC host triplet: x86_64-gnu-linux GCC target triplet: x86_64-w64-mingw32 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41211
[Bug c/41211] internal compiler error when using x86_64-w64-mingw32-gcc to build sqlite3 alter.c
--- Comment #1 from drangon dot mail at gmail dot com 2009-09-01 11:22 --- Created an attachment (id=18460) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18460action=view) gzip of alter.i, -save-temps output -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41211
[Bug c/41211] internal compiler error when using x86_64-w64-mingw32-gcc to build sqlite3 alter.c
--- Comment #2 from drangon dot mail at gmail dot com 2009-09-01 11:25 --- Created an attachment (id=18461) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18461action=view) alter.s, -save-temps output -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41211
[Bug tree-optimization/41212] New: miscompilation at -O2
The program below is miscompiled using gfortran -O2 -o m m.f90; ./m gives: y= 0.60653065945526063 2*y= 2. (the second of the printed numbers should equal twice the first). Using gfortran -fno-inline -O2 -o m m.f90 works OK. The compiler is: Using built-in specs. Target: i686-pc-linux-gnu Configured with: /home/jerry/gcc/trunk/configure --prefix=/usr/local/gfortran --enable-languages=c,fortran --disable-libmudflap --enable-libgomp --disable-shared Thread model: posix gcc version 4.5.0 20090831 (experimental) [trunk revision 151238] (GCC) program m double precision :: y,z call b(1.0d0,y,z) print*,'y= ', y, ' 2*y=', z contains subroutine b( x, y, z) implicit none double precision :: x,y,z integer :: i, k double precision :: h, r y = 1.0d0 z = 0.0d0 h = 0 DO k = 1,10 h = h + 1.0d0/k r = 1 DO i = 1,k r = (x/(2*i) ) * r END DO y = y + (-1)**k * r z = z + (-1)**(k+1) * h * r IF ( ABS(2*k/x*r) 1d-6 ) EXIT END DO z = 2*y end subroutine b end program m -- Summary: miscompilation at -O2 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jpr at csc dot fi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41212
[Bug fortran/41165] -std=f95: Reject PRODUCT in initialization expressions.
--- Comment #3 from dfranke at gcc dot gnu dot org 2009-09-01 11:49 --- It turns out, that the PRODUCT is already simplified to EXPR_CONST before is is checked in expr.c (check_init_expr). To be more specific, in resolve.c (resolve_unknown_f) the simplification is implied via intrinsic.c (gfc_intrinsic_func_interface). The latter returns MATCH_YESif the call corresponds to an intrinsic, simplification is done if possible. -- dfranke at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-09-01 11:49:55 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41165
[Bug libstdc++/41058] FAIL: ext/pb_ds/regression/hash_data_map_rand.cc
--- Comment #29 from ubizjak at gmail dot com 2009-09-01 12:00 --- (In reply to comment #28) This may be related to PR 37144. No, it was assembler bug with 2.19.1 in my case. -- ubizjak at gmail dot com changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41058
[Bug libgcj/40868] ecjx.cc should be compiled by host gcc
--- Comment #1 from ramana at gcc dot gnu dot org 2009-09-01 12:03 --- This sounds correct to me . Adding one of the libjava maintainers to comment on this. Patches should be submitted to the correct mailing list. -- ramana at gcc dot gnu dot org changed: What|Removed |Added CC||aph at redhat dot com Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-09-01 12:03:59 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40868
[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns
--- Comment #17 from ubizjak at gmail dot com 2009-09-01 12:08 --- Patch at http://gcc.gnu.org/ml/gcc-patches/2009-09/msg3.html -- ubizjak at gmail dot com changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2009- ||09/msg3.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319
[Bug tree-optimization/41213] New: miscompilation at -O2
The program below is miscompiled using gfortran -O2 -o m m.f90; ./m gives: y= 0.60653065945526063 2*y= 2. (the second of the printed numbers should equal twice the first). Using gfortran -fno-inline -O2 -o m m.f90 works OK. The compiler is: Using built-in specs. Target: i686-pc-linux-gnu Configured with: /home/jerry/gcc/trunk/configure --prefix=/usr/local/gfortran --enable-languages=c,fortran --disable-libmudflap --enable-libgomp --disable-shared Thread model: posix gcc version 4.5.0 20090831 (experimental) [trunk revision 151238] (GCC) program m double precision :: y,z call b(1.0d0,y,z) print*,'y= ', y, ' 2*y=', z contains subroutine b( x, y, z) implicit none double precision :: x,y,z integer :: i, k double precision :: h, r y = 1.0d0 z = 0.0d0 h = 0 DO k = 1,10 h = h + 1.0d0/k r = 1 DO i = 1,k r = (x/(2*i) ) * r END DO y = y + (-1)**k * r z = z + (-1)**(k+1) * h * r IF ( ABS(2*k/x*r) 1d-6 ) EXIT END DO z = 2*y end subroutine b end program m -- Summary: miscompilation at -O2 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jpr at csc dot fi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41213
[Bug c++/41058] FAIL: ext/pb_ds/regression/hash_data_map_rand.cc
--- Comment #30 from paolo dot carlini at oracle dot com 2009-09-01 12:12 --- Even if the bug is fixed, I think it would be nice to have it properly categorization: I can see only a C++ front-end patch in the trail, thus I'm changing the category to C++. If I'm wrong, please improve it... -- paolo dot carlini at oracle dot com changed: What|Removed |Added Component|libstdc++ |c++ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41058
[Bug libstdc++/41214] New: [4.5 regression] Null pointer dereferenced in _Unwind_SetGR()
GCC 4.5.0 20090827 When I run any program which throws an exception, I get a segfault: __cxa_throw . libstdc++-v3/libsupc++/eh_throw.cc:78 _Unwind_RaiseException .. gcc/unwind.inc:62 __gxx_personality_v0 libsupc++/eh_personality.cc:706 _Unwind_SetGR ... gcc/unwind-dw2.c:215 GCC is compiled with -O3, maybe it makes a difference. -- Summary: [4.5 regression] Null pointer dereferenced in _Unwind_SetGR() Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: d dot g dot gorbachev at gmail dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41214
[Bug tree-optimization/41212] miscompilation at -O2
--- Comment #1 from jpr at csc dot fi 2009-09-01 12:15 --- *** Bug 41213 has been marked as a duplicate of this bug. *** -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41212
[Bug tree-optimization/41213] miscompilation at -O2
--- Comment #1 from jpr at csc dot fi 2009-09-01 12:15 --- Sorry about the duplicate... *** This bug has been marked as a duplicate of 41212 *** -- jpr at csc dot fi changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41213
[Bug c++/41214] [4.5 regression] Null pointer dereferenced in _Unwind_SetGR()
--- Comment #1 from paolo dot carlini at oracle dot com 2009-09-01 12:21 --- For sure, this isn't a library issue. -- paolo dot carlini at oracle dot com changed: What|Removed |Added Component|libstdc++ |c++ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41214
[Bug lto/41058] FAIL: ext/pb_ds/regression/hash_data_map_rand.cc
--- Comment #31 from rguenth at gcc dot gnu dot org 2009-09-01 12:21 --- Let's reopen it as LTO specific. The test still fails on i?86 with the original multi-file testcase and -flto. There are also other similar pb_ds fails. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|RESOLVED|REOPENED Component|c++ |lto Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41058
[Bug fortran/41209] Add full ATTRIBUTE support to gfortran (ALIGN, (WEAK)ALIAS, ...)
--- Comment #1 from burnus at gcc dot gnu dot org 2009-09-01 13:29 --- As fun one can think of supporting also alignment within a TYPE, similarly to C's: struct foo { int x[2] __attribute__ ((aligned (8))); }; See http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html for details. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41209
[Bug target/41156] [4.4/4.5 Regression] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #7 from hjl dot tools at gmail dot com 2009-09-01 13:20 --- Realign the incoming stack for vectorizer has very limited impact on performance. Here are the differences of -m32 -O3 -msse2 -mfpmath=sse -ffast-math -funroll-loops before and after my patch: 400.perlbench-0.384615% 401.bzip20% 403.gcc -0.362319% 429.mcf -0.813008% 445.gobmk0.921659% 456.hmmer0.549451% 458.sjeng-0.438596% 462.libquantum 0% 464.h264ref 0% 471.omnetpp -0.478469% 473.astar-0.645161% 483.xalancbmk-0.727273% SPECint(R)_base2006 -0.411523% 410.bwaves -0.406504% 416.gamess 0% 433.milc -1.36986% 434.zeusmp -0.44843% 435.gromacs 0% 436.cactusADM0% 437.leslie3d -0.89% 444.namd 1.20482% 447.dealII -0.350877% 450.soplex -0.31746% 453.povray 0.458716% 454.calculix 0% 459.GemsFDTD 0% 465.tonto0% 470.lbm 0% 481.wrf 0.480769% 482.sphinx3 0.940439% SPECfp(R)_base2006 0% It won't change generated code if vectorizer isn't enabled. Its benifits outweigh its drawbacks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156
[Bug tree-optimization/41212] miscompilation at -O2
--- Comment #2 from dominiq at lps dot ens dot fr 2009-09-01 13:37 --- Confirmed on (powerpc|i686)-apple-darwin9 in 32 bit mode and -O2 or above. This is a regression: I get 1.21306131891052 with gcc 4.2.4, 4.3.4, and 4.4.1. I also get 1.21306131891052 with recent revisions of trunk with -m64. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41212
[Bug c++/41153] [4.4 Regression] ICE in building Qt4 src/core
--- Comment #12 from mmitchel at gcc dot gnu dot org 2009-09-01 13:54 --- I think the question is whether the use of __optimize__ is in a standard Qt release. If it is, then I'm quite concerned; it's bad if GCC 4.4.2 can't build Qt/KDE. (TBH, I'm concerned anyhow; if __optimize__ is unreliable, then perhaps we should be ignoring/warning about it in 4.4.x until we get it solid...) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41153
[Bug target/41021] [ARM] Suboptimal code generated to store a struct
--- Comment #4 from jamborm at gcc dot gnu dot org 2009-09-01 14:01 --- Indeed. SRA should not trigger here, that would make it too eager in other cases (thus I'm removing myself from the CC, feel free to add me again if there's any discussion that might concern me or SRA again). -- jamborm at gcc dot gnu dot org changed: What|Removed |Added CC|jamborm at gcc dot gnu dot | |org | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41021
[Bug libgcj/40868] ecjx.cc should be compiled by host gcc
--- Comment #2 from aph at gcc dot gnu dot org 2009-09-01 14:06 --- Assigning to Tom tromey: this is his area. -- aph at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |tromey at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40868
[Bug c/41215] New: request: option to suppress discarded qualifiers warnings
There is currently no GCC option that will turn off discarded qualifiers warnings, which typically arise from const/non-const mismatches. These warnings look like this: warning: assignment discards qualifiers from pointer target type warning: passing argument 1 of foo discards qualifiers from pointer target type This is a request for an option to suppress these warnings. A typical C programmer should see these warnings so that they can fix them. But in tools that automatically generate C code it may sometimes be difficult to avoid large numbers of warnings like this, and so as a practical matter it may be helpful to be able to suppress them. (One such tool is the Vala compiler; see http://live.gnome.org/Vala). -- Summary: request: option to suppress discarded qualifiers warnings Product: gcc Version: 4.3.3 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: adam at yorba dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41215
[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
--- Comment #2 from ramana at gcc dot gnu dot org 2009-09-01 14:51 --- I'm afraid nothing much can be done without a smaller testcase than this. What happens if you don't use -fno-omit-frame-pointer ? -- ramana at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug bootstrap/40651] bootstrap error on arm-linux-gnueabi: segfault in next_const_call_expr_arg
--- Comment #3 from ramana at gcc dot gnu dot org 2009-09-01 14:56 --- Does this still occur with trunk ? -- ramana at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40651
[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
--- Comment #3 from siarhei dot siamashka at gmail dot com 2009-09-01 15:08 --- It works fine if '-fno-omit-frame-pointer' is removed. I agree that this is quite a large and convoluted function. Unfortunately I did not manage to reduce it to something smaller that would still result in broken behaviour. My only guess is that the stack frame which is bigger than 4K may make some difference. I have a full linux system compiled with -fno-omit-frame-pointer (to get stack backtraces and generate callgraphs in oprofile). If anything simpler happens to to be broken too, I'll try to investigate it and provide additional details. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug c++/41195] floating point optimization
--- Comment #3 from Vikas dot Mehta at roguewave dot com 2009-09-01 15:22 --- Subject: RE: floating point optimization Thanks for looking into this issue. Target: x86_64-redhat-linux -Original Message- From: pinskia at gcc dot gnu dot org [mailto:gcc-bugzi...@gcc.gnu.org] Sent: Monday, August 31, 2009 12:21 AM To: Vikas Mehta Subject: [Bug c++/41195] floating point optimization --- Comment #2 from pinskia at gcc dot gnu dot org 2009-08-31 06:21 --- I think this is a duplicate of bug 323. What target are you using? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41195
[Bug c++/41216] New: G++ failed to correctly resolve the template parameters.
Platform£º WinXP SP3 + Eclipse CDT 6.0 + MinGW 5.1.4 + GCC 4.4.1 (TDM's Build) Bug description: GCC failed to correctly resolve the template parameters when nestedly calling friend function templates of class templates. Sample codes£º /* TestMinGW.cpp*/ #include iostream #include vector /* Class Inner */ struct Inner { template class IStream friend IStream operator ( IStream in, Inner inner ); Inner() : data(0) {} template class IStream explicit Inner( IStream in ) { operator (in,*this); } int data; }; template class IStream IStream operator ( IStream in, Inner inner ) { in inner.data; return in; } /* Class Outter */ struct Outter { template class IStream friend IStream operator ( IStream in, Outter outter ); template class IStream explicit Outter( IStream in ) { Inner inner(in); inners.push_back(inner); } std::vector Inner inners; }; template class IStream IStream operator ( IStream in, Outter inner ) { in inner; return in; } /* Main */ int main ( int argc, char* argv[] ) { Inner inner(std::cin);// OK. Outter outter(std::cin); // Compilation error. return 0; } Compiler log: g++ -v -save-temps -O3 -Wall -c -fmessage-length=0 -oSrc\TestMinGW.o ..\Src\TestMinGW.cpp Using built-in specs. Target: mingw32 Configured with: ../../gcc-4.4.1/configure --prefix=/mingw --build=mingw32 --enable-languages=c,ada,c++,fortran,objc,obj-c++ --disable-nls --disable-win32-registry --enable-libgomp --enable-cxx-flags='-fno-function-sections -fno-data-sections' --disable-werror --enable-threads --disable-symvers --enable-version-specific-runtime-libs --enable-fully-dynamic-string --with-pkgversion='TDM-1 mingw32' --enable-sjlj-exceptions --with-bugurl=http://www.tdragon.net/recentgcc/bugs.php Thread model: win32 gcc version 4.4.1 (TDM-1 mingw32) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-Wall' '-c' '-fmessage-length=0' '-oSrc\TestMinGW.o' '-mtune=i386' d:/mingw-5.1.4/bin/../libexec/gcc/mingw32/4.4.1/cc1plus.exe -E -quiet -v -iprefix d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/ ..\Src\TestMinGW.cpp -mtune=i386 -Wall -fmessage-length=0 -O3 -fpch-preprocess -o TestMinGW.ii ignoring nonexistent directory d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/../../../../mingw32/include ignoring duplicate directory d:/mingw-5.1.4/lib/gcc/../../lib/gcc/mingw32/4.4.1/include/c++ ignoring duplicate directory d:/mingw-5.1.4/lib/gcc/../../lib/gcc/mingw32/4.4.1/include/c++/mingw32 ignoring duplicate directory d:/mingw-5.1.4/lib/gcc/../../lib/gcc/mingw32/4.4.1/include/c++/backward ignoring nonexistent directory /mingw/include ignoring duplicate directory d:/mingw-5.1.4/lib/gcc/../../include ignoring duplicate directory d:/mingw-5.1.4/lib/gcc/../../lib/gcc/mingw32/4.4.1/include ignoring duplicate directory d:/mingw-5.1.4/lib/gcc/../../lib/gcc/mingw32/4.4.1/include-fixed ignoring nonexistent directory d:/mingw-5.1.4/lib/gcc/../../lib/gcc/mingw32/4.4.1/../../../../mingw32/include ignoring nonexistent directory /mingw/include #include ... search starts here: #include ... search starts here: D:/boost-1.40.0/include D:/dlib-17.21 d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include/c++ d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include/c++/mingw32 d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include/c++/backward d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/../../../../include d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include-fixed End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-Wall' '-c' '-fmessage-length=0' '-oSrc\TestMinGW.o' '-mtune=i386' d:/mingw-5.1.4/bin/../libexec/gcc/mingw32/4.4.1/cc1plus.exe -fpreprocessed TestMinGW.ii -quiet -dumpbase TestMinGW.cpp -mtune=i386 -auxbase-strip Src\TestMinGW.o -O3 -Wall -version -fmessage-length=0 -o TestMinGW.s GNU C++ (TDM-1 mingw32) version 4.4.1 (mingw32) compiled by GNU C version 4.4.1, GMP version 4.3.0, MPFR version 2.4.1. GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 8d2e404a57d82e42e16008cfa0818446 ..\Src\TestMinGW.cpp: In function 'IStream operator(IStream, Inner) [with IStream = Inner]': ..\Src\TestMinGW.cpp:8: instantiated from 'Inner::Inner(IStream) [with IStream = Inner]' d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include/c++/bits/stl_uninitialized.h:74: instantiated from 'static _ForwardIterator std::__uninitialized_copyanonymous ::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = Inner*, _ForwardIterator = Inner*, bool anonymous = false]' d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include/c++/bits/stl_uninitialized.h:117: instantiated from '_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = Inner*, _ForwardIterator = Inner*]' d:\mingw-5.1.4\bin\../lib/gcc/mingw32/4.4.1/include/c++/bits/stl_uninitialized.h:257: instantiated from '_ForwardIterator
[Bug c++/41216] G++ failed to correctly resolve the template parameters.
--- Comment #1 from yimingli0126 at 163 dot com 2009-09-01 16:16 --- Created an attachment (id=18462) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18462action=view) The source file rising the compilation failure. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41216
[Bug driver/39356] assembler isn't called
--- Comment #19 from ktietz at gcc dot gnu dot org 2009-09-01 16:16 --- I verified it by myself and it is a duplicate of 41184 *** This bug has been marked as a duplicate of 41184 *** -- ktietz at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39356
[Bug c/41184] wrong optimise code, epilogue code adjust wrong rsp before pop
--- Comment #8 from ktietz at gcc dot gnu dot org 2009-09-01 16:16 --- *** Bug 39356 has been marked as a duplicate of this bug. *** -- ktietz at gcc dot gnu dot org changed: What|Removed |Added CC||rainer at emrich-ebersheim ||dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41184
[Bug target/35124] Method _alloca is defined different by MSVCRT as by gcc.
--- Comment #9 from ktietz at gcc dot gnu dot org 2009-09-01 16:20 --- As the initial reason of this bug is solved, I close it. In fact is the __chkstk function here incompatible to VC version, but this should be part of a feature request. -- ktietz at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35124
[Bug libgcj/40868] ecjx.cc should be compiled by host gcc
--- Comment #3 from tromey at gcc dot gnu dot org 2009-09-01 16:58 --- I think it isn't correct to use gcc directly. You probably have to introduce a new variable. But, I don't see why we need ecjx.cc at all. I think it must be to work around some other problem. Maybe instead we could just fix that problem directly. Apparently it came in here, though I don't see why: http://gcc.gnu.org/ml/java-patches/2008-q4/msg00067.html See also PR 38396 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40868
[Bug libstdc++/41216] G++ failed to correctly resolve the template parameters.
--- Comment #2 from paolo dot carlini at oracle dot com 2009-09-01 17:07 --- Seems a problem with the stricter uninitialized_copy we have got since 4.3... Investigating. -- paolo dot carlini at oracle dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |paolo dot carlini at oracle |dot org |dot com Component|c++ |libstdc++ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41216
[Bug libgcj/40868] ecjx.cc should be compiled by host gcc
--- Comment #4 from aph at gcc dot gnu dot org 2009-09-01 17:09 --- Hmm, I seem to have approved that patch. I agree with you: I can't see why the specfile change requires ecjx.cc. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40868
[Bug c++/41214] [4.5 regression] Null pointer dereferenced in _Unwind_SetGR()
--- Comment #2 from ubizjak at gmail dot com 2009-09-01 17:26 --- No testcase - no analysis - no solution. -- ubizjak at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41214
[Bug tree-optimization/41212] [4.5 Regression] miscompilation at -O2
--- Comment #3 from ubizjak at gmail dot com 2009-09-01 17:31 --- Per comment #2. -- ubizjak at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-09-01 17:31:02 date|| Summary|miscompilation at -O2 |[4.5 Regression] ||miscompilation at -O2 Target Milestone|--- |4.5.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41212
[Bug fortran/41209] Add full ATTRIBUTE support to gfortran (ALIGN, (WEAK)ALIAS, ...)
--- Comment #2 from burnus at gcc dot gnu dot org 2009-09-01 17:39 --- Created an attachment (id=18463) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18463action=view) Draft patch - first steps but incomplete will not work Draft patch (not really work yet). TODO: a) Copy attributes properly and free them at the end b) Write/read them from the .mod file c) Use an ENUM for stdcall etc.? Requires a check whether they are already set but those options are orthogonal thus that makes sense. TODO 2: Check whether derived types are properly handled: bind(C) vs. sequence vs. extensible types. And attributes to components vs. attributes to the type as a whole. Plus checking whether we set some attributes by default which clash with user settings. Plus: Are additional front-end checks needed besides stdcall etc. conformance for proc-pointers? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41209
[Bug target/39112] incorrect value of a static const double class member
--- Comment #4 from ktietz at gcc dot gnu dot org 2009-09-01 17:40 --- *** This bug has been marked as a duplicate of 41184 *** -- ktietz at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39112
[Bug c/41184] wrong optimise code, epilogue code adjust wrong rsp before pop
--- Comment #9 from ktietz at gcc dot gnu dot org 2009-09-01 17:40 --- *** Bug 39112 has been marked as a duplicate of this bug. *** -- ktietz at gcc dot gnu dot org changed: What|Removed |Added CC||alexey dot pushkin at ||mererand dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41184
[Bug target/41211] internal compiler error when using x86_64-w64-mingw32-gcc to build sqlite3 alter.c
--- Comment #4 from ubizjak at gmail dot com 2009-09-01 17:53 --- Works for me with a crosscompiler from linux to mingw: Target: x86_64-pc-mingw32 Configured with: ../gcc-svn/trunk/configure --target=x86_64-pc-mingw32 Thread model: win32 gcc version 4.5.0 20090901 (experimental) [trunk revision 151274] (GCC) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41211
[Bug bootstrap/39686] ./gcc/xgcc crashes on mingw
--- Comment #1 from ktietz at gcc dot gnu dot org 2009-09-01 18:26 --- I can't reproduce this failure. Neither for msys, linux, and linux64, nor for cygwin. Does it still exists for you? Which host and build environment you are using? -- ktietz at gcc dot gnu dot org changed: What|Removed |Added CC||ktietz at gcc dot gnu dot ||org Summary|./gcc/xgcc crashes on mingw |./gcc/xgcc crashes on mingw http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39686
[Bug target/40802] Libstdc++ is broken for win64 host
--- Comment #16 from ktietz at gcc dot gnu dot org 2009-09-01 18:38 --- (In reply to comment #15) GCC 4.5 [Trunk], SVN Revision 149872. Because Win64 testing is so hard to come by, I took the initiative of deleting the entire tree, re-checking it out, and building from scratch. I am sorry, I am still encountering the following: make[3]: Entering directory `/home/peter/mount/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3' Making all in include make[4]: Entering directory `/home/peter/mount/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include' mkdir -p ./x86_64-w64-mingw32/bits/stdc++.h.gch x86_64-w64-mingw32-c++ -L/home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/winsup/mingw -L/home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/winsup/w32api/lib -isystem /home/peter/build/GCC/gcc-trunk/winsup/mingw/include -isystem /home/peter/build/GCC/gcc-trunk/winsup/w32api/include-x c++-header -g -O2 -I/home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/x86_64-w64-mingw32 -I/home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include -I/home/peter/build/GCC/gcc-trunk/libstdc++-v3/libsupc++ -O2 -g -std=gnu++0x /home/peter/build/GCC/gcc-trunk/libstdc++-v3/include/precompiled/stdc++.h \ -o x86_64-w64-mingw32/bits/stdc++.h.gch/O2ggnu++0x.gch In file included from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/bits/move.h:38:0, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/bits/stl_pair.h:60, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/bits/stl_algobase.h:66, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/bits/char_traits.h:41, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/ios:41, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/istream:40, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/sstream:39, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/complex:47, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/ccomplex:42, from /home/peter/build/GCC/gcc-trunk/libstdc++-v3/include/precompiled/stdc++.h:51: /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/type_traits:185:62: error: a function call cannot appear in a constant-expression /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/type_traits:185:63: error: template argument 2 is invalid /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/type_traits:215:54: error: a function call cannot appear in a constant-expression /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/type_traits:215:55: error: template argument 2 is invalid In file included from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/fenv.h:50:0, from /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/cfenv:44, from /home/peter/build/GCC/gcc-trunk/libstdc++-v3/include/precompiled/stdc++.h:52: /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/tr1_impl/cfenv:49:11: error: ::fenv_t has not been declared /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/tr1_impl/cfenv:50:11: error: ::fexcept_t has not been declared /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/tr1_impl/cfenv:53:11: error: ::feclearexcept has not been declared /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/tr1_impl/cfenv:54:11: error: ::fegetexceptflag has not been declared /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/tr1_impl/cfenv:55:11: error: ::feraiseexcept has not been declared /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/tr1_impl/cfenv:56:11: error: ::fesetexceptflag has not been declared /home/peter/build/GCC/gcc-trunk/build-win-149872-20090721/x86_64-w64-mingw32/libstdc++-v3/include/tr1_impl/cfenv:57:11:
[Bug target/41211] internal compiler error when using x86_64-w64-mingw32-gcc to build sqlite3 alter.c
--- Comment #5 from ktietz at gcc dot gnu dot org 2009-09-01 18:59 --- (In reply to comment #4) Works for me with a crosscompiler from linux to mingw: Target: x86_64-pc-mingw32 Configured with: ../gcc-svn/trunk/configure --target=x86_64-pc-mingw32 Thread model: win32 gcc version 4.5.0 20090901 (experimental) [trunk revision 151274] (GCC) Works for me too. Maybe a duplicate of PR/41184 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41211
[Bug driver/41217] New: Driver crashes if -o specified without filename
$ ./xgcc -v Using built-in specs. Target: i686-pc-linux-gnu Configured with: ../configure --enable-languages=c Thread model: posix gcc version 4.5.0 20090901 (experimental) [trunk revision 151276] (GCC) $ ./xgcc -B. -o Segmentation fault (gdb) bt #0 0xb7f43613 in strlen () from /lib/tls/i686/cmov/libc.so.6 #1 0x08065b27 in xstrdup (s=0x0) at ../../libiberty/xstrdup.c:33 #2 0x0804f55e in process_command (argc=3, argv=0x9ff62b8) at ../../gcc/gcc.c:4164 #3 0x0805720f in main (argc=3, argv=0xbfe5d574) at ../../gcc/gcc.c:6823 Introduced by http://gcc.gnu.org/viewcvs?view=revisionrevision=145470 Only happens in gcc 4.5.0. Index: gcc.c === --- gcc.c (revision 151276) +++ gcc.c (working copy) @@ -4161,7 +4161,10 @@ argv[i] = convert_filename (argv[i], ! have_c, 0); #endif /* Save the output name in case -save-temps=obj was used. */ - save_temps_prefix = xstrdup ((p[1] == 0) ? argv[i + 1] : argv[i] + 1); + if ((p[1] == 0) argv[i + 1]) + save_temps_prefix = xstrdup(argv[i + 1]); + else + save_temps_prefix = xstrdup(argv[i] + 1); goto normal_switch; default: -- Summary: Driver crashes if -o specified without filename Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rmansfield at qnx dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41217