Re: [gcc-in-cxx] replacing qsort with std::sort
Hi, On Mon, 31 Aug 2009, Pedro Lamarão wrote: > 2009/8/28 Pedro Lamarão : > > > I have not yet made complete size and execution speed measurements, though. > > I've run the test suite and there are some failures; I think many of > > them are not regressions when compared with a pure build with C++. > > Comparing trunk -r151160 and trunk -t151160 --enable-build-with-cxx > + patches, these are the sizes of xgcc and g++ before strip: > > [psi...@joana obj]$ ls -lh gcc/xgcc gcc/g++ > -rwxrwxr-x. 1 psilva psilva 481K Ago 31 12:58 gcc/g++ > -rwxrwxr-x. 1 psilva psilva 477K Ago 31 12:58 gcc/xgcc That's not the real compiler, only the compiler driver. Look for files named cc1 (the C compiler) and cc1plus (the C++ compiler) :-) Ciao, Michael.
gcc-4.4-20090901 is now available
Snapshot gcc-4.4-20090901 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20090901/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 151295 You'll find: gcc-4.4-20090901.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20090901.tar.bz2 C front end and core compiler gcc-ada-4.4-20090901.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20090901.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20090901.tar.bz2 C++ front end and runtime gcc-java-4.4-20090901.tar.bz2 Java front end and runtime gcc-objc-4.4-20090901.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20090901.tar.bz2The GCC testsuite Diffs from 4.4-20090825 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Replacing certain operations with function calls
Jean Christophe Beyler writes: > In regard to what you said, do you mean I should build the tree before > the expand pass, by writing a new pass that will work on the trees > instead of rtx? Oh, sorry, I'm an idiot. I forgot that you only have RTL at this point. I would go with what you wrote and see what happens. Ian
Re: Replacing certain operations with function calls
Finally, I guess the one thing I can do is simply generate pseudo_registers and copy all my registers into the pseudos before the call I make. Then I do my expand like I showed above. And finally, move everything back. Later passes will remove anything that was not needed, anything that was will be kept. This could be a solution to the second issue but I'll wait to understand what you meant first. Jc On Tue, Sep 1, 2009 at 6:35 PM, Jean Christophe Beyler wrote: > I don't think I quite understand what you're meaning. I want to use > the standard ABI, basically I want to transform certain operations > into function calls. > > In regard to what you said, do you mean I should build the tree before > the expand pass, by writing a new pass that will work on the trees > instead of rtx? > > Otherwise, I fail to see how that is different to what I'm already > doing. Would you have an example? > > Thanks, > Jc > > PS: Although when I look at what GCC generates at the expand stage, it > really does seem that he first generates the calculation of the > parameters in pseudo-registers and then moves them to the actual > output registers. It's the next phases that will combine the two to > save a move. > > On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylor wrote: >> Jean Christophe Beyler writes: >> >>> First off: does this seem correct? >> >> Awkward though it is, it may be more reliable to build a small tree here >> and pass it to expand_call. This assumes that you want to use the >> standard ABI when calling this function. >> >> Then your second issue would go away. >> >> Ian >> >
Re: Replacing certain operations with function calls
I don't think I quite understand what you're meaning. I want to use the standard ABI, basically I want to transform certain operations into function calls. In regard to what you said, do you mean I should build the tree before the expand pass, by writing a new pass that will work on the trees instead of rtx? Otherwise, I fail to see how that is different to what I'm already doing. Would you have an example? Thanks, Jc PS: Although when I look at what GCC generates at the expand stage, it really does seem that he first generates the calculation of the parameters in pseudo-registers and then moves them to the actual output registers. It's the next phases that will combine the two to save a move. On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylor wrote: > Jean Christophe Beyler writes: > >> First off: does this seem correct? > > Awkward though it is, it may be more reliable to build a small tree here > and pass it to expand_call. This assumes that you want to use the > standard ABI when calling this function. > > Then your second issue would go away. > > Ian >
Re: Replacing certain operations with function calls
Jean Christophe Beyler writes: > First off: does this seem correct? Awkward though it is, it may be more reliable to build a small tree here and pass it to expand_call. This assumes that you want to use the standard ABI when calling this function. Then your second issue would go away. Ian
Re: Replacing certain operations with function calls
Actually, what I've done is probably something in between what you were suggesting and what I was initially doing. If we consider the multiplication, I've modified the define_expand for example to: (define_expand "muldi3" [(set (match_operand:DI 0 "register_operand" "") (mult:DI (match_operand:DI 1 "register_operand" "") (match_operand:DI 2 "register_operand" "")))] "" " { emit_function_call_2args (DImode, DImode, DImode, \"my_version_of_mull\", operands[0], operands[1], operands[2]); DONE; }") and my emit function is: void emit_function_call_2args ( enum machine_mode return_mode, enum machine_mode arg1_mode, enum machine_mode arg2_mode, const char *fname, rtx op0, rtx op1, rtx op2) { tree id; rtx insn; /* Move arguments */ emit_move_insn (gen_rtx_REG (arg1_mode, GP_ARG_FIRST), op1); emit_move_insn (gen_rtx_REG (arg2_mode, GP_ARG_FIRST + 1), op2); /* Get name */ id = get_identifier (fname); /* Generate call value */ insn = gen_call_value ( gen_rtx_REG (return_mode, 6), gen_rtx_MEM (DImode, gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id))), GEN_INT (64), NULL ); /* Annotate the call to say we are using both argument registers */ use_reg (&CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG (arg1_mode, GP_ARG_FIRST)); use_reg (&CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG (arg1_mode, GP_ARG_FIRST + 1)); /* Emit call */ emit_call_insn (insn); /* Set back return to op0 */ emit_move_insn (op0, gen_rtx_REG (return_mode, GP_RETURN)); } First off: does this seem correct? Second, I have a bit of a worry in the case where, if we consider this C code : bar (a * b, c * d); it is possible that the compiler would have normally generated this : mult output1, a, b mult output2, c, d call bar Which would be problematic for my expand system since this would expand into: mov output1, a mov output2, b call internal_mult mov output1, return_reg mov output1, c #Rewriting on output1... mov output2, d call internal_mult mov output2, return_reg call bar However, I am unsure this is possible in the expand stage, would the expand stage automatically have this instead: mult tmp1, a, b mult tmp2, c, d mov output1, tmp1 mov output2, tmp2 call bar in which case, I know I can do what I am currently doing. Thanks again for your help and I apologize for these basic questions... Jc On Tue, Sep 1, 2009 at 2:30 PM, Jean Christophe Beyler wrote: > I have looked at how other targest use the > init_builtins/expand_builtins. Of course, I don't understand > everything there but it seems indeed to be more for generating a > series of instructions instead of a function call. I haven't seen > anything resembling what I want to do. > > I had also first thought of going directly in the define_expand and > expanding to the function call I would want. The problem I have is > that it is unclear to me how to handle (set-up) the arguments of the > builtin_function I am trying to define. > > To go from no function call to : > > - Potentially spill output registers > - Potentially spill scratch registers > > - Setup output registers with the operands > - Perform function call > - Copy return to output operand > > - Potentially restore scratch registers > - Potentially restore output registers > > Seems a bit difficult to do at the define_expand level and might not > generate good code. I guess I could potentially perform a pass in the > tree representation to do what I am looking for but I am not sure that > that is the best solution either. > > For the moment, I will continue looking at what you suggest and also > see if my solution works. I see that, for example, the compiler will > not always generate the call I need to change. Therefore, it does seem > that I need another solution than the one I propose. > > I'm more and more considering a pass in the middle-end to get what I > need. Do you think this is better? > > Thanks for your input, > Jc > > On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylor wrote: >> Jean Christophe Beyler writes: >> >>> I have been also been looking into how to generate a function call for >>> certain operations. I've looked at various other targets for a similar >>> problem/solution but have not seen anything. On my target >>> architecture, we have certain optimized versions of the multiplication >>> for example. >>> >>> I wanted to replace certain mutliplications with a function call. The >>> solution I found was to do perform a FAIL on the define_expand of the >>> multiplication for these cases. This forces the comp
Re: Using MEM_EXPR inside a call expression
Richard Henderson writes: > On 09/01/2009 12:48 PM, Adam Nemet wrote: > > I see. So I guess you're saying that there is little chance to optimize the > > loop I had in my previous email ;(. > > Not at the rtl level. Gimple-level loop splitting should do it though. > > > Now suppose we split late, shouldn't we still assume that data-flow can > > change > > later. IOW, wouldn't we be required to use the literal/lituse counting that > > alpha does? > > If you split post-reload, data flow isn't going to change > in any significant way. > > > If yes then I guess it's still better to use MEM_EXPR. MEM_EXPR also has > > the > > benefit that it does not deem indirect calls as different when cross-jumping > > compares the insns. I don't know how important this is though. > > It depends on how much benefit you get from the direct > branch. On alpha it's quite a bit, so we work hard to > make sure that we can get one, if at all possible. Thanks, RTH. RichardS, Can you comment on what RTH is suggesting? Besides cross-jumping I haven't seen indirect PIC calls get optimized much, and it seems that splitting late will avoid the data-flow complications. I can experiment with this but it would be nice to get some early buy-in. BTW, I have the R_MIPS_JALR patch ready for submission but if we don't need to worry about data-flow changes then using MEM_EXPR is not necessary. Adam
Re: IRA undoing scheduling decisions
On Tue, 2009-09-01 at 16:46 -0400, Vladimir Makarov wrote: > Peter Bergner wrote: > > Were you going to whip that patch up or did you want me to? > > > I am going to do it by myself. Great! I'd like to see how your patch affects POWER6 performance. Do you have access to a POWER6 box? If not, can you send Pat and I the patch and we'll fire off a run on our POWER6 benchmark system. Thanks. Peter
Re: [lto] Reader-writer compatibility?
Diego Novillo wrote: On Tue, Sep 1, 2009 at 11:42, Ryan Mansfield wrote: Is it required that the same compiler that generated lto objects be used to read them? I've come across a couple ICEs with the current revision reading lto objects created by a slightly older version but same configuration. Is this simply invalid usage of my part? It's likely. How much drift between the two revisions? Can you recreate the ICE if you write and read with the exact same revision? If so, please file a bug. Please add version checking. gfortran's module files (extension .mod) that are generated from source files that contain MODULE ... END MODULE constructs *now* contain version information. I still get occasionally beaten by picking up modules from 4.3 that don't have this - you'll get all sorts of unintelligible error messages that just distract from what's really wrong. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
Re: Using MEM_EXPR inside a call expression
On 09/01/2009 12:48 PM, Adam Nemet wrote: I see. So I guess you're saying that there is little chance to optimize the loop I had in my previous email ;(. Not at the rtl level. Gimple-level loop splitting should do it though. Now suppose we split late, shouldn't we still assume that data-flow can change later. IOW, wouldn't we be required to use the literal/lituse counting that alpha does? If you split post-reload, data flow isn't going to change in any significant way. If yes then I guess it's still better to use MEM_EXPR. MEM_EXPR also has the benefit that it does not deem indirect calls as different when cross-jumping compares the insns. I don't know how important this is though. It depends on how much benefit you get from the direct branch. On alpha it's quite a bit, so we work hard to make sure that we can get one, if at all possible. r~
Re: IRA undoing scheduling decisions
Peter Bergner wrote: On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote: We could do update_equiv_regs in a separate pass before the 1st insn scheduling as it was before IRA. IIRC, update_equiv_regs() was always called as part of local-alloc, so it was always after sched1 even before IRA. That said, moving it to its own pass before sched1 sounds like an interesting idea. My patch from the other note basically didn't affect SPEC2000 at all, and we could use it, but if your idea works, I'm more than happy to dump my patch. :) Were you going to whip that patch up or did you want me to? I am going to do it by myself. Thanks for testing your patch, Peter.
Re: IRA undoing scheduling decisions
On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote: > We could do update_equiv_regs in a separate pass before the 1st insn > scheduling as it was before IRA. IIRC, update_equiv_regs() was always called as part of local-alloc, so it was always after sched1 even before IRA. That said, moving it to its own pass before sched1 sounds like an interesting idea. My patch from the other note basically didn't affect SPEC2000 at all, and we could use it, but if your idea works, I'm more than happy to dump my patch. :) Were you going to whip that patch up or did you want me to? Peter
Re: IRA undoing scheduling decisions
On Wed, 2009-08-26 at 17:12 -0500, Peter Bergner wrote: > On Wed, 2009-08-26 at 23:30 +0200, Richard Guenther wrote: > > Hmm. I suppose if you conditionalize it on flag_schedule_insns it might be > > an overall win. Care to SPEC test that change? > > I assume you mean like the change below? Yeah, I can SPEC test that. > > Peter > > > Index: ira.c > === > --- ira.c (revision 15) > +++ ira.c (working copy) > @@ -2510,6 +2510,8 @@ update_equiv_regs (void) >calls. */ > > if (REG_N_REFS (regno) == 2 > + && (!flag_schedule_insns > + || REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS) > && (rtx_equal_p (x, src) > || ! equiv_init_varies_p (src)) > && NONJUMP_INSN_P (insn) Pat ran the patch on SPEC2000 and it was very neutral. The overall SPECFP number didn't change and the SPECINT number only improved by 0.2%, which is pretty much in the noise. I think Vlad's suggestion of moving update_equiv_regs() to its own pass before sched1 sounds interesting. If that works, it's probably better than this patch. Peter
Re: Using MEM_EXPR inside a call expression
Richard Henderson writes: > On 08/28/2009 12:38 AM, Adam Nemet wrote: > > ... To assist the linker we need to annotate the indirect call > > with the function symbol. > > > > Since the call is expanded early... > > Having experimented with this on Alpha a few years back, > the only thing I can suggest is to not expand them early. > > I use a combination of peep2's and normal splitters to > determine if the post-call GP reload is needed, and to > expand the call itself. I see. So I guess you're saying that there is little chance to optimize the loop I had in my previous email ;(. Now suppose we split late, shouldn't we still assume that data-flow can change later. IOW, wouldn't we be required to use the literal/lituse counting that alpha does? If yes then I guess it's still better to use MEM_EXPR. MEM_EXPR also has the benefit that it does not deem indirect calls as different when cross-jumping compares the insns. I don't know how important this is though. Adam
GCC 4.4.2 Status Report (2009-09-01)
Status == The 4.4 branch is open for commits under the usual release branch rules. The timing of the 4.4.2 release (at least two months after the 4.4.1 release, so no sooner than September 22) at a point when there are no P1 regressions open for the branch) has yet to be determined. Quality Data Priority # Change from Last Report --- --- P14 + 3 P2 89 + 1 P31 - 1 --- --- Total94 + 3 Previous Report === http://gcc.gnu.org/ml/gcc/2009-08/msg00373.html The next report for 4.4.2 will be sent by Richard.
Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86
On Tue, Sep 1, 2009 at 12:31 PM, Ian Lance Taylor wrote: > Godmar Back writes: > >> It appears to me that '-mno-push-args' is the enabled by default (*), >> and not '-mpush-args'. > > The default varies by processor--it dependson the -mtune option. I don't know how to find out which tuning is enabled by default; I assume -mtune=generic? Would statements with respect to what "default" is apply to the "default" mtune setting? > >> Moreover, since -maccumulate-outgoing-args >> implies -mno-push-args, it appears that the only way to obtain >> 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a >> switch which the documentation doesn't even mention. > > That is likely true. > > If you want to send a patch for the docs, that would be great. > Whilst in general I am not opposed to this, and have contributed to many open source projects in the past, I feel that the documentation should be updated by someone who can actually vouch for the completeness and accuracy of what's written, which I definitely cannot. I also cannot verify the accuracy of the claims with respect to speeds of the two options. Moreover, these claims are made in a section of the documentation that applies to an entire architecture rather than a specific processor implementation. Perhaps they should simply be removed? I'm also uncertain where exactly the difference between accumulate-outgoing-args and push-args is. accumulate implies no-push-arg, and no-accumulate+push-arg is the traditional approach, but what does no-accumulate+no+push+arg look like and does it even make sense? It would also be great if '-mpush-args' without -mno-accumulate-outgoing-args would trigger a warning: Warning: -mpush-args ignored while -maccumulate-outgoing-args is in effect. - Godmar
Re: [lto] Reader-writer compatibility?
On Tue, Sep 1, 2009 at 14:32, Frank Ch. Eigler wrote: > Ryan Mansfield writes: > >> The objects were created with rev 15 and being read using 151271. >> No, I can't reproduce the ICE using the same version. >> Thanks for confirming this is not expected to work. > > Is it the intent that this work properly in the future? Yes. We likely want to maintain streamer compatibility within the same major release. I actually don't think we'll change the bytecode format too much. It will mostly depend on how much gimple changes in a single release. Clearly, we need better version drift detection. Diego.
Re: [lto] Reader-writer compatibility?
Ryan Mansfield writes: > The objects were created with rev 15 and being read using 151271. > No, I can't reproduce the ICE using the same version. > Thanks for confirming this is not expected to work. Is it the intent that this work properly in the future? It is not absurd to imagine that someone with a treeful of .o files might suffer an unexpected compiler upgrade before a later reuse/relink attempt. - FChE
Re: Replacing certain operations with function calls
I have looked at how other targest use the init_builtins/expand_builtins. Of course, I don't understand everything there but it seems indeed to be more for generating a series of instructions instead of a function call. I haven't seen anything resembling what I want to do. I had also first thought of going directly in the define_expand and expanding to the function call I would want. The problem I have is that it is unclear to me how to handle (set-up) the arguments of the builtin_function I am trying to define. To go from no function call to : - Potentially spill output registers - Potentially spill scratch registers - Setup output registers with the operands - Perform function call - Copy return to output operand - Potentially restore scratch registers - Potentially restore output registers Seems a bit difficult to do at the define_expand level and might not generate good code. I guess I could potentially perform a pass in the tree representation to do what I am looking for but I am not sure that that is the best solution either. For the moment, I will continue looking at what you suggest and also see if my solution works. I see that, for example, the compiler will not always generate the call I need to change. Therefore, it does seem that I need another solution than the one I propose. I'm more and more considering a pass in the middle-end to get what I need. Do you think this is better? Thanks for your input, Jc On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylor wrote: > Jean Christophe Beyler writes: > >> I have been also been looking into how to generate a function call for >> certain operations. I've looked at various other targets for a similar >> problem/solution but have not seen anything. On my target >> architecture, we have certain optimized versions of the multiplication >> for example. >> >> I wanted to replace certain mutliplications with a function call. The >> solution I found was to do perform a FAIL on the define_expand of the >> multiplication for these cases. This forces the compiler to generate a >> function call to __multdi3. >> >> I then go in the define_expand of the function call and check the >> symbol_ref to see what function is called. I can then modify the call >> at that point. >> >> My question is: is this a good approach or is there another solution >> that you would use? > > I think that what you describe will work. I would probably generate a > call to a builtin function in the define_expand. Look for the way > targets use init_builtins and expand_builtin. Normally expand_builtin > expands to some target-specific RTL, but it can expand to a function > call too. > > Ian >
Re: [lto] Reader-writer compatibility?
Diego Novillo wrote: On Tue, Sep 1, 2009 at 11:42, Ryan Mansfield wrote: Is it required that the same compiler that generated lto objects be used to read them? I've come across a couple ICEs with the current revision reading lto objects created by a slightly older version but same configuration. Is this simply invalid usage of my part? It's likely. How much drift between the two revisions? Can you recreate the ICE if you write and read with the exact same revision? If so, please file a bug. The objects were created with rev 15 and being read using 151271. No, I can't reproduce the ICE using the same version. Thanks for confirming this is not expected to work. Regards, Ryan Mansfield
Re: DI mode and endianess
On 08/19/2009 06:50 AM, Mohamed Shafi wrote: mov _h,d4 mov _h+4,d5 mov _j,d2 mov _j+4,d3 addd4,d2 adcd5,d3 irrespective of which endian it is. What could i be missing here? Should i add anything specific for this in the back-end? Given that the compiler is generating adc, I have to assume that you have an adddi3 pattern. At which point I have to assume that you're doing something wrong in there that's producing the little-endian sequence even for big-endian. r~
Re: Replacing certain operations with function calls
Jean Christophe Beyler writes: > I have been also been looking into how to generate a function call for > certain operations. I've looked at various other targets for a similar > problem/solution but have not seen anything. On my target > architecture, we have certain optimized versions of the multiplication > for example. > > I wanted to replace certain mutliplications with a function call. The > solution I found was to do perform a FAIL on the define_expand of the > multiplication for these cases. This forces the compiler to generate a > function call to __multdi3. > > I then go in the define_expand of the function call and check the > symbol_ref to see what function is called. I can then modify the call > at that point. > > My question is: is this a good approach or is there another solution > that you would use? I think that what you describe will work. I would probably generate a call to a builtin function in the define_expand. Look for the way targets use init_builtins and expand_builtin. Normally expand_builtin expands to some target-specific RTL, but it can expand to a function call too. Ian
Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86
Godmar Back writes: > It appears to me that '-mno-push-args' is the enabled by default (*), > and not '-mpush-args'. The default varies by processor--it dependson the -mtune option. > Moreover, since -maccumulate-outgoing-args > implies -mno-push-args, it appears that the only way to obtain > 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a > switch which the documentation doesn't even mention. That is likely true. If you want to send a patch for the docs, that would be great. Ian
Re: Using MEM_EXPR inside a call expression
On 08/28/2009 12:38 AM, Adam Nemet wrote: ... To assist the linker we need to annotate the indirect call with the function symbol. Since the call is expanded early... Having experimented with this on Alpha a few years back, the only thing I can suggest is to not expand them early. I use a combination of peep2's and normal splitters to determine if the post-call GP reload is needed, and to expand the call itself. r~
Re: [lto] Reader-writer compatibility?
On Tue, Sep 1, 2009 at 11:42, Ryan Mansfield wrote: > Is it required that the same compiler that generated lto objects be used to > read them? I've come across a couple ICEs with the current revision reading > lto objects created by a slightly older version but same configuration. Is > this simply invalid usage of my part? It's likely. How much drift between the two revisions? Can you recreate the ICE if you write and read with the exact same revision? If so, please file a bug. Diego.
[lto] Reader-writer compatibility?
Is it required that the same compiler that generated lto objects be used to read them? I've come across a couple ICEs with the current revision reading lto objects created by a slightly older version but same configuration. Is this simply invalid usage of my part? Regards, Ryan Mansfield
Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86
Minor correction to my previous email: On Tue, Sep 1, 2009 at 10:08 AM, Godmar Back wrote: > > gb...@setzer [39](~/tmp) > cat call.c > void caller(void) { > extern void callee(int); > callee(5); > } This: > gb...@setzer [40](~/tmp) > gcc -mno-push-args -S call.c should be '-mpush-args' as in: gb...@cyan [4](~/tmp) > gcc -S -mpush-args call.c gb...@cyan [5](~/tmp) > cat call.s .file "call.c" .text .globl caller .type caller, @function caller: pushl %ebp movl%esp, %ebp subl$8, %esp movl$5, (%esp) callcallee leave ret .size caller, .-caller .ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-44)" .section.note.GNU-stack,"",@progbits The point here is that '-mpush-args' is ineffective unless '-mno-accumulate-outgoing-args' is given, and that the documentation, in my opinion, may be misleading by a) not mentioning the -mno-accumulate-outgoing-args switch b) saying that '-mpush-args' is the default when it's an ineffective default (since the default -maccumulate-outgoing-args appears to override it) c) not mentioning that -maccumulate-outgoing-args is the default - in fact, the discussion in the section of push-args/no-push-args appears to imply that it shouldn't be the default. Thanks. - Godmar
Re: asm goto vs simulate_block
On 08/31/2009 05:06 PM, Richard Henderson wrote: The following patch appears to work for both. I'll commit it after a bootstrap and test cycle completes. Committed with one additional change, to prevent VRP from crashing. r~ (vrp_visit_stmt): Be prepared for non-interesting stmts. @@ -6087,7 +6090,9 @@ vrp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p) fprintf (dump_file, "\n"); } - if (is_gimple_assign (stmt) || is_gimple_call (stmt)) + if (!stmt_interesting_for_vrp (stmt)) +gcc_assert (stmt_ends_bb_p (stmt)); + else if (is_gimple_assign (stmt) || is_gimple_call (stmt)) { /* In general, assignments with virtual operands are not useful for deriving ranges, with the obvious exception of calls to
Re: Bit fields
On 08/31/2009 07:20 PM, Jean Christophe Beyler wrote: Ok, is it normal to see a ashift with a negative value though or is this already sign of a (potentially) different problem? I seem to recall that it's normal. Combine was originally written in the days of VAX, where negative shifts were allowed. You'll just want to reject them in your patterns. r~
Re: IRA undoing scheduling decisions
Peter Bergner wrote: On Mon, 2009-08-24 at 23:56 +, Charles J. Tabony wrote: I am seeing a performance regression on the port I maintain, and I would appreciate some pointers. When I compile the following code void f(int *x, int *y){ *x = 7; *y = 4; } with GCC 4.3.2, I get the desired sequence of instructions. I'll call it sequence A: r0 = 7 r1 = 4 [x] = r0 [y] = r1 When I compile the same code with GCC 4.4.0, I get a sequence that is lower performance for my target machine. I'll call it sequence B: r0 = 7 [x] = r0 r0 = 4 [y] = r0 This is caused by update_equiv_regs() which IRA inherited from local-alloc.c. Although with gcc 4.3 and earlier, you don't see the problem, it is still there, because if you look at the 4.3 dumps, you will see that update_equiv_regs() unordered them for us. What is saving us is that sched2 reschedules them again for us in the order we want. With 4.4, IRA happens to reuse the same register for both pseudos, so sched2 is hand tied and cannot schedule them back again for us. Peter, thanks for the investigation. We could do update_equiv_regs in a separate pass before the 1st insn scheduling as it was before IRA. I'll try this and see how will it work for mainstream targets (x86, ppc). Looking at update_equiv_regs(), if I disable the replacement for regs that are local to one basic block (patch below) like it existed before John Wehle's patch way back in Oct 2000: http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html then we get the ordering we want. Does anyone know why John removed that part of the test in his patch? Thoughts anyone? I have no idea. But if it works well, we could use it.
Replacing certain operations with function calls
Dear all, I have been also been looking into how to generate a function call for certain operations. I've looked at various other targets for a similar problem/solution but have not seen anything. On my target architecture, we have certain optimized versions of the multiplication for example. I wanted to replace certain mutliplications with a function call. The solution I found was to do perform a FAIL on the define_expand of the multiplication for these cases. This forces the compiler to generate a function call to __multdi3. I then go in the define_expand of the function call and check the symbol_ref to see what function is called. I can then modify the call at that point. My question is: is this a good approach or is there another solution that you would use? Thanks again for your time, Jean Christophe Beyler
question about -mpush-args -maccumulate-outgoing-args on gcc for x86
Hi, I'm using gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) for a x86 target. The info page says: `-mpush-args' `-mno-push-args' Use PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some cases disabling it may improve performance because of improved scheduling and reduced dependencies. `-maccumulate-outgoing-args' If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies `-mno-push-args'. This information is also found on http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html Is this information up-to-date? It appears to me that '-mno-push-args' is the enabled by default (*), and not '-mpush-args'. Moreover, since -maccumulate-outgoing-args implies -mno-push-args, it appears that the only way to obtain 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a switch which the documentation doesn't even mention. I have searched the mailing list archives and the only post I found was this one: http://gcc.gnu.org/ml/gcc/2005-01/msg00761.html which is at odds with the documentation above. Thanks. - Godmar (*) for instance, see: gb...@setzer [39](~/tmp) > cat call.c void caller(void) { extern void callee(int); callee(5); } gb...@setzer [40](~/tmp) > gcc -mno-push-args -S call.c gb...@setzer [41](~/tmp) > cat call.s .file "call.c" .text .globl caller .type caller, @function caller: pushl %ebp movl%esp, %ebp subl$8, %esp movl$5, (%esp) callcallee leave ret .size caller, .-caller .ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-44)" .section.note.GNU-stack,"",@progbits
Re: Why no strings in error messages?
On Wed, Aug 26, 2009 at 03:02:44PM -0400, Bradley Lucier wrote: > On Wed, 2009-08-26 at 20:38 +0200, Paolo Bonzini wrote: > > > > > When I worked at AMD, I was starting to suspect that it may be more > > > beneficial > > > to re-enable the first schedule insns pass if you were compiling in 64-bit > > > mode, since you have more registers available, and the new registers do > > > not > > > have hard wired uses, which in the past always meant a lot of spills > > > (also, the > > > default floating point unit is SSE instead of the x87 stack). I never got > > > around to testing this before AMD and I parted company. > > > > Unfortunately, hardwired use of %ecx for shifts is still enough to kill > > -fschedule-insns on AMD64. > > The AMD64 Architecture manual I found said that various combinations of > the RSI, RDI, and RCX registers are used implicitly by ten instructions > or prefixes, and RBX is used by XLAT, XLATB. So it appears that there > are 12 general-purpose registers available for allocation. XLATB is essentially useless (well maybe had some uses back in 16 bit days, when only a few registers could be used for addressing) and never generated by GCC. However %ebx is used for PIC addressing in 32 bit mode so it is not always free either (I don't know about PIE code). In 64 bit mode, PIC/PIE use PC relative addressing, so this gives you actually 9 more free registers than in 32 bit mode. However for some reason you glossed over the case of integer division which always use %edx and %eax. This is true even when dividing by a constant (non power of 2) in which case gcc will often use a widening multiply instead, whose results are in %edx:%eax, so it's almost a wash in terms of fixed register usage (not exactly, the divisions use %edx:%eax as dividends and need the divisor somewhere else, while the widening multiply use %eax as one input but %edx can be used for the other). (As a side note, %edx and %eax are also special with regard to I/O port accesses but this is only of interest in device drivers). > Are 12 registers not enough, in principle, to do scheduling before > register allocation? I don't know, but I would say that you have about 14 registers for address computations/indexing since you seem to be interested in FP code. I would think that it is sufficient for many inner loops (but not all, it really depends on the number of arrays that you access and the number of independant indexes that you have to keep). > I was getting a 15% speedup on some numerical > codes, as pre-scheduling spaced out the vector loads among the > floating-point computations. Well vector loads and floating point computations do not have anything to do with integer register choices. The 16 FP registers are nicely orthogonal (compared to the real nightmare that the x87 stack was). In practice you schedule on 16 FP registers and 14 (15 if you omit the frame pointer) addressing/indexing/counting registers. In this type of code there are typically very few instructions with fixed register constraints, and the less likely are the string instructions. Shifts of variable amount and integer divides are still possible, but unlikely. Gabriel
Re: Call for testers: MPC 0.7 prerelease tarball
Dave Korn wrote: > Attached allowed it to build, And with that patch: > === > All 45 tests passed > === cheers, DaveK
Re: Call for testers: MPC 0.7 prerelease tarball
Dave Korn wrote: > Fell at the first hurdle for me: > > gcc-4 -shared-libgcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D_FORTIFY_SOURCE=2 > -p > edantic -Wall -Wextra -Werror -O2 -pipe -MT inp_str.lo -MD -MP -MF > .deps/inp_str > .Tpo -c inp_str.c -DDLL_EXPORT -DPIC -o .libs/inp_str.o > cc1: warnings being treated as errors > inp_str.c: In function 'extract_string': > inp_str.c:113:10: error: array subscript has type 'char' > inp_str.c:114:10: error: array subscript has type 'char' > inp_str.c:115:10: error: array subscript has type 'char' > inp_str.c:118:13: error: array subscript has type 'char' > inp_str.c:119:13: error: array subscript has type 'char' > inp_str.c:120:13: error: array subscript has type 'char' > make[2]: *** [inp_str.lo] Error 1 > make[2]: *** Waiting for unfinished jobs Attached allowed it to build, and seems to be what the function was already doing for isspace earlier. Test results will follow. cheers, DaveK --- orig/mpc-0.7-dev/src/inp_str.c 2009-08-26 21:24:41.0 +0100 +++ mpc-0.7-dev/src/inp_str.c 2009-09-01 12:17:04.546875000 +0100 @@ -110,14 +110,14 @@ extract_string (FILE *stream) /* (n-char-sequence) only after a NaN */ if ((nread != 3 - || tolower (str[0]) != 'n' - || tolower (str[1]) != 'a' - || tolower (str[2]) != 'n') + || tolower ((unsigned char) str[0]) != 'n' + || tolower ((unsigned char) str[1]) != 'a' + || tolower ((unsigned char) str[2]) != 'n') && (nread != 5 || str[0] != '@' -|| tolower (str[1]) != 'n' -|| tolower (str[2]) != 'a' -|| tolower (str[3]) != 'n' +|| tolower ((unsigned char) str[1]) != 'n' +|| tolower ((unsigned char) str[2]) != 'a' +|| tolower ((unsigned char) str[3]) != 'n' || str[4] != '@')) { ungetc (c, stream); return str;
Re: Call for testers: MPC 0.7 prerelease tarball
Kaveh R. GHAZI wrote: > Hello, > > A prerelease tarball of the upcoming MPC 0.7 is available here: > http://www.multiprecision.org/mpc/download/mpc-0.7-dev.tar.gz > > Please help test it for portability and bugs by downloading and compiling > it on systems you have access to. Fell at the first hurdle for me: gcc-4 -shared-libgcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D_FORTIFY_SOURCE=2 -p edantic -Wall -Wextra -Werror -O2 -pipe -MT inp_str.lo -MD -MP -MF .deps/inp_str .Tpo -c inp_str.c -DDLL_EXPORT -DPIC -o .libs/inp_str.o cc1: warnings being treated as errors inp_str.c: In function 'extract_string': inp_str.c:113:10: error: array subscript has type 'char' inp_str.c:114:10: error: array subscript has type 'char' inp_str.c:115:10: error: array subscript has type 'char' inp_str.c:118:13: error: array subscript has type 'char' inp_str.c:119:13: error: array subscript has type 'char' inp_str.c:120:13: error: array subscript has type 'char' make[2]: *** [inp_str.lo] Error 1 make[2]: *** Waiting for unfinished jobs > I'd like a report to contain your > target triplet and the versions of your compiler, GMP and MPFR used when > building MPC. $ /gnu/gcc/gcc/config.guess i686-pc-cygwin $ gcc-4 -v Using built-in specs. Target: i686-pc-cygwin Configured with: /gnu/gcc/gcc-patched/configure --prefix=/opt/gcc-tools -v --with-gmp=/usr --with-mpfr=/usr --enable-bootstrap --enable-version-specific-runtime-libs --enable-static --enable-shared --enable-shared-libgcc --disable-__cxa_atexit --with-gnu-ld --with-gnu-as --with-dwarf2 --disable-sjlj-exceptions --disable-symvers --disable-libjava --disable-interpreter --program-suffix=-4 --disable-libgomp --enable-libssp --enable-libada --enable-threads=posix --with-arch=i686 --with-tune=generic CC=gcc-4 CXX=g++-4 CC_FOR_TARGET=gcc-4 CXX_FOR_TARGET=g++-4 --with-ecj-jar=/usr/share/java/ecj.jar LD=/opt/gcc-tools/bin/ld.exe LD_FOR_TARGET=/opt/gcc-tools/bin/ld.exe AS=/opt/gcc-tools/bin/as.exe AS_FOR_TARGET=/opt/gcc-tools/bin/as.exe --disable-win32-registry --disable-libgcj-debug --enable-languages=c,c++,ada Thread model: posix gcc version 4.5.0 20090730 (experimental) (GCC) $ cygcheck -c gmp mpfr libgmp3 libmpfr1 Cygwin Package Information Package VersionStatus gmp 4.3.1-3OK libgmp3 4.3.1-3OK libmpfr1 2.4.1-4OK mpfr 2.4.1-4OK $ BTW, I configured mpc with "--prefix=/usr --disable-static --enable-shared" (after first receiving "configure: error: gmp.h is a DLL: use --disable-static --enable-shared" when I tried with just --prefix). > Also please include your results from "make check". N/A ! cheers, DaveK
Trunk frozen for VTA merge
Subject says it all, I guess. Jakub