Re: [gcc-in-cxx] replacing qsort with std::sort

2009-09-01 Thread Michael Matz
Hi,

On Mon, 31 Aug 2009, Pedro Lamarão wrote:

> 2009/8/28 Pedro Lamarão :
> 
> > I have not yet made complete size and execution speed measurements, though.
> > I've run the test suite and there are some failures; I think many of
> > them are not regressions when compared with a pure build with C++.
> 
> Comparing trunk -r151160  and  trunk -t151160 --enable-build-with-cxx
> + patches, these are the sizes of xgcc and g++ before strip:
> 
> [psi...@joana obj]$ ls -lh gcc/xgcc gcc/g++
> -rwxrwxr-x. 1 psilva psilva 481K Ago 31 12:58 gcc/g++
> -rwxrwxr-x. 1 psilva psilva 477K Ago 31 12:58 gcc/xgcc

That's not the real compiler, only the compiler driver.  Look for files 
named cc1 (the C compiler) and cc1plus (the C++ compiler)  :-)


Ciao,
Michael.

gcc-4.4-20090901 is now available

2009-09-01 Thread gccadmin
Snapshot gcc-4.4-20090901 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20090901/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 151295

You'll find:

gcc-4.4-20090901.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20090901.tar.bz2 C front end and core compiler

gcc-ada-4.4-20090901.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20090901.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20090901.tar.bz2  C++ front end and runtime

gcc-java-4.4-20090901.tar.bz2 Java front end and runtime

gcc-objc-4.4-20090901.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20090901.tar.bz2The GCC testsuite

Diffs from 4.4-20090825 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Replacing certain operations with function calls

2009-09-01 Thread Ian Lance Taylor
Jean Christophe Beyler  writes:

> In regard to what you said, do you mean I should build the tree before
> the expand pass, by writing a new pass that will work on the trees
> instead of rtx?

Oh, sorry, I'm an idiot.  I forgot that you only have RTL at this point.

I would go with what you wrote and see what happens.

Ian


Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
Finally, I guess the one thing I can do is simply generate
pseudo_registers and copy all my registers into the pseudos before the
call I make.

Then I do my expand like I showed above.

And finally, move everything back.

Later passes will remove anything that was not needed, anything that
was will be kept. This could be a solution to the second issue but
I'll wait to understand what you meant first.

Jc

On Tue, Sep 1, 2009 at 6:35 PM, Jean Christophe
Beyler wrote:
> I don't think I quite understand what you're meaning. I want to use
> the standard ABI, basically I want to transform certain operations
> into function calls.
>
> In regard to what you said, do you mean I should build the tree before
> the expand pass, by writing a new pass that will work on the trees
> instead of rtx?
>
> Otherwise, I fail to see how that is different to what I'm already
> doing. Would you have an example?
>
> Thanks,
> Jc
>
> PS: Although when I look at what GCC generates at the expand stage, it
> really does seem that he first generates the calculation of the
> parameters in pseudo-registers and then moves them to the actual
> output registers. It's the next phases that will combine the two to
> save a move.
>
> On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylor wrote:
>> Jean Christophe Beyler  writes:
>>
>>> First off: does this seem correct?
>>
>> Awkward though it is, it may be more reliable to build a small tree here
>> and pass it to expand_call.  This assumes that you want to use the
>> standard ABI when calling this function.
>>
>> Then your second issue would go away.
>>
>> Ian
>>
>


Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
I don't think I quite understand what you're meaning. I want to use
the standard ABI, basically I want to transform certain operations
into function calls.

In regard to what you said, do you mean I should build the tree before
the expand pass, by writing a new pass that will work on the trees
instead of rtx?

Otherwise, I fail to see how that is different to what I'm already
doing. Would you have an example?

Thanks,
Jc

PS: Although when I look at what GCC generates at the expand stage, it
really does seem that he first generates the calculation of the
parameters in pseudo-registers and then moves them to the actual
output registers. It's the next phases that will combine the two to
save a move.

On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylor wrote:
> Jean Christophe Beyler  writes:
>
>> First off: does this seem correct?
>
> Awkward though it is, it may be more reliable to build a small tree here
> and pass it to expand_call.  This assumes that you want to use the
> standard ABI when calling this function.
>
> Then your second issue would go away.
>
> Ian
>


Re: Replacing certain operations with function calls

2009-09-01 Thread Ian Lance Taylor
Jean Christophe Beyler  writes:

> First off: does this seem correct?

Awkward though it is, it may be more reliable to build a small tree here
and pass it to expand_call.  This assumes that you want to use the
standard ABI when calling this function.

Then your second issue would go away.

Ian


Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
Actually, what I've done is probably something in between what you
were suggesting and what I was initially doing. If we consider the
multiplication, I've modified the define_expand for example to:

(define_expand "muldi3"
  [(set (match_operand:DI 0 "register_operand" "")
(mult:DI (match_operand:DI 1 "register_operand" "")
 (match_operand:DI 2 "register_operand" "")))]
 ""
 "
 {
emit_function_call_2args (DImode, DImode, DImode,
\"my_version_of_mull\", operands[0], operands[1], operands[2]);
DONE;
 }")

and my emit function is:

void
emit_function_call_2args (
enum machine_mode return_mode,
enum machine_mode arg1_mode,
enum machine_mode arg2_mode,
const char *fname,
rtx op0,
rtx op1,
rtx op2)
{
 tree id;
 rtx insn;

 /* Move arguments */
 emit_move_insn (gen_rtx_REG (arg1_mode, GP_ARG_FIRST), op1);
 emit_move_insn (gen_rtx_REG (arg2_mode, GP_ARG_FIRST + 1), op2);

 /* Get name */
 id = get_identifier (fname);

 /* Generate call value */
 insn = gen_call_value (
gen_rtx_REG (return_mode, 6),
gen_rtx_MEM (DImode,
   gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id))),
GEN_INT (64),
NULL
);

 /* Annotate the call to say we are using both argument registers */
 use_reg (&CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG
(arg1_mode, GP_ARG_FIRST));
 use_reg (&CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG
(arg1_mode, GP_ARG_FIRST + 1));

 /* Emit call */
 emit_call_insn (insn);

 /* Set back return to op0 */
 emit_move_insn (op0, gen_rtx_REG (return_mode, GP_RETURN));
}

First off: does this seem correct?

Second, I have a bit of a worry in the case where, if we consider this C code :

bar (a * b, c * d);

it is possible that the compiler would have normally generated this :

mult output1, a, b
mult output2, c, d
call bar

Which would be problematic for my expand system since this would expand into:

mov output1, a
mov output2, b
call internal_mult
mov output1, return_reg

mov output1, c   #Rewriting on output1...
mov output2, d
call internal_mult
mov output2, return_reg

call bar


However, I am unsure this is possible in the expand stage, would the
expand stage automatically have this instead:

mult tmp1, a, b
mult tmp2, c, d

mov output1, tmp1
mov output2, tmp2
call bar

in which case, I know I can do what I am currently doing.

Thanks again for your help and I apologize for these basic questions...
Jc


On Tue, Sep 1, 2009 at 2:30 PM, Jean Christophe
Beyler wrote:
> I have looked at how other targest use the
> init_builtins/expand_builtins. Of course, I don't understand
> everything there but it seems indeed to be more for generating a
> series of instructions instead of a function call. I haven't seen
> anything resembling what I want to do.
>
> I had also first thought of going directly in the define_expand and
> expanding to the function call I would want. The problem I have is
> that it is unclear to me how to handle (set-up) the arguments of the
> builtin_function I am trying to define.
>
> To go from no function call to :
>
> - Potentially spill output registers
> - Potentially spill scratch registers
>
> - Setup output registers with the operands
> - Perform function call
> - Copy return to output operand
>
> - Potentially restore scratch registers
> - Potentially restore output registers
>
> Seems a bit difficult to do at the define_expand level and might not
> generate good code. I guess I could potentially perform a pass in the
> tree representation to do what I am looking for but I am not sure that
> that is the best solution either.
>
> For the moment, I will continue looking at what you suggest and also
> see if my solution works. I see that, for example, the compiler will
> not always generate the call I need to change. Therefore, it does seem
> that I need another solution than the one I propose.
>
> I'm more and more considering a pass in the middle-end to get what I
> need. Do you think this is better?
>
> Thanks for your input,
> Jc
>
> On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylor wrote:
>> Jean Christophe Beyler  writes:
>>
>>> I have been also been looking into how to generate a function call for
>>> certain operations. I've looked at various other targets for a similar
>>> problem/solution but have not seen anything. On my target
>>> architecture, we have certain optimized versions of the multiplication
>>> for example.
>>>
>>> I wanted to replace certain mutliplications with a function call. The
>>> solution I found was to do perform a FAIL on the define_expand of the
>>> multiplication for these cases. This forces the comp

Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Adam Nemet
Richard Henderson writes:
> On 09/01/2009 12:48 PM, Adam Nemet wrote:
> > I see.  So I guess you're saying that there is little chance to optimize the
> > loop I had in my previous email ;(.
> 
> Not at the rtl level.  Gimple-level loop splitting should do it though.
> 
> > Now suppose we split late, shouldn't we still assume that data-flow can 
> > change
> > later.  IOW, wouldn't we be required to use the literal/lituse counting that
> > alpha does?
> 
> If you split post-reload, data flow isn't going to change
> in any significant way.
> 
> > If yes then I guess it's still better to use MEM_EXPR.  MEM_EXPR also has 
> > the
> > benefit that it does not deem indirect calls as different when cross-jumping
> > compares the insns.  I don't know how important this is though.
> 
> It depends on how much benefit you get from the direct
> branch.  On alpha it's quite a bit, so we work hard to
> make sure that we can get one, if at all possible.

Thanks, RTH.

RichardS,

Can you comment on what RTH is suggesting?  Besides cross-jumping I haven't
seen indirect PIC calls get optimized much, and it seems that splitting late
will avoid the data-flow complications.

I can experiment with this but it would be nice to get some early buy-in.

BTW, I have the R_MIPS_JALR patch ready for submission but if we don't need to
worry about data-flow changes then using MEM_EXPR is not necessary.

Adam


Re: IRA undoing scheduling decisions

2009-09-01 Thread Peter Bergner
On Tue, 2009-09-01 at 16:46 -0400, Vladimir Makarov wrote:
> Peter Bergner wrote:
> > Were you going to whip that patch up or did you want me to?
> >
> I am going to do it by myself.

Great!  I'd like to see how your patch affects POWER6 performance.
Do you have access to a POWER6 box?  If not, can you send Pat and I
the patch and we'll fire off a run on our POWER6 benchmark system.
Thanks.

Peter





Re: [lto] Reader-writer compatibility?

2009-09-01 Thread Toon Moene

Diego Novillo wrote:


On Tue, Sep 1, 2009 at 11:42, Ryan Mansfield wrote:



Is it required that the same compiler that generated lto objects be used to
read them? I've come across a couple ICEs with the current revision reading
lto objects created by a slightly older version  but same configuration. Is
this simply invalid usage of my part?


It's likely.  How much drift between the two revisions?  Can you
recreate the ICE if you write and read with the exact same revision?
If so, please file a bug.


Please add version checking.  gfortran's module files (extension .mod) 
that are generated from source files that contain MODULE ... END MODULE 
constructs *now* contain version information.


I still get occasionally beaten by picking up modules from 4.3 that 
don't have this - you'll get all sorts of unintelligible error messages 
that just distract from what's really wrong.


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html


Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Richard Henderson

On 09/01/2009 12:48 PM, Adam Nemet wrote:

I see.  So I guess you're saying that there is little chance to optimize the
loop I had in my previous email ;(.


Not at the rtl level.  Gimple-level loop splitting should do it though.


Now suppose we split late, shouldn't we still assume that data-flow can change
later.  IOW, wouldn't we be required to use the literal/lituse counting that
alpha does?


If you split post-reload, data flow isn't going to change
in any significant way.


If yes then I guess it's still better to use MEM_EXPR.  MEM_EXPR also has the
benefit that it does not deem indirect calls as different when cross-jumping
compares the insns.  I don't know how important this is though.


It depends on how much benefit you get from the direct
branch.  On alpha it's quite a bit, so we work hard to
make sure that we can get one, if at all possible.


r~


Re: IRA undoing scheduling decisions

2009-09-01 Thread Vladimir Makarov

Peter Bergner wrote:

On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote:
  
We could do update_equiv_regs in a separate pass before the 1st insn 
scheduling as it was before IRA.



IIRC, update_equiv_regs() was always called as part of local-alloc,
so it was always after sched1 even before IRA.  That said, moving it
to its own pass before sched1 sounds like an interesting idea.
My patch from the other note basically didn't affect SPEC2000 at all,
and we could use it, but if your idea works, I'm more than happy to
dump my patch. :)

Were you going to whip that patch up or did you want me to?


  

I am going to do it by myself.  Thanks for testing your patch, Peter.






Re: IRA undoing scheduling decisions

2009-09-01 Thread Peter Bergner
On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote:
> We could do update_equiv_regs in a separate pass before the 1st insn 
> scheduling as it was before IRA.

IIRC, update_equiv_regs() was always called as part of local-alloc,
so it was always after sched1 even before IRA.  That said, moving it
to its own pass before sched1 sounds like an interesting idea.
My patch from the other note basically didn't affect SPEC2000 at all,
and we could use it, but if your idea works, I'm more than happy to
dump my patch. :)

Were you going to whip that patch up or did you want me to?

Peter





Re: IRA undoing scheduling decisions

2009-09-01 Thread Peter Bergner
On Wed, 2009-08-26 at 17:12 -0500, Peter Bergner wrote:
> On Wed, 2009-08-26 at 23:30 +0200, Richard Guenther wrote:
> > Hmm.  I suppose if you conditionalize it on flag_schedule_insns it might be
> > an overall win.  Care to SPEC test that change?
> 
> I assume you mean like the change below?  Yeah, I can SPEC test that.
> 
> Peter
> 
> 
> Index: ira.c
> ===
> --- ira.c (revision 15)
> +++ ira.c (working copy)
> @@ -2510,6 +2510,8 @@ update_equiv_regs (void)
>calls.  */
> 
> if (REG_N_REFS (regno) == 2
> +   && (!flag_schedule_insns
> +   || REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS)
> && (rtx_equal_p (x, src)
> || ! equiv_init_varies_p (src))
> && NONJUMP_INSN_P (insn)

Pat ran the patch on SPEC2000 and it was very neutral.  The overall
SPECFP number didn't change and the SPECINT number only improved by
0.2%, which is pretty much in the noise.

I think Vlad's suggestion of moving update_equiv_regs() to its own pass
before sched1 sounds interesting.  If that works, it's probably better
than this patch.

Peter





Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Adam Nemet
Richard Henderson writes:
> On 08/28/2009 12:38 AM, Adam Nemet wrote:
> > ... To assist the linker we need to annotate the indirect call
> > with the function symbol.
> >
> > Since the call is expanded early...
> 
> Having experimented with this on Alpha a few years back,
> the only thing I can suggest is to not expand them early.
> 
> I use a combination of peep2's and normal splitters to
> determine if the post-call GP reload is needed, and to
> expand the call itself.

I see.  So I guess you're saying that there is little chance to optimize the
loop I had in my previous email ;(.

Now suppose we split late, shouldn't we still assume that data-flow can change
later.  IOW, wouldn't we be required to use the literal/lituse counting that
alpha does?

If yes then I guess it's still better to use MEM_EXPR.  MEM_EXPR also has the
benefit that it does not deem indirect calls as different when cross-jumping
compares the insns.  I don't know how important this is though.

Adam


GCC 4.4.2 Status Report (2009-09-01)

2009-09-01 Thread Mark Mitchell

Status
==

The 4.4 branch is open for commits under the usual release branch
rules.

The timing of the 4.4.2 release (at least two months after the 4.4.1
release, so no sooner than September 22) at a point when there are no
P1 regressions open for the branch) has yet to be determined.

Quality Data


Priority  # Change from Last Report
--- ---
P14 + 3
P2   89 + 1
P31 - 1
--- ---
Total94 + 3

Previous Report
===

http://gcc.gnu.org/ml/gcc/2009-08/msg00373.html

The next report for 4.4.2 will be sent by Richard.


Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86

2009-09-01 Thread Godmar Back
On Tue, Sep 1, 2009 at 12:31 PM, Ian Lance Taylor wrote:
> Godmar Back  writes:
>
>> It appears to me that '-mno-push-args' is the enabled by default (*),
>> and not '-mpush-args'.
>
> The default varies by processor--it dependson the -mtune option.

I don't know how to find out which tuning is enabled by default; I
assume -mtune=generic?
Would statements with respect to what "default" is apply to the
"default" mtune setting?

>
>> Moreover, since -maccumulate-outgoing-args
>> implies -mno-push-args, it appears that the only way to obtain
>> 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a
>> switch which the documentation doesn't even mention.
>
> That is likely true.
>
> If you want to send a patch for the docs, that would be great.
>

Whilst in general I am not opposed to this, and have contributed to
many open source projects in the past, I feel that the documentation
should be updated by someone who can actually vouch for the
completeness and accuracy of what's written, which I definitely
cannot. I also cannot verify the accuracy of the claims with respect
to speeds of the two options. Moreover, these claims are made in a
section of the documentation that applies to an entire architecture
rather than a specific processor implementation. Perhaps they should
simply be removed?

I'm also uncertain where exactly the difference between
accumulate-outgoing-args and push-args is.
accumulate implies no-push-arg, and no-accumulate+push-arg is the
traditional approach, but what does no-accumulate+no+push+arg look
like and does it even make sense?

It would also be great if '-mpush-args' without
-mno-accumulate-outgoing-args would trigger a warning:
Warning: -mpush-args ignored while -maccumulate-outgoing-args is in effect.

 - Godmar


Re: [lto] Reader-writer compatibility?

2009-09-01 Thread Diego Novillo
On Tue, Sep 1, 2009 at 14:32, Frank Ch. Eigler wrote:
> Ryan Mansfield  writes:
>
>> The objects were created with rev 15 and being read using 151271.
>> No, I can't reproduce the ICE using the same version.
>> Thanks for confirming this is not expected to work.
>
> Is it the intent that this work properly in the future?

Yes.  We likely want to maintain streamer compatibility within the
same major release.  I actually don't think we'll change the bytecode
format too much.  It will mostly depend on how much gimple changes in
a single release.

Clearly, we need better version drift detection.

Diego.


Re: [lto] Reader-writer compatibility?

2009-09-01 Thread Frank Ch. Eigler
Ryan Mansfield  writes:

> The objects were created with rev 15 and being read using 151271.
> No, I can't reproduce the ICE using the same version.
> Thanks for confirming this is not expected to work.

Is it the intent that this work properly in the future?  It is not
absurd to imagine that someone with a treeful of .o files might suffer
an unexpected compiler upgrade before a later reuse/relink attempt.

- FChE


Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
I have looked at how other targest use the
init_builtins/expand_builtins. Of course, I don't understand
everything there but it seems indeed to be more for generating a
series of instructions instead of a function call. I haven't seen
anything resembling what I want to do.

I had also first thought of going directly in the define_expand and
expanding to the function call I would want. The problem I have is
that it is unclear to me how to handle (set-up) the arguments of the
builtin_function I am trying to define.

To go from no function call to :

- Potentially spill output registers
- Potentially spill scratch registers

- Setup output registers with the operands
- Perform function call
- Copy return to output operand

- Potentially restore scratch registers
- Potentially restore output registers

Seems a bit difficult to do at the define_expand level and might not
generate good code. I guess I could potentially perform a pass in the
tree representation to do what I am looking for but I am not sure that
that is the best solution either.

For the moment, I will continue looking at what you suggest and also
see if my solution works. I see that, for example, the compiler will
not always generate the call I need to change. Therefore, it does seem
that I need another solution than the one I propose.

I'm more and more considering a pass in the middle-end to get what I
need. Do you think this is better?

Thanks for your input,
Jc

On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylor wrote:
> Jean Christophe Beyler  writes:
>
>> I have been also been looking into how to generate a function call for
>> certain operations. I've looked at various other targets for a similar
>> problem/solution but have not seen anything. On my target
>> architecture, we have certain optimized versions of the multiplication
>> for example.
>>
>> I wanted to replace certain mutliplications with a function call. The
>> solution I found was to do perform a FAIL on the define_expand of the
>> multiplication for these cases. This forces the compiler to generate a
>> function call to __multdi3.
>>
>> I then go in the define_expand of the function call and check the
>> symbol_ref to see what function is called. I can then modify the call
>> at that point.
>>
>> My question is: is this a good approach or is there another solution
>> that you would use?
>
> I think that what you describe will work.  I would probably generate a
> call to a builtin function in the define_expand.  Look for the way
> targets use init_builtins and expand_builtin.  Normally expand_builtin
> expands to some target-specific RTL, but it can expand to a function
> call too.
>
> Ian
>


Re: [lto] Reader-writer compatibility?

2009-09-01 Thread Ryan Mansfield

Diego Novillo wrote:

On Tue, Sep 1, 2009 at 11:42, Ryan Mansfield wrote:

Is it required that the same compiler that generated lto objects be used to
read them? I've come across a couple ICEs with the current revision reading
lto objects created by a slightly older version  but same configuration. Is
this simply invalid usage of my part?


It's likely.  How much drift between the two revisions?  Can you
recreate the ICE if you write and read with the exact same revision?
If so, please file a bug.


The objects were created with rev 15 and being read using 151271.
No, I can't reproduce the ICE using the same version.

Thanks for confirming this is not expected to work.

Regards,

Ryan Mansfield




Re: DI mode and endianess

2009-09-01 Thread Richard Henderson

On 08/19/2009 06:50 AM, Mohamed Shafi wrote:

 mov  _h,d4
 mov  _h+4,d5
 mov  _j,d2
 mov  _j+4,d3
 addd4,d2
 adcd5,d3

irrespective of which endian it is.
What could i be missing here? Should i add anything specific for this
in the back-end?


Given that the compiler is generating adc, I have to
assume that you have an adddi3 pattern.  At which point
I have to assume that you're doing something wrong in
there that's producing the little-endian sequence even
for big-endian.


r~


Re: Replacing certain operations with function calls

2009-09-01 Thread Ian Lance Taylor
Jean Christophe Beyler  writes:

> I have been also been looking into how to generate a function call for
> certain operations. I've looked at various other targets for a similar
> problem/solution but have not seen anything. On my target
> architecture, we have certain optimized versions of the multiplication
> for example.
>
> I wanted to replace certain mutliplications with a function call. The
> solution I found was to do perform a FAIL on the define_expand of the
> multiplication for these cases. This forces the compiler to generate a
> function call to __multdi3.
>
> I then go in the define_expand of the function call and check the
> symbol_ref to see what function is called. I can then modify the call
> at that point.
>
> My question is: is this a good approach or is there another solution
> that you would use?

I think that what you describe will work.  I would probably generate a
call to a builtin function in the define_expand.  Look for the way
targets use init_builtins and expand_builtin.  Normally expand_builtin
expands to some target-specific RTL, but it can expand to a function
call too.

Ian


Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86

2009-09-01 Thread Ian Lance Taylor
Godmar Back  writes:

> It appears to me that '-mno-push-args' is the enabled by default (*),
> and not '-mpush-args'.

The default varies by processor--it dependson the -mtune option.

> Moreover, since -maccumulate-outgoing-args
> implies -mno-push-args, it appears that the only way to obtain
> 'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a
> switch which the documentation doesn't even mention.

That is likely true.

If you want to send a patch for the docs, that would be great.

Ian


Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Richard Henderson

On 08/28/2009 12:38 AM, Adam Nemet wrote:

... To assist the linker we need to annotate the indirect call
with the function symbol.

Since the call is expanded early...


Having experimented with this on Alpha a few years back,
the only thing I can suggest is to not expand them early.

I use a combination of peep2's and normal splitters to
determine if the post-call GP reload is needed, and to
expand the call itself.


r~


Re: [lto] Reader-writer compatibility?

2009-09-01 Thread Diego Novillo
On Tue, Sep 1, 2009 at 11:42, Ryan Mansfield wrote:
> Is it required that the same compiler that generated lto objects be used to
> read them? I've come across a couple ICEs with the current revision reading
> lto objects created by a slightly older version  but same configuration. Is
> this simply invalid usage of my part?

It's likely.  How much drift between the two revisions?  Can you
recreate the ICE if you write and read with the exact same revision?
If so, please file a bug.


Diego.


[lto] Reader-writer compatibility?

2009-09-01 Thread Ryan Mansfield
Is it required that the same compiler that generated lto objects be used 
to read them? I've come across a couple ICEs with the current revision 
reading lto objects created by a slightly older version  but same 
configuration. Is this simply invalid usage of my part?


Regards,

Ryan Mansfield


Re: question about -mpush-args -maccumulate-outgoing-args on gcc for x86

2009-09-01 Thread Godmar Back
Minor correction to my previous email:

On Tue, Sep 1, 2009 at 10:08 AM, Godmar Back wrote:
>
> gb...@setzer [39](~/tmp) > cat call.c
> void caller(void) {
>    extern void callee(int);
>    callee(5);
> }

This:

> gb...@setzer [40](~/tmp) > gcc -mno-push-args -S call.c

should be '-mpush-args' as in:

gb...@cyan [4](~/tmp) > gcc -S -mpush-args call.c
gb...@cyan [5](~/tmp) > cat call.s
.file   "call.c"
.text
.globl caller
.type   caller, @function
caller:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
movl$5, (%esp)
callcallee
leave
ret
.size   caller, .-caller
.ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-44)"
.section.note.GNU-stack,"",@progbits

The point here is that '-mpush-args' is ineffective unless
'-mno-accumulate-outgoing-args' is given, and that the documentation,
in my opinion, may be misleading by

a) not mentioning the -mno-accumulate-outgoing-args switch

b) saying that '-mpush-args' is the default when it's an ineffective
default (since the default -maccumulate-outgoing-args appears to
override it)

c) not mentioning that -maccumulate-outgoing-args is the default - in
fact, the discussion in the section of push-args/no-push-args appears
to imply that it shouldn't be the default.

Thanks.

 - Godmar


Re: asm goto vs simulate_block

2009-09-01 Thread Richard Henderson

On 08/31/2009 05:06 PM, Richard Henderson wrote:

The following patch appears to work for both. I'll commit
it after a bootstrap and test cycle completes.


Committed with one additional change, to prevent VRP
from crashing.


r~


 (vrp_visit_stmt): Be prepared for non-interesting stmts.


 @@ -6087,7 +6090,9 @@ vrp_visit_stmt (gimple stmt, edge *taken_edge_p, 
tree *output_p)

fprintf (dump_file, "\n");
  }

 -  if (is_gimple_assign (stmt) || is_gimple_call (stmt))
 +  if (!stmt_interesting_for_vrp (stmt))
 +gcc_assert (stmt_ends_bb_p (stmt));
 +  else if (is_gimple_assign (stmt) || is_gimple_call (stmt))
  {
/* In general, assignments with virtual operands are not useful
  for deriving ranges, with the obvious exception of calls to



Re: Bit fields

2009-09-01 Thread Richard Henderson

On 08/31/2009 07:20 PM, Jean Christophe Beyler wrote:

Ok, is it normal to see a ashift with a negative value though or is
this already sign of a (potentially) different problem?


I seem to recall that it's normal.  Combine was originally
written in the days of VAX, where negative shifts were allowed.
You'll just want to reject them in your patterns.


r~


Re: IRA undoing scheduling decisions

2009-09-01 Thread Vladimir Makarov

Peter Bergner wrote:

On Mon, 2009-08-24 at 23:56 +, Charles J. Tabony wrote:
  

I am seeing a performance regression on the port I maintain, and I would 
appreciate some pointers.

When I compile the following code

void f(int *x, int *y){
  *x = 7;
  *y = 4;
}

with GCC 4.3.2, I get the desired sequence of instructions.  I'll call it 
sequence A:

r0 = 7
r1 = 4
[x] = r0
[y] = r1

When I compile the same code with GCC 4.4.0, I get a sequence that is lower 
performance for my target machine.  I'll call it sequence B:

r0 = 7
[x] = r0
r0 = 4
[y] = r0



This is caused by update_equiv_regs() which IRA inherited from local-alloc.c.
Although with gcc 4.3 and earlier, you don't see the problem, it is still there,
because if you look at the 4.3 dumps, you will see that update_equiv_regs()
unordered them for us.  What is saving us is that sched2 reschedules them
again for us in the order we want.  With 4.4, IRA happens to reuse the same
register for both pseudos, so sched2 is hand tied and cannot schedule them
back again for us.

  

Peter, thanks for the investigation.

We could do update_equiv_regs in a separate pass before the 1st insn 
scheduling as it was before IRA.


I'll try this and see how will it work for mainstream targets (x86, ppc).

Looking at update_equiv_regs(), if I disable the replacement for regs
that are local to one basic block (patch below) like it existed before
John Wehle's patch way back in Oct 2000:

  http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html

then we get the ordering we want.  Does anyone know why John removed
that part of the test in his patch?  Thoughts anyone?

  

I have no idea.  But if it works well, we could use it.




Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
Dear all,

I have been also been looking into how to generate a function call for
certain operations. I've looked at various other targets for a similar
problem/solution but have not seen anything. On my target
architecture, we have certain optimized versions of the multiplication
for example.

I wanted to replace certain mutliplications with a function call. The
solution I found was to do perform a FAIL on the define_expand of the
multiplication for these cases. This forces the compiler to generate a
function call to __multdi3.

I then go in the define_expand of the function call and check the
symbol_ref to see what function is called. I can then modify the call
at that point.

My question is: is this a good approach or is there another solution
that you would use?

Thanks again for your time,
Jean Christophe Beyler


question about -mpush-args -maccumulate-outgoing-args on gcc for x86

2009-09-01 Thread Godmar Back
Hi,

I'm using gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) for a x86
target. The info page says:

`-mpush-args'
`-mno-push-args'
 Use PUSH operations to store outgoing parameters.  This method is
 shorter and usually equally fast as method using SUB/MOV
 operations and is enabled by default.  In some cases disabling it
 may improve performance because of improved scheduling and reduced
 dependencies.

`-maccumulate-outgoing-args'
 If enabled, the maximum amount of space required for outgoing
 arguments will be computed in the function prologue.  This is
 faster on most modern CPUs because of reduced dependencies,
 improved scheduling and reduced stack usage when preferred stack
 boundary is not equal to 2.  The drawback is a notable increase in
 code size.  This switch implies `-mno-push-args'.

This information is also found on
http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html


Is this information up-to-date?

It appears to me that '-mno-push-args' is the enabled by default (*),
and not '-mpush-args'.  Moreover, since -maccumulate-outgoing-args
implies -mno-push-args, it appears that the only way to obtain
'push-args' behavior is to specify '-mno-accumulate-outgoing-args' - a
switch which the documentation doesn't even mention.

I have searched the mailing list archives and the only post I found
was this one:
http://gcc.gnu.org/ml/gcc/2005-01/msg00761.html which is at odds with
the documentation above.

Thanks.

 - Godmar

(*) for instance, see:

gb...@setzer [39](~/tmp) > cat call.c
void caller(void) {
extern void callee(int);
callee(5);
}
gb...@setzer [40](~/tmp) > gcc -mno-push-args -S call.c
gb...@setzer [41](~/tmp) > cat call.s
.file   "call.c"
.text
.globl caller
.type   caller, @function
caller:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
movl$5, (%esp)
callcallee
leave
ret
.size   caller, .-caller
.ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-44)"
.section.note.GNU-stack,"",@progbits


Re: Why no strings in error messages?

2009-09-01 Thread Gabriel Paubert
On Wed, Aug 26, 2009 at 03:02:44PM -0400, Bradley Lucier wrote:
> On Wed, 2009-08-26 at 20:38 +0200, Paolo Bonzini wrote:
> > 
> > > When I worked at AMD, I was starting to suspect that it may be more 
> > > beneficial
> > > to re-enable the first schedule insns pass if you were compiling in 64-bit
> > > mode, since you have more registers available, and the new registers do 
> > > not
> > > have hard wired uses, which in the past always meant a lot of spills 
> > > (also, the
> > > default floating point unit is SSE instead of the x87 stack).  I never got
> > > around to testing this before AMD and I parted company.
> > 
> > Unfortunately, hardwired use of %ecx for shifts is still enough to kill 
> > -fschedule-insns on AMD64.
> 
> The AMD64 Architecture manual I found said that various combinations of
> the RSI, RDI, and RCX registers are used implicitly by ten instructions
> or prefixes, and RBX is used by XLAT, XLATB.  So it appears that there
> are 12 general-purpose registers available for allocation.

XLATB is essentially useless (well maybe had some uses back in 16 bit days, 
when only a few registers could be used for addressing) and never generated
by GCC. 

However %ebx is used for PIC addressing in 32 bit mode so it is not 
always free either (I don't know about PIE code).

In 64 bit mode, PIC/PIE use PC relative addressing, so this gives 
you actually 9 more free registers than in 32 bit mode.

However for some reason you glossed over the case of integer division
which always use %edx and %eax. This is true even when dividing by a 
constant (non power of 2) in which case gcc will often use a widening 
multiply instead, whose results are in %edx:%eax, so it's almost a wash 
in terms of fixed register usage (not exactly, the divisions use %edx:%eax 
as dividends and need the divisor somewhere else, while the widening
multiply use %eax as one input but %edx can be used for the other).

(As a side note, %edx and %eax are also special with regard to I/O port
accesses but this is only of interest in device drivers).

> Are 12 registers not enough, in principle, to do scheduling before
> register allocation? 

I don't know, but I would say that you have about 14 registers
for address computations/indexing since you seem to be interested
in FP code. I would think that it is sufficient for many inner
loops (but not all, it really depends on the number of arrays
that you access and the number of independant indexes that
you have to keep).

> I was getting a 15% speedup on some numerical
> codes, as pre-scheduling spaced out the vector loads among the
> floating-point computations.

Well vector loads and floating point computations do not have anything 
to do with integer register choices. The 16 FP registers are 
nicely orthogonal (compared to the real nightmare that the x87 stack was).
In practice you schedule on 16 FP registers and 14 (15 if you omit
the frame pointer) addressing/indexing/counting registers.

In this type of code there are typically very few instructions with
fixed register constraints, and the less likely are the string
instructions. Shifts of variable amount and integer divides
are still possible, but unlikely.

Gabriel


Re: Call for testers: MPC 0.7 prerelease tarball

2009-09-01 Thread Dave Korn
Dave Korn wrote:

>   Attached allowed it to build, 

  And with that patch:

> ===
> All 45 tests passed
> ===

cheers,
  DaveK


Re: Call for testers: MPC 0.7 prerelease tarball

2009-09-01 Thread Dave Korn
Dave Korn wrote:

>   Fell at the first hurdle for me:
> 
>  gcc-4 -shared-libgcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D_FORTIFY_SOURCE=2 
> -p
> edantic -Wall -Wextra -Werror -O2 -pipe -MT inp_str.lo -MD -MP -MF 
> .deps/inp_str
> .Tpo -c inp_str.c  -DDLL_EXPORT -DPIC -o .libs/inp_str.o
> cc1: warnings being treated as errors
> inp_str.c: In function 'extract_string':
> inp_str.c:113:10: error: array subscript has type 'char'
> inp_str.c:114:10: error: array subscript has type 'char'
> inp_str.c:115:10: error: array subscript has type 'char'
> inp_str.c:118:13: error: array subscript has type 'char'
> inp_str.c:119:13: error: array subscript has type 'char'
> inp_str.c:120:13: error: array subscript has type 'char'
> make[2]: *** [inp_str.lo] Error 1
> make[2]: *** Waiting for unfinished jobs


  Attached allowed it to build, and seems to be what the function was already
doing for isspace earlier.  Test results will follow.

cheers,
  DaveK
--- orig/mpc-0.7-dev/src/inp_str.c	2009-08-26 21:24:41.0 +0100
+++ mpc-0.7-dev/src/inp_str.c	2009-09-01 12:17:04.546875000 +0100
@@ -110,14 +110,14 @@ extract_string (FILE *stream)
 
 /* (n-char-sequence) only after a NaN */
 if ((nread != 3
- || tolower (str[0]) != 'n'
- || tolower (str[1]) != 'a'
- || tolower (str[2]) != 'n')
+ || tolower ((unsigned char) str[0]) != 'n'
+ || tolower ((unsigned char) str[1]) != 'a'
+ || tolower ((unsigned char) str[2]) != 'n')
 && (nread != 5
 || str[0] != '@'
-|| tolower (str[1]) != 'n'
-|| tolower (str[2]) != 'a'
-|| tolower (str[3]) != 'n'
+|| tolower ((unsigned char) str[1]) != 'n'
+|| tolower ((unsigned char) str[2]) != 'a'
+|| tolower ((unsigned char) str[3]) != 'n'
 || str[4] != '@')) {
   ungetc (c, stream);
   return str;


Re: Call for testers: MPC 0.7 prerelease tarball

2009-09-01 Thread Dave Korn
Kaveh R. GHAZI wrote:
> Hello,
> 
> A prerelease tarball of the upcoming MPC 0.7 is available here:
> http://www.multiprecision.org/mpc/download/mpc-0.7-dev.tar.gz
> 
> Please help test it for portability and bugs by downloading and compiling
> it on systems you have access to.

  Fell at the first hurdle for me:

 gcc-4 -shared-libgcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D_FORTIFY_SOURCE=2 -p
edantic -Wall -Wextra -Werror -O2 -pipe -MT inp_str.lo -MD -MP -MF .deps/inp_str
.Tpo -c inp_str.c  -DDLL_EXPORT -DPIC -o .libs/inp_str.o
cc1: warnings being treated as errors
inp_str.c: In function 'extract_string':
inp_str.c:113:10: error: array subscript has type 'char'
inp_str.c:114:10: error: array subscript has type 'char'
inp_str.c:115:10: error: array subscript has type 'char'
inp_str.c:118:13: error: array subscript has type 'char'
inp_str.c:119:13: error: array subscript has type 'char'
inp_str.c:120:13: error: array subscript has type 'char'
make[2]: *** [inp_str.lo] Error 1
make[2]: *** Waiting for unfinished jobs

>  I'd like a report to contain your
> target triplet and the versions of your compiler, GMP and MPFR used when
> building MPC.  

$ /gnu/gcc/gcc/config.guess
i686-pc-cygwin

$ gcc-4 -v
Using built-in specs.
Target: i686-pc-cygwin
Configured with: /gnu/gcc/gcc-patched/configure --prefix=/opt/gcc-tools -v
--with-gmp=/usr --with-mpfr=/usr --enable-bootstrap
--enable-version-specific-runtime-libs --enable-static --enable-shared
--enable-shared-libgcc --disable-__cxa_atexit --with-gnu-ld --with-gnu-as
--with-dwarf2 --disable-sjlj-exceptions --disable-symvers --disable-libjava
--disable-interpreter --program-suffix=-4 --disable-libgomp --enable-libssp
--enable-libada --enable-threads=posix --with-arch=i686 --with-tune=generic
CC=gcc-4 CXX=g++-4 CC_FOR_TARGET=gcc-4 CXX_FOR_TARGET=g++-4
--with-ecj-jar=/usr/share/java/ecj.jar LD=/opt/gcc-tools/bin/ld.exe
LD_FOR_TARGET=/opt/gcc-tools/bin/ld.exe AS=/opt/gcc-tools/bin/as.exe
AS_FOR_TARGET=/opt/gcc-tools/bin/as.exe --disable-win32-registry
--disable-libgcj-debug --enable-languages=c,c++,ada
Thread model: posix
gcc version 4.5.0 20090730 (experimental) (GCC)

$ cygcheck -c gmp mpfr libgmp3 libmpfr1
Cygwin Package Information
Package  VersionStatus
gmp  4.3.1-3OK
libgmp3  4.3.1-3OK
libmpfr1 2.4.1-4OK
mpfr 2.4.1-4OK

$

  BTW, I configured mpc with "--prefix=/usr --disable-static --enable-shared"
(after first receiving "configure: error: gmp.h is a DLL: use --disable-static
--enable-shared" when I tried with just --prefix).

> Also please include your results from "make check".

  N/A !


cheers,
  DaveK



Trunk frozen for VTA merge

2009-09-01 Thread Jakub Jelinek
Subject says it all, I guess.

Jakub