A question about mov pattern

2010-06-24 Thread Revital1 Eres

Hello,

In the new target I'm working on there are branch regs and gprs.
The loads and store instructions are only to/from the gprs, so if a
branch reg needs to be spilled it first needs to be moved to a gpr and
then stored to memory.  I've implemented mov pattern in the machine
description file for the gprs and a mov pattern between gprs and branch
regs; however I'm am not sure if I need to add more to model the behavior
described above and if so how to do it.

Thanks,
Revital



Generated files and patches

2010-06-24 Thread Sebastian Huber
Hi,

someone told me that generated files should be not included in patches.  It
would be nice if this is mentioned at

http://gcc.gnu.org/contribute.html

Have a nice day!

-- 
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: Generated files and patches

2010-06-24 Thread Manuel López-Ibáñez
On 24 June 2010 12:34, Sebastian Huber
 wrote:
> Hi,
>
> someone told me that generated files should be not included in patches.  It
> would be nice if this is mentioned at
>
> http://gcc.gnu.org/contribute.html
>
> Have a nice day!

Yes, it would be nice. Unfortunately, I know by experience that if you
care about this, you should submit a patch against the webpage.
Otherwise, it will *never* get done.

Cheers,

Manuel.


Re: A question about mov pattern

2010-06-24 Thread Jeff Law

On 06/24/10 02:02, Revital1 Eres wrote:

Hello,

In the new target I'm working on there are branch regs and gprs.
The loads and store instructions are only to/from the gprs, so if a
branch reg needs to be spilled it first needs to be moved to a gpr and
then stored to memory.  I've implemented mov pattern in the machine
description file for the gprs and a mov pattern between gprs and branch
regs; however I'm am not sure if I need to add more to model the behavior
described above and if so how to do it.
   

Secondary reloads is the answer.

This isn't a terribly uncommon situation.  Handling of the shift 
register  (SAR) on the PA would be a good example.  You can move the SAR 
to/from a GPR, but SAR can not be stored directly to memory.  Searches 
for SAR in pa.c will get you a long way.



Jeff


Re: patch: honor volatile bitfield types

2010-06-24 Thread Hans-Peter Nilsson

(I wrote:)
> > Can we similarly promise or say something for accesses of the
> > containing struct as a whole?

No takers?

> Date: Wed, 23 Jun 2010 11:34:04 -0400
> From: DJ Delorie 

> Should be the same as before, I would think.

Primarily I want them similarly defined.  I wasn't expecting
those access to be actually changed by your patches.  Of course
it'd be nice if they could tag along. :)  The first step is this:
to check if anyone is against them being well-defined (in GNU C
terms, of course not in ISO C terms), and perhaps whether people
believe that it's obvious one way (like I do) or the other (like
it seemed other people did).

Thanks BTW, for pushing through the subject as well as the patches. :)

brgds, H-P
PS. Happy midsummer!


Re: A question about mov pattern

2010-06-24 Thread Peter Bergner
On Thu, 2010-06-24 at 08:57 -0600, Jeff Law wrote:
> On 06/24/10 02:02, Revital1 Eres wrote:
> > Hello,
> >
> > In the new target I'm working on there are branch regs and gprs.
> > The loads and store instructions are only to/from the gprs, so if a
> > branch reg needs to be spilled it first needs to be moved to a gpr and
> > then stored to memory.  I've implemented mov pattern in the machine
> > description file for the gprs and a mov pattern between gprs and branch
> > regs; however I'm am not sure if I need to add more to model the behavior
> > described above and if so how to do it.
> >
> Secondary reloads is the answer.
> 
> This isn't a terribly uncommon situation.  Handling of the shift 
> register  (SAR) on the PA would be a good example.  You can move the SAR 
> to/from a GPR, but SAR can not be stored directly to memory.  Searches 
> for SAR in pa.c will get you a long way.

The same is true for the condition register on PowerPC.

Peter





Massive performance regression from switching to gcc 4.5

2010-06-24 Thread Taras Glek

Hi,
Just wanted to give a heads up on what might be the biggest 
compiler-upgrade-related performance difference we've seen at Mozilla.


We switched gcc4.3 for gcc4.5 and our automated benchmarking 
infrastructure reported  4-19% slowdown on most of our performance 
metrics on 32 and 64bit Linux.


A lone 8% speedup was measured on the Sunspider javascript benchmark on 
64bit linux.


Here are some of the slowdowns reported:
http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/77951ccb76b5e630#
http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/624246d7d900ed41#


Most of the code is compiled with   -fPIC  -fno-rtti -fno-exceptions -Os 
-freorder-blocks -fomit-frame-pointer. The only difference in 4.5 is 
that we link with -static-libstdc++ and compile libstdc++ with -fPIC. 
However we barely make use of libstdc++, so I doubt that's the problem. 
We needed to link statically because of 4.5 uses a handful of newer 
libstdc++ symbols.


We were upgrading to gcc 4.5.0 because of plugins and the fact that it 
can compile Firefox with PGO on(above builds were not built with PGO). 
Now we have to reconsider a complete switchover to 4.5.


I'm not sure how to proceed from here,
Taras


Re: Massive performance regression from switching to gcc 4.5

2010-06-24 Thread Andrew Pinski



On Jun 24, 2010, at 11:50 AM, Taras Glek  wrote:


Hi,
Just wanted to give a heads up on what might be the biggest compiler- 
upgrade-related performance difference we've seen at Mozilla.


We switched gcc4.3 for gcc4.5 and our automated benchmarking  
infrastructure reported  4-19% slowdown on most of our performance  
metrics on 32 and 64bit Linux.


A lone 8% speedup was measured on the Sunspider javascript benchmark  
on 64bit linux.


Here are some of the slowdowns reported:
http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/77951ccb76b5e630#
http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/624246d7d900ed41#


Most of the code is compiled with   -fPIC  -fno-rtti -fno-exceptions  
-Os


Stop right there. You are compiling at -Os, that is tuned for size and  
not speed. So the question is did the size go down? Not the speed  
decreased. Try at -O2 and report back. I doubt we are going to do a  
tradeoff for speed at -Os at all.

Thanks,
Andrew Pinski


-freorder-blocks -fomit-frame-pointer. The only difference in 4.5 is  
that we link with -static-libstdc++ and compile libstdc++ with - 
fPIC. However we barely make use of libstdc++, so I doubt that's the  
problem. We needed to link statically because of 4.5 uses a handful  
of newer libstdc++ symbols.


We were upgrading to gcc 4.5.0 because of plugins and the fact that  
it can compile Firefox with PGO on(above builds were not built with  
PGO). Now we have to reconsider a complete switchover to 4.5.


I'm not sure how to proceed from here,
Taras


Re: Massive performance regression from switching to gcc 4.5

2010-06-24 Thread Benjamin Smedberg

On 6/24/10 3:06 PM, Andrew Pinski wrote:


Most of the code is compiled with -fPIC -fno-rtti -fno-exceptions -Os


Stop right there. You are compiling at -Os, that is tuned for size and
not speed. So the question is did the size go down? Not the speed
decreased. Try at -O2 and report back. I doubt we are going to do a
tradeoff for speed at -Os at all.


For what it's worth, Mozilla-compiled-with-GCC has historically been faster 
compiled -Os instead of -O2. This is because the vast majority of our code 
is cold, and -O2 has produced substantially larger code, which causes our 
hot code to be evicted from processor caches more often.


We will definitely try -O2 to see if the previous measurements are no longer 
valid with GCC 4.5.


Looking through our codesize comparison logs, some of our methods are 
thosands of bytes longer with GCC 4.5 than 4.3 (same -Os compiler flags):


+796nsHTMLEditRules::nsHTMLEditRules()
+1088   nsCrypto::GenerateCRMFRequest(nsIDOMCRMFObject**)

In addition, it appears at first glance that GCC is either no longer 
inlining at -Os, even when it would be a size advantage to do so, or is 
making some very poor inlining choices.


e.g. +72nsTArray::nsTArray(nsTArray const&)

We can turn some of these observations into bug reports if that would be 
helpful, but if it would make more sense we could perhaps just tune the 
inlining parameters directly to get the "real -Os" that we usually want.


--BDS



Re: Massive performance regression from switching to gcc 4.5

2010-06-24 Thread Eric Botcazou
> In addition, it appears at first glance that GCC is either no longer
> inlining at -Os, even when it would be a size advantage to do so, or is
> making some very poor inlining choices.
>
> e.g. +72  nsTArray::nsTArray(nsTArray const&)
>
> We can turn some of these observations into bug reports if that would be
> helpful, but if it would make more sense we could perhaps just tune the
> inlining parameters directly to get the "real -Os" that we usually want.

We ran into similar inlining regressions in Ada, the heuristics have indeed 
changed significantly.  The attached patchlet alone saves 3% in code size 
at -Os on a 50 MB executable and yields a 5% speedup at -O2 on another code.


* ipa-inline.c (likely_eliminated_by_inlining_p): Really consider that
loads from parameters passed by reference are free after inlining.


-- 
Eric Botcazou
*** gcc/ipa-inline.c.0	2010-06-12 17:01:09.0 +0200
--- gcc/ipa-inline.c	2010-06-12 18:26:32.0 +0200
*** likely_eliminated_by_inlining_p (gimple
*** 1736,1754 
  	bool rhs_free = false;
  	bool lhs_free = false;
  
!  	while (handled_component_p (inner_lhs) || TREE_CODE (inner_lhs) == INDIRECT_REF)
  	  inner_lhs = TREE_OPERAND (inner_lhs, 0);
!  	while (handled_component_p (inner_rhs)
! 	   || TREE_CODE (inner_rhs) == ADDR_EXPR || TREE_CODE (inner_rhs) == INDIRECT_REF)
  	  inner_rhs = TREE_OPERAND (inner_rhs, 0);
  
- 
  	if (TREE_CODE (inner_rhs) == PARM_DECL
  	|| (TREE_CODE (inner_rhs) == SSA_NAME
  		&& SSA_NAME_IS_DEFAULT_DEF (inner_rhs)
  		&& TREE_CODE (SSA_NAME_VAR (inner_rhs)) == PARM_DECL))
  	  rhs_free = true;
! 	if (rhs_free && is_gimple_reg (lhs))
  	  lhs_free = true;
  	if (((TREE_CODE (inner_lhs) == PARM_DECL
  	  || (TREE_CODE (inner_lhs) == SSA_NAME
--- 1736,1757 
  	bool rhs_free = false;
  	bool lhs_free = false;
  
! 	while (handled_component_p (inner_lhs)
! 		   || TREE_CODE (inner_lhs) == INDIRECT_REF)
  	  inner_lhs = TREE_OPERAND (inner_lhs, 0);
! 	while (handled_component_p (inner_rhs)
! 	   || TREE_CODE (inner_rhs) == ADDR_EXPR
! 		   || TREE_CODE (inner_rhs) == INDIRECT_REF)
  	  inner_rhs = TREE_OPERAND (inner_rhs, 0);
  
  	if (TREE_CODE (inner_rhs) == PARM_DECL
  	|| (TREE_CODE (inner_rhs) == SSA_NAME
  		&& SSA_NAME_IS_DEFAULT_DEF (inner_rhs)
  		&& TREE_CODE (SSA_NAME_VAR (inner_rhs)) == PARM_DECL))
  	  rhs_free = true;
! 	if (rhs_free
! 		&& (is_gimple_reg (lhs)
! 		|| !is_gimple_reg_type (TREE_TYPE (lhs
  	  lhs_free = true;
  	if (((TREE_CODE (inner_lhs) == PARM_DECL
  	  || (TREE_CODE (inner_lhs) == SSA_NAME
*** likely_eliminated_by_inlining_p (gimple
*** 1759,1765 
  	|| (TREE_CODE (inner_lhs) == SSA_NAME
  		&& TREE_CODE (SSA_NAME_VAR (inner_lhs)) == RESULT_DECL))
  	  lhs_free = true;
! 	if (lhs_free && (is_gimple_reg (rhs) || is_gimple_min_invariant (rhs)))
  	  rhs_free = true;
  	if (lhs_free && rhs_free)
  	  return true;
--- 1762,1771 
  	|| (TREE_CODE (inner_lhs) == SSA_NAME
  		&& TREE_CODE (SSA_NAME_VAR (inner_lhs)) == RESULT_DECL))
  	  lhs_free = true;
! 	if (lhs_free
! 		&& (is_gimple_reg (rhs)
! 		|| !is_gimple_reg_type (TREE_TYPE (rhs))
! 		|| is_gimple_min_invariant (rhs)))
  	  rhs_free = true;
  	if (lhs_free && rhs_free)
  	  return true;


Re: Massive performance regression from switching to gcc 4.5

2010-06-24 Thread Richard Guenther
On Thu, Jun 24, 2010 at 10:24 PM, Eric Botcazou  wrote:
>> In addition, it appears at first glance that GCC is either no longer
>> inlining at -Os, even when it would be a size advantage to do so, or is
>> making some very poor inlining choices.
>>
>> e.g. +72      nsTArray::nsTArray(nsTArray const&)
>>
>> We can turn some of these observations into bug reports if that would be
>> helpful, but if it would make more sense we could perhaps just tune the
>> inlining parameters directly to get the "real -Os" that we usually want.
>
> We ran into similar inlining regressions in Ada, the heuristics have indeed
> changed significantly.  The attached patchlet alone saves 3% in code size
> at -Os on a 50 MB executable and yields a 5% speedup at -O2 on another code.
>
>
>        * ipa-inline.c (likely_eliminated_by_inlining_p): Really consider that
>        loads from parameters passed by reference are free after inlining.

I don't understand this change.  Minus whitespace changes it seems to be

!   if (lhs_free && (is_gimple_reg (rhs) ||
is_gimple_min_invariant (rhs)))
  rhs_free = true;

vs.

!   if (lhs_free
!   && (is_gimple_reg (rhs)
!   || !is_gimple_reg_type (TREE_TYPE (rhs))
!   || is_gimple_min_invariant (rhs)))
  rhs_free = true;

so the stmt is likely being eliminated if either the LHS or the RHS is based
on a parameter and the other side is a register or an invariant.  You change
that to also discount aggregate stores/loads to/from parameters to be
free.

Which you could have simplified to just say

  if (lhs_free || rhs_free)
return true;

and drop the code you are changing.

I never considered the heuristic making loads/stores from parameters
free a very good one.  It makes *p free but not *(p+1) for example.
I would rather have seen the call stmts actual argument list to
be considered.

There are btw. some bugs wrt accounting of functions called once
being inlined in 4.5 which were fixed on trunk which allow extra
inlining.  See

2010-04-13  Jan Hubicka  

* ipa-inline.c (cgraph_mark_inline_edge): Avoid double accounting
of optimized out static functions.
...

Richard.


gcc-4.5-20100624 is now available

2010-06-24 Thread gccadmin
Snapshot gcc-4.5-20100624 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100624/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 161342

You'll find:

gcc-4.5-20100624.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100624.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100624.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100624.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100624.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100624.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100624.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100624.tar.bz2The GCC testsuite

Diffs from 4.5-20100617 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Massive performance regression from switching to gcc 4.5

2010-06-24 Thread Jonathan Adamczewski
On 25/06/10 06:39, Richard Guenther wrote:
> There are btw. some bugs wrt accounting of functions called once
> being inlined in 4.5 which were fixed on trunk which allow extra
> inlining.
>   

Are these changes likely to make it onto the 4.5 branch and into (say)
4.5.1?

j.


Re: Massive performance regression from switching to gcc 4.5

2010-06-24 Thread Eric Botcazou
> Minus whitespace changes it seems to be
>
> !   if (lhs_free && (is_gimple_reg (rhs) ||
> is_gimple_min_invariant (rhs)))
>   rhs_free = true;
>
> vs.
>
> !   if (lhs_free
> !   && (is_gimple_reg (rhs)
> !   || !is_gimple_reg_type (TREE_TYPE (rhs))
> !   || is_gimple_min_invariant (rhs)))
>   rhs_free = true;
>
> so the stmt is likely being eliminated if either the LHS or the RHS is
> based on a parameter and the other side is a register or an invariant.  You
> change that to also discount aggregate stores/loads to/from parameters to
> be free.

There is also the counterpart for the RHS:

!   if (rhs_free && is_gimple_reg (lhs))
  lhs_free = true;
vs

!   if (rhs_free
!   && (is_gimple_reg (lhs)
!   || !is_gimple_reg_type (TREE_TYPE (lhs
  lhs_free = true;

> Which you could have simplified to just say
>
>   if (lhs_free || rhs_free)
> return true;
>
> and drop the code you are changing.

I don't think so, compare your version and mine for scalar stores/loads 
from/to parameters or return values.

-- 
Eric Botcazou