A question about mov pattern
Hello, In the new target I'm working on there are branch regs and gprs. The loads and store instructions are only to/from the gprs, so if a branch reg needs to be spilled it first needs to be moved to a gpr and then stored to memory. I've implemented mov pattern in the machine description file for the gprs and a mov pattern between gprs and branch regs; however I'm am not sure if I need to add more to model the behavior described above and if so how to do it. Thanks, Revital
Generated files and patches
Hi, someone told me that generated files should be not included in patches. It would be nice if this is mentioned at http://gcc.gnu.org/contribute.html Have a nice day! -- Sebastian Huber, embedded brains GmbH Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany Phone : +49 89 18 90 80 79-6 Fax : +49 89 18 90 80 79-9 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: Generated files and patches
On 24 June 2010 12:34, Sebastian Huber wrote: > Hi, > > someone told me that generated files should be not included in patches. It > would be nice if this is mentioned at > > http://gcc.gnu.org/contribute.html > > Have a nice day! Yes, it would be nice. Unfortunately, I know by experience that if you care about this, you should submit a patch against the webpage. Otherwise, it will *never* get done. Cheers, Manuel.
Re: A question about mov pattern
On 06/24/10 02:02, Revital1 Eres wrote: Hello, In the new target I'm working on there are branch regs and gprs. The loads and store instructions are only to/from the gprs, so if a branch reg needs to be spilled it first needs to be moved to a gpr and then stored to memory. I've implemented mov pattern in the machine description file for the gprs and a mov pattern between gprs and branch regs; however I'm am not sure if I need to add more to model the behavior described above and if so how to do it. Secondary reloads is the answer. This isn't a terribly uncommon situation. Handling of the shift register (SAR) on the PA would be a good example. You can move the SAR to/from a GPR, but SAR can not be stored directly to memory. Searches for SAR in pa.c will get you a long way. Jeff
Re: patch: honor volatile bitfield types
(I wrote:) > > Can we similarly promise or say something for accesses of the > > containing struct as a whole? No takers? > Date: Wed, 23 Jun 2010 11:34:04 -0400 > From: DJ Delorie > Should be the same as before, I would think. Primarily I want them similarly defined. I wasn't expecting those access to be actually changed by your patches. Of course it'd be nice if they could tag along. :) The first step is this: to check if anyone is against them being well-defined (in GNU C terms, of course not in ISO C terms), and perhaps whether people believe that it's obvious one way (like I do) or the other (like it seemed other people did). Thanks BTW, for pushing through the subject as well as the patches. :) brgds, H-P PS. Happy midsummer!
Re: A question about mov pattern
On Thu, 2010-06-24 at 08:57 -0600, Jeff Law wrote: > On 06/24/10 02:02, Revital1 Eres wrote: > > Hello, > > > > In the new target I'm working on there are branch regs and gprs. > > The loads and store instructions are only to/from the gprs, so if a > > branch reg needs to be spilled it first needs to be moved to a gpr and > > then stored to memory. I've implemented mov pattern in the machine > > description file for the gprs and a mov pattern between gprs and branch > > regs; however I'm am not sure if I need to add more to model the behavior > > described above and if so how to do it. > > > Secondary reloads is the answer. > > This isn't a terribly uncommon situation. Handling of the shift > register (SAR) on the PA would be a good example. You can move the SAR > to/from a GPR, but SAR can not be stored directly to memory. Searches > for SAR in pa.c will get you a long way. The same is true for the condition register on PowerPC. Peter
Massive performance regression from switching to gcc 4.5
Hi, Just wanted to give a heads up on what might be the biggest compiler-upgrade-related performance difference we've seen at Mozilla. We switched gcc4.3 for gcc4.5 and our automated benchmarking infrastructure reported 4-19% slowdown on most of our performance metrics on 32 and 64bit Linux. A lone 8% speedup was measured on the Sunspider javascript benchmark on 64bit linux. Here are some of the slowdowns reported: http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/77951ccb76b5e630# http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/624246d7d900ed41# Most of the code is compiled with -fPIC -fno-rtti -fno-exceptions -Os -freorder-blocks -fomit-frame-pointer. The only difference in 4.5 is that we link with -static-libstdc++ and compile libstdc++ with -fPIC. However we barely make use of libstdc++, so I doubt that's the problem. We needed to link statically because of 4.5 uses a handful of newer libstdc++ symbols. We were upgrading to gcc 4.5.0 because of plugins and the fact that it can compile Firefox with PGO on(above builds were not built with PGO). Now we have to reconsider a complete switchover to 4.5. I'm not sure how to proceed from here, Taras
Re: Massive performance regression from switching to gcc 4.5
On Jun 24, 2010, at 11:50 AM, Taras Glek wrote: Hi, Just wanted to give a heads up on what might be the biggest compiler- upgrade-related performance difference we've seen at Mozilla. We switched gcc4.3 for gcc4.5 and our automated benchmarking infrastructure reported 4-19% slowdown on most of our performance metrics on 32 and 64bit Linux. A lone 8% speedup was measured on the Sunspider javascript benchmark on 64bit linux. Here are some of the slowdowns reported: http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/77951ccb76b5e630# http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/624246d7d900ed41# Most of the code is compiled with -fPIC -fno-rtti -fno-exceptions -Os Stop right there. You are compiling at -Os, that is tuned for size and not speed. So the question is did the size go down? Not the speed decreased. Try at -O2 and report back. I doubt we are going to do a tradeoff for speed at -Os at all. Thanks, Andrew Pinski -freorder-blocks -fomit-frame-pointer. The only difference in 4.5 is that we link with -static-libstdc++ and compile libstdc++ with - fPIC. However we barely make use of libstdc++, so I doubt that's the problem. We needed to link statically because of 4.5 uses a handful of newer libstdc++ symbols. We were upgrading to gcc 4.5.0 because of plugins and the fact that it can compile Firefox with PGO on(above builds were not built with PGO). Now we have to reconsider a complete switchover to 4.5. I'm not sure how to proceed from here, Taras
Re: Massive performance regression from switching to gcc 4.5
On 6/24/10 3:06 PM, Andrew Pinski wrote: Most of the code is compiled with -fPIC -fno-rtti -fno-exceptions -Os Stop right there. You are compiling at -Os, that is tuned for size and not speed. So the question is did the size go down? Not the speed decreased. Try at -O2 and report back. I doubt we are going to do a tradeoff for speed at -Os at all. For what it's worth, Mozilla-compiled-with-GCC has historically been faster compiled -Os instead of -O2. This is because the vast majority of our code is cold, and -O2 has produced substantially larger code, which causes our hot code to be evicted from processor caches more often. We will definitely try -O2 to see if the previous measurements are no longer valid with GCC 4.5. Looking through our codesize comparison logs, some of our methods are thosands of bytes longer with GCC 4.5 than 4.3 (same -Os compiler flags): +796nsHTMLEditRules::nsHTMLEditRules() +1088 nsCrypto::GenerateCRMFRequest(nsIDOMCRMFObject**) In addition, it appears at first glance that GCC is either no longer inlining at -Os, even when it would be a size advantage to do so, or is making some very poor inlining choices. e.g. +72nsTArray::nsTArray(nsTArray const&) We can turn some of these observations into bug reports if that would be helpful, but if it would make more sense we could perhaps just tune the inlining parameters directly to get the "real -Os" that we usually want. --BDS
Re: Massive performance regression from switching to gcc 4.5
> In addition, it appears at first glance that GCC is either no longer > inlining at -Os, even when it would be a size advantage to do so, or is > making some very poor inlining choices. > > e.g. +72 nsTArray::nsTArray(nsTArray const&) > > We can turn some of these observations into bug reports if that would be > helpful, but if it would make more sense we could perhaps just tune the > inlining parameters directly to get the "real -Os" that we usually want. We ran into similar inlining regressions in Ada, the heuristics have indeed changed significantly. The attached patchlet alone saves 3% in code size at -Os on a 50 MB executable and yields a 5% speedup at -O2 on another code. * ipa-inline.c (likely_eliminated_by_inlining_p): Really consider that loads from parameters passed by reference are free after inlining. -- Eric Botcazou *** gcc/ipa-inline.c.0 2010-06-12 17:01:09.0 +0200 --- gcc/ipa-inline.c 2010-06-12 18:26:32.0 +0200 *** likely_eliminated_by_inlining_p (gimple *** 1736,1754 bool rhs_free = false; bool lhs_free = false; ! while (handled_component_p (inner_lhs) || TREE_CODE (inner_lhs) == INDIRECT_REF) inner_lhs = TREE_OPERAND (inner_lhs, 0); ! while (handled_component_p (inner_rhs) ! || TREE_CODE (inner_rhs) == ADDR_EXPR || TREE_CODE (inner_rhs) == INDIRECT_REF) inner_rhs = TREE_OPERAND (inner_rhs, 0); - if (TREE_CODE (inner_rhs) == PARM_DECL || (TREE_CODE (inner_rhs) == SSA_NAME && SSA_NAME_IS_DEFAULT_DEF (inner_rhs) && TREE_CODE (SSA_NAME_VAR (inner_rhs)) == PARM_DECL)) rhs_free = true; ! if (rhs_free && is_gimple_reg (lhs)) lhs_free = true; if (((TREE_CODE (inner_lhs) == PARM_DECL || (TREE_CODE (inner_lhs) == SSA_NAME --- 1736,1757 bool rhs_free = false; bool lhs_free = false; ! while (handled_component_p (inner_lhs) ! || TREE_CODE (inner_lhs) == INDIRECT_REF) inner_lhs = TREE_OPERAND (inner_lhs, 0); ! while (handled_component_p (inner_rhs) ! || TREE_CODE (inner_rhs) == ADDR_EXPR ! || TREE_CODE (inner_rhs) == INDIRECT_REF) inner_rhs = TREE_OPERAND (inner_rhs, 0); if (TREE_CODE (inner_rhs) == PARM_DECL || (TREE_CODE (inner_rhs) == SSA_NAME && SSA_NAME_IS_DEFAULT_DEF (inner_rhs) && TREE_CODE (SSA_NAME_VAR (inner_rhs)) == PARM_DECL)) rhs_free = true; ! if (rhs_free ! && (is_gimple_reg (lhs) ! || !is_gimple_reg_type (TREE_TYPE (lhs lhs_free = true; if (((TREE_CODE (inner_lhs) == PARM_DECL || (TREE_CODE (inner_lhs) == SSA_NAME *** likely_eliminated_by_inlining_p (gimple *** 1759,1765 || (TREE_CODE (inner_lhs) == SSA_NAME && TREE_CODE (SSA_NAME_VAR (inner_lhs)) == RESULT_DECL)) lhs_free = true; ! if (lhs_free && (is_gimple_reg (rhs) || is_gimple_min_invariant (rhs))) rhs_free = true; if (lhs_free && rhs_free) return true; --- 1762,1771 || (TREE_CODE (inner_lhs) == SSA_NAME && TREE_CODE (SSA_NAME_VAR (inner_lhs)) == RESULT_DECL)) lhs_free = true; ! if (lhs_free ! && (is_gimple_reg (rhs) ! || !is_gimple_reg_type (TREE_TYPE (rhs)) ! || is_gimple_min_invariant (rhs))) rhs_free = true; if (lhs_free && rhs_free) return true;
Re: Massive performance regression from switching to gcc 4.5
On Thu, Jun 24, 2010 at 10:24 PM, Eric Botcazou wrote: >> In addition, it appears at first glance that GCC is either no longer >> inlining at -Os, even when it would be a size advantage to do so, or is >> making some very poor inlining choices. >> >> e.g. +72 nsTArray::nsTArray(nsTArray const&) >> >> We can turn some of these observations into bug reports if that would be >> helpful, but if it would make more sense we could perhaps just tune the >> inlining parameters directly to get the "real -Os" that we usually want. > > We ran into similar inlining regressions in Ada, the heuristics have indeed > changed significantly. The attached patchlet alone saves 3% in code size > at -Os on a 50 MB executable and yields a 5% speedup at -O2 on another code. > > > * ipa-inline.c (likely_eliminated_by_inlining_p): Really consider that > loads from parameters passed by reference are free after inlining. I don't understand this change. Minus whitespace changes it seems to be ! if (lhs_free && (is_gimple_reg (rhs) || is_gimple_min_invariant (rhs))) rhs_free = true; vs. ! if (lhs_free ! && (is_gimple_reg (rhs) ! || !is_gimple_reg_type (TREE_TYPE (rhs)) ! || is_gimple_min_invariant (rhs))) rhs_free = true; so the stmt is likely being eliminated if either the LHS or the RHS is based on a parameter and the other side is a register or an invariant. You change that to also discount aggregate stores/loads to/from parameters to be free. Which you could have simplified to just say if (lhs_free || rhs_free) return true; and drop the code you are changing. I never considered the heuristic making loads/stores from parameters free a very good one. It makes *p free but not *(p+1) for example. I would rather have seen the call stmts actual argument list to be considered. There are btw. some bugs wrt accounting of functions called once being inlined in 4.5 which were fixed on trunk which allow extra inlining. See 2010-04-13 Jan Hubicka * ipa-inline.c (cgraph_mark_inline_edge): Avoid double accounting of optimized out static functions. ... Richard.
gcc-4.5-20100624 is now available
Snapshot gcc-4.5-20100624 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100624/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch revision 161342 You'll find: gcc-4.5-20100624.tar.bz2 Complete GCC (includes all of below) gcc-core-4.5-20100624.tar.bz2 C front end and core compiler gcc-ada-4.5-20100624.tar.bz2 Ada front end and runtime gcc-fortran-4.5-20100624.tar.bz2 Fortran front end and runtime gcc-g++-4.5-20100624.tar.bz2 C++ front end and runtime gcc-java-4.5-20100624.tar.bz2 Java front end and runtime gcc-objc-4.5-20100624.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.5-20100624.tar.bz2The GCC testsuite Diffs from 4.5-20100617 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Massive performance regression from switching to gcc 4.5
On 25/06/10 06:39, Richard Guenther wrote: > There are btw. some bugs wrt accounting of functions called once > being inlined in 4.5 which were fixed on trunk which allow extra > inlining. > Are these changes likely to make it onto the 4.5 branch and into (say) 4.5.1? j.
Re: Massive performance regression from switching to gcc 4.5
> Minus whitespace changes it seems to be > > ! if (lhs_free && (is_gimple_reg (rhs) || > is_gimple_min_invariant (rhs))) > rhs_free = true; > > vs. > > ! if (lhs_free > ! && (is_gimple_reg (rhs) > ! || !is_gimple_reg_type (TREE_TYPE (rhs)) > ! || is_gimple_min_invariant (rhs))) > rhs_free = true; > > so the stmt is likely being eliminated if either the LHS or the RHS is > based on a parameter and the other side is a register or an invariant. You > change that to also discount aggregate stores/loads to/from parameters to > be free. There is also the counterpart for the RHS: ! if (rhs_free && is_gimple_reg (lhs)) lhs_free = true; vs ! if (rhs_free ! && (is_gimple_reg (lhs) ! || !is_gimple_reg_type (TREE_TYPE (lhs lhs_free = true; > Which you could have simplified to just say > > if (lhs_free || rhs_free) > return true; > > and drop the code you are changing. I don't think so, compare your version and mine for scalar stores/loads from/to parameters or return values. -- Eric Botcazou