Re: "ld -r" on mixed IR/non-IR objects (
On 08/12/2010 18:40, Andi Kleen wrote: > Fat LTO is just too slow. I suspect with that kind of performance > penalty most people simply would not use it at all. How slow is "too" slow? How many people out of a hundred won't use it? Got numbers, or just a gut feeling? cheers, DaveK
Re: "ld -r" on mixed IR/non-IR objects (
On Wed, 8 Dec 2010, Andrew Pinski wrote: > On Wed, Dec 8, 2010 at 10:40 AM, Andi Kleen wrote: > > The gcc maintainers unfortunately didn't want to integrate the > > wrapper scripts to make it easy, but they can be always downloaded > > separately and I assume distributions will eventually ship > > them anyways. > > No we do just not as scripts. We want real programs rather shell > based scripts so it is more portable. And programs that take proper account of the transformed name under which the compiler driver they are running will be installed, rather than hardcoding "gcc" (and bad assumptions that the presence of "xgcc" means a GCC build directory), to mention another piece of my feedback that was ignored in the resubmission. -- Joseph S. Myers jos...@codesourcery.com
Re: "ld -r" on mixed IR/non-IR objects (
On Wed, Dec 8, 2010 at 10:40 AM, Andi Kleen wrote: > The gcc maintainers unfortunately didn't want to integrate the > wrapper scripts to make it easy, but they can be always downloaded > separately and I assume distributions will eventually ship > them anyways. No we do just not as scripts. We want real programs rather shell based scripts so it is more portable. -- Pinski
Re: wrong output of print_generic_decl() called from a plugin
On Wed, Dec 8, 2010 at 1:52 PM, I wrote: > This outputs "static void barfunc (int);" but the function is neither > static nor does it expect only one int parameter... here's another example where print_generic_decl() fails: --- typedef void (*Handler)( int , void * ); Handler GetFunctionPointer(); --- This would output "extern void (*Handler) (int, void *) GetFunctionPointer (void);" Any other function I could use that is more reliable? Thanks, Joachim
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
On Wed, Dec 08, 2010 at 09:16:23PM +0100, Richard Guenther wrote: > On Wed, 8 Dec 2010, Jack Howarth wrote: > > > On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote: > > > > > > > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: > > > >> > > > > This was built against ppl 0.10.2 and cloog 0.15.10. > > > > > > Have you tried a bootstrap with neither ppl nor cloog ? I have yet to see > > > their value and I generally exclude them. This results ( thus far ) in > > > nice clean bootstrap builds. > > > > > > > Dennis, > >Considering that distros like Fedora ship their gcc's with graphite > > support built-in, allowing graphite to regress like this between gcc > > maintenance releases doesn't seem like a very good idea. > > The SUSE builds look fine. You have to investigate why it doesn't > work for you, but it won't hold the 4.5.2 release. Are your > ppl and cloog testsuite runs clean? Did you by chance build them > with a different GCC release (and thus libstdc++)? Richard, I see the problem now and it confirms my fears about the loose version control on gcc vs ppl vs cloog. I had built a cloog deb package against a ppl2 0.11 package but forgot that and reinstalled the ppl 0.10.2 package. This resulted in a build of gcc with... [MacPro:gcc/x86_64-apple-darwin10.5.0/4.5.2] howarth% otool -L cc1 cc1: /sw/lib/libintl.8.dylib (compatibility version 9.0.0, current version 9.2.0) /sw/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) /sw/lib/libcloog.0.dylib (compatibility version 1.0.0, current version 1.0.0) /sw/lib/libppl_c.2.dylib (compatibility version 4.0.0, current version 4.0.0) /sw/lib/libppl.7.dylib (compatibility version 9.0.0, current version 9.0.0) /sw/lib/libgmpxx.4.dylib (compatibility version 6.0.0, current version 6.2.0) /sw/lib/libmpc.2.dylib (compatibility version 3.0.0, current version 3.0.0) /sw/lib/libmpfr.1.dylib (compatibility version 4.0.0, current version 4.2.0) /sw/lib/libgmp.3.dylib (compatibility version 9.0.0, current version 9.2.0) /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.3) /usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 625.0.0) /sw/lib/gcc4.5/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.1) [MacPro:gcc/x86_64-apple-darwin10.5.0/4.5.2] howarth% otool -L /sw/lib/libcloog.0.dylib /sw/lib/libcloog.0.dylib: /sw/lib/libcloog.0.dylib (compatibility version 1.0.0, current version 1.0.0) /sw/lib/libgmp.3.dylib (compatibility version 9.0.0, current version 9.2.0) /sw/lib/libppl_c.4.dylib (compatibility version 5.0.0, current version 5.0.0) /sw/lib/libppl.9.dylib (compatibility version 10.0.0, current version 10.0.0) /sw/lib/libgmpxx.4.dylib (compatibility version 6.0.0, current version 6.2.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.1) I believe in the past I may have tested FSF gcc built against ppl 0.11 vs a cloog built against ppl 0.10.2 and that worked. Apparently it is the inverse that breaks graphite (ie FSF built against ppl 0.10.2 vs a cloog built against ppl 0.11). Jack > > Thanks, > Richard. > > > Jack > > > > > > > > -- > > > Dennis Clarke > > > dcla...@opensolaris.ca <- Email related to the open source Solaris > > > dcla...@blastwave.org <- Email related to open source for Solaris > > > > > > > > > -- > Richard Guenther > Novell / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus > Rex
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
> On Wed, 8 Dec 2010, Jack Howarth wrote: >> On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote: >> > > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: >> > >> >> > > This was built against ppl 0.10.2 and cloog 0.15.10. >> > >> > Have you tried a bootstrap with neither ppl nor cloog ? I have yet to >> see >> > their value and I generally exclude them. This results ( thus far ) in >> > nice clean bootstrap builds. >> Dennis, >>Considering that distros like Fedora ship their gcc's with graphite >> support built-in, allowing graphite to regress like this between gcc maintenance releases doesn't seem like a very good idea. > > The SUSE builds look fine. You have to investigate why it doesn't work for you, but it won't hold the 4.5.2 release. Are your > ppl and cloog testsuite runs clean? Did you by chance build them with a different GCC release (and thus libstdc++)? > > Thanks, > Richard. Good question ! I generally do a double bootstrap in which my first build is done with a previous version of GCC. Once I see reasonable testsuite results I then use the resultant compiler from the first bootstrap to build the "release" version. This then explains why the compiler that build GCC 4.5.1 on Solaris 8 is in fact, GCC 4.5.1 : http://gcc.gnu.org/ml/gcc-testresults/2010-09/msg02183.html However, having said all this I have yet to see either the ppl or cloog software components build once on the legacy Solaris platform I must support baseline legacy Solaris 8 which in turn assures functionality upwards to Solaris 10 and possibly 11. http://www.blastwave.org/jir/pkgcontents.ftd?software=gcc4&style=brief&state=5&arch=sparc -- Dennis Clarke dcla...@opensolaris.ca <- Email related to the open source Solaris dcla...@blastwave.org <- Email related to open source for Solaris
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
On Wed, 8 Dec 2010, Jack Howarth wrote: > On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote: > > > > > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: > > >> > > > This was built against ppl 0.10.2 and cloog 0.15.10. > > > > Have you tried a bootstrap with neither ppl nor cloog ? I have yet to see > > their value and I generally exclude them. This results ( thus far ) in > > nice clean bootstrap builds. > > > > Dennis, >Considering that distros like Fedora ship their gcc's with graphite > support built-in, allowing graphite to regress like this between gcc > maintenance releases doesn't seem like a very good idea. The SUSE builds look fine. You have to investigate why it doesn't work for you, but it won't hold the 4.5.2 release. Are your ppl and cloog testsuite runs clean? Did you by chance build them with a different GCC release (and thus libstdc++)? Thanks, Richard. > Jack > > > > > -- > > Dennis Clarke > > dcla...@opensolaris.ca <- Email related to the open source Solaris > > dcla...@blastwave.org <- Email related to open source for Solaris > > > > -- Richard Guenther Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
On Wed, 8 Dec 2010, David Fang wrote: > Hi, > Is there time to include the 4.5 backport patch for: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46170 > > (is fixed on trunk, a 4.5.0 regression, 4.4.3 branch regression) > The comments indicate that the patch is good to go for 4.5, but I didn't see > an entry log that it was actually committed. We'll fix it for 4.5.3, the patch seems pretty big so is not appropriate at this stage. Richard. > Fang > > > A release candidate for GCC 4.5.2 is available from > > > > ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208 > > > > and shortly its mirrors. It has been generated from SVN revision 167585. > > > > I have so far bootstrapped and tested the release candidate on > > x86_64-linux, bootstraps and tests on > > {i686,ia64,ppc,ppc64,s390,s390x}-linux are running. > > > > Please test it and report any issues to bugzilla. > > > > The branch remains frozen and all checkins until after the final release > > of GCC 4.5.2 require explicit RM approval. > > > > If all goes well, I'd like to release 4.5.2 early next week. > > > > > > Richard. > > > > > > David Fang > http://www.csl.cornell.edu/~fang/ > http://www.achronix.com/ > > -- Richard Guenther Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
Re: PowerPC optimization regression
Joakim Tjernlund writes: > I already sent in a bug with gccbug, hope it shows up > How long do one have to wait until it is visible? The gccbug script no longer works and has been removed from current versions of gcc. You should get a bounce message. Please use http://gcc.gnu.org/bugzilla/ instead, as described at http://gcc.gnu.org/bugs/ . Sorry for the confusion. Ian
rsync'd repo size
http://gcc.gnu.org/rsync.html says 17 Gb. I just did it, and it's up to 22 Gb.
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
> On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote: >> >> > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: >> >> >> > This was built against ppl 0.10.2 and cloog 0.15.10. >> >> Have you tried a bootstrap with neither ppl nor cloog ? I have yet to >> see >> their value and I generally exclude them. This results ( thus far ) in >> nice clean bootstrap builds. >> > > Dennis, >Considering that distros like Fedora ship their gcc's with graphite > support built-in, allowing graphite to regress like this between gcc > maintenance releases doesn't seem like a very good idea. > Jack Of course I agree completely. -- Dennis Clarke dcla...@opensolaris.ca <- Email related to the open source Solaris dcla...@blastwave.org <- Email related to open source for Solaris
Re: Making a new port
"viv0411.par...@gmail.com" writes: > Sir i plan to make gcc port for android. I only know c++. Please tell me how > should i make. There already is a gcc port for Android. If you mean that you want to build gcc for the Android target, see http://gcc.gnu.org/install/ . Please take any questions to the mailing list gcc-h...@gcc.gnu.org, rather than g...@gcc.gnu.org. Thanks. Ian
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
On Wed, Dec 8, 2010 at 11:05 AM, Jack Howarth wrote: > On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote: >> >> > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: >> >> >> > This was built against ppl 0.10.2 and cloog 0.15.10. >> >> Have you tried a bootstrap with neither ppl nor cloog ? I have yet to see >> their value and I generally exclude them. This results ( thus far ) in >> nice clean bootstrap builds. >> > > Dennis, > Considering that distros like Fedora ship their gcc's with graphite > support built-in, allowing graphite to regress like this between gcc > maintenance releases doesn't seem like a very good idea. > Jack > graphite tests on 4.5 branch seem OK for Fedora 14/x86-64: http://gcc.gnu.org/ml/gcc-testresults/2010-12/msg00662.html gcc 4.5 configure reports: checking for the correct version of gmp.h... yes checking for the correct version of mpfr.h... yes checking for the correct version of mpc.h... yes checking for the correct version of the gmp/mpfr/mpc libraries... yes checking for version 0.10 of PPL... yes checking for version 0.15.5 (or later revision) of CLooG... buggy but acceptable -- H.J.
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote: > > > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: > >> > > This was built against ppl 0.10.2 and cloog 0.15.10. > > Have you tried a bootstrap with neither ppl nor cloog ? I have yet to see > their value and I generally exclude them. This results ( thus far ) in > nice clean bootstrap builds. > Dennis, Considering that distros like Fedora ship their gcc's with graphite support built-in, allowing graphite to regress like this between gcc maintenance releases doesn't seem like a very good idea. Jack > > -- > Dennis Clarke > dcla...@opensolaris.ca <- Email related to the open source Solaris > dcla...@blastwave.org <- Email related to open source for Solaris >
wrong output of print_generic_decl() called from a plugin
While testing how to parse C and C++ code for function prototypes from a plugin (see http://gcc.gnu.org/ml/gcc/2010-12/msg00179.html) I noticed that print_generic_decl() seems to output wrong data. Consider the following function definition: -- void barfunc (int foo, int abc, ... ) { } -- This outputs "static void barfunc (int);" but the function is neither static nor does it expect only one int parameter... Am I doing something wrong? I am calling "print_generic_decl(file, decl, 0);" from the PLUGIN_PRE_GENERICIZE hook and this is gcc version 4.5.1 (GCC) on Solaris. Thanks, Joachim
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
> On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: >> > This was built against ppl 0.10.2 and cloog 0.15.10. Have you tried a bootstrap with neither ppl nor cloog ? I have yet to see their value and I generally exclude them. This results ( thus far ) in nice clean bootstrap builds. -- Dennis Clarke dcla...@opensolaris.ca <- Email related to the open source Solaris dcla...@blastwave.org <- Email related to open source for Solaris
Re: "ld -r" on mixed IR/non-IR objects (
> As someone who encountered slim LTO on Unix 17 years ago (on MIPS) I can > promise you that unless fat LTO is supported, there will never be a Fat LTO is just too slow. I suspect with that kind of performance penalty most people simply would not use it at all. > successful transition. The amount of work to deal with the make > environment every time simply made it not worth it. It's not too hard in my experience. I did it in a few cases for gcc. The gcc maintainers unfortunately didn't want to integrate the wrapper scripts to make it easy, but they can be always downloaded separately and I assume distributions will eventually ship them anyways. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote: > > A release candidate for GCC 4.5.2 is available from > > ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208 > > and shortly its mirrors. It has been generated from SVN revision 167585. > > I have so far bootstrapped and tested the release candidate on > x86_64-linux, bootstraps and tests on > {i686,ia64,ppc,ppc64,s390,s390x}-linux are running. > > Please test it and report any issues to bugzilla. > > The branch remains frozen and all checkins until after the final release > of GCC 4.5.2 require explicit RM approval. > > If all goes well, I'd like to release 4.5.2 early next week. Richard, I am seeing a large number of regressions in gcc-4.5.2-RC-20101208 on x86_64-apple-darwin10 in the graphite tests. So far I have... === gcc Summary for unix/-m32 === # of expected passes70022 # of unexpected failures227 # of expected failures 175 # of unresolved testcases 36 # of unsupported tests 1281 The failures all seem to be of the form... Executing on host: /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/ /sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c -O2 -fgraphite -fdump-tree-graphite-all -S -m32 -o scop-0.s(timeout = 300) /sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c: In function 'toto':^M /sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c:4:5: internal compiler error: Segmentation fault^M which backtraces as... gdb /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/cc1 GNU gdb 6.3.50-20050815 (Apple version gdb-1472) (Wed Jul 21 10:53:12 UTC 2010) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ... warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/block.o" - no debug information available for "source/block.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/loop.o" - no debug information available for "source/loop.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/names.o" - no debug information available for "source/names.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/options.o" - no debug information available for "source/options.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/clast.o" - no debug information available for "source/ppl/clast.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/domain.o" - no debug information available for "source/ppl/domain.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/matrix.o" - no debug information available for "source/ppl/matrix.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/pprint.o" - no debug information available for "source/pprint.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/program.o" - no debug information available for "source/program.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/statement.o" - no debug information available for "source/statement.c". warning: Could not find object file "/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/version.o" - no debug information available for "source/version.c". ..b done ^R (gdb) break fancy_abort Breakpoint 1 at 0x10034a400: file ../../gcc-4.5.2-RC-20101208/gcc/diagnostic.c, line 762. (gdb) r -quiet -v -imultilib i386 -iprefix /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/../lib/gcc/x86_64-apple-darwin10.5.0/4.5.2/ -isystem /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/include -isystem /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/include-fixed -D__DYNAMIC__ /sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c -fPIC -quiet -dumpbase scop-0.c -mm
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
Hi, Is there time to include the 4.5 backport patch for: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46170 (is fixed on trunk, a 4.5.0 regression, 4.4.3 branch regression) The comments indicate that the patch is good to go for 4.5, but I didn't see an entry log that it was actually committed. Fang A release candidate for GCC 4.5.2 is available from ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208 and shortly its mirrors. It has been generated from SVN revision 167585. I have so far bootstrapped and tested the release candidate on x86_64-linux, bootstraps and tests on {i686,ia64,ppc,ppc64,s390,s390x}-linux are running. Please test it and report any issues to bugzilla. The branch remains frozen and all checkins until after the final release of GCC 4.5.2 require explicit RM approval. If all goes well, I'd like to release 4.5.2 early next week. Richard. David Fang http://www.csl.cornell.edu/~fang/ http://www.achronix.com/
Re: "ld -r" on mixed IR/non-IR objects (
On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote: >> On 12/07/2010 04:20 PM, Andi Kleen wrote: >>> >>> The only problem left is mixing of lto and non lto objects. this right >>> now is not handled. IMHO still the best way to handle it is to use >>> slim lto and then simply separate link the "left overs" after deleting >>> the LTO objects. This can be actually done with objcopy (with some >>> limitations), doesn't even need linker support. I agree that FAT lto objects are not necessary to make the everything work and the integration of LTO with existing build environment 'transparent' --- there is compiler out in the world that does just that -- produces IL only objects (wrapped in elf format); works with archives with mixed objects; works with ld -r with mixed objects; builds unix kernel successfully with LTO ... David >>> >> >> Quite possibly a better way to deal with that is to provide a mechanism >> for encapsulating arbitrary binary code objects inside the LTO IR. > > Then you would need to teach your assembler and everything > else that may generate ELF objects to generate this magic object. But why > not just ELF directly? that is what it is after all. > > To be honest I don't really see the point of all this complexity you > guys are proposing just to save fat LTO. Fat LTO is always a bad idea > because it's slow and does lots of redundant work. If LTO is to become > a more wide spread mode it has to go simply because of the poor > performance. > > With slim LTO passthrough is very straight-forward: simple pass > through every section that is not LTO and generate code for the LTO > sections. No new magic sections needed at all. > > -Andi > >
Making a new port
Sir i plan to make gcc port for android. I only know c++. Please tell me how should i make.
Re: combine two load insns
On 8 December 2010 17:37, Jeff Law wrote: > On 12/08/10 09:18, Frederic Riss wrote: >> >> OK, I see your point, but I tend to think the the odds of the register >> allocator being able to coalesce the additional DI->SI moves in the >> pre-IRA approach are by far higher that the odds of having merge >> candidates after register allocation. > > I agree, but note that failure to coalesce leads to code quality regression. Well, it really depends on the architecture. Moving between SImode registers is usually nearly free, whereas accessing the memory is so much more costly... If your architecture has a DI sized datapath to memory, you actually divide the memory bandwidth requirement by 2 when you pack SI loads together. This seems like a net win to me even if you add 1 or 2 moves to the equation. Fred
Re: "ld -r" on mixed IR/non-IR objects (
On 12/08/2010 01:19 AM, Andi Kleen wrote: > > To be honest I don't really see the point of all this complexity you > guys are proposing just to save fat LTO. Fat LTO is always a bad idea > because it's slow and does lots of redundant work. If LTO is to become > a more wide spread mode it has to go simply because of the poor > performance. > As someone who encountered slim LTO on Unix 17 years ago (on MIPS) I can promise you that unless fat LTO is supported, there will never be a successful transition. The amount of work to deal with the make environment every time simply made it not worth it. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf.
Re: "ld -r" on mixed IR/non-IR objects (
On 12/08/2010 01:19 AM, Andi Kleen wrote: >> >> Quite possibly a better way to deal with that is to provide a mechanism >> for encapsulating arbitrary binary code objects inside the LTO IR. > > Then you would need to teach your assembler and everything > else that may generate ELF objects to generate this magic object. But why > not just ELF directly? that is what it is after all. > No. You just need to teach the linker to generate it when you're doing a ld -r on mixed objects. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf.
Re: "ld -r" on mixed IR/non-IR objects (
On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu wrote: > On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote: >>> On 12/07/2010 04:20 PM, Andi Kleen wrote: The only problem left is mixing of lto and non lto objects. this right now is not handled. IMHO still the best way to handle it is to use slim lto and then simply separate link the "left overs" after deleting the LTO objects. This can be actually done with objcopy (with some limitations), doesn't even need linker support. >>> >>> Quite possibly a better way to deal with that is to provide a mechanism >>> for encapsulating arbitrary binary code objects inside the LTO IR. >> >> Then you would need to teach your assembler and everything > > The magic section is generated by linker directly. No changes to > assembler is required. > >> else that may generate ELF objects to generate this magic object. But why >> not just ELF directly? that is what it is after all. > > My proposal isn't specific to ELF. > >> >> To be honest I don't really see the point of all this complexity you >> guys are proposing just to save fat LTO. Fat LTO is always a bad idea >> because it's slow and does lots of redundant work. If LTO is to become >> a more wide spread mode it has to go simply because of the poor >> performance. >> >> With slim LTO passthrough is very straight-forward: simple pass >> through every section that is not LTO and generate code for the LTO >> sections. No new magic sections needed at all. >> > > My proposal works on both fat and slim LTO objects. The idea is > you can use "ld -r" on any combination of inputs and its output > still works as before "ld -r". > Here is the revised proposal. -- H.J. Link with mixed IR/non-IR objects * 2 kinds of object files o non-IR object file has * non-IR sections o IR object file has * IR sections * non-IR sections * The output of "ld -r" with mixed IR/non-IR objects should work with: o Compilers/linkers with IR support. o Compilers/linkers without IR support. * Add the mixed object file which has o IR sections o non-IR sections: * Object codes from IR sections. * Object codes from non-IR object files. o Object-only section: * With section name ".gnu_object_only" and SHT_GNU_OBJECT_ONLY (0x6ff8) type on ELF. * Contain non-IR object file. * Input is discarded after link. * Linker action: o Classify each input object file: * If there is a ".gnu_object_only" section, it is a mixed object file. * If there is a IR section, it is an IR object file. * Otherwise, it is a non-IR object file. o Relocatable non-IR link: * Prepare for an object-only output. * Prepare for a regular output. * For each mixed object file: * Add IR and non-IR sections to the regular output. * For object-only section: * Extract object only file. * Add it to the object-only output. * Discard object-only section. * For each IR object file: * Add IR and non-IR sections to the regular output. * For each non-IR object file: * Add non-IR sections to the regular output. * Add non-IR sections to the object-only output. * Final output: * If there are IR objects, non-IR objects and the object-only output isn't empty: * Put the object-only output into the object-only section. * Add the object-only section to the regular output. * Remove the object-only output. o Normal link and relocatable IR link: * Prepare for output. * IR link: * For each mixed object file: * Compile and add IR sections to the output. * Discard non-IR sections. * Object-only section: * Extract object only file. * Add it to the output. * Discard object-only section. * For each IR object file: * Compile and add IR sections to the output. * Discard non-IR sections. * For each non-IR object file: * Add non-IR sections to the output. * Non-IR link: * For each mixed object file: * Add non-IR sections to the output. * Discard IR sections and object-only section. * For each IR object file: * Add non-IR sections to the output. * Discard IR sections. * For each non-IR object file: * Add non-IR sections to the output.
Re: software pipelining
Hi Roy, I guess SMS didn't pipeline your loop, and the "prologue" code mentioned in your email is an iteration peeled off from the loop. It has nothing to do with prologue code. I think there are two reasons that can explain why your code is not pipelined: 1. Alias information is not enough to disambiguate x and y. x and y are pointers from outside. Currently, at least in SMS phase, GCC does not know whether x aliases to y. This may prohibit GCC from pipelining your loop. As far as I'm aware, alias information from array data dependence stage is not propagated to SMS, at least I didn't find in the main trunk. See the last bullet in "In Progress" section in here: http://gcc.gnu.org/wiki/SwingModuloScheduling Andrey, correct me if I'm wrong. 2. GCC does not pipeline loops that contain "auto-inc/post-inc" operations. See line 1025 and 1039 in modulo-sched.c (gcc-4.5.1). Please try the codelet below. It works on after you comment out line 1025 in gcc-4.5.1 and rebuild your compiler. void foo(void) { int ii, jj, kk; int R0,R1,R2,R3; for (ii = 1; ii < 12; ii++) { for (jj = 0; jj < ii; jj++) { (*((int *) ((char *) R3 + 0))) = R0; R3 += 4; R0 = (*((int *) ((char *) R2 + 0))); R2 = R2+48; } } } I hope this can help you . Gan 2010/12/8 roy rosen : > I have tried to play a bit with SMS on ia64 and I can't understand > what it is doing. > It seems that instead of getting some of the first insns out of the > loop into the prologue it simply gets an entire iteration out of the > loop and the loop's content stays approximately the same. > > For example for > > void x(long long* y, long long* x) > { > int i; > for (i = 0; i < 100; i++) > { > *x = *y; > x+=20;y+=30; > } > } > > with ./cc1 ./a.c -O3 -fmodulo-sched. > Can someone show an example where it actually works as it should? > > Roy. > > 2010/11/10 Andrey Belevantsev : >> Hi, >> >> On 10.11.2010 12:32, roy rosen wrote: >>> >>> Hi, >>> >>> I was wondering if gcc has software pipelining. >>> I saw options -fsel-sched-pipelining -fselective-scheduling >>> -fselective-scheduling2 but I don't see any pipelining happening >>> (tried with ia64). >>> Is there a gcc VLIW port in which I can see it working? >> >> You need to try -fmodulo-sched. Selective scheduling works by default on >> ia64 with -O3, otherwise you need -fselective-scheduling2 >> -fsel-sched-pipelining. Note that selective scheduling disables autoinc >> generation for the pipelining to work, and modulo scheduling will likely >> refuse to pipeline a loop with autoincs. >> >> Modulo scheduling implementation in GCC may be improved, but that's a >> different topic. >> >> Andrey >> >>> >>> For an example function like >>> >>> int nor(char* __restrict__ c, char* __restrict__ d) >>> { >>> int i, sum = 0; >>> for (i = 0; i< 256; i++) >>> d[i] = c[i]<< 3; >>> return sum; >>> } >>> >>> with no pipelining a code like >>> >>> r1 = 0 >>> r2 = c >>> r3 = d >>> _startloop >>> if r1 == 256 jmp _end >>> r4 = [r2]+ >>> r4>>= r4 >>> [r3]+ = r4 >>> r1++ >>> jmp _startloop >>> _end >>> >>> here inside the loop there is a data dependency between all 3 insns >>> (only the r1++ is independent) which does not permit any parallelism >>> >>> with pipelining I expect a code like >>> >>> r1 = 2 >>> r2 = c >>> r3 = d >>> // peel first iteration >>> r4 = [r2]+ >>> r4>>= r4 >>> r5 = [r2]+ >>> _startloop >>> if r1 == 256 jmp _end >>> [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+ >>> r1++ >>> jmp _startloop >>> _end >>> >>> Now the data dependecy is broken and parlallism is possible. >>> As I said I could not see that happening. >>> Can someone please tell me on which port and with what options can I >>> get such a result? >>> >>> Thanks, Roy. >> >> > -- Best Regards Gan
Re: PowerPC optimization regression
David Edelsohn wrote on 2010/12/08 17:38:11: > > On Wed, Dec 8, 2010 at 4:37 AM, Joakim Tjernlund > wrote: > > > > I have noticed gcc 4.4.5 often produces less optimzed code > > than the old 3.4.6. Below is the latest example. I am > > starting to wonder if I need rebuild gcc 4.4.5 and/or > > add new options to gcc when I compile. Any insight? > > Jocke, > > As Ian mentioned, please open a performance regression bug report on > GCC Bugzilla and add me, Mike Meissner and Peter Bergner to the CC > list. > > This might be a lingering result of the GCC SSA transition and not a > PowerPC-specific regression, although the symptom has more impact on > PowerPC. I already sent in a bug with gccbug, hope it shows up How long do one have to wait until it is visible? Jocke
Re: combine two load insns
On 12/08/10 09:43, Paul Koning wrote: On Dec 8, 2010, at 11:37 AM, Jeff Law wrote: On 12/08/10 09:18, Frederic Riss wrote: OK, I see your point, but I tend to think the the odds of the register allocator being able to coalesce the additional DI->SI moves in the pre-IRA approach are by far higher that the odds of having merge candidates after register allocation. I agree, but note that failure to coalesce leads to code quality regression. Also note that handling of double-word values is, IMHO, the allocator's biggest problem area. This has been greatly helped by Bernd's recent work, but there's still significant amounts of work to do here. This probably has been discussed at length in the past, but as a relative newcomer I'll make this observation... I wonder how much is lost by GCC's insistence that multi-register values must be in adjacent registers. Obviously that's hard to change (the registers would have to be explicitly listed instead of implied by the first register number). And in some cases it is actually required. But in many cases, it's not (in some machines, never). And I would think that register allocation could benefit from not having such a restriction. The item in question here is just one example. Or get even smarter about splitting up multi-word values into word sized component values (yes, we're starting to get off-topic here). The subreg-lowering code does an OK job, but it has certain restrictions that prevent it from doing a good job. The fundamental problem with lower-subreg is that it's unable to perform lowering at anything other than function granularity. If we added the ability to copy-in/copy-out at regional boundaries we could lower within regions and drastically reduce the amount of double-word operations left in the IL. jeff
Re: combine two load insns
Paul Koning writes: > This probably has been discussed at length in the past, but as a > relative newcomer I'll make this observation... I wonder how much is > lost by GCC's insistence that multi-register values must be in > adjacent registers. Obviously that's hard to change (the registers > would have to be explicitly listed instead of implied by the first > register number). And in some cases it is actually required. But in > many cases, it's not (in some machines, never). And I would think > that register allocation could benefit from not having such a > restriction. The item in question here is just one example. You may want to look at the lower-subreg pass. Ian
Re: combine two load insns
On Dec 8, 2010, at 11:37 AM, Jeff Law wrote: > On 12/08/10 09:18, Frederic Riss wrote: >> >> OK, I see your point, but I tend to think the the odds of the register >> allocator being able to coalesce the additional DI->SI moves in the >> pre-IRA approach are by far higher that the odds of having merge >> candidates after register allocation. > I agree, but note that failure to coalesce leads to code quality regression. > > Also note that handling of double-word values is, IMHO, the allocator's > biggest problem area. This has been greatly helped by Bernd's recent work, > but there's still significant amounts of work to do here. This probably has been discussed at length in the past, but as a relative newcomer I'll make this observation... I wonder how much is lost by GCC's insistence that multi-register values must be in adjacent registers. Obviously that's hard to change (the registers would have to be explicitly listed instead of implied by the first register number). And in some cases it is actually required. But in many cases, it's not (in some machines, never). And I would think that register allocation could benefit from not having such a restriction. The item in question here is just one example. paul
Re: PowerPC optimization regression
On Wed, Dec 8, 2010 at 4:37 AM, Joakim Tjernlund wrote: > > I have noticed gcc 4.4.5 often produces less optimzed code > than the old 3.4.6. Below is the latest example. I am > starting to wonder if I need rebuild gcc 4.4.5 and/or > add new options to gcc when I compile. Any insight? Jocke, As Ian mentioned, please open a performance regression bug report on GCC Bugzilla and add me, Mike Meissner and Peter Bergner to the CC list. This might be a lingering result of the GCC SSA transition and not a PowerPC-specific regression, although the symptom has more impact on PowerPC. Thanks, David
Re: combine two load insns
On 12/08/10 09:18, Frederic Riss wrote: OK, I see your point, but I tend to think the the odds of the register allocator being able to coalesce the additional DI->SI moves in the pre-IRA approach are by far higher that the odds of having merge candidates after register allocation. I agree, but note that failure to coalesce leads to code quality regression. Also note that handling of double-word values is, IMHO, the allocator's biggest problem area. This has been greatly helped by Bernd's recent work, but there's still significant amounts of work to do here. I agree with your suggestion of being able to do that in the scheduler though, it might be a good fit, even if it's not a scheduling issue in the first place. It may not be a scheduling issue, but it's been known for 20 years that GCC's scheduler has the necessary bits to do these kinds of memory optimizations. We've just never taken the time to utilize the dependency information available in the scheduler in any way other than to reorder insns to improve pipeline behavior. One could even argue that the dependency info in the scheduler should be pushed out to other passes that could easily make use of such information. Jeff
Re: software pipelining
I have tried to play a bit with SMS on ia64 and I can't understand what it is doing. It seems that instead of getting some of the first insns out of the loop into the prologue it simply gets an entire iteration out of the loop and the loop's content stays approximately the same. For example for void x(long long* y, long long* x) { int i; for (i = 0; i < 100; i++) { *x = *y; x+=20;y+=30; } } with ./cc1 ./a.c -O3 -fmodulo-sched. Can someone show an example where it actually works as it should? Roy. 2010/11/10 Andrey Belevantsev : > Hi, > > On 10.11.2010 12:32, roy rosen wrote: >> >> Hi, >> >> I was wondering if gcc has software pipelining. >> I saw options -fsel-sched-pipelining -fselective-scheduling >> -fselective-scheduling2 but I don't see any pipelining happening >> (tried with ia64). >> Is there a gcc VLIW port in which I can see it working? > > You need to try -fmodulo-sched. Selective scheduling works by default on > ia64 with -O3, otherwise you need -fselective-scheduling2 > -fsel-sched-pipelining. Note that selective scheduling disables autoinc > generation for the pipelining to work, and modulo scheduling will likely > refuse to pipeline a loop with autoincs. > > Modulo scheduling implementation in GCC may be improved, but that's a > different topic. > > Andrey > >> >> For an example function like >> >> int nor(char* __restrict__ c, char* __restrict__ d) >> { >> int i, sum = 0; >> for (i = 0; i< 256; i++) >> d[i] = c[i]<< 3; >> return sum; >> } >> >> with no pipelining a code like >> >> r1 = 0 >> r2 = c >> r3 = d >> _startloop >> if r1 == 256 jmp _end >> r4 = [r2]+ >> r4>>= r4 >> [r3]+ = r4 >> r1++ >> jmp _startloop >> _end >> >> here inside the loop there is a data dependency between all 3 insns >> (only the r1++ is independent) which does not permit any parallelism >> >> with pipelining I expect a code like >> >> r1 = 2 >> r2 = c >> r3 = d >> // peel first iteration >> r4 = [r2]+ >> r4>>= r4 >> r5 = [r2]+ >> _startloop >> if r1 == 256 jmp _end >> [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+ >> r1++ >> jmp _startloop >> _end >> >> Now the data dependecy is broken and parlallism is possible. >> As I said I could not see that happening. >> Can someone please tell me on which port and with what options can I >> get such a result? >> >> Thanks, Roy. > >
Re: combine two load insns
On 8 December 2010 15:39, Jeff Law wrote: > On 12/08/10 01:40, Frederic Riss wrote: >> Sorry, I think I wasn't clear. I didn't mean constraints in term on >> RTL template constraints, but 'constraints' coming from the new DI >> destination of the load. More specifically: 2 SI loads can target >> totally independent registers whereas a standard DI load must target a >> contiguous SI register pair. If you don't do that before IRA, it will >> most likely be impossible to do cleanly, won't it? > > I tend to look at it the other way -- prior to allocation & reload you're > going to have two SImode pseudos and there's no way to guarantee they'll end > up in consecutive hard registers. You'd have to create a new DImode pseudo > as the destination of the memory load, then copy from the DImode pseudo into > the two SImode pseudos and rely on the register allocator to allocate the > DImode pseudo to the same hard registers as the two SImode pseudos. There's > no guarantee that'll happen (it often will, but in the cases where it > doesn't you end up with useless copies). > > With that in mind, I tend to see the right way to address this optimization > as an optimization which runs *after* register allocation and reloading > where we know the precise set of registers used and thus can determine if > two SImode loads target a pair of consecutive registers and thus are > potential candidates for merging the SImode loads into a DImode load. The > difficulty here is the data dependency analysis, thus my suggestion that the > scheduler's dependency analysis be used to drive this optimization. OK, I see your point, but I tend to think the the odds of the register allocator being able to coalesce the additional DI->SI moves in the pre-IRA approach are by far higher that the odds of having merge candidates after register allocation. I agree with your suggestion of being able to do that in the scheduler though, it might be a good fit, even if it's not a scheduling issue in the first place. Fred
Re: combine two load insns
On 12/08/10 01:40, Frederic Riss wrote: On 8 December 2010 00:12, Jeff Law wrote: On 12/07/10 12:29, Frédéric RISS wrote: Le mardi 07 décembre 2010 à 06:18 -0700, Jeff Law a écrit : On 12/06/10 15:07, Ian Lance Taylor wrote: Given the two loads don't have a def-use data dependency combine won't ever get the opportunity to do anything with them. In general there is no pass which combines insns without a true data dependency and targets which have such insns have had to handle those combinations in machine dependent reorg. In fact, it was the combination of independent insns which led to the introduction of the machine dependent reorg pass eons ago. The issue with this approach is that reorg runs very late. I suppose that if one wants to combine 2 SI loads into a DI load, it needs to be done before IRA to satisfy the generated register constraints. Constraints aren't checked until after register allocation is complete -- they're going to be of no help in performing this optimization. Right now the machine dependent reorg pass or a peephole are the only places this optimization can be performed.However, I believe it would be possible to make the scheduler perform this optimization with some work. Sorry, I think I wasn't clear. I didn't mean constraints in term on RTL template constraints, but 'constraints' coming from the new DI destination of the load. More specifically: 2 SI loads can target totally independent registers whereas a standard DI load must target a contiguous SI register pair. If you don't do that before IRA, it will most likely be impossible to do cleanly, won't it? I tend to look at it the other way -- prior to allocation & reload you're going to have two SImode pseudos and there's no way to guarantee they'll end up in consecutive hard registers. You'd have to create a new DImode pseudo as the destination of the memory load, then copy from the DImode pseudo into the two SImode pseudos and rely on the register allocator to allocate the DImode pseudo to the same hard registers as the two SImode pseudos. There's no guarantee that'll happen (it often will, but in the cases where it doesn't you end up with useless copies). With that in mind, I tend to see the right way to address this optimization as an optimization which runs *after* register allocation and reloading where we know the precise set of registers used and thus can determine if two SImode loads target a pair of consecutive registers and thus are potential candidates for merging the SImode loads into a DImode load. The difficulty here is the data dependency analysis, thus my suggestion that the scheduler's dependency analysis be used to drive this optimization. jeff Fred
Re: PowerPC optimization regression
Joakim Tjernlund writes: > I have noticed gcc 4.4.5 often produces less optimzed code > than the old 3.4.6. Below is the latest example. I am > starting to wonder if I need rebuild gcc 4.4.5 and/or > add new options to gcc when I compile. Any insight? This question as stated is not really appropriate for the mailing list gcc@gcc.gnu.org, which is for discussion about the development of gcc itself. It would be appropriate for the mailing list gcc-h...@gcc.gnu.org. Unfortunately I don't have any good news here. Building gcc differently won't help, and it looks like you are using appropriate options. I think this is an optimization regression. I encourage you to file a bug report about it. Ian
Re: "ld -r" on mixed IR/non-IR objects (
On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote: >> On 12/07/2010 04:20 PM, Andi Kleen wrote: >>> >>> The only problem left is mixing of lto and non lto objects. this right >>> now is not handled. IMHO still the best way to handle it is to use >>> slim lto and then simply separate link the "left overs" after deleting >>> the LTO objects. This can be actually done with objcopy (with some >>> limitations), doesn't even need linker support. >>> >> >> Quite possibly a better way to deal with that is to provide a mechanism >> for encapsulating arbitrary binary code objects inside the LTO IR. > > Then you would need to teach your assembler and everything The magic section is generated by linker directly. No changes to assembler is required. > else that may generate ELF objects to generate this magic object. But why > not just ELF directly? that is what it is after all. My proposal isn't specific to ELF. > > To be honest I don't really see the point of all this complexity you > guys are proposing just to save fat LTO. Fat LTO is always a bad idea > because it's slow and does lots of redundant work. If LTO is to become > a more wide spread mode it has to go simply because of the poor > performance. > > With slim LTO passthrough is very straight-forward: simple pass > through every section that is not LTO and generate code for the LTO > sections. No new magic sections needed at all. > My proposal works on both fat and slim LTO objects. The idea is you can use "ld -r" on any combination of inputs and its output still works as before "ld -r". -- H.J.
GCC 4.5.2 Release Candidate available from gcc.gnu.org
A release candidate for GCC 4.5.2 is available from ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208 and shortly its mirrors. It has been generated from SVN revision 167585. I have so far bootstrapped and tested the release candidate on x86_64-linux, bootstraps and tests on {i686,ia64,ppc,ppc64,s390,s390x}-linux are running. Please test it and report any issues to bugzilla. The branch remains frozen and all checkins until after the final release of GCC 4.5.2 require explicit RM approval. If all goes well, I'd like to release 4.5.2 early next week. Richard. -- Richard Guenther Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
GCC 4.5 branch frozen for release (candidate)
The GCC 4.5 branch is now frozen in preparation for a release candidate of GCC 4.5.2 and a release of GCC 4.5.2 about a week later. Please refrain from checking in any patches to the branch without an explicit approval from a release manager. Thanks, Richard.
Re: question about alias-analysis in gcc 4.5
On Tue, Dec 7, 2010 at 8:31 PM, Eugen Wagner wrote: > Hi, > Are any kinds of flow-dependent points-to analysis computed on gimple > in ssa form? > in which pass? In tree-ssa-structalias.c we compute points-to analysis. It is flow-sensitive only for pointers in SSA form. Richard. > > regards, > Eugen >
PowerPC optimization regression
I have noticed gcc 4.4.5 often produces less optimzed code than the old 3.4.6. Below is the latest example. I am starting to wonder if I need rebuild gcc 4.4.5 and/or add new options to gcc when I compile. Any insight? Jocke const char *test(int i) { const char *p = "abc\0def\0gef"; for(; i; --i) while(*++p); return p; } /* gcc 4.4.5 -O2 -S .section".text" .align 2 .globl test .type test, @function test: mr. 0,3 mtctr 0 beq- 0,.L10 lis 3,.lanch...@ha la 3,.lanch...@l(3) .L8: lbzu 0,1(3) cmpwi 7,0,0 bne+ 7,.L8 bdnz .L8 blr .L10: lis 3,.lanch...@ha la 3,.lanch...@l(3) blr .size test, .-test .section.rodata .align 2 .set.LANCHOR0,. + 0 .LC0: .string "abc" .string "def" .string "gef" .ident "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5" */ /* gcc 4.4.5 -Os -S .globl test .type test, @function test: mr 9,3 lis 3,.lanch...@ha la 3,.lanch...@l(3) b .L2 .L5: lbzu 0,1(3) cmpwi 7,0,0 bne+ 7,.L5 addi 9,9,-1 .L2: cmpwi 7,9,0 bne+ 7,.L5 blr .size test, .-test .section.rodata .set.LANCHOR0,. + 0 .LC0: .string "abc" .string "def" .string "gef" .ident "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5" */ /* gcc 3.4.6 -Os -S and gcc -O2 -S section.rodata .align 2 .LC0: .string "abc" .string "def" .string "gef" .section".text" .align 2 .globl test .type test, @function test: mr. 0,3 lis 9,@ha la 3,@l(9) mtctr 0 beqlr- 0 .L13: lbzu 0,1(3) cmpwi 7,0,0 bne- 7,.L13 bdnz .L13 blr .size test, .-test .section.note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)" */
Re: "ld -r" on mixed IR/non-IR objects (
> On 12/07/2010 04:20 PM, Andi Kleen wrote: >> >> The only problem left is mixing of lto and non lto objects. this right >> now is not handled. IMHO still the best way to handle it is to use >> slim lto and then simply separate link the "left overs" after deleting >> the LTO objects. This can be actually done with objcopy (with some >> limitations), doesn't even need linker support. >> > > Quite possibly a better way to deal with that is to provide a mechanism > for encapsulating arbitrary binary code objects inside the LTO IR. Then you would need to teach your assembler and everything else that may generate ELF objects to generate this magic object. But why not just ELF directly? that is what it is after all. To be honest I don't really see the point of all this complexity you guys are proposing just to save fat LTO. Fat LTO is always a bad idea because it's slow and does lots of redundant work. If LTO is to become a more wide spread mode it has to go simply because of the poor performance. With slim LTO passthrough is very straight-forward: simple pass through every section that is not LTO and generate code for the LTO sections. No new magic sections needed at all. -Andi
Re: combine two load insns
On 8 December 2010 00:12, Jeff Law wrote: > On 12/07/10 12:29, Frédéric RISS wrote: >> >> Le mardi 07 décembre 2010 à 06:18 -0700, Jeff Law a écrit : >>> >>> On 12/06/10 15:07, Ian Lance Taylor wrote: >>> Given the two loads don't have a def-use data dependency combine won't >>> ever get the opportunity to do anything with them. In general there is >>> no pass which combines insns without a true data dependency and targets >>> which have such insns have had to handle those combinations in machine >>> dependent reorg. In fact, it was the combination of independent insns >>> which led to the introduction of the machine dependent reorg pass eons >>> ago. >> >> The issue with this approach is that reorg runs very late. I suppose >> that if one wants to combine 2 SI loads into a DI load, it needs to be >> done before IRA to satisfy the generated register constraints. > > Constraints aren't checked until after register allocation is complete -- > they're going to be of no help in performing this optimization. Right now > the machine dependent reorg pass or a peephole are the only places this > optimization can be performed. However, I believe it would be possible to > make the scheduler perform this optimization with some work. Sorry, I think I wasn't clear. I didn't mean constraints in term on RTL template constraints, but 'constraints' coming from the new DI destination of the load. More specifically: 2 SI loads can target totally independent registers whereas a standard DI load must target a contiguous SI register pair. If you don't do that before IRA, it will most likely be impossible to do cleanly, won't it? Fred