Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread Dave Korn
On 08/12/2010 18:40, Andi Kleen wrote:

> Fat LTO is just too slow. I suspect with that kind of performance
> penalty most people simply would not use it at all.

  How slow is "too" slow?  How many people out of a hundred won't use it?  Got
numbers, or just a gut feeling?

cheers,
  DaveK


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread Joseph S. Myers
On Wed, 8 Dec 2010, Andrew Pinski wrote:

> On Wed, Dec 8, 2010 at 10:40 AM, Andi Kleen  wrote:
> > The gcc maintainers unfortunately didn't want to integrate the
> > wrapper scripts to make it easy, but they can be always downloaded
> > separately and I assume distributions will eventually ship
> > them anyways.
> 
> No we do just not as scripts.  We want real programs rather shell
> based scripts so it is more portable.

And programs that take proper account of the transformed name under which 
the compiler driver they are running will be installed, rather than 
hardcoding "gcc" (and bad assumptions that the presence of "xgcc" means a 
GCC build directory), to mention another piece of my feedback that was 
ignored in the resubmission.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread Andrew Pinski
On Wed, Dec 8, 2010 at 10:40 AM, Andi Kleen  wrote:
> The gcc maintainers unfortunately didn't want to integrate the
> wrapper scripts to make it easy, but they can be always downloaded
> separately and I assume distributions will eventually ship
> them anyways.

No we do just not as scripts.  We want real programs rather shell
based scripts so it is more portable.

-- Pinski


Re: wrong output of print_generic_decl() called from a plugin

2010-12-08 Thread Joachim Wieland
On Wed, Dec 8, 2010 at 1:52 PM, I wrote:
> This outputs "static void barfunc (int);" but the function is neither
> static nor does it expect only one int parameter...

here's another example where print_generic_decl() fails:

---
typedef void (*Handler)( int , void * );
Handler GetFunctionPointer();
---

This would output "extern void (*Handler) (int, void *)
GetFunctionPointer (void);"

Any other function I could use that is more reliable?


Thanks,
Joachim


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Jack Howarth
On Wed, Dec 08, 2010 at 09:16:23PM +0100, Richard Guenther wrote:
> On Wed, 8 Dec 2010, Jack Howarth wrote:
> 
> > On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote:
> > > 
> > > > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
> > > >> 
> > > > This was built against ppl 0.10.2 and cloog 0.15.10.
> > > 
> > > Have you tried a bootstrap with neither ppl nor cloog ?  I have yet to see
> > > their value and I generally exclude them. This results ( thus far ) in
> > > nice clean bootstrap builds.
> > > 
> > 
> > Dennis,
> >Considering that distros like Fedora ship their gcc's with graphite
> > support built-in, allowing graphite to regress like this between gcc
> > maintenance releases doesn't seem like a very good idea.
> 
> The SUSE builds look fine.  You have to investigate why it doesn't
> work for you, but it won't hold the 4.5.2 release.  Are your
> ppl and cloog testsuite runs clean?  Did you by chance build them
> with a different GCC release (and thus libstdc++)?

Richard,
   I see the problem now and it confirms my fears about the loose version
control on gcc vs ppl vs cloog. I had built a cloog deb package against
a ppl2 0.11 package but forgot that and reinstalled the ppl 0.10.2 package.
This resulted in a build of gcc with...

[MacPro:gcc/x86_64-apple-darwin10.5.0/4.5.2] howarth% otool -L cc1
cc1:
/sw/lib/libintl.8.dylib (compatibility version 9.0.0, current version 
9.2.0)
/sw/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 
7.0.0)
/sw/lib/libcloog.0.dylib (compatibility version 1.0.0, current version 
1.0.0)
/sw/lib/libppl_c.2.dylib (compatibility version 4.0.0, current version 
4.0.0)
/sw/lib/libppl.7.dylib (compatibility version 9.0.0, current version 
9.0.0)
/sw/lib/libgmpxx.4.dylib (compatibility version 6.0.0, current version 
6.2.0)
/sw/lib/libmpc.2.dylib (compatibility version 3.0.0, current version 
3.0.0)
/sw/lib/libmpfr.1.dylib (compatibility version 4.0.0, current version 
4.2.0)
/sw/lib/libgmp.3.dylib (compatibility version 9.0.0, current version 
9.2.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 
1.2.3)
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 
625.0.0)
/sw/lib/gcc4.5/lib/libgcc_s.1.dylib (compatibility version 1.0.0, 
current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 125.2.1)
[MacPro:gcc/x86_64-apple-darwin10.5.0/4.5.2] howarth% otool -L 
/sw/lib/libcloog.0.dylib
/sw/lib/libcloog.0.dylib:
/sw/lib/libcloog.0.dylib (compatibility version 1.0.0, current version 
1.0.0)
/sw/lib/libgmp.3.dylib (compatibility version 9.0.0, current version 
9.2.0)
/sw/lib/libppl_c.4.dylib (compatibility version 5.0.0, current version 
5.0.0)
/sw/lib/libppl.9.dylib (compatibility version 10.0.0, current version 
10.0.0)
/sw/lib/libgmpxx.4.dylib (compatibility version 6.0.0, current version 
6.2.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 125.2.1)

I believe in the past I may have tested FSF gcc built against ppl 0.11 vs a 
cloog built
against ppl 0.10.2 and that worked. Apparently it is the inverse that breaks 
graphite
(ie FSF built against ppl 0.10.2 vs a cloog built against ppl 0.11).
  Jack
> 
> Thanks,
> Richard.
> 
> >   Jack
> > 
> > > 
> > > -- 
> > > Dennis Clarke
> > > dcla...@opensolaris.ca  <- Email related to the open source Solaris
> > > dcla...@blastwave.org   <- Email related to open source for Solaris
> > > 
> > 
> > 
> 
> -- 
> Richard Guenther 
> Novell / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus 
> Rex


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Dennis Clarke
> On Wed, 8 Dec 2010, Jack Howarth wrote:
>> On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote:
>> > > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
>> > >> 
>> > > This was built against ppl 0.10.2 and cloog 0.15.10.
>> >
>> > Have you tried a bootstrap with neither ppl nor cloog ?  I have yet
to
>> see
>> > their value and I generally exclude them. This results ( thus far )
in
>> > nice clean bootstrap builds.
>> Dennis,
>>Considering that distros like Fedora ship their gcc's with graphite
>> support built-in, allowing graphite to regress like this between gcc
maintenance releases doesn't seem like a very good idea.
>
> The SUSE builds look fine.  You have to investigate why it doesn't work
for you, but it won't hold the 4.5.2 release.  Are your
> ppl and cloog testsuite runs clean?  Did you by chance build them with a
different GCC release (and thus libstdc++)?
>
> Thanks,
> Richard.

Good question !

I generally do a double bootstrap in which my first build is done with a
previous version of GCC. Once I see reasonable testsuite results I then
use the resultant compiler from the first bootstrap to build the "release"
version. This then explains why the compiler that build GCC 4.5.1 on
Solaris 8 is in fact, GCC 4.5.1 :

http://gcc.gnu.org/ml/gcc-testresults/2010-09/msg02183.html

However, having said all this I have yet to see either the ppl or cloog
software components build once on the legacy Solaris platform I must
support baseline legacy Solaris 8 which in turn assures functionality
upwards to Solaris 10 and possibly 11.

http://www.blastwave.org/jir/pkgcontents.ftd?software=gcc4&style=brief&state=5&arch=sparc

-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris






Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Richard Guenther
On Wed, 8 Dec 2010, Jack Howarth wrote:

> On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote:
> > 
> > > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
> > >> 
> > > This was built against ppl 0.10.2 and cloog 0.15.10.
> > 
> > Have you tried a bootstrap with neither ppl nor cloog ?  I have yet to see
> > their value and I generally exclude them. This results ( thus far ) in
> > nice clean bootstrap builds.
> > 
> 
> Dennis,
>Considering that distros like Fedora ship their gcc's with graphite
> support built-in, allowing graphite to regress like this between gcc
> maintenance releases doesn't seem like a very good idea.

The SUSE builds look fine.  You have to investigate why it doesn't
work for you, but it won't hold the 4.5.2 release.  Are your
ppl and cloog testsuite runs clean?  Did you by chance build them
with a different GCC release (and thus libstdc++)?

Thanks,
Richard.

>   Jack
> 
> > 
> > -- 
> > Dennis Clarke
> > dcla...@opensolaris.ca  <- Email related to the open source Solaris
> > dcla...@blastwave.org   <- Email related to open source for Solaris
> > 
> 
> 

-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Richard Guenther
On Wed, 8 Dec 2010, David Fang wrote:

> Hi,
>   Is there time to include the 4.5 backport patch for:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46170
> 
> (is fixed on trunk, a 4.5.0 regression, 4.4.3 branch regression)
> The comments indicate that the patch is good to go for 4.5, but I didn't see
> an entry log that it was actually committed.

We'll fix it for 4.5.3, the patch seems pretty big so is not
appropriate at this stage.

Richard.

> Fang
> 
> > A release candidate for GCC 4.5.2 is available from
> > 
> > ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208
> > 
> > and shortly its mirrors.  It has been generated from SVN revision 167585.
> > 
> > I have so far bootstrapped and tested the release candidate on
> > x86_64-linux, bootstraps and tests on
> > {i686,ia64,ppc,ppc64,s390,s390x}-linux are running.
> > 
> > Please test it and report any issues to bugzilla.
> > 
> > The branch remains frozen and all checkins until after the final release
> > of GCC 4.5.2 require explicit RM approval.
> > 
> > If all goes well, I'd like to release 4.5.2 early next week.
> > 
> > 
> > Richard.
> > 
> > 
> 
> David Fang
> http://www.csl.cornell.edu/~fang/
> http://www.achronix.com/
> 
> 

-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex


Re: PowerPC optimization regression

2010-12-08 Thread Ian Lance Taylor
Joakim Tjernlund  writes:

> I already sent in a bug with gccbug, hope it shows up
> How long do one have to wait until it is visible?

The gccbug script no longer works and has been removed from current
versions of gcc.  You should get a bounce message.  Please use
http://gcc.gnu.org/bugzilla/ instead, as described at
http://gcc.gnu.org/bugs/ .  Sorry for the confusion.

Ian


rsync'd repo size

2010-12-08 Thread DJ Delorie

http://gcc.gnu.org/rsync.html says 17 Gb.

I just did it, and it's up to 22 Gb.


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Dennis Clarke

> On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote:
>>
>> > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
>> >> 
>> > This was built against ppl 0.10.2 and cloog 0.15.10.
>>
>> Have you tried a bootstrap with neither ppl nor cloog ?  I have yet to
>> see
>> their value and I generally exclude them. This results ( thus far ) in
>> nice clean bootstrap builds.
>>
>
> Dennis,
>Considering that distros like Fedora ship their gcc's with graphite
> support built-in, allowing graphite to regress like this between gcc
> maintenance releases doesn't seem like a very good idea.
>   Jack

Of course I agree completely.

-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris




Re: Making a new port

2010-12-08 Thread Ian Lance Taylor
"viv0411.par...@gmail.com"  writes:

> Sir  i plan to make gcc port for android. I only know c++. Please tell me how 
> should i make.

There already is a gcc port for Android.

If you mean that you want to build gcc for the Android target, see
http://gcc.gnu.org/install/ .  Please take any questions to the mailing
list gcc-h...@gcc.gnu.org, rather than g...@gcc.gnu.org.  Thanks.

Ian


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread H.J. Lu
On Wed, Dec 8, 2010 at 11:05 AM, Jack Howarth  wrote:
> On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote:
>>
>> > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
>> >> 
>> > This was built against ppl 0.10.2 and cloog 0.15.10.
>>
>> Have you tried a bootstrap with neither ppl nor cloog ?  I have yet to see
>> their value and I generally exclude them. This results ( thus far ) in
>> nice clean bootstrap builds.
>>
>
> Dennis,
>   Considering that distros like Fedora ship their gcc's with graphite
> support built-in, allowing graphite to regress like this between gcc
> maintenance releases doesn't seem like a very good idea.
>                      Jack
>

graphite tests on 4.5 branch seem OK for Fedora 14/x86-64:

http://gcc.gnu.org/ml/gcc-testresults/2010-12/msg00662.html

gcc 4.5 configure reports:

checking for the correct version of gmp.h... yes
checking for the correct version of mpfr.h... yes
checking for the correct version of mpc.h... yes
checking for the correct version of the gmp/mpfr/mpc libraries... yes
checking for version 0.10 of PPL... yes
checking for version 0.15.5 (or later revision) of CLooG... buggy but acceptable

-- 
H.J.


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Jack Howarth
On Wed, Dec 08, 2010 at 01:44:38PM -0500, Dennis Clarke wrote:
> 
> > On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
> >> 
> > This was built against ppl 0.10.2 and cloog 0.15.10.
> 
> Have you tried a bootstrap with neither ppl nor cloog ?  I have yet to see
> their value and I generally exclude them. This results ( thus far ) in
> nice clean bootstrap builds.
> 

Dennis,
   Considering that distros like Fedora ship their gcc's with graphite
support built-in, allowing graphite to regress like this between gcc
maintenance releases doesn't seem like a very good idea.
  Jack

> 
> -- 
> Dennis Clarke
> dcla...@opensolaris.ca  <- Email related to the open source Solaris
> dcla...@blastwave.org   <- Email related to open source for Solaris
> 


wrong output of print_generic_decl() called from a plugin

2010-12-08 Thread Joachim Wieland
While testing how to parse C and C++ code for function prototypes from a plugin

(see http://gcc.gnu.org/ml/gcc/2010-12/msg00179.html)

I noticed that print_generic_decl() seems to output wrong data.

Consider the following function definition:

--
void barfunc (int foo, int abc, ... ) {

}
--

This outputs "static void barfunc (int);" but the function is neither
static nor does it expect only one int parameter...

Am I doing something wrong? I am calling "print_generic_decl(file,
decl, 0);" from the PLUGIN_PRE_GENERICIZE hook and this is gcc version
4.5.1 (GCC) on Solaris.


Thanks,
Joachim


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Dennis Clarke

> On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
>> 
> This was built against ppl 0.10.2 and cloog 0.15.10.

Have you tried a bootstrap with neither ppl nor cloog ?  I have yet to see
their value and I generally exclude them. This results ( thus far ) in
nice clean bootstrap builds.


-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris




Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread Andi Kleen
> As someone who encountered slim LTO on Unix 17 years ago (on MIPS) I can
> promise you that unless fat LTO is supported, there will never be a

Fat LTO is just too slow. I suspect with that kind of performance
penalty most people simply would not use it at all.

> successful transition.  The amount of work to deal with the make
> environment every time simply made it not worth it.

It's not too hard in my experience. I did it in a few cases for gcc. 

The gcc maintainers unfortunately didn't want to integrate the
wrapper scripts to make it easy, but they can be always downloaded
separately and I assume distributions will eventually ship
them anyways.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Jack Howarth
On Wed, Dec 08, 2010 at 02:42:56PM +0100, Richard Guenther wrote:
> 
> A release candidate for GCC 4.5.2 is available from
> 
>  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208
> 
> and shortly its mirrors.  It has been generated from SVN revision 167585.
> 
> I have so far bootstrapped and tested the release candidate on
> x86_64-linux, bootstraps and tests on
> {i686,ia64,ppc,ppc64,s390,s390x}-linux are running.
> 
> Please test it and report any issues to bugzilla.
> 
> The branch remains frozen and all checkins until after the final release
> of GCC 4.5.2 require explicit RM approval.
> 
> If all goes well, I'd like to release 4.5.2 early next week.

Richard,
   I am seeing a large number of regressions in gcc-4.5.2-RC-20101208
on x86_64-apple-darwin10 in the graphite tests. So far I have...

=== gcc Summary for unix/-m32 ===

# of expected passes70022
# of unexpected failures227
# of expected failures  175
# of unresolved testcases   36
# of unsupported tests  1281

The failures all seem to be of the form...

Executing on host: /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/xgcc 
-B/sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/ 
/sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c
   -O2 -fgraphite -fdump-tree-graphite-all -S  -m32 -o scop-0.s(timeout = 
300)
/sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c:
 In function 'toto':^M
/sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c:4:5:
 internal compiler error: Segmentation fault^M

which backtraces as...

gdb /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/cc1 
GNU gdb 6.3.50-20050815 (Apple version gdb-1472) (Wed Jul 21 10:53:12 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared 
libraries ...
warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/block.o" - no debug 
information available for "source/block.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/loop.o" - no debug 
information available for "source/loop.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/names.o" - no debug 
information available for "source/names.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/options.o" - no 
debug information available for "source/options.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/clast.o" - no debug 
information available for "source/ppl/clast.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/domain.o" - no 
debug information available for "source/ppl/domain.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/matrix.o" - no 
debug information available for "source/ppl/matrix.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/pprint.o" - no 
debug information available for "source/pprint.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/program.o" - no 
debug information available for "source/program.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/statement.o" - no 
debug information available for "source/statement.c".


warning: Could not find object file 
"/sw/src/fink.build/cloog-0.15.10-0/cloog-ppl-0.15.10/.libs/version.o" - no 
debug information available for "source/version.c".

..b done
^R

(gdb) break fancy_abort
Breakpoint 1 at 0x10034a400: file ../../gcc-4.5.2-RC-20101208/gcc/diagnostic.c, 
line 762.
(gdb) r -quiet -v -imultilib i386 -iprefix 
/sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/../lib/gcc/x86_64-apple-darwin10.5.0/4.5.2/
 -isystem /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/include 
-isystem /sw/src/fink.build/gcc45-4.5.2-1000/darwin_objdir/gcc/include-fixed 
-D__DYNAMIC__ 
/sw/src/fink.build/gcc45-4.5.2-1000/gcc-4.5.2-RC-20101208/gcc/testsuite/gcc.dg/graphite/scop-0.c
 -fPIC -quiet -dumpbase scop-0.c -mm

Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread David Fang

Hi,
Is there time to include the 4.5 backport patch for:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46170

(is fixed on trunk, a 4.5.0 regression, 4.4.3 branch regression)
The comments indicate that the patch is good to go for 4.5, but I didn't 
see an entry log that it was actually committed.


Fang


A release candidate for GCC 4.5.2 is available from

ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208

and shortly its mirrors.  It has been generated from SVN revision 167585.

I have so far bootstrapped and tested the release candidate on
x86_64-linux, bootstraps and tests on
{i686,ia64,ppc,ppc64,s390,s390x}-linux are running.

Please test it and report any issues to bugzilla.

The branch remains frozen and all checkins until after the final release
of GCC 4.5.2 require explicit RM approval.

If all goes well, I'd like to release 4.5.2 early next week.


Richard.




David Fang
http://www.csl.cornell.edu/~fang/
http://www.achronix.com/



Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread Xinliang David Li
On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen  wrote:
>> On 12/07/2010 04:20 PM, Andi Kleen wrote:
>>>
>>> The only problem left is mixing of lto and non lto objects. this right
>>> now is not handled. IMHO still the best way to handle it is to use
>>> slim lto and then simply separate link the "left overs" after deleting
>>> the LTO objects. This can be actually done with objcopy (with some
>>> limitations), doesn't even need linker support.


I agree that FAT lto objects are not necessary to make the everything
work and the integration of LTO with  existing  build environment
'transparent' --- there is compiler out in the world that does just
that -- produces IL only objects (wrapped in elf format); works with
archives with mixed objects; works with ld -r with mixed objects;
builds unix kernel successfully with LTO  ...


David



>>>
>>
>> Quite possibly a better way to deal with that is to provide a mechanism
>> for encapsulating arbitrary binary code objects inside the LTO IR.
>
> Then you would need to teach your assembler and everything
> else that may generate ELF objects to generate this magic object. But why
> not just ELF directly? that is what it is after all.
>
> To be honest I don't really see the point of all this complexity you
> guys are proposing just to save fat LTO. Fat LTO is always a bad idea
> because it's slow and  does lots of redundant work. If LTO is to become
> a more wide spread mode it has to go simply because of the poor
> performance.
>
> With slim LTO passthrough is  very straight-forward: simple pass
> through every section that is not LTO and generate code for the LTO
> sections. No new magic sections needed at all.
>
> -Andi
>
>


Making a new port

2010-12-08 Thread viv0411.par...@gmail.com
Sir  i plan to make gcc port for android. I only know c++. Please tell me how 
should i make.


Re: combine two load insns

2010-12-08 Thread Frederic Riss
On 8 December 2010 17:37, Jeff Law  wrote:
> On 12/08/10 09:18, Frederic Riss wrote:
>>
>> OK, I see your point, but I tend to think the the odds of the register
>> allocator being able to coalesce the additional DI->SI moves in the
>> pre-IRA approach are by far higher that the odds of having merge
>> candidates after register allocation.
>
> I agree, but note that failure to coalesce leads to code quality regression.

Well, it really depends on the architecture. Moving between SImode
registers is usually nearly free, whereas accessing the memory is so
much more costly... If your architecture has a DI sized datapath to
memory, you actually divide the memory bandwidth requirement by 2 when
you pack SI loads together. This seems like a net win to me even if
you add 1 or 2 moves to the equation.

Fred


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread H. Peter Anvin
On 12/08/2010 01:19 AM, Andi Kleen wrote:
> 
> To be honest I don't really see the point of all this complexity you
> guys are proposing just to save fat LTO. Fat LTO is always a bad idea
> because it's slow and  does lots of redundant work. If LTO is to become
> a more wide spread mode it has to go simply because of the poor
> performance.
> 

As someone who encountered slim LTO on Unix 17 years ago (on MIPS) I can
promise you that unless fat LTO is supported, there will never be a
successful transition.  The amount of work to deal with the make
environment every time simply made it not worth it.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread H. Peter Anvin
On 12/08/2010 01:19 AM, Andi Kleen wrote:
>>
>> Quite possibly a better way to deal with that is to provide a mechanism
>> for encapsulating arbitrary binary code objects inside the LTO IR.
> 
> Then you would need to teach your assembler and everything
> else that may generate ELF objects to generate this magic object. But why
> not just ELF directly? that is what it is after all.
> 

No.  You just need to teach the linker to generate it when you're doing
a ld -r on mixed objects.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread H.J. Lu
On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu  wrote:
> On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen  wrote:
>>> On 12/07/2010 04:20 PM, Andi Kleen wrote:

 The only problem left is mixing of lto and non lto objects. this right
 now is not handled. IMHO still the best way to handle it is to use
 slim lto and then simply separate link the "left overs" after deleting
 the LTO objects. This can be actually done with objcopy (with some
 limitations), doesn't even need linker support.

>>>
>>> Quite possibly a better way to deal with that is to provide a mechanism
>>> for encapsulating arbitrary binary code objects inside the LTO IR.
>>
>> Then you would need to teach your assembler and everything
>
> The magic section is generated by linker directly. No changes to
> assembler is required.
>
>> else that may generate ELF objects to generate this magic object. But why
>> not just ELF directly? that is what it is after all.
>
> My proposal isn't specific to ELF.
>
>>
>> To be honest I don't really see the point of all this complexity you
>> guys are proposing just to save fat LTO. Fat LTO is always a bad idea
>> because it's slow and  does lots of redundant work. If LTO is to become
>> a more wide spread mode it has to go simply because of the poor
>> performance.
>>
>> With slim LTO passthrough is  very straight-forward: simple pass
>> through every section that is not LTO and generate code for the LTO
>> sections. No new magic sections needed at all.
>>
>
> My proposal works on both fat and slim LTO objects.  The idea is
> you can use "ld -r" on any combination of inputs and its output
> still works as before "ld -r".
>

Here is the revised proposal.


-- 
H.J.
Link with mixed IR/non-IR objects

* 2 kinds of object files
  o non-IR object file has
* non-IR sections
  o IR object file has
* IR sections
* non-IR sections
* The output of "ld -r" with mixed IR/non-IR objects should work with:
o Compilers/linkers with IR support.
o Compilers/linkers without IR support.
* Add the mixed object file which has
  o IR sections
  o non-IR sections:
* Object codes from IR sections.
* Object codes from non-IR object files.
  o Object-only section:
* With section name ".gnu_object_only" and SHT_GNU_OBJECT_ONLY
(0x6ff8) type on ELF. 
* Contain non-IR object file.
* Input is discarded after link.
* Linker action:
  o Classify each input object file:
* If there is a ".gnu_object_only" section, it is a mixed object file.
* If there is a IR section, it is an IR object file.
* Otherwise, it is a non-IR object file.
  o Relocatable non-IR link:
* Prepare for an object-only output.
* Prepare for a regular output.
* For each mixed object file:
  * Add IR and non-IR sections to the regular output.
  * For object-only section:
  * Extract object only file.
  * Add it to the object-only output.
  * Discard object-only section.
* For each IR object file:
  * Add IR and non-IR sections to the regular output.
* For each non-IR object file:
  * Add non-IR sections to the regular output.
  * Add non-IR sections to the object-only output.
* Final output:
  * If there are IR objects, non-IR objects and the object-only
  output isn't empty:
* Put the object-only output into the object-only section.
* Add the object-only section to the regular output.
* Remove the object-only output.
  o Normal link and relocatable IR link:
* Prepare for output.
* IR link:
  * For each mixed object file:
* Compile and add IR sections to the output.
* Discard non-IR sections.
* Object-only section:
  * Extract object only file.
  * Add it to the output.
  * Discard object-only section.
  * For each IR object file:
* Compile and add IR sections to the output.
* Discard non-IR sections.
  * For each non-IR object file:
* Add non-IR sections to the output.
* Non-IR link:
  * For each mixed object file:
* Add non-IR sections to the output.
* Discard IR sections and object-only section.
  * For each IR object file:
* Add non-IR sections to the output.
* Discard IR sections.
  * For each non-IR object file:
* Add non-IR sections to the output.


Re: software pipelining

2010-12-08 Thread Gan
Hi Roy,

I guess SMS didn't pipeline your loop, and the
"prologue" code mentioned in your email is
an iteration peeled off from the loop. It has
nothing to do with prologue code.

I think there are two reasons that can explain why
your code is not pipelined:

1. Alias information is not enough to disambiguate
x and y. x and y are pointers from outside. Currently,
at least in SMS phase, GCC does not know whether
x aliases to y. This may prohibit GCC from pipelining
your loop. As far as I'm aware, alias information from
array data dependence stage is not propagated to SMS,
at least I didn't find in the main trunk. See the last bullet
in "In Progress" section in here:
http://gcc.gnu.org/wiki/SwingModuloScheduling
Andrey, correct me if I'm wrong.

2. GCC does not pipeline loops that contain "auto-inc/post-inc"
operations. See line 1025 and 1039 in modulo-sched.c (gcc-4.5.1).

Please try the codelet below. It works on after you comment out
line 1025 in gcc-4.5.1 and rebuild your compiler.

void foo(void)
{

  int ii, jj, kk;

  int R0,R1,R2,R3;

  for (ii = 1; ii < 12; ii++)
  {
for (jj = 0; jj < ii; jj++)
 {
  (*((int *) ((char *) R3 + 0))) = R0;
  R3 += 4;
   R0 = (*((int *) ((char *) R2 + 0)));
   R2 = R2+48;
 }
  }

}


I hope this can help you .

Gan


2010/12/8 roy rosen :
> I have tried to play a bit with SMS on ia64 and I can't understand
> what it is doing.
> It seems that instead of getting some of the first insns out of the
> loop into the prologue it simply gets an entire iteration out of the
> loop and the loop's content stays approximately the same.
>
> For example for
>
> void x(long long*  y, long long* x)
> {
>    int i;
>    for (i = 0; i < 100; i++)
>    {
>        *x = *y;
>        x+=20;y+=30;
>    }
> }
>
> with ./cc1 ./a.c -O3 -fmodulo-sched.
> Can someone show an example where it actually works as it should?
>
> Roy.
>
> 2010/11/10 Andrey Belevantsev :
>> Hi,
>>
>> On 10.11.2010 12:32, roy rosen wrote:
>>>
>>> Hi,
>>>
>>> I was wondering if gcc has software pipelining.
>>> I saw options -fsel-sched-pipelining -fselective-scheduling
>>> -fselective-scheduling2 but I don't see any pipelining happening
>>> (tried with ia64).
>>> Is there a gcc VLIW port in which I can see it working?
>>
>> You need to try -fmodulo-sched.  Selective scheduling works by default on
>> ia64 with -O3, otherwise you need -fselective-scheduling2
>> -fsel-sched-pipelining.  Note that selective scheduling disables autoinc
>> generation for the pipelining to work, and modulo scheduling will likely
>> refuse to pipeline a loop with autoincs.
>>
>> Modulo scheduling implementation in GCC may be improved, but that's a
>> different topic.
>>
>> Andrey
>>
>>>
>>> For an example function like
>>>
>>> int nor(char* __restrict__ c, char* __restrict__ d)
>>> {
>>>     int i, sum = 0;
>>>     for (i = 0; i<  256; i++)
>>>         d[i] = c[i]<<  3;
>>>     return sum;
>>> }
>>>
>>> with no pipelining a code like
>>>
>>> r1 = 0
>>> r2 = c
>>> r3 = d
>>> _startloop
>>> if r1 == 256 jmp _end
>>> r4 = [r2]+
>>> r4>>= r4
>>> [r3]+ = r4
>>> r1++
>>> jmp _startloop
>>> _end
>>>
>>> here inside the loop there is a data dependency between all 3 insns
>>> (only the r1++ is independent) which does not permit any parallelism
>>>
>>> with pipelining I expect a code like
>>>
>>> r1 = 2
>>> r2 = c
>>> r3 = d
>>> // peel first iteration
>>> r4 = [r2]+
>>> r4>>= r4
>>> r5 = [r2]+
>>> _startloop
>>> if r1 == 256 jmp _end
>>> [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+
>>> r1++
>>> jmp _startloop
>>> _end
>>>
>>> Now the data dependecy is broken and parlallism is possible.
>>> As I said I could not see that happening.
>>> Can someone please tell me on which port and with what options can I
>>> get such a result?
>>>
>>> Thanks, Roy.
>>
>>
>



-- 
Best Regards

Gan


Re: PowerPC optimization regression

2010-12-08 Thread Joakim Tjernlund
David Edelsohn  wrote on 2010/12/08 17:38:11:
>
> On Wed, Dec 8, 2010 at 4:37 AM, Joakim Tjernlund
>  wrote:
> >
> > I have noticed gcc 4.4.5 often produces less optimzed code
> > than the old 3.4.6. Below is the latest example. I am
> > starting to wonder if I need rebuild gcc 4.4.5 and/or
> > add new options to gcc when I compile. Any insight?
>
> Jocke,
>
> As Ian mentioned, please open a performance regression bug report on
> GCC Bugzilla and add me, Mike Meissner and Peter Bergner to the CC
> list.
>
> This might be a lingering result of the GCC SSA transition and not a
> PowerPC-specific regression, although the symptom has more impact on
> PowerPC.

I already sent in a bug with gccbug, hope it shows up
How long do one have to wait until it is visible?

 Jocke



Re: combine two load insns

2010-12-08 Thread Jeff Law

On 12/08/10 09:43, Paul Koning wrote:

On Dec 8, 2010, at 11:37 AM, Jeff Law wrote:


On 12/08/10 09:18, Frederic Riss wrote:

OK, I see your point, but I tend to think the the odds of the register
allocator being able to coalesce the additional DI->SI moves in the
pre-IRA approach are by far higher that the odds of having merge
candidates after register allocation.

I agree, but note that failure to coalesce leads to code quality regression.

Also note that handling of double-word values is, IMHO, the allocator's biggest 
problem area.  This has been greatly helped by Bernd's recent work, but there's 
still significant amounts of work to do here.

This probably has been discussed at length in the past, but as a relative 
newcomer I'll make this observation...  I wonder how much is lost by GCC's 
insistence that multi-register values must be in adjacent registers.  Obviously 
that's hard to change (the registers would have to be explicitly listed instead 
of implied by the first register number).  And in some cases it is actually 
required.  But in many cases, it's not (in some machines, never).  And I would 
think that register allocation could benefit from not having such a 
restriction.  The item in question here is just one example.
Or get even smarter about splitting up multi-word values into word sized 
component values (yes, we're starting to get off-topic here).  The 
subreg-lowering code does an OK job, but it has certain restrictions 
that prevent it from doing a good job.   The fundamental problem with 
lower-subreg is that it's unable to perform lowering at anything other 
than function granularity.  If we added the ability to copy-in/copy-out 
at regional boundaries we could lower within regions and drastically 
reduce the amount of double-word operations left in the IL.


jeff




Re: combine two load insns

2010-12-08 Thread Ian Lance Taylor
Paul Koning  writes:

> This probably has been discussed at length in the past, but as a
> relative newcomer I'll make this observation...  I wonder how much is
> lost by GCC's insistence that multi-register values must be in
> adjacent registers.  Obviously that's hard to change (the registers
> would have to be explicitly listed instead of implied by the first
> register number).  And in some cases it is actually required.  But in
> many cases, it's not (in some machines, never).  And I would think
> that register allocation could benefit from not having such a
> restriction.  The item in question here is just one example.

You may want to look at the lower-subreg pass.

Ian


Re: combine two load insns

2010-12-08 Thread Paul Koning

On Dec 8, 2010, at 11:37 AM, Jeff Law wrote:

> On 12/08/10 09:18, Frederic Riss wrote:
>> 
>> OK, I see your point, but I tend to think the the odds of the register
>> allocator being able to coalesce the additional DI->SI moves in the
>> pre-IRA approach are by far higher that the odds of having merge
>> candidates after register allocation.
> I agree, but note that failure to coalesce leads to code quality regression.
> 
> Also note that handling of double-word values is, IMHO, the allocator's 
> biggest problem area.  This has been greatly helped by Bernd's recent work, 
> but there's still significant amounts of work to do here.

This probably has been discussed at length in the past, but as a relative 
newcomer I'll make this observation...  I wonder how much is lost by GCC's 
insistence that multi-register values must be in adjacent registers.  Obviously 
that's hard to change (the registers would have to be explicitly listed instead 
of implied by the first register number).  And in some cases it is actually 
required.  But in many cases, it's not (in some machines, never).  And I would 
think that register allocation could benefit from not having such a 
restriction.  The item in question here is just one example.

paul



Re: PowerPC optimization regression

2010-12-08 Thread David Edelsohn
On Wed, Dec 8, 2010 at 4:37 AM, Joakim Tjernlund
 wrote:
>
> I have noticed gcc 4.4.5 often produces less optimzed code
> than the old 3.4.6. Below is the latest example. I am
> starting to wonder if I need rebuild gcc 4.4.5 and/or
> add new options to gcc when I compile. Any insight?

Jocke,

As Ian mentioned, please open a performance regression bug report on
GCC Bugzilla and add me, Mike Meissner and Peter Bergner to the CC
list.

This might be a lingering result of the GCC SSA transition and not a
PowerPC-specific regression, although the symptom has more impact on
PowerPC.

Thanks, David


Re: combine two load insns

2010-12-08 Thread Jeff Law

On 12/08/10 09:18, Frederic Riss wrote:


OK, I see your point, but I tend to think the the odds of the register
allocator being able to coalesce the additional DI->SI moves in the
pre-IRA approach are by far higher that the odds of having merge
candidates after register allocation.

I agree, but note that failure to coalesce leads to code quality regression.

Also note that handling of double-word values is, IMHO, the allocator's 
biggest problem area.  This has been greatly helped by Bernd's recent 
work, but there's still significant amounts of work to do here.




  I agree with your suggestion of
being able to do that in the scheduler though, it might be a good fit,
even if it's not a scheduling issue in the first place.
It may not be a scheduling issue, but it's been known for 20 years that 
GCC's scheduler has the necessary bits to do these kinds of memory 
optimizations.   We've just never taken the time to utilize the 
dependency information available in the scheduler in any way other than 
to reorder insns to improve pipeline behavior.   One could even argue 
that the dependency info in the scheduler should be pushed out to other 
passes that could easily make use of such information.


Jeff



Re: software pipelining

2010-12-08 Thread roy rosen
I have tried to play a bit with SMS on ia64 and I can't understand
what it is doing.
It seems that instead of getting some of the first insns out of the
loop into the prologue it simply gets an entire iteration out of the
loop and the loop's content stays approximately the same.

For example for

void x(long long*  y, long long* x)
{
int i;
for (i = 0; i < 100; i++)
{
*x = *y;
x+=20;y+=30;
}
}

with ./cc1 ./a.c -O3 -fmodulo-sched.
Can someone show an example where it actually works as it should?

Roy.

2010/11/10 Andrey Belevantsev :
> Hi,
>
> On 10.11.2010 12:32, roy rosen wrote:
>>
>> Hi,
>>
>> I was wondering if gcc has software pipelining.
>> I saw options -fsel-sched-pipelining -fselective-scheduling
>> -fselective-scheduling2 but I don't see any pipelining happening
>> (tried with ia64).
>> Is there a gcc VLIW port in which I can see it working?
>
> You need to try -fmodulo-sched.  Selective scheduling works by default on
> ia64 with -O3, otherwise you need -fselective-scheduling2
> -fsel-sched-pipelining.  Note that selective scheduling disables autoinc
> generation for the pipelining to work, and modulo scheduling will likely
> refuse to pipeline a loop with autoincs.
>
> Modulo scheduling implementation in GCC may be improved, but that's a
> different topic.
>
> Andrey
>
>>
>> For an example function like
>>
>> int nor(char* __restrict__ c, char* __restrict__ d)
>> {
>>     int i, sum = 0;
>>     for (i = 0; i<  256; i++)
>>         d[i] = c[i]<<  3;
>>     return sum;
>> }
>>
>> with no pipelining a code like
>>
>> r1 = 0
>> r2 = c
>> r3 = d
>> _startloop
>> if r1 == 256 jmp _end
>> r4 = [r2]+
>> r4>>= r4
>> [r3]+ = r4
>> r1++
>> jmp _startloop
>> _end
>>
>> here inside the loop there is a data dependency between all 3 insns
>> (only the r1++ is independent) which does not permit any parallelism
>>
>> with pipelining I expect a code like
>>
>> r1 = 2
>> r2 = c
>> r3 = d
>> // peel first iteration
>> r4 = [r2]+
>> r4>>= r4
>> r5 = [r2]+
>> _startloop
>> if r1 == 256 jmp _end
>> [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+
>> r1++
>> jmp _startloop
>> _end
>>
>> Now the data dependecy is broken and parlallism is possible.
>> As I said I could not see that happening.
>> Can someone please tell me on which port and with what options can I
>> get such a result?
>>
>> Thanks, Roy.
>
>


Re: combine two load insns

2010-12-08 Thread Frederic Riss
On 8 December 2010 15:39, Jeff Law  wrote:
> On 12/08/10 01:40, Frederic Riss wrote:
>> Sorry, I think I wasn't clear. I didn't mean constraints in term on
>> RTL template constraints, but 'constraints' coming from the new DI
>> destination of the load. More specifically: 2 SI loads can target
>> totally independent registers whereas a standard DI load must target a
>> contiguous SI register pair. If you don't do that before IRA, it will
>> most likely be impossible to do cleanly, won't it?
>
> I tend to look at it the other way -- prior to allocation & reload you're
> going to have two SImode pseudos and there's no way to guarantee they'll end
> up in consecutive hard registers.  You'd have to create a new DImode pseudo
> as the destination of the memory load, then copy from the DImode pseudo into
> the two SImode pseudos and rely on the register allocator to allocate the
> DImode pseudo to the same hard registers as the two SImode pseudos.  There's
> no guarantee that'll happen (it often will, but in the cases where it
> doesn't you end up with useless copies).
>
> With that in mind, I tend to see the right way to address this optimization
> as an optimization which runs *after* register allocation and reloading
> where we know the precise set of registers used and thus can determine if
> two SImode loads target a pair of consecutive registers and thus are
> potential candidates for merging the SImode loads into a DImode load.  The
> difficulty here is the data dependency analysis, thus my suggestion that the
> scheduler's dependency analysis be used to drive this optimization.

OK, I see your point, but I tend to think the the odds of the register
allocator being able to coalesce the additional DI->SI moves in the
pre-IRA approach are by far higher that the odds of having merge
candidates after register allocation. I agree with your suggestion of
being able to do that in the scheduler though, it might be a good fit,
even if it's not a scheduling issue in the first place.

Fred


Re: combine two load insns

2010-12-08 Thread Jeff Law

On 12/08/10 01:40, Frederic Riss wrote:

On 8 December 2010 00:12, Jeff Law  wrote:

On 12/07/10 12:29, Frédéric RISS wrote:

Le mardi 07 décembre 2010 à 06:18 -0700, Jeff Law a écrit :

On 12/06/10 15:07, Ian Lance Taylor wrote:
Given the two loads don't have a def-use data dependency combine won't
ever get the opportunity to do anything with them.  In general there is
no pass which combines insns without a true data dependency and targets
which have such insns have had to handle those combinations in machine
dependent reorg.  In fact, it was the combination of independent insns
which led to the introduction of the machine dependent reorg pass eons
ago.

The issue with this approach is that reorg runs very late. I suppose
that if one wants to combine 2 SI loads into a DI load, it needs to be
done before IRA to satisfy the generated register constraints.

Constraints aren't checked until after register allocation is complete --
they're going to be of no help in performing this optimization.  Right now
the machine dependent reorg pass or a peephole are the only places this
optimization can be performed.However, I believe it would be possible to
make the scheduler perform this optimization with some work.

Sorry, I think I wasn't clear. I didn't mean constraints in term on
RTL template constraints, but 'constraints' coming from the new DI
destination of the load. More specifically: 2 SI loads can target
totally independent registers whereas a standard DI load must target a
contiguous SI register pair. If you don't do that before IRA, it will
most likely be impossible to do cleanly, won't it?
I tend to look at it the other way -- prior to allocation & reload 
you're going to have two SImode pseudos and there's no way to guarantee 
they'll end up in consecutive hard registers.  You'd have to create a 
new DImode pseudo as the destination of the memory load, then copy from 
the DImode pseudo into the two SImode pseudos and rely on the register 
allocator to allocate the DImode pseudo to the same hard registers as 
the two SImode pseudos.  There's no guarantee that'll happen (it often 
will, but in the cases where it doesn't you end up with useless copies).


With that in mind, I tend to see the right way to address this 
optimization as an optimization which runs *after* register allocation 
and reloading where we know the precise set of registers used and thus 
can determine if two SImode loads target a pair of consecutive registers 
and thus are potential candidates for merging the SImode loads into a 
DImode load.  The difficulty here is the data dependency analysis, thus 
my suggestion that the scheduler's dependency analysis be used to drive 
this optimization.


jeff

Fred




Re: PowerPC optimization regression

2010-12-08 Thread Ian Lance Taylor
Joakim Tjernlund  writes:

> I have noticed gcc 4.4.5 often produces less optimzed code
> than the old 3.4.6. Below is the latest example. I am
> starting to wonder if I need rebuild gcc 4.4.5 and/or
> add new options to gcc when I compile. Any insight?

This question as stated is not really appropriate for the mailing list
gcc@gcc.gnu.org, which is for discussion about the development of gcc
itself.  It would be appropriate for the mailing list
gcc-h...@gcc.gnu.org.

Unfortunately I don't have any good news here.  Building gcc differently
won't help, and it looks like you are using appropriate options.  I
think this is an optimization regression.  I encourage you to file a bug
report about it.

Ian


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread H.J. Lu
On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen  wrote:
>> On 12/07/2010 04:20 PM, Andi Kleen wrote:
>>>
>>> The only problem left is mixing of lto and non lto objects. this right
>>> now is not handled. IMHO still the best way to handle it is to use
>>> slim lto and then simply separate link the "left overs" after deleting
>>> the LTO objects. This can be actually done with objcopy (with some
>>> limitations), doesn't even need linker support.
>>>
>>
>> Quite possibly a better way to deal with that is to provide a mechanism
>> for encapsulating arbitrary binary code objects inside the LTO IR.
>
> Then you would need to teach your assembler and everything

The magic section is generated by linker directly. No changes to
assembler is required.

> else that may generate ELF objects to generate this magic object. But why
> not just ELF directly? that is what it is after all.

My proposal isn't specific to ELF.

>
> To be honest I don't really see the point of all this complexity you
> guys are proposing just to save fat LTO. Fat LTO is always a bad idea
> because it's slow and  does lots of redundant work. If LTO is to become
> a more wide spread mode it has to go simply because of the poor
> performance.
>
> With slim LTO passthrough is  very straight-forward: simple pass
> through every section that is not LTO and generate code for the LTO
> sections. No new magic sections needed at all.
>

My proposal works on both fat and slim LTO objects.  The idea is
you can use "ld -r" on any combination of inputs and its output
still works as before "ld -r".

-- 
H.J.


GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-08 Thread Richard Guenther

A release candidate for GCC 4.5.2 is available from

 ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208

and shortly its mirrors.  It has been generated from SVN revision 167585.

I have so far bootstrapped and tested the release candidate on
x86_64-linux, bootstraps and tests on
{i686,ia64,ppc,ppc64,s390,s390x}-linux are running.

Please test it and report any issues to bugzilla.

The branch remains frozen and all checkins until after the final release
of GCC 4.5.2 require explicit RM approval.

If all goes well, I'd like to release 4.5.2 early next week.


Richard.

-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex


GCC 4.5 branch frozen for release (candidate)

2010-12-08 Thread Richard Guenther

The GCC 4.5 branch is now frozen in preparation for a release candidate
of GCC 4.5.2 and a release of GCC 4.5.2 about a week later.

Please refrain from checking in any patches to the branch without
an explicit approval from a release manager.

Thanks,
Richard.


Re: question about alias-analysis in gcc 4.5

2010-12-08 Thread Richard Guenther
On Tue, Dec 7, 2010 at 8:31 PM, Eugen Wagner
 wrote:
> Hi,
> Are any kinds of flow-dependent points-to analysis computed on gimple
> in ssa form?
> in which pass?

In tree-ssa-structalias.c we compute points-to analysis.  It is flow-sensitive
only for pointers in SSA form.

Richard.

>
> regards,
> Eugen
>


PowerPC optimization regression

2010-12-08 Thread Joakim Tjernlund

I have noticed gcc 4.4.5 often produces less optimzed code
than the old 3.4.6. Below is the latest example. I am
starting to wonder if I need rebuild gcc 4.4.5 and/or
add new options to gcc when I compile. Any insight?

 Jocke

const char *test(int i)
{
const char *p = "abc\0def\0gef";
for(; i; --i)
while(*++p);
return p;
}

/* gcc 4.4.5 -O2 -S
   .section".text"
.align 2
.globl test
.type   test, @function
test:
mr. 0,3
mtctr 0
beq- 0,.L10
lis 3,.lanch...@ha
la 3,.lanch...@l(3)
.L8:
lbzu 0,1(3)
cmpwi 7,0,0
bne+ 7,.L8
bdnz .L8
blr
.L10:
lis 3,.lanch...@ha
la 3,.lanch...@l(3)
blr
.size   test, .-test
.section.rodata
.align 2
.set.LANCHOR0,. + 0
.LC0:
.string "abc"
.string "def"
.string "gef"
.ident  "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5"
 */
/* gcc 4.4.5 -Os -S
.globl test
.type   test, @function
test:
mr 9,3
lis 3,.lanch...@ha
la 3,.lanch...@l(3)
b .L2
.L5:
lbzu 0,1(3)
cmpwi 7,0,0
bne+ 7,.L5
addi 9,9,-1
.L2:
cmpwi 7,9,0
bne+ 7,.L5
blr
.size   test, .-test
.section.rodata
.set.LANCHOR0,. + 0
.LC0:
.string "abc"
.string "def"
.string "gef"
.ident  "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5"
 */

/* gcc 3.4.6 -Os -S and gcc -O2 -S
section.rodata
.align 2
.LC0:
.string "abc"
.string "def"
.string "gef"
.section".text"
.align 2
.globl test
.type   test, @function
test:
mr. 0,3
lis 9,@ha
la 3,@l(9)
mtctr 0
beqlr- 0
.L13:
lbzu 0,1(3)
cmpwi 7,0,0
bne- 7,.L13
bdnz .L13
blr
.size   test, .-test
.section.note.GNU-stack,"",@progbits
.ident  "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)"
*/



Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread Andi Kleen
> On 12/07/2010 04:20 PM, Andi Kleen wrote:
>>
>> The only problem left is mixing of lto and non lto objects. this right
>> now is not handled. IMHO still the best way to handle it is to use
>> slim lto and then simply separate link the "left overs" after deleting
>> the LTO objects. This can be actually done with objcopy (with some
>> limitations), doesn't even need linker support.
>>
>
> Quite possibly a better way to deal with that is to provide a mechanism
> for encapsulating arbitrary binary code objects inside the LTO IR.

Then you would need to teach your assembler and everything
else that may generate ELF objects to generate this magic object. But why
not just ELF directly? that is what it is after all.

To be honest I don't really see the point of all this complexity you
guys are proposing just to save fat LTO. Fat LTO is always a bad idea
because it's slow and  does lots of redundant work. If LTO is to become
a more wide spread mode it has to go simply because of the poor
performance.

With slim LTO passthrough is  very straight-forward: simple pass
through every section that is not LTO and generate code for the LTO
sections. No new magic sections needed at all.

-Andi



Re: combine two load insns

2010-12-08 Thread Frederic Riss
On 8 December 2010 00:12, Jeff Law  wrote:
> On 12/07/10 12:29, Frédéric RISS wrote:
>>
>> Le mardi 07 décembre 2010 à 06:18 -0700, Jeff Law a écrit :
>>>
>>> On 12/06/10 15:07, Ian Lance Taylor wrote:
>>> Given the two loads don't have a def-use data dependency combine won't
>>> ever get the opportunity to do anything with them.  In general there is
>>> no pass which combines insns without a true data dependency and targets
>>> which have such insns have had to handle those combinations in machine
>>> dependent reorg.  In fact, it was the combination of independent insns
>>> which led to the introduction of the machine dependent reorg pass eons
>>> ago.
>>
>> The issue with this approach is that reorg runs very late. I suppose
>> that if one wants to combine 2 SI loads into a DI load, it needs to be
>> done before IRA to satisfy the generated register constraints.
>
> Constraints aren't checked until after register allocation is complete --
> they're going to be of no help in performing this optimization.  Right now
> the machine dependent reorg pass or a peephole are the only places this
> optimization can be performed.    However, I believe it would be possible to
> make the scheduler perform this optimization with some work.

Sorry, I think I wasn't clear. I didn't mean constraints in term on
RTL template constraints, but 'constraints' coming from the new DI
destination of the load. More specifically: 2 SI loads can target
totally independent registers whereas a standard DI load must target a
contiguous SI register pair. If you don't do that before IRA, it will
most likely be impossible to do cleanly, won't it?

Fred