Boehm-gc performance data

2006-06-23 Thread Laurynas Biveinis

I'm still waiting for the testsuite to complete (it's been running
just for about 24 hours so far). In the meanwhile I'd like to discuss
the first performance results, which I've put on the Wiki:

First number is GCC with Boehm's GC and the number in parentheses is
GCC with page collector.

combine.c: top mem usage: 52180k (13915k). GC execution time 0.66
(0.61) 4% (4%). User running time: 0m16 (0m14).

reload.c: top mem usage: 35764k (10049k). GC execution time 0.44
(0.53) 5% (6%). User running time: 0m10 (0m9).

PR/8361 (C++): top mem usage: 289128k (62510k). GC execution time 3.97
(5.77) 5% (9%). User running time: 1m17 (1m6). System running time:
0m2 (0m2).

PR/19614 (C++): top mem usage: 289140k (139520k). GC execution time 5
(4.68) 15% (17%). User running time: 0m35 (0m27). System running time:
0m1 (0m1).

My observations and some hypotheses:
1. Top memory usage is rather bad - the collector is not aggresive as
it should. I've done sanity check and verified that the number of GC
allocated bytes is the same in both cases.
2. GC part in total runtime is decreased but the total runtime is
increased - caused by effective GC algorithms, but worse allocated
data locality?

Why this data might have some inaccuracies:
1. Debugging version of Boehm's GC API is used now, though the
collector itself is not compiled with debug options. I will re-run
tests with non-debugging API, but I don't expect significant impact on
results.

2. Decision when to collect is made a bit differently than with old
collectors. They track increase in number of allocated bytes in the GC
heap, on the other hand I track increase of the GC heap size itself.
Again, it will be easy to re-run tests with other way around.

All in all, IMHO this data favours against Boehm's GC in GCC. But
before deciding I would like to enable generational GC features, if
that will help with run time. On the other hand, I don't see how peak
memory usage could be reduced.

What do you think?

--
Laurynas


Re: Boehm-gc performance data

2006-06-23 Thread Steven Bosscher

On 6/23/06, Laurynas Biveinis <[EMAIL PROTECTED]> wrote:

All in all, IMHO this data favours against Boehm's GC in GCC. But
before deciding I would like to enable generational GC features, if
that will help with run time. On the other hand, I don't see how peak
memory usage could be reduced.

What do you think?


First of all, I think I'm impressed by how quickly you've done all this.

Don't write off Boehm's GC just yet.  You can't expect to beat
something that has seen a lot of tuning for GCC with something that
you got working only a few days ago. There are a lot of special tricks
especially in ggc-page that may put it at an advantage, but with some
tuning perhaps you can get Boehm's to perform better for GCC.

For the locality thing: Have you already tried using something like
cachegrind or oprofile to compare the cache behavior of gcc with
Boehm's and gcc with ggc?  What about allocation strategies?  Perhaps
that's another thing you could toy with to improve the peak memory
usage issue. I don't know how Boehm's GC works, but in ggc-page e.g.
all binary expression 'tree's are allocated on the same bag of pages,
which may help (or not, dunno).

Keep up the good work!

Gr.
Steven


Re: Boehm-gc performance data

2006-06-23 Thread Mike Stump

On Jun 23, 2006, at 8:51 AM, Laurynas Biveinis wrote:

First number is GCC with Boehm's GC and the number in parentheses is
GCC with page collector.

combine.c: top mem usage: 52180k (13915k). GC execution time 0.66
(0.61) 4% (4%). User running time: 0m16 (0m14).


Are these with checking on or off?  Normally checking is on, you have  
to go out of your way to turn it off.  If it were on, the real  
numbers are going to look much worse than the ones you're presented.


Also, I've not been following real closely, but the GTY markers are  
used by PCH and the dual use of them by GC allow one to find PCH bugs  
more quickly and easily.  If we moved entirely to Boehm's, did you  
have a plan for the GTY markers and PCH?


Re: Boehm-gc performance data

2006-06-23 Thread Andrew Pinski
> 
> On Jun 23, 2006, at 8:51 AM, Laurynas Biveinis wrote:
> > First number is GCC with Boehm's GC and the number in parentheses is
> > GCC with page collector.
> >
> > combine.c: top mem usage: 52180k (13915k). GC execution time 0.66
> > (0.61) 4% (4%). User running time: 0m16 (0m14).
> 
> Are these with checking on or off?  Normally checking is on, you have  
> to go out of your way to turn it off.  If it were on, the real  
> numbers are going to look much worse than the ones you're presented.
> 
> Also, I've not been following real closely, but the GTY markers are  
> used by PCH and the dual use of them by GC allow one to find PCH bugs  
> more quickly and easily.  If we moved entirely to Boehm's, did you  
> have a plan for the GTY markers and PCH?

GTY markers are still used to mark roots with the boehm-gc.

Thanks,
Andrew Pinski



Re: Getting to the GCC Summit web page

2006-06-23 Thread Dan Kegel

Thanks!  I put an updated page up at
  http://kegel.com/gcc/summit2006.html

I won't be attending myself this year (I needed a break from travel),
but if anyone's blogging the event, please let me know and I'll
link to their blog from my page.
- Dan

On 6/23/06, Andrey Belevantsev <[EMAIL PROTECTED]> wrote:

Hi Daniel,

Last year, when I was at the GCC Summit for the first time, I've found
your web page with directions on how to get there really helpful
(http://kegel.com/gcc/summit2005.html).  By now, some links from the
page are not working:

1.  The transitway info and map is now at the same page at
http://www.octranspo.com/mapscheds/transitway/transitway_map.html
instead of http://www.octranspo.com/mapscheds/transitway/tway_map.html

2.  Mackenzie King is now
http://www.octranspo.com/mapscheds/transitway/station_layout.asp?station_id=MAC

instead of http://www.octranspo.com/mapscheds/transitway/mackenzie_king.htm

3. Area walking map is now
http://www.octranspo.com/mapscheds/transitway/area_map.asp?station_id=MAC
instead of
http://www.octranspo.com/mapscheds/transitway/areamaps/mackenzie_king_area.htm

All the others seem to be ok.  Hope that helps.

Andrey






--
Wine for Windows ISVs: http://kegel.com/wine/isv


Fwd: Lots of gfortrans testsuite failuers on sparc64-linux: undefined reference to `_gfortran_reshape_r8

2006-06-23 Thread Christian Joensson

Bugger, this went to testresults insetad of here... sorry for that...

-- Forwarded message --
From: Christian Joensson <[EMAIL PROTECTED]>
Date: Jun 23, 2006 8:09 PM
Subject: Lots of gfortrans testsuite failuers on sparc64-linux:
undefined reference to `_gfortran_reshape_r8
To: [EMAIL PROTECTED]


Aurora SPARC Linux release 2.1 (Snowshoe FC3)/TI UltraSparc IIi (Sabre) sun4u:

binutils 2.17.50 20060610
bison-1.875c-2.sparc
dejagnu-1.4.4-2.noarch
expect-5.42.1-1.sparc
gcc-3.4.2-6.fc3.sparc
gcc-c++-3.4.2-6.fc3.sparc
gcc-gnat-3.4.2-6.fc3.sparc
glibc-2.3.6-0.fc3.1.sparc64
glibc-2.3.6-0.fc3.1.sparcv9
glibc-devel-2.3.6-0.fc3.1.sparc64
glibc-devel-2.3.6-0.fc3.1.sparc
glibc-headers-2.3.6-0.fc3.1.sparc
glibc-kernheaders-2.6-20sparc.sparc
gmp-4.1.4-3sparc.sparc
gmp-4.1.4-3sparc.sparc64
gmp-devel-4.1.4-3sparc.sparc
gmp-devel-4.1.4-3sparc.sparc64
kernel-2.6.16-1.2241sp1.sparc64
kernel-devel-2.6.16-1.2241sp1.sparc64
libgcc-3.4.2-6.fc3.sparc
libgcc-3.4.2-6.fc3.sparc64
libgcj-3.4.2-6.fc3.sparc
libgcj-devel-3.4.2-6.fc3.sparc
libstdc++-3.4.2-6.fc3.sparc
libstdc++-3.4.2-6.fc3.sparc64
libstdc++-devel-3.4.2-6.fc3.sparc
libstdc++-devel-3.4.2-6.fc3.sparc64
make-3.80-5.sparc
nptl-devel-2.3.6-0.fc3.1.sparcv9
tcl-8.4.7-2.sparc

LAST_UPDATED: Thu Jun 22 17:11:44 UTC 2006 (revision 114896)

Platform: sparc64-unknown-linux-gnu
configure flags: --enable-__cxa_atexit --enable-shared --with-cpu=v7
--enable-languages=c,ada,c++,fortran,java,objc,obj-c++,treelang

I get a lot of gfortran testsuite failuers like this:

PASS: gfortran.dg/append-1.f90  -Os  execution test
Executing on host:
/usr/local/src/trunk/objdir/gcc/testsuite/gfortran/../../gfortran
-B/usr/local/src/trunk/objdir/gcc/testsuite/gfortran/../../
/usr/local/src/trunk/gcc/gcc/testsuite/gfortran.dg/array-1.f90   -O0
-pedantic-errors
-L/usr/local/src/trunk/objdir/sparc64-unknown-linux-gnu/64/libgfortran/.libs
-L/usr/local/src/trunk/objdir/sparc64-unknown-linux-gnu/64/libgfortran/.libs
-L/usr/local/src/trunk/objdir/sparc64-unknown-linux-gnu/64/libiberty
-lm   -m64 -o ./array-1.exe(timeout = 1800)
/tmp/ccwsoiqs.o: In function `MAIN__':
array-1.f90:(.text+0x33c): undefined reference to `_gfortran_reshape_r8'
collect2: ld returned 1 exit status
compiler exited with status 1
output is:
/tmp/ccwsoiqs.o: In function `MAIN__':
array-1.f90:(.text+0x33c): undefined reference to `_gfortran_reshape_r8'
collect2: ld returned 1 exit status

FAIL: gfortran.dg/array-1.f90  -O0  (test for excess errors)
Excess errors:
array-1.f90:(.text+0x33c): undefined reference to `_gfortran_reshape_r8'

WARNING: gfortran.dg/array-1.f90  -O0  compilation failed to produce executable

Any ideas?

The FAILS were not present in my last test suite run...
http://gcc.gnu.org/ml/gcc-testresults/2006-06/msg01081.html

Would you like me to file a bug?

--
Cheers,

/ChJ


--
Cheers,

/ChJ


RE: Intermixing powerpc-eabi and powerpc-linux C code

2006-06-23 Thread Meissner, Michael
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of
> Ron McCall
> Sent: Thursday, June 01, 2006 2:33 PM
> To: gcc@gcc.gnu.org
> Subject: Intermixing powerpc-eabi and powerpc-linux C code
> 
> Hi!
> 
> Does anyone happen to know if it is possible to link
> (and run) C code compiled with a powerpc-eabi targeted
> gcc with C code compiled with a powerpc-linux targeted
> gcc?  The resulting program would be run on a PowerPC
> Linux system (ELDK 4.0).

When I last played with the powerpc many years ago, the main differences
between Linux and eabi was some details that you may or may not run into
(note these are from memory, so you probably need to check what the
current reality is):
1) eabi had different stack alignments than Linux;
2) eabi uses 2 small data registers (r2, r13) and Linux only 1 (r13?).
3) There are eabi relocations not officially in Linux and vice versa,
but the GNU linker should support any relocations the compiler uses.
4) eabi can be little endian, Linux is only big endian.
5) different system libraries were linked in by default. 

--
Michael Meissner
AMD, MS 83-29
90 Central Street
Boxborough, MA 01719





Project RABLET

2006-06-23 Thread Andrew MacLeod

Last fall I produced the RABLE document which described the approach I
thought should be taken to write a new register allocator for GCC.

A new register allocator written from scratch is a very long term
project (measured in years), and there is no guarantee after all that
work that we'd end up with something which is remarkably better. One
would hope that it is a lot more maintainable, but the generated code is
a crapshot. It will surely look better but will it really run faster?
The current plate of spaghetti we call the register allocator has had a
lot of fine tuning go into it over the years, and it generally generates
pretty darn good code IF it doesn't have to spill much, which is much of
the time.

This describes my current work-in-progress, RABLET, which stands for
RABLE-Themes, and conveniently implies something smaller. Rather than
write a new allocator, I think there are things we can do that are a lot
less work which will reap many of the benefits. This works from the
premise that we generate good code when we don't have to spill, so try
to detect early that we are going to spill and do something about it at
a point where we have good analysis.

THEMES
--

1 - One of the core themes in RABLE was very early selection of
instructions from patterns.  RTL patterns are initially chosen by the
EXPAND pass. EXPAND tends to generates better rtl patterns by being
handed complex trees which it can process and get better combinations.

  When TREE-SSA was first implemented, we got very poor RTL because
expand was seeing very small trees.  TER (Temporary Expression
Replacement) was invented, which mashed any single-def/single-use
ssa_names together into more complex trees. This gave expand a better
chance of selecting better instructions, and made a huge difference.

2 - Rematerialization/register pressure reduction was another major
component.  The tree-ssa optimizers pay no attention to potential
register pressure (which is as it should be). This sometimes results in
very high register pressure entering the back end.  RABLE defined passes
performing expression rematerialization and other register pressure
reducing techniques to reduce excessive register pressure before
actually trying to do allocation.  If you have 120 register live at some
point, and only 16 physical registers, the allocator is clearly going to
have a difficult time. Try to present it with something more manageable.

RABLET CORE
---

On to the meat of the subject. RABLET involves reworking the out-of-ssa
pass, rewriting expand such that it is tightly integrated with the
out-of-ssa pass, and adding some new work to reduce register pressure.
Thats a lot less work than a whole new register allocator.

If we look at the RABLE architecture, the passes before global
allocation are as follows:

Live-range-disjointer
instruction-selection
Register coalescing
register pressure reduction
<.. then regular register allocation activity ...>

In the context of RABLET:

-Out of SSA naturally performs live range disjointing.
-Initial RTL pattern selection is performed by expand.
-Register coalescing -  ssa_name coalescing is done by out-of-ssa, and
ssa_name == register.
-Register pressure reduction – This is new work which would be
implemented.

If we create a new black box super pass called
out-of-ssa-expand-register-pressure-reducer,  (ewww,  lets just refer to
it as ssa-to-rtl :-),  then we'd have something close to the first part
of RABLE. Not exactly of course, because 'instruction selection' means
something slightly different.  In RABLE it means choosing an instruction
alternative from an RTL pattern , in RABLET it simple means "choose good
RTL patterns".

“But wait...” I hear some clever person saying... “A lot of things
happen between ssa-to-rtl and global register allocation.”.  Hold that
thought until I get through describing ssa-to-rtl since there are some
important considerations which affect that discussion

SSA-TO-RTL
--

When out-of-ssa was originally written, tree-ssa was still evolving. We
didnt know exactly what was going to be expected of it, so it was
written to be flexible and had a lot of gorp added after the fact.
TREE-SSA is now mature and we understand much more fully what is
expected of translation out of ssa. I have begun rewriting it to be
smaller and faster and to eliminate all the unused features which were
initially provided.  It is also being rewritten with RABLET in mind.  

The new generation out-of-ssa will eliminate PHIs, much as it does
today. This is done by coalescing together ssa_names connected by copies
(and PHIs are just copies), and issuing copies on edges required by the
PHIs.  Instead of  then mapping these back to VAR_DECLs and writing this
all back to the trees, it will simply be maintained in a partition list
map.  This is *key*. At this point, we have a mapping of what ssa_names
have been coalesced together, and the copies required to perform this.
The trees themselves are unaltere

Re: Boehm-gc performance data

2006-06-23 Thread David Nicol

On 6/23/06, Laurynas Biveinis <[EMAIL PROTECTED]> wrote:

What do you think?


Is it possible to turn garbage collection totally off for a null-case
run-time comparison or would that cause thrashing except for very
small jobs?

--
David L Nicol
"if life were like Opera, this would probably
have poison in it" -- Lyric Opera promotional
coffee cup sleeve from Latte Land


Re: Project RABLET

2006-06-23 Thread Robert Dewar

Andrew MacLeod wrote:


A new register allocator written from scratch is a very long term
project (measured in years), and there is no guarantee after all that
work that we'd end up with something which is remarkably better. One
would hope that it is a lot more maintainable, but the generated code is
a crapshot. It will surely look better but will it really run faster?
The current plate of spaghetti we call the register allocator has had a
lot of fine tuning go into it over the years, and it generally generates
pretty darn good code IF it doesn't have to spill much, which is much of
the time.


If you are starting from scratch would it not be better to adopt
the approach of combining register allocation and scheduling.
Significant progress has been made in this area in recent years.



Visibility and C++ Classes/Templates

2006-06-23 Thread Jason Merrill
I'm currently working on a massive overhaul of the visibility code to 
make it play nice with C++.  One of the issues I've run into is the 
question of priority of #pragma visibility versus other sources of 
visibility information.


Consider:

  #pragma GCC visibility push(hidden)
  class __attribute((visibility("default"))) A
  {
void f ();
  };

  void A::f() { }

Here I think we'd all agree that f should get default visibility.

  class A
  {
void f ();
  };

  #pragma GCC visibility push(hidden)
  void A::f() { }
  #pragma GCC visibility pop

This case is less clear; A does not have a specified visibility, but the 
context of f's definition does.  However, we don't want to encourage 
this kind of code; the visibility should be specified as early as 
possible so that callers use the right calling convention.  Waiting 
until the definition to specify visibility is bad practice.  Also, the 
status quo is that f gets A's visibility.  I would preserve that and 
possibly give a warning to tell the user that they might want to add 
__attribute((visibility)) to the declaration of f in A.


Now, templates:

  template __attribute((visibility ("hidden")) T f(T);
  #pragma GCC visibility push(default)
  extern template int f(int);
  #pragma GCC visibility pop

This could really go either way.  It could be considered similar to the 
above case in that f is in a way "part" of f, but there isn't 
the same scoping relationship.  Also, there isn't the 
declaration/definition problem, as the extern template directive is the 
first declaration of the instantiation.  In this case I am inclined to 
respect the #pragma rather than the attribute on the template.


Using an attribute would be less ambiguous:

  extern template __attribute ((visibility ("default")) int f(int);

In a PR Geoff asked if we really want to allow different visibility for 
different instantiations.  I think we do; perhaps one instantiation is 
part of the interface of an exported class, but we want other 
instantiations to be generated locally in each shared object.


Jason


Re: Project RABLET

2006-06-23 Thread Andrew MacLeod
On Fri, 2006-06-23 at 15:29 -0400, Robert Dewar wrote:
> Andrew MacLeod wrote:
> 
> > A new register allocator written from scratch is a very long term
> > project (measured in years), and there is no guarantee after all that
> > work that we'd end up with something which is remarkably better. One
> > would hope that it is a lot more maintainable, but the generated code is
> > a crapshot. It will surely look better but will it really run faster?
> > The current plate of spaghetti we call the register allocator has had a
> > lot of fine tuning go into it over the years, and it generally generates
> > pretty darn good code IF it doesn't have to spill much, which is much of
> > the time.
> 
> If you are starting from scratch would it not be better to adopt
> the approach of combining register allocation and scheduling.
> Significant progress has been made in this area in recent years.
> 

I am personally not a believer in combining register allocation and
scheduling. They are two different problems, and although there is some
interaction, I am still in the "keep them seperate" camp. 

However, RABLET is not writing a register allocator so its moot
anyway :-).

Andrew



Re: Project RABLET

2006-06-23 Thread Robert Dewar

Andrew MacLeod wrote:


I am personally not a believer in combining register allocation and
scheduling. They are two different problems, and although there is some
interaction, I am still in the "keep them seperate" camp. 


I disagree, there is in fact much more than "some interaction", there
is a very strong interaction between scheduling and register allocation,
particularly on modern machines like the typical x86 chips (which have
only a fraction of their registers directly nameable). The research
results on combining the two steps looks very promising to me.


However, RABLET is not writing a register allocator so its moot
anyway :-).


indeed, moot = disussable, undecided, so here we are discussing
(or if you like to use the verb, mooting) the issue.


Andrew





Re: Project RABLET

2006-06-23 Thread Daniel Jacobowitz
On Fri, Jun 23, 2006 at 04:30:01PM -0400, Robert Dewar wrote:
> >However, RABLET is not writing a register allocator so its moot
> >anyway :-).
> 
> indeed, moot = disussable, undecided, so here we are discussing
> (or if you like to use the verb, mooting) the issue.

Please try the other definition, which he clearly meant:

 2. Of purely theoretical or academic interest; having no
practical consequence; as, the team won in spite of the
bad call, and whether the ruling was correct is a moot
question.

-- 
Daniel Jacobowitz
CodeSourcery


Re: Project RABLET

2006-06-23 Thread Robert Dewar

Daniel Jacobowitz wrote:


Please try the other definition, which he clearly meant:

 2. Of purely theoretical or academic interest; having no
practical consequence; as, the team won in spite of the
bad call, and whether the ruling was correct is a moot
question.


Well I am not sure what he meant, but for sure it is not the
case that optimal register allocation and scheduling is of only
theoretical or academic interest with no practical consequences!



Re: Project RABLET

2006-06-23 Thread Steven Bosscher

On 6/23/06, Robert Dewar <[EMAIL PROTECTED]> wrote:

Well I am not sure what he meant, but for sure it is not the
case that optimal register allocation and scheduling is of only
theoretical or academic interest with no practical consequences!


Thanks for making that point.

Now, what do you think about this RABLET idea, which has nothing to do
with either register allocation or scheduling? ;-)
Gr.
Steven


Re: Project RABLET

2006-06-23 Thread Robert Dewar

Steven Bosscher wrote:


Now, what do you think about this RABLET idea, which has nothing to do
with either register allocation or scheduling? ;-)


Well I would not say that it has nothing to do with register allocation!
But indeed this seems a promising approach. The real question in my mind
is whether it can be done in a way that simplifies and clarifies rather
than adding to what is now very complex code to follow. I think the 
answer to that is probably yes.




Fortran Compiler

2006-06-23 Thread hector riojas roldan

Hello, I would like to know if there is a fortran compiler that runs
on AMD 64 bits. I have installed suse 10.1 linux on my computer, I
would really apreciated all your help. I heard yours also have C and
C++.
Thank  you very much, I write you from Argentina,
héctor Riojas Roldan


RFC: __cxa_atexit for mingw32

2006-06-23 Thread Danny Smith
Hello,

One of things mingw32 C runtime lacks is an implementation of
__cxa_atexit.
However, as explained in the comment below, some of the behaviour of
__cxa_atexit is already in the  C runtime  atexit implementation.

Adding the object below to libstdc++ or libgcc.a and configuring with
__cxa_atexit enabled produces PASSES for the three __cxa_atexit
dependent testcases. It works fine in tests with destruction of objects
in dll's too, whether these dlls are unloaded at process exit or by
earlier calls to UnloadLibrary. (No, it doesn't allow exceptions to be
thrown and caugtht across dll boundaries -- thats another story for gcc
4.3 -- but it removes one obstacle.)

Although this  keeps the changes local to mingw32 code, I don't really
like adding a fake __cxa_atexit to a runtime lib. So, the other option
would be to add a 'if (flag_use_dllonexit)' code to cp/decl.c and
decl2.c to parallel flag_use_cxa_atexit.

Adding a real __cxa_atexit to mingw runtime is of course also possible,
but I thought I'd attempt the easy options first.

I would appreciate any comments.


Danny

/* mingw32-cxa_atexit.c
   Contributed by Danny Smith ([EMAIL PROTECTED])
   Copyright (C) 2006   Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING.  If not, write to
the Free Software Foundation, 51 Franklin Street, Fifth Floor,
Boston, MA 02110-1301, USA.  */
 
/* On mingw32, each dll has its own on-exit table, which is initialized 
   on dll load, Calls to atexit or onexit will register functions in the
   on-exit table of the containing module. Each dll-specific on-exit
table
   runs when that dll unloads. The calls to atexit from the main app,
(ie,
   including all static libs) are finalized at process exit.

   cc1plus currently ignores the argument to __cxa_atexit-registered
   functions.   If that changes, we will need to replace this with a
real
   __cxa_atexit implementation in mingw runtime.  */

#include 

/* We don't need an explicit dll handle. The handle is always 'this'. */
void* __dso_handle = NULL;

int __mingw_cxa_atexit (void (*)(void *), void *, void *);

int __mingw_cxa_atexit (void (*func)(void *),
void *arg __attribute__((unused)),
void *d __attribute__((unused)))
{
  return atexit ((void (*) (void)) func);
}


int __cxa_atexit (void (*)(void *), void *, void *)
  __attribute__ ((alias ("__mingw_cxa_atexit")));





Re: Project RABLET

2006-06-23 Thread Ian Lance Taylor
Andrew MacLeod <[EMAIL PROTECTED]> writes:

> This describes my current work-in-progress, RABLET, which stands for
> RABLE-Themes, and conveniently implies something smaller.

Thanks for this proposal.


> ssa-to-rtl
> spill cost analysis
> global allocation
> spiller
> spill location optimizer
> instruction rewriter.

You omitted the RTL loop optimizer passes, which still do quite a bit
of work despite the tree-ssa loop passes.  Also if-conversion and some
minor passes, though they are less relevant.


> If expand is made much smarter, I would argue that much of GCSE and CSE
> isn't needed.  We've already performed those optimizations at  a high
> level, and we can hopefully do a lot of the factoring and things on
> addressing registers exposed during expand.  I'm sure there are other
> things to do, but I would argue that they are significantly less than a
> "general purpose" CSE and GCSE pass. And in the cases of high register
> pressure, how much would you want them to do anyway?  Its really these
> high register pressure areas that RABLET is attacking anyway.

Here I think you are waving your hands a little too hard.  RTL level
CSE is significant for handling common expressions exposed by address
calculations and by DImode (and larger) computations.  On some
processors giving up CSE on address calculations would be very
painful.  There needs to be a plan to handle that.

Also at present may vector calculations are not exposed at the tree
level--they are hidden inside builtin functions until they are
expanded--and vector heavy code can also have a lot of common
subexpressions.


> If I recall, scheduling is register pressure aware and normally doesn't
> increase register pressure dramatically. If it does increase pressure,
> well, this won't solve every problem after all.

Unfortunately, scheduling is currently not register pressure aware at
all.  The scheduler will gleefully increase register pressure.  That's
why we don't even run the scheduler before register allocation on x86.


Modulo the above comments, I don't see anything wrong with your basic
idea.  But I also wonder whether you couldn't get a similar effect by
forcing instruction selection to occur before register allocation.  If
that is done well, reload will have much less work to do.

One of the basic issues with the current code is not that we do
register allocation well or poorly, but that reload takes the output
of the register allocator and munges it unpredictably.  That's going
to happen with your proposal as well.  It doesn't mean that your
proposal won't improve things.  But no register allocator can do a
good job when it can't make the final decisions.

Ian


Re: Visibility and C++ Classes/Templates

2006-06-23 Thread Mark Mitchell
Jason Merrill wrote:

Nice to see this stuff getting improved!

>   #pragma GCC visibility push(hidden)
>   class __attribute((visibility("default"))) A
>   {
> void f ();
>   };
> 
>   void A::f() { }
> 
> Here I think we'd all agree that f should get default visibility.

Agreed.

>   class A
>   {
> void f ();
>   };
> 
>   #pragma GCC visibility push(hidden)
>   void A::f() { }
>   #pragma GCC visibility pop
> 
> This case is less clear; A does not have a specified visibility, but the
> context of f's definition does.  However, we don't want to encourage
> this kind of code; the visibility should be specified as early as
> possible so that callers use the right calling convention.  Waiting
> until the definition to specify visibility is bad practice.  Also, the
> status quo is that f gets A's visibility.  I would preserve that and
> possibly give a warning to tell the user that they might want to add
> __attribute((visibility)) to the declaration of f in A.

Agreed.

> Now, templates:
> 
>   template __attribute((visibility ("hidden")) T f(T);
>   #pragma GCC visibility push(default)
>   extern template int f(int);
>   #pragma GCC visibility pop
> 
> This could really go either way.  It could be considered similar to the
> above case in that f is in a way "part" of f, but there isn't
> the same scoping relationship.  Also, there isn't the
> declaration/definition problem, as the extern template directive is the
> first declaration of the instantiation.  In this case I am inclined to
> respect the #pragma rather than the attribute on the template.

I'd tend to say that the attribute wins, and that if you want to specify
the visibility on the template instantiation, you must use the attribute
on the instantiation, as you suggest:

> Using an attribute would be less ambiguous:
> 
>   extern template __attribute ((visibility ("default")) int f(int);
> 
> In a PR Geoff asked if we really want to allow different visibility for
> different instantiations.  I think we do; perhaps one instantiation is
> part of the interface of an exported class, but we want other
> instantiations to be generated locally in each shared object.

Agreed.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: Visibility and C++ Classes/Templates

2006-06-23 Thread Ian Lance Taylor
Mark Mitchell <[EMAIL PROTECTED]> writes:

> Jason Merrill wrote:
> > Now, templates:
> > 
> >   template __attribute((visibility ("hidden")) T f(T);
> >   #pragma GCC visibility push(default)
> >   extern template int f(int);
> >   #pragma GCC visibility pop
> > 
> > This could really go either way.  It could be considered similar to the
> > above case in that f is in a way "part" of f, but there isn't
> > the same scoping relationship.  Also, there isn't the
> > declaration/definition problem, as the extern template directive is the
> > first declaration of the instantiation.  In this case I am inclined to
> > respect the #pragma rather than the attribute on the template.
> 
> I'd tend to say that the attribute wins, and that if you want to specify
> the visibility on the template instantiation, you must use the attribute
> on the instantiation, as you suggest:

Don't you still have to deal with this case?

#pragma GCC visibility push(hidden)
template T f(T);
#pragma GCC visibility pop
...
#pragma GCC visibility push(default)
extern template int f(int);
#pragma GCC visibility pop

Personally I wouldn't mind saying that the attribute always beats the
pragma, but it seems to me that there is still the potential for
ambiguity.

Ian


gcc-4.1-20060623 is now available

2006-06-23 Thread gccadmin
Snapshot gcc-4.1-20060623 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20060623/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.1 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch 
revision 114953

You'll find:

gcc-4.1-20060623.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.1-20060623.tar.bz2 C front end and core compiler

gcc-ada-4.1-20060623.tar.bz2  Ada front end and runtime

gcc-fortran-4.1-20060623.tar.bz2  Fortran front end and runtime

gcc-g++-4.1-20060623.tar.bz2  C++ front end and runtime

gcc-java-4.1-20060623.tar.bz2 Java front end and runtime

gcc-objc-4.1-20060623.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.1-20060623.tar.bz2The GCC testsuite

Diffs from 4.1-20060616 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.1
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Project RABLET

2006-06-23 Thread Seongbae Park

On 6/23/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
...

1 - One of the core themes in RABLE was very early selection of
instructions from patterns.  RTL patterns are initially chosen by the
EXPAND pass. EXPAND tends to generates better rtl patterns by being
handed complex trees which it can process and get better combinations.

  When TREE-SSA was first implemented, we got very poor RTL because
expand was seeing very small trees.  TER (Temporary Expression
Replacement) was invented, which mashed any single-def/single-use
ssa_names together into more complex trees. This gave expand a better
chance of selecting better instructions, and made a huge difference.


Have you considered using BURG/IBURG style tree pattern matching
instruction selection ?

http://www.cs.princeton.edu/software/iburg/

That approach can certainly provide a low register pressure
high quality instruction selection.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


sh3e opcodes in sh2e's crt1.o?

2006-06-23 Thread DJ Delorie

It looks like crt1.asm unconditionally includes an sh3e opcode (stc
spc,r1) which causes problems trying to build an sh2a-single-only
executable, which falls back to sh2e but doesn't have this sh3e
opcode.  Comments?


 1091   ! Here handler available, call it.
 1092   /* Now call the trap handler with as much of 
the context unchanged as possible.
 1093  Move trapping address into PR to make it 
look like the trap point */
 1094 052a 0142 stc spc, r1
 1095 052c 412A lds r1, pr


unable to detect exception model

2006-06-23 Thread Jack Howarth
 I have run into a build problem with tonights gcc trunk on MacOS X which 
didn't exist in yesterdays
svn pull. The gcc trunk build on MacOS X 10.4.6 crashes with...

checking how to run the C++ preprocessor...  
/sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/xgcc 
-shared-libgcc -B/sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc 
-nostdinc++ 
-L/sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/powerpc-apple-darwin8/libstdc++-v3/src
 
-L/sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/powerpc-apple-darwin8/libstdc++-v3/src/.libs
 -B/sw/lib/gcc4/powerpc-apple-darwin8/bin/ 
-B/sw/lib/gcc4/powerpc-apple-darwin8/lib/ -isystem 
/sw/lib/gcc4/powerpc-apple-darwin8/include -isystem 
/sw/lib/gcc4/powerpc-apple-darwin8/sys-include -E
loading cache ./config.cache within ltconfig
checking host system type... powerpc-apple-darwin8
checking build system type... powerpc-apple-darwin8
checking for objdir... .libs
checking for /sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/xgcc 
option to produce PIC... -fno-common -DPIC
checking if /sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/xgcc 
PIC flag -fno-common -DPIC works... yes
checking if /sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/xgcc 
static flag -static works... no
finding the maximum length of command line arguments... (cached) 196608
checking if /sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/xgcc 
supports -c -o file.o... (cached) yes
checking if /sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/xgcc 
supports -fno-rtti -fno-exceptions ... yes
checking whether the linker 
(/sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/collect-ld) 
supports shared libraries... 
checking how to hardcode library paths into programs... unsupported
checking whether stripping libraries is possible... no
checking dynamic linker characteristics... darwin8 dyld
checking command to parse 
/sw/src/fink.build/gcc4-4.1.999-20060623/darwin_objdir/./gcc/nm output... failed
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
appending configuration tag "CXX" to libtool
checking for exception model to use... configure: error: unable to detect 
exception model
make[1]: *** [configure-target-libstdc++-v3] Error 1
make: *** [all] Error 2
### execution of /var/tmp/tmp.2.jzc1x4 failed, exit code 2
Failed: phase compiling: gcc4-4.1.999-20060623 failed

Any idea where this is coming from?
 Jack


Re: Project RABLET

2006-06-23 Thread Daniel Berlin
Ian Lance Taylor wrote:
> Andrew MacLeod <[EMAIL PROTECTED]> writes:
> 
>> This describes my current work-in-progress, RABLET, which stands for
>> RABLE-Themes, and conveniently implies something smaller.
> 
> Thanks for this proposal.
> 
> 
>> ssa-to-rtl
>> spill cost analysis
>> global allocation
>> spiller
>> spill location optimizer
>> instruction rewriter.
> 
> You omitted the RTL loop optimizer passes, which still do quite a bit
> of work despite the tree-ssa loop passes.  Also if-conversion and some
> minor passes, though they are less relevant.
> 
> 
>> If expand is made much smarter, I would argue that much of GCSE and CSE
>> isn't needed.  We've already performed those optimizations at  a high
>> level, and we can hopefully do a lot of the factoring and things on
>> addressing registers exposed during expand.  I'm sure there are other
>> things to do, but I would argue that they are significantly less than a
>> "general purpose" CSE and GCSE pass. And in the cases of high register
>> pressure, how much would you want them to do anyway?  Its really these
>> high register pressure areas that RABLET is attacking anyway.
> 
> Here I think you are waving your hands a little too hard.  RTL level
> CSE is significant for handling common expressions exposed by address
> calculations and by DImode (and larger) computations.  On some
> processors giving up CSE on address calculations would be very
> painful.  There needs to be a plan to handle that.

I agree with Ian completely.
Also, after having stared and worked on df in the backend with Kenny and
watched the amount of work that has had to be done, i think you may be
underestimating the complexity of what is really going on in the backend
right now.

Not that i wouldn't love to see our backend become simpler and have a
bunch of relatively non-complex df based passes, because I would, but i
*also* don't think RABLET is going to enable that (or the removal of
CSE/GCSE) through smarter expand.  It's possible you'd remove GCSE, but
only because the last time i remember someone looking (stevenb, i
think), it wasn't doing all *that* much.

Again, like Ian, I'd argue you'd need to do real instruction selection
before register allocation before that can happen.  Luckily, these days,
BURG based instruction selection has become production usable, so that
task isn't as horrid as it used to be.

--Dan


Re:sh3e opcodes in sh2e's crt1.o?

2006-06-23 Thread Joern Rennecke
> It looks like crt1.asm unconditionally includes an sh3e opcode (stc
> spc,r1) which causes problems trying to build an sh2a-single-only
> executable, which falls back to sh2e but doesn't have this sh3e
> opcode.  Comments?

It's not actually unconditional, but the condition it depends on is set
conditionally with a flawed condition.  Please try the attached patch.




tmp
Description: Binary data


Re: sh3e opcodes in sh2e's crt1.o?

2006-06-23 Thread DJ Delorie

> It's not actually unconditional, but the condition it depends on is set
> conditionally with a flawed condition.  Please try the attached patch.

That seems to fix it, although I only tested a simple hello.c program.
Thanks!


Re: Project RABLET

2006-06-23 Thread Andrew MacLeod
On Fri, 2006-06-23 at 23:08 -0400, Daniel Berlin wrote:
> Ian Lance Taylor wrote:

> > 
> > Here I think you are waving your hands a little too hard.  RTL level
> > CSE is significant for handling common expressions exposed by address
> > calculations and by DImode (and larger) computations.  On some
> > processors giving up CSE on address calculations would be very
> > painful.  There needs to be a plan to handle that.
> 
> I agree with Ian completely.
> Also, after having stared and worked on df in the backend with Kenny and
> watched the amount of work that has had to be done, i think you may be
> underestimating the complexity of what is really going on in the backend
> right now.
> 
> Not that i wouldn't love to see our backend become simpler and have a
> bunch of relatively non-complex df based passes, because I would, but i
> *also* don't think RABLET is going to enable that (or the removal of
> CSE/GCSE) through smarter expand.  It's possible you'd remove GCSE, but
> only because the last time i remember someone looking (stevenb, i
> think), it wasn't doing all *that* much.
> 
> Again, like Ian, I'd argue you'd need to do real instruction selection
> before register allocation before that can happen.  Luckily, these days,
> BURG based instruction selection has become production usable, so that
> task isn't as horrid as it used to be.

It occurs to me I think there is a misunderstanding here. Perhaps I
didn't communicate this well enough, or perhaps I got a little carried
away trying to make RABLET look like RABLE.

Im not actually proposing that RABLET will enable the backend to
suddenly become simple... The initial impact of RABLET is to simply
remove some of the onus of dealing with excessive register pressure from
the register allocator.

RABLET will really do nothing when register pressure is not high, things
would be pretty much exactly as they are today.

When register pressure is high, many of the things the RTL optimizations
I mentioned do really become irrelevant (I think), since they increase
register pressure more, and cause more spilling. This generally offsets
whatever good they do. I was trying to claim that some level of this
work can be done in expand and in *this* circumstance, thats all that
needs to be done. Making the resulting model look somewhat like RABLE,
and simplifying the view of the RTL optimizations. 

Its possible that some of this work can simplify the RTL optimizations
in other cases, perhaps not. If we can simplify anything, that great.
I'd love to see it, and I hope some of it is possible. I do see
possibilities that will hopefully pan out.

Ultimately, RABLET simply tries to present the backend with code that
looks more like low register pressure code which the current backend is
pretty good at handling. Anything else we can get from it is a bonus.

For the work involved in RABLET vs. the work involved in a new allocator
like RABLE, I think RABLET is well worth doing. (months vs. years). I
think RABLET will show a significant benefit. (famous last words, ho
ho :-)

Andrew



Re: Visibility and C++ Classes/Templates

2006-06-23 Thread Mark Mitchell
Ian Lance Taylor wrote:

> Don't you still have to deal with this case?
> 
> #pragma GCC visibility push(hidden)
> template T f(T);
> #pragma GCC visibility pop
> ...
> #pragma GCC visibility push(default)
> extern template int f(int);
> #pragma GCC visibility pop
> 
> Personally I wouldn't mind saying that the attribute always beats the
> pragma, but it seems to me that there is still the potential for
> ambiguity.

I would treat that case as if the template had the attribute, and,
therefore, ignore the pragma at the point of instantiation.

My concern here is that template instantiation can happen "at any time".
 I'm sure we all agree that the pragma should affect *implicit*
instantiations; if you happened to say:

#pragma GCC visibility push(default)
int i = f(int);
#pragma GCC visibility pop

we wouldn't want the visibility of "i" to affect "f".  But, an
explicit instantiation:

  template int f(int);

should really behave just like an implicit instantiation; it's just a
manual way of saying instantiate here.  And, "extern template" is a GNU
extension which says "there's an explicit instantiation elsewhere; you
needn't bother implicitly instantiating here".

I'm just not comfortable with the idea of #pragmas affecting
instantiations.  (I'm OK with them affecting specializations, though; in
that case, the original template has basically no impact, so I think
it's fine to treat the specialization case as if it were any other
function.)

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: Project RABLET

2006-06-23 Thread Andrew MacLeod
On Fri, 2006-06-23 at 15:07 -0700, Ian Lance Taylor wrote:

> You omitted the RTL loop optimizer passes, which still do quite a bit
> of work despite the tree-ssa loop passes.  Also if-conversion and some
> minor passes, though they are less relevant.

Which brings up a good discussion. I presume the rtl loop optimizers see
things exposed by addressing modes which aren't seen in the higher level
code. I wonder what the "big gains" are here... and if they are
detectable at expansion time...   

In general, I didnt mention anything that tends not to increase register
pressure, at least not in any significant manner as far as RABLET is
concerned.

> 
> > If expand is made much smarter, I would argue that much of GCSE and CSE
> > isn't needed.  We've already performed those optimizations at  a high
> > level, and we can hopefully do a lot of the factoring and things on
> > addressing registers exposed during expand.  I'm sure there are other
> > things to do, but I would argue that they are significantly less than a
> > "general purpose" CSE and GCSE pass. And in the cases of high register
> > pressure, how much would you want them to do anyway?  Its really these
> > high register pressure areas that RABLET is attacking anyway.
> 
> Here I think you are waving your hands a little too hard.  RTL level
> CSE is significant for handling common expressions exposed by address
> calculations and by DImode (and larger) computations.  On some
> processors giving up CSE on address calculations would be very
> painful.  There needs to be a plan to handle that.
> 

Yes, there is some hand waving, mostly because I haven't gotten that far
in details yet. I expect to be able to do some of this type of commoning
at rtl generation time as things are generated. (much like RABLE's
spiller reuses spill loads nearby). That may turn out to be more
difficult than I anticipate however. Pain is in the implementation :-)

I am not proposing that CSE necessarily be eliminated *all* the time,
but in cases when register pressure is already excessively high, is
further commoning of DImode values going to make things better? Its
really this case I'm interested in evaluating since this is the case we
already have problems. if we don't spill, RABLET would effectively do
nothing.

Clearly there will be a lot of further investigation required once
implementation reaches this point. Ultimately CSE and all RTL
optimizations can be re-evaluated to see if things can be simplified.

> Also at present may vector calculations are not exposed at the tree
> level--they are hidden inside builtin functions until they are
> expanded--and vector heavy code can also have a lot of common
> subexpressions.
> 

I have no plan at moment for vector operations :-). That could change,
but for now we'll have to keep whatever we do today for those.

> 
> > If I recall, scheduling is register pressure aware and normally doesn't
> > increase register pressure dramatically. If it does increase pressure,
> > well, this won't solve every problem after all.
> 
> Unfortunately, scheduling is currently not register pressure aware at
> all.  The scheduler will gleefully increase register pressure.  That's
> why we don't even run the scheduler before register allocation on x86.
> 

hum, too bad. for some reason I was under the impression that it at
least tried not to increase register pressure when it was above a
certain threshold value. Not running it at least means it wont increase
register pressure, so that works :-)
  
> 
> Modulo the above comments, I don't see anything wrong with your basic
> idea.  But I also wonder whether you couldn't get a similar effect by
> forcing instruction selection to occur before register allocation.  If
> that is done well, reload will have much less work to do.
> 

That was one of the premises of RABLE. Since out of ssa needs some TLC
and TER has been a wart for years, this seems like a good way of dealing
with those issues, and perhaps dealing with some significant RA issues
at the same time. (Anything to avoid actually rewriting RA eh!)


> One of the basic issues with the current code is not that we do
> register allocation well or poorly, but that reload takes the output
> of the register allocator and munges it unpredictably.  That's going
> to happen with your proposal as well.  It doesn't mean that your
> proposal won't improve things.  But no register allocator can do a
> good job when it can't make the final decisions.
> 
Truer words have never been spoken. RABLET makes no attempt to do
anything about reload. It simply attempts to present the backend with
code that isn't full of excessive register pressure. If it turns out to
be something reload screws up today, it will continue to be screwed up.
I suspect a lot of the time we do have excessive spill, RABLET will show
benefit. 

Its clearly not as good as a new register allocator would be, but the
effort to benefit ratio ought to be a lot higher for RABLET than for a
register allo

Re: Project RABLET

2006-06-23 Thread Andrew Pinski


On Jun 23, 2006, at 9:39 PM, Andrew MacLeod wrote:


On Fri, 2006-06-23 at 15:07 -0700, Ian Lance Taylor wrote:


You omitted the RTL loop optimizer passes, which still do quite a bit
of work despite the tree-ssa loop passes.  Also if-conversion and  
some

minor passes, though they are less relevant.


Which brings up a good discussion. I presume the rtl loop  
optimizers see
things exposed by addressing modes which aren't seen in the higher  
level

code. I wonder what the "big gains" are here... and if they are
detectable at expansion time...


The one rtl loop optimizer which has nothing to do with addressing  
modes and
loops is the doloop optimizer which is most likely possible to do  
expansion time
and is one of the few loop optimizer which lowers register pressure.   
The reason
why it lowers register pressure is because it makes use of a special  
register for

loops (at least on PowerPC).

Thanks,
Andrew Pinski 


Re: Project RABLET

2006-06-23 Thread Ian Lance Taylor
Andrew MacLeod <[EMAIL PROTECTED]> writes:

> On Fri, 2006-06-23 at 15:07 -0700, Ian Lance Taylor wrote:
> 
> > You omitted the RTL loop optimizer passes, which still do quite a bit
> > of work despite the tree-ssa loop passes.  Also if-conversion and some
> > minor passes, though they are less relevant.
> 
> Which brings up a good discussion. I presume the rtl loop optimizers see
> things exposed by addressing modes which aren't seen in the higher level
> code. I wonder what the "big gains" are here... and if they are
> detectable at expansion time...   

One obvious gain is hoisting constants exposed by address expansion
out of loops.  Also once addressing modes are expanded, there are new
IVs.


> I am not proposing that CSE necessarily be eliminated *all* the time,
> but in cases when register pressure is already excessively high, is
> further commoning of DImode values going to make things better? Its
> really this case I'm interested in evaluating since this is the case we
> already have problems. if we don't spill, RABLET would effectively do
> nothing.

I think that even when pressure is high, it helps a lot to do CSE
after DImode values have been split up, as will be the case even today
for, e.g., DImode bitwise operations.  It tends to reduce register
pressure if anything.


As you say, none of these arguments that RABLET is a bad idea, they
are just arguments that we can't expect to remove the RTL passes
without a lot more work, whether or not they increase register
pressure.

One thing we could perhaps consider would be expanding addressing mode
calculations at the tree level.

Ian