Re: [RFC] Migrate pointers to members to the middle end

2007-08-17 Thread Richard Guenther
On 8/9/07, Ollie Wild [EMAIL PROTECTED] wrote:
 On 8/8/07, Daniel Berlin [EMAIL PROTECTED] wrote:
  I also haven't necessarily said what Ollie has proposed is a bad idea.
   I have simply said the way he has come up with what he proposed is
  not the way we should go about this.  It may turn out he has come up
  with exactly the representation we want (though I doubt this, for
  various reasons).The specification given also doesn't even explain
  where/how these operations can occur in GIMPLE, and what they do other
  than a C++ something something.
 
  Also given that someone already wrote a type-based devirtualizer that
  worked fine, and i don't see how a points-to one is much work, I'd
  like to see more justification for things like PTRMEM_PLUS_EXPR than
  hey, the C++ FE generates this internally.

 OK.  It sounds like I need to go into a lot more detail.  The new
 nodes I've proposed aren't actually motivated by the C++ front end,
 but rather by a consideration of the semantics dictated by the C++
 standard.  Naturally, this gives rise to some similarity, but for
 instance, there is no PTRMEM_PLUS_EXPR in the C++ front end, and my
 definition of PTRMEM_CST disagrees with the current node of the same
 name.

 Let's walk through them:

[...]

Just to throw in something late from the side.  Can you try to model
things in such a way that representing TLS accesses on the tree level
would be possible with whatever new infrastructure you come up with?
(It's the only thing we keep libcalls for if I remember correctly)

Thx,
Richard.


Re: [RFC] Migrate pointers to members to the middle end

2007-08-10 Thread Mark Mitchell
Ollie Wild wrote:

 Offhand, I don't remember what happened with the various other cases,
 but my testing at the time wasn't particularly thorough.  The feedback
 I've gotten so far seems overwhelmingly negative, so I think the next
 step is to revisit the lowering approach, exercise the hell out of it,
 and see what, if any, limitations pop up.

Yes, I agree.  Again, thank you for being patient with the process.

Let me know when you're at the point where you'd like me to review the
front-end lowering patch again; send me a URL, and I'll be happy to do so.

Thanks,

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: [RFC] Migrate pointers to members to the middle end

2007-08-10 Thread Tom Tromey
 Dan == Daniel Berlin [EMAIL PROTECTED] writes:

Dan Just to be clear, we *already* have the class hierarchies in the
Dan middle end.

Dan They have been there for a few years now :)

Good point, thanks.

I don't think that is enough though, because I don't think the BINFO
slots mean the same thing in g++ and gcj.

Anyway, I don't want to derail this conversation.  If we really want
to strength reduce interface dispatch to virtual dispatch in LTO then
we'll need to find some relatively language neutral way to express
that.

Tom


Re: [RFC] Migrate pointers to members to the middle end

2007-08-10 Thread Michael Matz

Hi,

On Thu, 9 Aug 2007, Tom Tromey wrote:


Michael Yes, devirtualization.  But I wonder if you really need class
Michael hierarchies for this (actually I'm fairly sure you don't).

However, I'm not sure I agree with the above assertion.  Specifically, 
for Java I think it is sometimes possible to strength reduce interface 
calls to virtual calls, but I don't see how this could be done without 
class hierarchy information.


Okay, I suppose there are transformations that could make use of class 
hierarchies.  Luckily we do have that via the BINFO machinery already.



Ciao,
Michael.


Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Ollie Wild
On 8/8/07, Daniel Berlin [EMAIL PROTECTED] wrote:
 I also haven't necessarily said what Ollie has proposed is a bad idea.
  I have simply said the way he has come up with what he proposed is
 not the way we should go about this.  It may turn out he has come up
 with exactly the representation we want (though I doubt this, for
 various reasons).The specification given also doesn't even explain
 where/how these operations can occur in GIMPLE, and what they do other
 than a C++ something something.

 Also given that someone already wrote a type-based devirtualizer that
 worked fine, and i don't see how a points-to one is much work, I'd
 like to see more justification for things like PTRMEM_PLUS_EXPR than
 hey, the C++ FE generates this internally.

OK.  It sounds like I need to go into a lot more detail.  The new
nodes I've proposed aren't actually motivated by the C++ front end,
but rather by a consideration of the semantics dictated by the C++
standard.  Naturally, this gives rise to some similarity, but for
instance, there is no PTRMEM_PLUS_EXPR in the C++ front end, and my
definition of PTRMEM_CST disagrees with the current node of the same
name.

Let's walk through them:


PTRMEM_TYPE

Contains the types of the member (TREE_TYPE) and class
(TYPE_PTRMEM_BASETYPE) of this pointer to member.  This is hopefully
self-explanatory.  In the language of the C++ standard, it is the type
of a pointer to member of class TYPE_PTRMEM_BASETYPE of type
TREE_TYPE.  This is the type of PTRMEM_CST's, PTRMEM_PLUS_EXPR's, and
various variable types (VAR_DECL, FIELD_DECL, PARM_DECL, etc.).


PTRMEM_CST

The C++ front end already has a PTRMEM_CST node.  However, the
existing node only contains a class (PTRMEM_CST_CLASS) and member
(PTRMEM_CST_MEMBER), and is unable to represent an arbitrary pointer
to member value.  This is especially evident when dealing with
multiple inheritance.  Consider the following example:

  struct B { int f (); };
  struct L : B {};
  struct R : B {};;
  struct D : L, R {};

  int (B::*pb)() = B::f;
  int (L::*pl)() = pb;
  int (R::*pr)() = pb;
  int (D::*pd[2])() = { pl, pr };

In this case, pd[0] and pd[1] both have the same type and point to the
same member of the same class (B::f), but they point to different base
class instances of B.  To represent this, we need an offset.  Now, one
might argue that rather than a numeric offset, we should point to the
_DECL of the base class subobject, but that breaks down because the
following is also legal:

  struct B {};
  struct D : B { int f (); };

  int (D::*pd)() = D::f;
  int (B::*pb)() = static_castint (B::*)()(pd);

In this case, pb points to D::f in the derived class.  Since there is
no subobject to point to, we see that a numeric offset representation
is required.

This leads to the definition of PTRMEM_CST which I have adopted.
Since the class type is already provided in its type, we store the
member (TREE_PTRMEM_CST_MEMBER) and numeric offset
(TREE_PTRMEM_CST_OFFSET).  The member is one of NULL (for NULL
pointers to members), a FIELD_DECL (for non-NULL pointers to data
members), or a FUNCTION_DECL (for non-NULL pointers to member
functions).  I've chosen the offset value according to convenience.
For NULL pointers to members, it's irrelevant.  For pointers to data
members, it's the offset of the member relative to the current class
(as determined by any type conversions).  For pointers to member
functions, it's the offset to the this pointer which must be passed to
the function.

PTRMEM_PLUS_EXPR

From the discussion above, it's clear that type conversions on
pointers to members require adjustments to the offsets (to fields or
this pointers).  We could handle this via CONVERT_EXPRs, but that has
two shortcomings: (1) it requires the middle end to compute offsets to
base class subobjects, and (2) as the first code example above
illustrates, multiple CONVERT_EXPRs cannot be folded together.  To
work around these issues, I've implemented the PTRMEM_PLUS_EXPR.  It's
a binary expression which takes two arguments, a PTRMEM_TYPE object,
and an integral offset expression.  These can be nicely constant
folded, either with other PTRMEM_PLUS_EXPRs or with PTRMEM_CSTs.

There's also an added benefit when dealing with NULL pointers to
members.  Consider the following code:

  struct B { int a; };
  struct L : B {};
  struct R : B {};;
  struct D : L, R {};

  int B::*pb = NULL;
  int L::*pl = pb;
  int R::*pr = pb;
  int D::*pd[2] = { pl, pr };

The C++ standard states that pd[0] == pd[1] since all NULL pointers to
members of the same type compare equal.  However, the current GCC
implementation gets this wrong because the C++ front end implements
pointer to data member via simple addition.  In practice, it needs to
check for NULL first.  However, folding stacked conversions then
requires optimizing code like:

  if (d != NULL_MARKER) d += offset1;
  if (d != NULL_MARKER) d += offset2;
  if (d != NULL_MARKER) d += offset3;

to

  if (d!= NULL_MARKER) d += 

Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Daniel Berlin
On 8/8/07, Ollie Wild [EMAIL PROTECTED] wrote:
 On 8/8/07, Daniel Berlin [EMAIL PROTECTED] wrote:
  I also haven't necessarily said what Ollie has proposed is a bad idea.
   I have simply said the way he has come up with what he proposed is
  not the way we should go about this.  It may turn out he has come up
  with exactly the representation we want (though I doubt this, for
  various reasons).The specification given also doesn't even explain
  where/how these operations can occur in GIMPLE, and what they do other
  than a C++ something something.
 
  Also given that someone already wrote a type-based devirtualizer that
  worked fine, and i don't see how a points-to one is much work, I'd
  like to see more justification for things like PTRMEM_PLUS_EXPR than
  hey, the C++ FE generates this internally.

 OK.  It sounds like I need to go into a lot more detail.  The new
 nodes I've proposed aren't actually motivated by the C++ front end,
 but rather by a consideration of the semantics dictated by the C++
 standard.  Naturally, this gives rise to some similarity, but for
 instance, there is no PTRMEM_PLUS_EXPR in the C++ front end, and my
 definition of PTRMEM_CST disagrees with the current node of the same
 name.

Okay, so what exactly does all of the below complexity buy us over
lowering like Michael suggests, given that we can do devirtualization
(which seemed to be the motivating optimiation) without any of this?
We generally only add things to GIMPLE when there is a really
compelling reason.  Other than it makes us not have to do
optimizations we are not going to stop doing anyway (IE addition of
constants like you give below).

Certainly, absolutely none of this helps type based devirtualization,
since it wouldn't care about any of the below except PTRMEM_REF.

As the points-to person, I can tell you none of this will give
points-to based devirtualization any benefit over completely lowering
it unless you were to implement flow-sensitive, context-sensitive
points-to (which is at least O(n^6) time, and currently, more than
that space wise).

We would see the lowered form as additions into a constant table, and
figure out the offset into that table, just as you are doing here.

To me, the only benefit is that you expose a higher level concept, but
i don't see any optimization potential except possible value numbering
of PTRMEM_CST nodes, which we get even on the lowered representation.

I'm really not trying to rain on your parade, I just don't see why
it's a win to do this.  I'd be very convinced, however, if there was
something spectacular we could do that we can't do now.  IE if there
are times we will never be able to recover the information, and it
enabled us to do something cool.  I'd also be convinced if there were
tons of passes that could use this info to do something, and it was
non-trivial to reconstruct it (this is why i've always been in favor
of having a point in the compiler were array_ref was allowed on
pointers, or at least, we had the index info for indirect_ref's :P)

Right now, as Mark says, it seems the information is all still there,
even if it is a bit obfuscated, and I don't see why that many things
are going to care about digging it out.

I'd love to hear other opinions on the matter, of course :)



I
 Let's walk through them:


 PTRMEM_TYPE

 Contains the types of the member (TREE_TYPE) and class
 (TYPE_PTRMEM_BASETYPE) of this pointer to member.  This is hopefully
 self-explanatory.  In the language of the C++ standard, it is the type
 of a pointer to member of class TYPE_PTRMEM_BASETYPE of type
 TREE_TYPE.  This is the type of PTRMEM_CST's, PTRMEM_PLUS_EXPR's, and
 various variable types (VAR_DECL, FIELD_DECL, PARM_DECL, etc.).


 PTRMEM_CST

 The C++ front end already has a PTRMEM_CST node.  However, the
 existing node only contains a class (PTRMEM_CST_CLASS) and member
 (PTRMEM_CST_MEMBER), and is unable to represent an arbitrary pointer
 to member value.  This is especially evident when dealing with
 multiple inheritance.  Consider the following example:

   struct B { int f (); };
   struct L : B {};
   struct R : B {};;
   struct D : L, R {};

   int (B::*pb)() = B::f;
   int (L::*pl)() = pb;
   int (R::*pr)() = pb;
   int (D::*pd[2])() = { pl, pr };

 In this case, pd[0] and pd[1] both have the same type and point to the
 same member of the same class (B::f), but they point to different base
 class instances of B.  To represent this, we need an offset.  Now, one
 might argue that rather than a numeric offset, we should point to the
 _DECL of the base class subobject, but that breaks down because the
 following is also legal:

   struct B {};
   struct D : B { int f (); };

   int (D::*pd)() = D::f;
   int (B::*pb)() = static_castint (B::*)()(pd);

 In this case, pb points to D::f in the derived class.  Since there is
 no subobject to point to, we see that a numeric offset representation
 is required.

 This leads to the definition of PTRMEM_CST which I have 

Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Tom Tromey
 Ollie == Ollie Wild [EMAIL PROTECTED] writes:

Ollie 1.  Is pointer to member migration worthwhile?  Can other languages
Ollie besides C++ benefit from this?

Not Java.  You might ask Andrea about CLR though.

Ollie 4.  Is a migration of virtual functions and virtual function tables
Ollie required?  Are the various language implementations sufficiently
Ollie similar to be handled by a common infrastructure?

Java has 2 kinds of indirect function call (to be precise, this is the
C++ ABI.  The binary compatibility ABI is different -- but also
isn't open to this kind of optimization).  Virtual calls are
implemented in a way that is compatible with C++.  Interface calls are
implemented in a different way.

Tom


Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Tom Tromey
 Michael == Michael Matz [EMAIL PROTECTED] writes:

Michael Yes, devirtualization.  But I wonder if you really need class
Michael hierarchies for this (actually I'm fairly sure you don't).

I'm generally in favor of what you talked about in this note and
others, and also Danny's overall approach of trying to design trees as
a language-neutral IR.

However, I'm not sure I agree with the above assertion.  Specifically,
for Java I think it is sometimes possible to strength reduce interface
calls to virtual calls, but I don't see how this could be done without
class hierarchy information.

Also in Java it is possible to devirtualize calls in some situations
where only a bound on the type is known.  For instance at a call site
we might know that all possible targets are derived from a class where
the virtual method is final.

I have no idea whether these cases matter, but they do exist.  There
are also type-related optimizations that can be done on Java that
don't involve devirtualization.

We have some code kicking around that does devirtualization and some
other things for Java.  We never put this in since it has a huge hack:
the new passes aren't language-neutral and are added at runtime.

Tom


Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Joe Buck
On Thu, Aug 09, 2007 at 02:29:28PM -0600, Tom Tromey wrote:
 Also in Java it is possible to devirtualize calls in some situations
 where only a bound on the type is known.  For instance at a call site
 we might know that all possible targets are derived from a class where
 the virtual method is final.

Likewise in C++ we might know that a method is effectively final because
no class overrides it.


Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Joe Buck
On Thu, Aug 09, 2007 at 02:11:34PM -0700, Joe Buck wrote:
 On Thu, Aug 09, 2007 at 02:29:28PM -0600, Tom Tromey wrote:
  Also in Java it is possible to devirtualize calls in some situations
  where only a bound on the type is known.  For instance at a call site
  we might know that all possible targets are derived from a class where
  the virtual method is final.

I wrote:
 Likewise in C++ we might know that a method is effectively final because
 no class overrides it.

Whoops, that didn't come out the way that I intended.  I meant to say that
we might know that a given method (member function) call is effectively
final because no class that the pointer/reference can refer to overrides
the call.


Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Mark Mitchell
Daniel Berlin wrote:

 This is the source of my current woes, as this may involve virtual
 function resolution, which can't be done with the information
 currently available to the middle end.

Ollie, IIRC, you posted an example where, together with your front-end
lowering patch (i.e., with the strategy that Daniel is advocating), we
don't do some optimization that we could in theory do.  Have you worked
out why not?  Perhaps that would shed some light on whether or not it's
tractable to recover this information from our current IR.

Thanks,

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Daniel Berlin
On 8/9/07, Ollie Wild [EMAIL PROTECTED] wrote:
 On 8/9/07, Mark Mitchell [EMAIL PROTECTED] wrote:
  Daniel Berlin wrote:
 
   This is the source of my current woes, as this may involve virtual
   function resolution, which can't be done with the information
   currently available to the middle end.
 
  Ollie, IIRC, you posted an example where, together with your front-end
  lowering patch (i.e., with the strategy that Daniel is advocating), we
  don't do some optimization that we could in theory do.  Have you worked
  out why not?  Perhaps that would shed some light on whether or not it's
  tractable to recover this information from our current IR.

 The example I posted before indicated that the check to determine
 whether or not the member function is virtual wasn't optimized out.
 It didn't know that (fn_ptr  1) == 0.  That should be easy to fix by
 incorporating pointer alignment requirements in fold_binary().

 Offhand, I don't remember what happened with the various other cases,
 but my testing at the time wasn't particularly thorough.  The feedback
 I've gotten so far seems overwhelmingly negative, so I think the next
 step is to revisit the lowering approach, exercise the hell out of it,
 and see what, if any, limitations pop up.

 Daniel mentioned something about a pre-existing type-based
 devirtualizer.  I'd be interested to see how it works.  From what I've
 been able to gleam from the GCC code, it seems that such an approach
 would need to hook back into the front end, which is a showstopper
 from an LTO perspective.  It's entirely possible that I'm missing
 something, though.

See
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg00744.html

It only calls into the frontend to
1. Fold obj_type_ref_nodes (which would require *lowering* or defining
the semantics of it,  not raising)
2. Determine what is a ctor/dtor (something we do not expose to the middle-end)



 Ollie



Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Ollie Wild
On 8/9/07, Mark Mitchell [EMAIL PROTECTED] wrote:
 Daniel Berlin wrote:

  This is the source of my current woes, as this may involve virtual
  function resolution, which can't be done with the information
  currently available to the middle end.

 Ollie, IIRC, you posted an example where, together with your front-end
 lowering patch (i.e., with the strategy that Daniel is advocating), we
 don't do some optimization that we could in theory do.  Have you worked
 out why not?  Perhaps that would shed some light on whether or not it's
 tractable to recover this information from our current IR.

The example I posted before indicated that the check to determine
whether or not the member function is virtual wasn't optimized out.
It didn't know that (fn_ptr  1) == 0.  That should be easy to fix by
incorporating pointer alignment requirements in fold_binary().

Offhand, I don't remember what happened with the various other cases,
but my testing at the time wasn't particularly thorough.  The feedback
I've gotten so far seems overwhelmingly negative, so I think the next
step is to revisit the lowering approach, exercise the hell out of it,
and see what, if any, limitations pop up.

Daniel mentioned something about a pre-existing type-based
devirtualizer.  I'd be interested to see how it works.  From what I've
been able to gleam from the GCC code, it seems that such an approach
would need to hook back into the front end, which is a showstopper
from an LTO perspective.  It's entirely possible that I'm missing
something, though.

Ollie


Re: [RFC] Migrate pointers to members to the middle end

2007-08-09 Thread Daniel Berlin
On 8/9/07, Joe Buck [EMAIL PROTECTED] wrote:
 On Thu, Aug 09, 2007 at 02:29:28PM -0600, Tom Tromey wrote:
  Also in Java it is possible to devirtualize calls in some situations
  where only a bound on the type is known.  For instance at a call site
  we might know that all possible targets are derived from a class where
  the virtual method is final.

 Likewise in C++ we might know that a method is effectively final because
 no class overrides it.



Just to be clear, we *already* have the class hierarchies in the middle end.

They have been there for a few years now :)


Re: [RFC] Migrate pointers to members to the middle end

2007-08-08 Thread Michael Matz
Hi,

On Tue, 7 Aug 2007, Ollie Wild wrote:

 In response to a suggestion from Mark Mitchell, I've been attempting to 
 migrate pointers to members to the GCC middle end.  The goal of this is 
 twofold: (a) to enable conversion of pointer to member dereferences to 
 direct function calls and member accesses when analysis determines this 
 is unambiguous and (b) to obsolete the need for the expand_constant 
 language hook.
 
 Under my current approach, I've added the following new nodes to 
 gcc/tree.def:
 
   DEFTREECODE (PTRMEM_TYPE, ptrmem_type, tcc_type, 0)
   DEFTREECODE (PTRMEM_CST, ptrmem_cst, tcc_constant, 0)
   DEFTREECODE (PTRMEM_PLUS_EXPR, ptrmem_plus_expr, tcc_binary, 2)
   DEFTREECODE (PTRMEM_REF, ptrmem_ref, tcc_reference, 2)
 
 I then modify the C++ front end to instantiate the new nodes, expand
 them inside expand_expr_real_1 and output_constant, and perform
 folding in the various fold-const functions.

So those tree expressions would live throughout the middle-end and only 
then become lowered to RTL directly?  I'm not sure that's worthwhile.  
E.g. I'm not sure why there's a need to really get rid of the 
expand_constant langhook.  It's only important that it isn't called too 
late, i.e. ideally during gimplification.  It seems it only makes use of 
type information which should be available at that time, so if it 
currently is called too late (interfering with LTO in the future) it 
should be possible to move it earlier.

I have a conceptual problem with moving pointer to members into the 
middle-end: my mental model of what the middle-end should be concerned 
about is complete expressions/constants/types, like adding two numbers, 
accessing an integer two words away from that address (i.e. you see I 
already sort of decompose structures in my mental model).  Pointers to 
members is a very different beast: they can't be accessed without a real 
object, yet they can be stored into objects themself (sort of an 
incomplete memory reference).  If anything they simply resemble offsets 
(perhaps variable ones), so you might perhaps model them as such.  
Conversions between them sometimes requires adjustments to 'this', 
resulting in real operations (the delta field of the struct, how pointer 
to member values are currently modelled).  IMHO it would be wrong if we 
wouldn't make those adjustments explicit in the middle end.

So, why do you think you need the PTRMEM_TYPE in the middle end?  And why 
the PTRMEM_CST (i.e. why couldn't it be lowered to some explicit constant 
during gimplificaton)?  Same for PTRMEM_PLUS_EXPR, why is (PTR_)PLUS_EXPR 
not enough, if the semantic is only to add the integer argument to the 
pointer argument (is that even an operation which can be done to pointers 
to members?)?  Also PTRMEM_REF seems to equivalent to a normal 
COMPONENT_REF, just that the second operand is a funny offset 
specification instead of a simple field decl.

 However, pointers to virtual functions are turning out to be 
 problematic.  As far I can tell, the middle end has no concept of 
 virtual functions and virtual function tables: they appear to be 
 implemented solely in the C++ front end.  This suggests that a migration 
 of the virtual function machinery is a necessary precondition to pointer 
 to member migration.

Ugh, I wouldn't like that either.  I have the feeling that it would drag 
too much specifics of C++ into the middle end.  After all e.g. the virtual 
tables have to follow a certain layout according to the C++ ABI, which 
needn't be the right one for other languages.  I think you need only one 
feature, namely given a definite class type and an offset into the 
vtable, what definite FUNCTION_DECL that corresponds too.  I can't think 
of many places where you'd like to have this information, as the most 
interesting user of it would be the inliner.  There aren't that many 
transformations which make a former indefinite class type definite, and 
most of them can be done when the C++ frontend is still around to ask.

If you were to implement something like virtual functions into the middle 
end, it should be expressed in a fairly low level way IMHO.  E.g. a 
virtual table simply being a vector of pointers to function decls (which 
we can express already just fine).  That way they could also be written 
out for LTO and read back in, and the question what function decl is 
connected to what slot can also be answered trivially.  Then definite 
class type merely has the characteristic that they can point to such a 
function table, whereas indefinite class types (i.e. those whose runtime 
type can be any derived one) can not.  E.g. I wouldn't try to model the 
inheritance relationship.

But even with that I don't really see the need for new tree nodes.  
Pointer to members are a fancy offset, so why not model them as such?  
It's obviously possible I'm missing something, in that case, please 
educate me where the problems are ... :-)


Ciao,
Michael.


Re: [RFC] Migrate pointers to members to the middle end

2007-08-08 Thread Ian Lance Taylor
Michael Matz [EMAIL PROTECTED] writes:

 If you were to implement something like virtual functions into the middle 
 end, it should be expressed in a fairly low level way IMHO.  E.g. a 
 virtual table simply being a vector of pointers to function decls (which 
 we can express already just fine).  That way they could also be written 
 out for LTO and read back in, and the question what function decl is 
 connected to what slot can also be answered trivially.  Then definite 
 class type merely has the characteristic that they can point to such a 
 function table, whereas indefinite class types (i.e. those whose runtime 
 type can be any derived one) can not.  E.g. I wouldn't try to model the 
 inheritance relationship.

There is some advantage to knowing class heirarchy relationships in
LTO.  Some C++ programs implement different virtual subclasses in
different files.  LTO can put those together.  When the compiler can
then determine that a variable definitely has a particular subclass,
it can devirtualize the virtual calls, turning an indirect function
call into a direct function calls, also exposing inlining
opportunities.

I don't know how important an optimization this is, but it seems like
a real one, and one which is only available if the LTO middle-end
knows something about class relationships.

Ian


Re: [RFC] Migrate pointers to members to the middle end

2007-08-08 Thread Daniel Berlin
On 08 Aug 2007 17:36:43 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 Michael Matz [EMAIL PROTECTED] writes:

  If you were to implement something like virtual functions into the middle
  end, it should be expressed in a fairly low level way IMHO.  E.g. a
  virtual table simply being a vector of pointers to function decls (which
  we can express already just fine).  That way they could also be written
  out for LTO and read back in, and the question what function decl is
  connected to what slot can also be answered trivially.  Then definite
  class type merely has the characteristic that they can point to such a
  function table, whereas indefinite class types (i.e. those whose runtime
  type can be any derived one) can not.  E.g. I wouldn't try to model the
  inheritance relationship.

 There is some advantage to knowing class heirarchy relationships in
 LTO.  Some C++ programs implement different virtual subclasses in
 different files.  LTO can put those together.  When the compiler can
 then determine that a variable definitely has a particular subclass,
 it can devirtualize the virtual calls, turning an indirect function
 call into a direct function calls, also exposing inlining
 opportunities.

 I don't know how important an optimization this is, but it seems like
 a real one, and one which is only available if the LTO middle-end
 knows something about class relationships.

I believe we do want to have this info in the middle end, but i don't
necessarily believe what Ollie's current approach is the best one.

It looks like all the code is still C++ specific, and it's semantics
are only defined by how they get generated in cp/*

I'd rather see us go the route of deciding what the semantics *should
be*, which of these tree codes are *actually necessary*, then make
cp/* gimplify it to what we've got.  The semantics should also be
completely documented independently of the C++ FE (even if we were
decide to give the middle end codes the exact same semantics as C++)

For example, I see no reason for PTRMEM_PLUS_EXPR.

To put it concretely, i think the proposal has gone backwards, and
worked from we have an implementation of something in the C++ FE,
let's move it to the middle end instead of we have a design for
something we want to do in the middle end, let's make the C++
FE/gimplifier do it.

It's not at all clear to me from the current proposal where/when/how
these tree codes occur in the normal GIMPLE datastream, how I would
handle them in points-to analysis, etc.

Note that to directly adress your point, the middle end *already does*
have a notion of class relationships.  The BINFO_* (base class
information) is already common to the middle-end.  This is how the
ipa-cha stuff that was submitted a while ago knows how to find base
classes.

Besides type-based devirt, you will also get a lot out of points-to
based devirt, because it can do without the whole program, whereas
type-based devirt cannot.

--Dan


Re: [RFC] Migrate pointers to members to the middle end

2007-08-08 Thread Michael Matz

Hi,

On Thu, 8 Aug 2007, Ian Lance Taylor wrote:

those whose runtime type can be any derived one) can not.  E.g. I 
wouldn't try to model the inheritance relationship.


There is some advantage to knowing class heirarchy relationships in LTO. 
Some C++ programs implement different virtual subclasses in different 
files.  LTO can put those together.  When the compiler can then 
determine that a variable definitely has a particular subclass, it can 
devirtualize the virtual calls, turning an indirect function call into a 
direct function calls, also exposing inlining opportunities.


Yes, devirtualization.  But I wonder if you really need class hierarchies 
for this (actually I'm fairly sure you don't).  In effect you only need to 
determine where this virtual call, when you know the definite runtime type 
of the object pointer, points to, i.e. to which function decl.  For that 
you don't need the class hierarchy (i.e. the edges in the inheritance 
graph, or in fact any information about what base classes might or might 
exist or about other classes), but simply a list of all slot-number - 
function-decls mappings for that type, i.e. the vtable (but let's call it 
different to not confuse it with the C++-ABI thingy which actually is 
written to the .o file).  As the middle end should have all thunks already 
also 'this' pointer adjusting virtual calls should be taken care of (i.e. 
the slot-function mapping should already have the thunks referenced).


Then you just need a way to get from a definite type to that 
slot-function mapping.  As the LTO frontend needs to emit something 
similar anyway somewhere (as it needs to express all C++ types in some 
lowered form appropriate for the LTO reader), it will be available 
somehow, presumably hanging off the RECORD_TYPE.  If you have the definite 
runtime type of the pointer, you also have that RECORD_TYPE, hence the 
slot-function mapping, the slot number from the virtual call itself, and 
ergo the finally called function_decl.  No need for hierarchies.


There were also other patches already floating around which (tried to) 
implement devirtualization (via profile feedback testing at runtime if a 
pointer was of certain type), which didn't need real inheritance 
hierarchies in the middle-end, so it can be done.


We also need to make sure to not munge together too many of these not 
entirely trivial topics.  We have pointer to members (IMHO just fancy 
offsets) and virtual function calls (for devirtualization) up to now. 
They only relate via pointer to virtual member functions, which still are 
only fancy offsets (referring to a slot number, not a byte offset).  So 
IMHO the C++ frontend should lower all these constructs as much as 
possible and try to express them in basic types and expressions, instead 
of pulling the whole hair into the middle end.  If something is right now 
not possible in the middle-end, then we should try to carefully add the 
necessary information (and only that) to enable whatever we want.  For 
devirtualization I think I pointed out one possibility.  I'm not sure what 
else we want.  Surely I've seen nothing which would make me think hell, 
yes, let's pull pointer to members into the middle end, and as we are at 
it, let's add virtual functions right away too ;-)



Ciao,
Michael.

PS: From time to time I'm forced to dive into either of these areas of 
cp/*.[ch] and it has some very complex code.  Most of it really front-end 
related I know, but intertwined with the code generation and layouting 
code.  I'm fairly certain that this all should stay in cp/ .


Re: [RFC] Migrate pointers to members to the middle end

2007-08-08 Thread Ollie Wild
On 8/8/07, Michael Matz [EMAIL PROTECTED] wrote:
 So those tree expressions would live throughout the middle-end and only
 then become lowered to RTL directly?  I'm not sure that's worthwhile.
 E.g. I'm not sure why there's a need to really get rid of the
 expand_constant langhook.  It's only important that it isn't called too
 late, i.e. ideally during gimplification.  It seems it only makes use of
 type information which should be available at that time, so if it
 currently is called too late (interfering with LTO in the future) it
 should be possible to move it earlier.

You're correct.  It is possible to remove the expand_constant language
hook without supporting pointers to members in the middle end.  In
fact, I submitted such a patch back in March
(http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01819.html).  The
various follow-up emails illustrate some of the reasons why migrating
pointers to members to the middle end is a good thing (at least for
C++).

It would have been more accurate for me to describe the language hooks
removal as a jumping off point.  Strictly speaking, the middle end
migration is only necessary for language hook removal if the C++
maintainers won't approve my patch.  :)

That said, I think there is real value in moving pointers to members
to the middle end.  Keep reading.

 I have a conceptual problem with moving pointer to members into the
 middle-end: my mental model of what the middle-end should be concerned
 about is complete expressions/constants/types, like adding two numbers,
 accessing an integer two words away from that address (i.e. you see I
 already sort of decompose structures in my mental model).  Pointers to
 members is a very different beast: they can't be accessed without a real
 object, yet they can be stored into objects themself (sort of an
 incomplete memory reference).  If anything they simply resemble offsets
 (perhaps variable ones), so you might perhaps model them as such.
 Conversions between them sometimes requires adjustments to 'this',
 resulting in real operations (the delta field of the struct, how pointer
 to member values are currently modelled).  IMHO it would be wrong if we
 wouldn't make those adjustments explicit in the middle end.

I think the primary purpose of the middle end is to provide a
representation which captures the semantics of a program at a
sufficiently high level to enable efficient optimization.  COMPLEX_CST
and COMPLEX_TYPE are a good example.  In theory, the middle end has
enough information to optimize complex arithmetic based solely on the
constituent operations on real and imaginary components, but it's
easier to deal with the complex number as an atomic unit.

Similarly, consider the following code fragment:

  struct S { virtual void f(); };
  typedef void (S::*P)(void);
  const P p = S::f, NULL;
 void g(S s) {
   (s.*p)();
 }

GCC should be able to optimize g() to call s.S::f() directly.  In
theory, it can optimize out the null pointer to member check, the
virtual bit check, and the vtable lookup, but that's a lot of work.
Right now, GCC can't do it.

 So, why do you think you need the PTRMEM_TYPE in the middle end?  And why
 the PTRMEM_CST (i.e. why couldn't it be lowered to some explicit constant
 during gimplificaton)?  Same for PTRMEM_PLUS_EXPR, why is (PTR_)PLUS_EXPR
 not enough, if the semantic is only to add the integer argument to the
 pointer argument (is that even an operation which can be done to pointers
 to members?)?  Also PTRMEM_REF seems to equivalent to a normal
 COMPONENT_REF, just that the second operand is a funny offset
 specification instead of a simple field decl.

The implementation is certainly negotiable.  That's part of why I sent
out this email.

For pointers to data members, PTRMEM_CST doesn't give much information
that isn't already provided by integers of OFFSET_TYPE (It does
explicitly indicate NULL pointers to members, which offsets do not.
In fact, GCC currently handles NULL pointer to member casts
incorrectly, as it fails to preserve NULLs).  However, for pointers to
member functions, it bypasses the need to decode an
architecture-dependent virtual bit, and replaces the virtual function
offset with FUNCTION_DECL, which means we don't need to decode vtable
lookups in order to inline function calls.

PTRMEM_PLUS_EXPR is supposed to represent casts (CAST_EXPR didn't seem
appropriate, but I could be convinced otherwise).  It expands to a
conditional which propagates NULL or increments the offset.  I don't
think PTR_PLUS_EXPR checks for NULL, and it's designed to increment
pointers.  A pointer to data member expands to an offset, and a
pointer to member function expands to a fairly complex structure.  In
fact, for some architectures, the offset field is shifted, so
PTRMEM_PLUS_EXPR (p, 1), would actually have to add 2 to the offset.

Take a look at the expand_ptrmemfunc_cst() and build_ptrmemfunc1()
calls inside cplus_expand_constant().  Now reverse that to see the
amount of work that 

Re: [RFC] Migrate pointers to members to the middle end

2007-08-08 Thread Mark Mitchell
Ollie Wild wrote:
 On 8/8/07, Michael Matz [EMAIL PROTECTED] wrote:

Ollie, thanks for patiently trying out different approaches.

 I think the primary purpose of the middle end is to provide a
 representation which captures the semantics of a program at a
 sufficiently high level to enable efficient optimization.  COMPLEX_CST
 and COMPLEX_TYPE are a good example.  In theory, the middle end has
 enough information to optimize complex arithmetic based solely on the
 constituent operations on real and imaginary components, but it's
 easier to deal with the complex number as an atomic unit.

I agree.

When I discussed this with Ollie, my feeling was that, yes, it's
possible to express all this stuff in terms of GIMPLE (in fact, we
already do, aside from PTRMEM_CST itself!), but we're essentially
obfuscating information and then trying to get it back.  Creating
RECORD_TYPE instances to represent pointers-to-member-functions and then
hoping that the middle end will scalarize the struct, fold all the tests
for virtual-ness, dereference the virtual table entry, etc. seems like a
lot of work to do de-virtualization.  But, the information is in fact
there, so I guess we could go that way.

Michael and Danny have expressed opinions; does anyone else have one?

Thanks,

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: [RFC] Migrate pointers to members to the middle end

2007-08-08 Thread Daniel Berlin
On 8/8/07, Mark Mitchell [EMAIL PROTECTED] wrote:
 Ollie Wild wrote:
  On 8/8/07, Michael Matz [EMAIL PROTECTED] wrote:

 Ollie, thanks for patiently trying out different approaches.

  I think the primary purpose of the middle end is to provide a
  representation which captures the semantics of a program at a
  sufficiently high level to enable efficient optimization.  COMPLEX_CST
  and COMPLEX_TYPE are a good example.  In theory, the middle end has
  enough information to optimize complex arithmetic based solely on the
  constituent operations on real and imaginary components, but it's
  easier to deal with the complex number as an atomic unit.

 I agree.

 When I discussed this with Ollie, my feeling was that, yes, it's
 possible to express all this stuff in terms of GIMPLE (in fact, we
 already do, aside from PTRMEM_CST itself!), but we're essentially
 obfuscating information and then trying to get it back.

This is true of all the lowering we perform (loops-gotos, for example)

  Creating
 RECORD_TYPE instances to represent pointers-to-member-functions and then
 hoping that the middle end will scalarize the struct, fold all the tests
 for virtual-ness, dereference the virtual table entry, etc. seems like a
 lot of work to do de-virtualization.
  But, the information is in fact
 there, so I guess we could go that way.

We have no need for scalarization to occur to perform points-to *or*
type based devirtualization.  Dereferencing is just a call, it is
always there anyway.
Folding is something we have to be doing anyway.
I just don't see this as as much work as you do.  Both the type-based
that was posted before, and the points-to based one, are just not that
large in terms of code.

I also haven't necessarily said what Ollie has proposed is a bad idea.
 I have simply said the way he has come up with what he proposed is
not the way we should go about this.  It may turn out he has come up
with exactly the representation we want (though I doubt this, for
various reasons).The specification given also doesn't even explain
where/how these operations can occur in GIMPLE, and what they do other
than a C++ something something.

Also given that someone already wrote a type-based devirtualizer that
worked fine, and i don't see how a points-to one is much work, I'd
like to see more justification for things like PTRMEM_PLUS_EXPR than
hey, the C++ FE generates this internally.

So i have no real objection to either the type or the CST, I just want
to see semantics that are actually described in terms of our language
independent intermediate language, not in terms what the C++ FE
does.

Example:
* A pointer-to-member constant.
TREE_PTRMEM_CST_MEMBER is the _DECL for the member.
TREE_PTRMEM_CST_OFFSET takes on different interpretations depending on
the type of the member.  If the member is NULL, it is ignored.  If the
member is a FIELD_DECL it refers to the field offset.  Otherwise, it refers
to the offset of the this pointer passed to methods.  */

No justification was given why there is a member and an offset, what
is allowed to be the _DECL for the member, why member is allowed to be
null, why member would sometimes be a FIELD_DECL, sometimes NULL, and
apparently sometimes neither NULL nor a FIELD_DECL, how to find the
this pointer offset refers to (only sometimes, apparently!), etc.

As I said, i'm really not necessarily opposed to seeing pointer to
members in the middle end, i'd just rather not see us underspecify it
and just take what the C++ FE currently does as the word of the
lord.

 Michael and Danny have expressed opinions; does anyone else have one?

 Thanks,

 --
 Mark Mitchell
 CodeSourcery
 [EMAIL PROTECTED]
 (650) 331-3385 x713



[RFC] Migrate pointers to members to the middle end

2007-08-07 Thread Ollie Wild
Hi,

In response to a suggestion from Mark Mitchell, I've been attempting
to migrate pointers to members to the GCC middle end.  The goal of
this is twofold: (a) to enable conversion of pointer to member
dereferences to direct function calls and member accesses when
analysis determines this is unambiguous and (b) to obsolete the need
for the expand_constant language hook.

Under my current approach, I've added the following new nodes to gcc/tree.def:

  /* A C++ pointer-to-member type.
 The TREE_TYPE field is the member type.
 The TYPE_PTRMEM_BASETYPE points to the node for the class to which the
 member belongs.  */
  DEFTREECODE (PTRMEM_TYPE, ptrmem_type, tcc_type, 0)

  /* A pointer-to-member constant.
 TREE_PTRMEM_CST_MEMBER is the _DECL for the member.
 TREE_PTRMEM_CST_OFFSET takes on different interpretations depending on
 the type of the member.  If the member is NULL, it is ignored.  If the
 member is a FIELD_DECL it refers to the field offset.  Otherwise, it refers
 to the offset of the this pointer passed to methods.  */
  DEFTREECODE (PTRMEM_CST, ptrmem_cst, tcc_constant, 0)

  /* Pointer to member addition.  This is used in type conversions to adjust the
 pointer to member offset.  The first operand is a pointer to
member, and the
 second operand is an integer of type sizetype.  */
  DEFTREECODE (PTRMEM_PLUS_EXPR, ptrmem_plus_expr, tcc_binary, 2)

  /* A C++ pointer-to-member dereference.
 Operand 0 is a class expression.
 Operand 1 is a pointer-to-member.  */
  DEFTREECODE (PTRMEM_REF, ptrmem_ref, tcc_reference, 2)

I then modify the C++ front end to instantiate the new nodes, expand
them inside expand_expr_real_1 and output_constant, and perform
folding in the various fold-const functions.

However, pointers to virtual functions are turning out to be
problematic.  As far I can tell, the middle end has no concept of
virtual functions and virtual function tables: they appear to be
implemented solely in the C++ front end.  This suggests that a
migration of the virtual function machinery is a necessary
precondition to pointer to member migration.

Some questions for the GCC community:

1.  Is pointer to member migration worthwhile?  Can other languages
besides C++ benefit from this?

2.  Do the tree nodes I've discussed make sense?  Is there a better
way to represent pointer to member objects and operations?

3.  What can I do to handle virtual functions?  Is it possible to
handle this without migrating virtual functions to the middle end?

4.  Is a migration of virtual functions and virtual function tables
required?  Are the various language implementations sufficiently
similar to be handled by a common infrastructure?

Any and all feedback is appreciated.

Thanks,
Ollie