Re: Cgraph Modification Plan

2012-09-13 Thread Michael Matz
Hi,

On Tue, 11 Sep 2012, Lawrence Crowl wrote:

 change
 node-symbol.whatever
 to
 node-ref_symbol().whatever

Please don't uglify the compiler with this.  If GTY deficiencies force you 
to do that, then hold off on it until GTY doesn't force you anymore.


Ciao,
Michael.


Re: Cgraph Modification Plan

2012-09-12 Thread Diego Novillo

On 2012-09-11 16:22 , Lawrence Crowl wrote:

We do not yet seem to have consensus on a long term plan.
Would it be reasonable to start on short term prepatory work?

In particular, I was think we could do

Add converters and testers.
Change callers to use those.

and maybe

Change callers to use type-safe parameters.

Where those mean what I earlier stated.

Comments?


Seems straightforward enough.  And it doesn't seem like it will affect 
the longer term conversion.



Diego.


Re: Cgraph Modification Plan

2012-09-12 Thread Jan Hubicka
 We do not yet seem to have consensus on a long term plan.
 Would it be reasonable to start on short term prepatory work?
 
 In particular, I was think we could do
 
Add converters and testers.
Change callers to use those.
 
 and maybe
 
Change callers to use type-safe parameters.
 
 Where those mean what I earlier stated.
 
 Comments?

Yes, it seems sane.
 
 CONVERTERS AND TESTERS ###
 
 add
 symtab_node_base symtab_node_def::ref_symbol()
   { return symbol; }
 symtab_node_base cgraph_node::ref_symbol()
   { return symbol; }
 symtab_node_base varpool_node::ref_symbol()
   { return symbol; }
 
 change
 node-symbol.whatever
 to
 node-ref_symbol().whatever

I still do not gather why simple inheritance can't let me to write
node-whatever again...

If that is GTY limitation, I have no problem having this in short term and
update it later.  symbol_node is really base of functions/variables and new
beasts we are going to get soon.
I implemented it as union only to make GTY happy.
 
 
 
 should not need to add these
 
 cgraph_node symtab_node_def::ref_cgraph()
   { gcc_assert (symbol.type == SYMTAB_FUNCTION);

I think checking_assert is enough in these cases. We are getting more
conversions back and forth and we don't really want to have asserts all around
code.

 return x_function; }
 varpool_node symtab_node_def::ref_varpool()
   { gcc_assert (symbol.type == SYMTAB_VARIABLE);
 return x_variable; }
 
 
 
 add
 symtab_node_base *symtab_node_def::try_symbol()
   { return symbol; }
 cgraph_node *symtab_node_def::try_cgraph()
   { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; }
 varpool_node *symtab_node_def::try_varpool()
   { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; }

I think try_function/try_variable reads better than cgraph/varpool especially 
with the
longer term plan to drop these names when cgraph/varpool no longer exists and 
we have
symbol table.

Honza


Re: Cgraph Modification Plan

2012-09-12 Thread Lawrence Crowl
On 9/12/12, Jan Hubicka hubi...@ucw.cz wrote:
 We do not yet seem to have consensus on a long term plan.
 Would it be reasonable to start on short term prepatory work?

 In particular, I was think we could do

Add converters and testers.
Change callers to use those.

 and maybe

Change callers to use type-safe parameters.

 Where those mean what I earlier stated.

 Comments?

 Yes, it seems sane.


 CONVERTERS AND TESTERS ###

 add
 symtab_node_base symtab_node_def::ref_symbol()
  { return symbol; }
 symtab_node_base cgraph_node::ref_symbol()
  { return symbol; }
 symtab_node_base varpool_node::ref_symbol()
  { return symbol; }

 change
 node-symbol.whatever
 to
 node-ref_symbol().whatever

 I still do not gather why simple inheritance can't let me to write
 node-whatever again...

 If that is GTY limitation, I have no problem having this in
 short term and update it later.  symbol_node is really base of
 functions/variables and new beasts we are going to get soon.
 I implemented it as union only to make GTY happy.

It is a GTY limitation right now.  We can do most of this work before
gengtype supports inheritance, and for that we need ref_symbol.


 

 should not need to add these

 cgraph_node symtab_node_def::ref_cgraph()
  { gcc_assert (symbol.type == SYMTAB_FUNCTION);

 I think checking_assert is enough in these cases. We are getting
 more conversions back and forth and we don't really want to have
 asserts all around code.

The are untested conversions from base to derived.  We should not
be using them, and hence they should be rare.  I mention them only
because there might be places in the code where we get backed into
a corner.  I've changed my notes to use gcc_checking_assert, and
we can revisit the decision whenever we need to.

return x_function; }
 varpool_node symtab_node_def::ref_varpool()
  { gcc_assert (symbol.type == SYMTAB_VARIABLE);
return x_variable; }

 

 add
 symtab_node_base *symtab_node_def::try_symbol()
  { return symbol; }
 cgraph_node *symtab_node_def::try_cgraph()
  { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; }
 varpool_node *symtab_node_def::try_varpool()
  { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; }

 I think try_function/try_variable reads better than cgraph/varpool
 especially with the longer term plan to drop these names when
 cgraph/varpool no longer exists and we have symbol table.

Okay, I will do that.

-- 
Lawrence Crowl


Re: Cgraph Modification Plan

2012-09-11 Thread Lawrence Crowl
We do not yet seem to have consensus on a long term plan.
Would it be reasonable to start on short term prepatory work?

In particular, I was think we could do

   Add converters and testers.
   Change callers to use those.

and maybe

   Change callers to use type-safe parameters.

Where those mean what I earlier stated.

Comments?

CONVERTERS AND TESTERS ###

add
symtab_node_base symtab_node_def::ref_symbol()
{ return symbol; }
symtab_node_base cgraph_node::ref_symbol()
{ return symbol; }
symtab_node_base varpool_node::ref_symbol()
{ return symbol; }

change
node-symbol.whatever
to
node-ref_symbol().whatever



should not need to add these

cgraph_node symtab_node_def::ref_cgraph()
{ gcc_assert (symbol.type == SYMTAB_FUNCTION);
  return x_function; }
varpool_node symtab_node_def::ref_varpool()
{ gcc_assert (symbol.type == SYMTAB_VARIABLE);
  return x_variable; }



add
symtab_node_base *symtab_node_def::try_symbol()
{ return symbol; }
cgraph_node *symtab_node_def::try_cgraph()
{ return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; }
varpool_node *symtab_node_def::try_varpool()
{ return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; }

change
if (symtab_function_p (node)  cgraph (node)-analyzed)
  return cgraph (node);
to
if (cgraph_node *p = node-try_cgraph())
  if (p-analyzed)
return p;

change
if (symtab_function_p (node)  cgraph (node)-callers)
  
to
if (cgraph_node *p = node-try_cgraph())
  if (p-callers)


change
if (symtab_function_p (node))
  {
struct cgraph_node *cnode = cgraph (node);

to
if (cgraph_node *cnode = node-try_cgraph ())
  {



likewise symtab_variable_p (node) and varpool (node)



If there are any symtab_function_p (node) expressions left,

add
bool symtab_node_def::is_cgraph()
{ return symbol.type == SYMTAB_FUNCTION; }
bool symtab_node_def::is_varpool()
{ return symbol.type == SYMTAB_VARIABLE; }

change
symtab_function_p (node)
to
node-is_cgraph ()

likewise symtab_variable_p (node)




Though we would like to avoid doing so,
if there are any cgraph (node) or varpool (node) expressions left,

add

symtab_node_base *symtab_node_def::ptr_symbol()
{ return symbol; }
cgraph_node *symtab_node_def::ptr_cgraph()
{ gcc_assert (symbol.type == SYMTAB_FUNCTION);
{ return x_function; }
varpool_node *symtab_node_def::ptr_varpool()
{ gcc_assert (symbol.type == SYMTAB_VARIABLE);
{ return x_variable; }

change
cgraph (node) = node-ptr_cgraph()

likewise varpool (node)


TYPE SAFETY ###

If a function asserts that its symtab_node parameter is symtab_function_p,
then convert the function to take a cgraph_node*
and change the callers to convert as above.

-- 
Lawrence Crowl


Re: Cgraph Modification Plan

2012-09-07 Thread Jan Hubicka
 Sorry to interrupt here, but please finish the existing partial C++ 
 transitions
 instead of starting to work on new ones.  Current stage1 will not last forever
 (stage1 is usually 6 months, so its natural end would be end of September).
 I'd rather have the current transition to a symbol table finished than having
 that half-way done and half-way done in C++.  Please.

I am focusing on getting the transition finished for 4.8. Some discussion about
C++ transition (I would like to see done for 4.8 at least in the basic and
obvious way) and problems people have with the current design seems good to me.
We do not need to implement everything discussed in this thread immediately,
but it is good to have the discussion.
 
 Btw, I also think the current symtab hierarchy is somewhat flawed.  At
 the core a symtab entry should just be the symbol name and a list
 of entities associated with it (much similar to the LTO symtab stuff).

We do have that. There is list of symtab entries sharing the same symtab hash 
entry
(assembler name). I have WIP patch to make lto-symtab to use these instead of 
its
own hashes. They need more testing and bit of cleanup though.

 Entities then are callgraph nodes, varpool nodes or alias nodes (or other

Having alias nodes separate from callgrap/varpool is difficult thing. The 
problem
is that one realy wants to handle function aliases as functions and variable 
aliases as
variables (i.e. one want to have call to function alias, or read from variable 
alias but
not other ways around). Thus we currently have aliases inherited from 
functions/variables
both on tree declaration level and symbol table level.

I have patches to commonize more the alias handling that is now duplicated over
former cgraph/vaprool in queue though.
 stuff).  Thus, the current symtab_node_base is too fat, and decls,
 instead of having DECL_ASSEMBLER_NAME should have a pointer
 to the (new) symtab node they are associated with.

Yes, it is the longer term goal (or rather getting rid of assembler name in 
favor
of real symbol name + flags that are currently encoded in assembler name)
At the moment it does seem to make sense though when everything is tied to 
declarations
and eclarations have their own assembler name.
I will slowly work on this, but after 4.8.

Honza
 
 Thanks,
 Richard.


Re: Cgraph Modification Plan

2012-09-06 Thread Xinliang David Li
On Wed, Sep 5, 2012 at 10:14 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Wed, Sep 5, 2012 at 5:41 PM, Lawrence Crowl cr...@googlers.com wrote:
  On 9/5/12, Xinliang David Li davi...@google.com wrote:
  On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
   OK, the basic idea is that symtab_node is basetype of
   cgraph_node and varpool_node.  We may want to drop the historica
   cgraph/varpool names here, since function_node/variable_node
   would sound better. Cgraph still exists, but varpool is more
   or less historical relic.
 
  The cgraph redesign probably deserves more discussion.
 
  Hence, the post!
 
  1) It may be worthwhile to abstract the graph manipulation code
  into a utility class which is templatized.
 
  graphT, nodeT with node inheriting from T.
 
  While I won't dispute the statement, getting from here to there
  will likely involve some intermediate steps.

 Yes. My point is that the intermediate steps should be driven by the
 final goal. It is important to first have  the design of final class
 hierarchy and interfaces nailed and then the intermediate steps
 defined instead of the other way around.

 Yes, it seems resonable to me. We just need to keep in mind the specifics
 callgraph has (i.e. many types of edges that are relevant in different
 contextes and have good reason for different kind of in memory 
 reppresentation)
 There are also different types of nodes - functions, variables, aliases. I 
 plan
 to add labels in 4.8 timeframe, too).

Areas that are confusing and need clean up (IMO) include:
1) handling of aliases and clones
2) reachability, needed, analyzed bits.  The needed bit is not in sync
with the call to check if a function is needed.  Bits and attributes
used in analysis should be isolated out from the core data structure.
3) ipa references -- should they be modeled as special edges?


 
  2) Introduce a global symbol table containing a function table
  and a global variable table. The function table should replace
  the current cgraph node link list, and the variable table replaces
  the varpool.  The symbol table should provide basic interfaces to
  do named based lookup, traversal, alias handling etc.  I noticed
  trunk already has some of that -- but it seems more abstraction
  is needed.
 
  Do you mean moving away from a pointer-based approach?

 See above. I mean it is important to have well defined symtab
 interfaces that hide implementation as much as possible. It will make
 the interfaces more stable. It is currently quite difficult for
 cgraph/varpool related changes in gcc branches to keep up with trunk
 without stable core APIs.

 Well, the APIs did not changed much over years (LTO brought in some more busy
 changes, but many were just hacks that needed redesign anyway), in 4.8 we
 changed quite a lot because of introduction of the symbol table. It would be
 really nice to get as much as C++ conversion into 4.8, too, since we broke 
 most
 of existing patches anyway, to get more stability frm 4.9 up.

  3) it seems natural to drop 'node' in the naming scheme
 
  symbol (symtab_entry) -- base class;
  function -- derived from symbol;  (It conflicts with the existing
  struct function though);
  variable -- derived from symbol;
 
  typedef nodefunction cnode;
 
  A nodefunction is not a derived class of nodesymbol even when
  function is derived from symbol.  That property is helpful in
  ensuring usable type safety.


 We don't need nodesymbol -- only nodefunction is needed, and it is
 derived from function and function is derived from symbol.

 Yes, this is where I expected to land.  I expect to see more complex 
 inheritance
 tree.  Currently we have:

 symbol
   |
   +-function (cgraph)
   |
   +-variable (varpool)

 While function can have five types - just external declarations, actual
 definitions with body (analyzed flag is set), thunk, alias and virtual clone

 Similarly variables are declarations, ones with body (analyzed flag is set),
 and aliases.

 We do not represent labels and const_decl references. We ought: those are also
 symbols and they are important for partitioning.

 We probably want rither structure here in longer term:

 symbol
   |
   +-function declaration
   | |
   | +-function definition
   | |
   | +-function alternative entry point (thunk)
   | |
   | +-function alias
   | |
   | +-virtual clone
   |
   +-variable declaration
   | |
   | +-variable definition (possibly constant pool references via const_decl 
 here, too)
   | |
   | +-variable alias
   |
   +-function label definition

 In particular we do not want to store all the stuff we store for function
 definitions for declarations.

True, but that is at the cost of too complex class hierarchy. Is it
enough to have 'has_body()' to differentiate def vs declarations (the
meaning of 'is_external' is overloaded)? The additional information
can be accessed via indirection (to additional data structure).

For  aliases, why can't those nodes be merged into one node (mapped
from 

Re: Cgraph Modification Plan

2012-09-06 Thread Jan Hubicka
 
 Areas that are confusing and need clean up (IMO) include:
 1) handling of aliases and clones

I am slowly cleaning up alias stuff, it had major reorg in 4.7 and further
cleanups in 4.8.  Do you have more specific suggestions?

 2) reachability, needed, analyzed bits.  The needed bit is not in sync
 with the call to check if a function is needed.  Bits and attributes
 used in analysis should be isolated out from the core data structure.

This is mostly done. Reachable and needed flags are removed. We still have
analyzed flag, but it really has meaning is definition these days except for
early cgraph construction phase, so I intend to rename it (and make cgraph
construction to use its own bitmap).

 3) ipa references -- should they be modeled as special edges?

They are special edges. Different in memory from cgraph edges though.  I did
some estimates how much memory would be needed for representing them by the
cgraph_edge way and it was quite steep for Mozilla LTO (over a gig, as compared
to about 200MB we need now).

We could also rework cgraph edges into vector represnetaiton instead of doubly
linked lists to save some memory, not sure how fruitful that would be.

Still different sizes of in memory representations and thus different vectors
is desirable here.
 
 
  
   2) Introduce a global symbol table containing a function table
   and a global variable table. The function table should replace
   the current cgraph node link list, and the variable table replaces
   the varpool.  The symbol table should provide basic interfaces to
   do named based lookup, traversal, alias handling etc.  I noticed
   trunk already has some of that -- but it seems more abstraction
   is needed.
  
   Do you mean moving away from a pointer-based approach?
 
  See above. I mean it is important to have well defined symtab
  interfaces that hide implementation as much as possible. It will make
  the interfaces more stable. It is currently quite difficult for
  cgraph/varpool related changes in gcc branches to keep up with trunk
  without stable core APIs.
 
  Well, the APIs did not changed much over years (LTO brought in some more 
  busy
  changes, but many were just hacks that needed redesign anyway), in 4.8 we
  changed quite a lot because of introduction of the symbol table. It would be
  really nice to get as much as C++ conversion into 4.8, too, since we broke 
  most
  of existing patches anyway, to get more stability frm 4.9 up.
 
   3) it seems natural to drop 'node' in the naming scheme
  
   symbol (symtab_entry) -- base class;
   function -- derived from symbol;  (It conflicts with the existing
   struct function though);
   variable -- derived from symbol;
  
   typedef nodefunction cnode;
  
   A nodefunction is not a derived class of nodesymbol even when
   function is derived from symbol.  That property is helpful in
   ensuring usable type safety.
 
 
  We don't need nodesymbol -- only nodefunction is needed, and it is
  derived from function and function is derived from symbol.
 
  Yes, this is where I expected to land.  I expect to see more complex 
  inheritance
  tree.  Currently we have:
 
  symbol
|
+-function (cgraph)
|
+-variable (varpool)
 
  While function can have five types - just external declarations, actual
  definitions with body (analyzed flag is set), thunk, alias and virtual clone
 
  Similarly variables are declarations, ones with body (analyzed flag is set),
  and aliases.
 
  We do not represent labels and const_decl references. We ought: those are 
  also
  symbols and they are important for partitioning.
 
  We probably want rither structure here in longer term:
 
  symbol
|
+-function declaration
| |
| +-function definition
| |
| +-function alternative entry point (thunk)
| |
| +-function alias
| |
| +-virtual clone
|
+-variable declaration
| |
| +-variable definition (possibly constant pool references via const_decl 
  here, too)
| |
| +-variable alias
|
+-function label definition
 
  In particular we do not want to store all the stuff we store for function
  definitions for declarations.
 
 True, but that is at the cost of too complex class hierarchy. Is it
 enough to have 'has_body()' to differentiate def vs declarations (the
 meaning of 'is_external' is overloaded)? The additional information
 can be accessed via indirection (to additional data structure).

We can, not sure how effective those sparse tables will be in practice however,
Some of this stuff is quite critical during WPA stage.
 
 For  aliases, why can't those nodes be merged into one node (mapped
 from all assembler names in the symtab)? aliases will just be
 attribute of the symbol).

I attempted to merge aliases into one node for cgraph/varpool, but this did
not really work - the problem is that different symbols do have different
visibilities (one is overwritable one not).  Consequentely you need to track to
which specific 

Re: Cgraph Modification Plan

2012-09-06 Thread Richard Guenther
On Thu, Sep 6, 2012 at 12:47 AM, Jan Hubicka hubi...@ucw.cz wrote:
 What do you think of the following plan for turning cgraph into
 a class hierarchy?  We cannot finish it until we have gengtype
 understanding single inheritance, but we can start changing APIs
 in preparation.

 Good you told me, I was about trying that myself. Did not know gengtype
 do not understand inheritance yet.

 APPROACH ###

 Add converters and testers.
 Change callers to use those.
 Change callers to use type-safe parameters.
 Change implementation to class hierarchy.
 Add accessors.

 Sounds good to me.

Sorry to interrupt here, but please finish the existing partial C++ transitions
instead of starting to work on new ones.  Current stage1 will not last forever
(stage1 is usually 6 months, so its natural end would be end of September).
I'd rather have the current transition to a symbol table finished than having
that half-way done and half-way done in C++.  Please.

Btw, I also think the current symtab hierarchy is somewhat flawed.  At
the core a symtab entry should just be the symbol name and a list
of entities associated with it (much similar to the LTO symtab stuff).
Entities then are callgraph nodes, varpool nodes or alias nodes (or other
stuff).  Thus, the current symtab_node_base is too fat, and decls,
instead of having DECL_ASSEMBLER_NAME should have a pointer
to the (new) symtab node they are associated with.

Thanks,
Richard.


Re: Cgraph Modification Plan

2012-09-06 Thread Martin Jambor
Hi,

(I'm CCing the gcc mailing list too since I suppose it is an accident
that it wasn't in the message I'm replying to)

On Thu, Sep 06, 2012 at 09:22:27AM +0200, Jan Hubicka wrote:
  On Thu, Sep 6, 2012 at 12:08 AM, Jan Hubicka hubi...@ucw.cz wrote:
Consequentely you need to track to
which specific alias call/reference is going.  So we ended up doing 
this only
for same body aliases
  
   I was actually referring to same-body aliases which are more common in
   C++.  Weak aliases look like different animals.
  
   The handling of aliases was unified in 4.7, hope you will find it 
   convenient.
   I think it is improvement over what we have before.
  
   and eventually I got convinced to reorg the code other
way (in 4.7) so the aliases are all explicitely referenced.
   
Except for need to walk the alises to real node and work out the 
visibility,
there is not much to worry about. Sadly I do not think aliases can be 
fully
hidden in the abstraction - you need to actually think of them when 
doing many
of IPA transforms.
   
   
Thunks are only referenced from vtables. Is there a need to create
cgraph nodes for them?
   
This is not really true. thunks may get into code via constant folding 
or LTO
unit merging. We used to disallow direct calls to thunks in 4.6 and 
earlier but
decided to lift this restriction in 4.7.
  
   Why is that? Should the thunk code be inline expanded (e.g, in profile
   guided icall promotion)? There is no real need for thunk direct
   references, right?
  
   Well, I was trying to go down this route, too.  We discussed it for a 
   long while and eventually
   decided to give it up.  There are many interesting side cases.
   For example we may devirtualize call in one unit before LTO streaming and 
   this devirtualized
   call may turn out to be tunk in other unit at WPA time when we can not 
   easilly inline the
   thunk code.
  
  I don't quite get this. Is the discussion thread available?
 
 I think it went mostly on the IRC. Richard/Martin?

Well, great deal of it on IRC too but the most infamous relevant bug
was PR 48674 (you might also want to look at PRs 45605 and 48661 to
get the bigger picture).

It was gross, the world is much better with explicit representation of
thunks.

Martin


Re: Cgraph Modification Plan

2012-09-06 Thread Richard Guenther
On Thu, 6 Sep 2012, Martin Jambor wrote:

 Hi,
 
 (I'm CCing the gcc mailing list too since I suppose it is an accident
 that it wasn't in the message I'm replying to)
 
 On Thu, Sep 06, 2012 at 09:22:27AM +0200, Jan Hubicka wrote:
   On Thu, Sep 6, 2012 at 12:08 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Consequentely you need to track to
 which specific alias call/reference is going.  So we ended up doing 
 this only
 for same body aliases
   
I was actually referring to same-body aliases which are more common in
C++.  Weak aliases look like different animals.
   
The handling of aliases was unified in 4.7, hope you will find it 
convenient.
I think it is improvement over what we have before.
   
and eventually I got convinced to reorg the code other
 way (in 4.7) so the aliases are all explicitely referenced.

 Except for need to walk the alises to real node and work out the 
 visibility,
 there is not much to worry about. Sadly I do not think aliases can 
 be fully
 hidden in the abstraction - you need to actually think of them when 
 doing many
 of IPA transforms.


 Thunks are only referenced from vtables. Is there a need to create
 cgraph nodes for them?

 This is not really true. thunks may get into code via constant 
 folding or LTO
 unit merging. We used to disallow direct calls to thunks in 4.6 and 
 earlier but
 decided to lift this restriction in 4.7.
   
Why is that? Should the thunk code be inline expanded (e.g, in profile
guided icall promotion)? There is no real need for thunk direct
references, right?
   
Well, I was trying to go down this route, too.  We discussed it for a 
long while and eventually
decided to give it up.  There are many interesting side cases.
For example we may devirtualize call in one unit before LTO streaming 
and this devirtualized
call may turn out to be tunk in other unit at WPA time when we can not 
easilly inline the
thunk code.
   
   I don't quite get this. Is the discussion thread available?
  
  I think it went mostly on the IRC. Richard/Martin?
 
 Well, great deal of it on IRC too but the most infamous relevant bug
 was PR 48674 (you might also want to look at PRs 45605 and 48661 to
 get the bigger picture).
 
 It was gross, the world is much better with explicit representation of
 thunks.

Yes, and related to thunks we also discussed to explicitely represent
multiple entry points (which thunks are).

Richard.


Re: Cgraph Modification Plan

2012-09-06 Thread Lawrence Crowl
On 9/6/12, Richard Guenther richard.guent...@gmail.com wrote:
 Sorry to interrupt here, but please finish the existing
 partial C++ transitions instead of starting to work on new ones.
 Current stage1 will not last forever (stage1 is usually 6 months,
 so its natural end would be end of September).  I'd rather have
 the current transition to a symbol table finished than having
 that half-way done and half-way done in C++.  Please.

There are often periods of time where I am waiting on discussions
or approvals.  To make good use of my time, I need to be able to
work on more than one thing at once.  In particular, the discussion
on cgraph can go on in parallel with finishing up double_int.

-- 
Lawrence Crowl


Re: Cgraph Modification Plan

2012-09-05 Thread Jan Hubicka
 What do you think of the following plan for turning cgraph into
 a class hierarchy?  We cannot finish it until we have gengtype
 understanding single inheritance, but we can start changing APIs
 in preparation.

Good you told me, I was about trying that myself. Did not know gengtype
do not understand inheritance yet.
 
 APPROACH ###
 
 Add converters and testers.
 Change callers to use those.
 Change callers to use type-safe parameters.
 Change implementation to class hierarchy.
 Add accessors.

Sounds good to me.
 
 
 CONVERTERS AND TESTERS ###
 
 add
 symtab_node_base symtab_node_def::ref_symbol()
   { return symbol; }
 symtab_node_base cgraph_node::ref_symbol()
   { return symbol; }
 symtab_node_base varpool_node::ref_symbol()
   { return symbol; }
 
 change
 node-symbol.whatever
 to
 node-ref_symbol().whatever

OK, the basic idea is that symtab_node is basetype of cgraph_node and
varpool_node.  We may want to drop the historica cgraph/varpool names here,
since function_node/variable_node would sound better. Cgraph still exists, but
varpool is more or less historical relic.

I would expect inheritance to allow me to write node-whatever and to have
casting operators to convert things back and forth, so we only need to add
testers?  Probably is_function/is_variable  try_function/try_variable sounds
more readable to me.
What do you think? (I just arrived from China, so will take it more tought
once unjetlagged)

Honza
 
 
 
 should not need to add these
 
 cgraph_node symtab_node_def::ref_cgraph()
   { gcc_assert (symbol.type == SYMTAB_FUNCTION);
 return x_function; }
 varpool_node symtab_node_def::ref_varpool()
   { gcc_assert (symbol.type == SYMTAB_VARIABLE);
 return x_variable; }
 
 
 
 add
 symtab_node_base *symtab_node_def::try_symbol()
   { return symbol; }
 cgraph_node *symtab_node_def::try_cgraph()
   { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; }
 varpool_node *symtab_node_def::try_varpool()
   { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; }
 
 change
 if (symtab_function_p (node)  cgraph (node)-analyzed)
   return cgraph (node);
 to
 if (cgraph_node *p = node-try_cgraph())
   if (p-analyzed)
 return p;
 
 change
 if (symtab_function_p (node)  cgraph (node)-callers)
   
 to
 if (cgraph_node *p = node-try_cgraph())
   if (p-callers)
 
 
 change
 if (symtab_function_p (node))
   {
 struct cgraph_node *cnode = cgraph (node);
 
 to
 if (cgraph_node *cnode = node-try_cgraph ())
   {
 
 
 
 likewise symtab_variable_p (node) and varpool (node)
 
 
 
 If there are any symtab_function_p (node) expressions left,
 
 add
 bool symtab_node_def::is_cgraph()
 { return symbol.type == SYMTAB_FUNCTION; }
 bool symtab_node_def::is_varpool()
 { return symbol.type == SYMTAB_VARIABLE; }
 
 change
 symtab_function_p (node)
 to
 node-is_cgraph ()
 
 likewise symtab_variable_p (node)
 
 
 
 
 Though we would like to avoid doing so,
 if there are any cgraph (node) or varpool (node) expressions left,
 
 add
 
 symtab_node_base *symtab_node_def::ptr_symbol()
   { return symbol; }
 cgraph_node *symtab_node_def::ptr_cgraph()
   { gcc_assert (symbol.type == SYMTAB_FUNCTION);
   { return x_function; }
 varpool_node *symtab_node_def::ptr_varpool()
   { gcc_assert (symbol.type == SYMTAB_VARIABLE);
   { return x_variable; }
 
 change
 cgraph (node) = node-ptr_cgraph()
 
 likewise varpool (node)
 
 
 TYPE SAFETY ###
 
 If a function asserts that its symtab_node parameter is symtab_function_p,
 then convert the function to take a cgraph_node*
 and change the callers to convert as above.
 
 
 PROPOSED HIERARCHY ###
 
 enum symtab_type { SYMTAB_SYMBOL, SYMTAB_FUNCTION, SYMTAB_VARIABLE };
 
 struct symtab_node_base /* marked SYMTAB_SYMBOL */
   embeds symtab_type
 
 cgraph_node : symtab_node_base /* marked SYMTAB_FUNCTION */
 
 varpool_node : symtab_node_base /* marked SYMTAB_VARIABLE */
 
 changing
 typedef union symtab_node_def *symtab_node;
 typedef const union symtab_node_def *const_symtab_node;
 to
 typedef symtab_node_base *symtab_node;
 typedef const symtab_node_base *const_symtab_node;
 
 changing used of
 symtab_node_def
 to
 symtab_node_base
 
 -- 
 Lawrence Crowl


Re: Cgraph Modification Plan

2012-09-05 Thread Xinliang David Li
On Wed, Sep 5, 2012 at 3:47 PM, Jan Hubicka hubi...@ucw.cz wrote:
 What do you think of the following plan for turning cgraph into
 a class hierarchy?  We cannot finish it until we have gengtype
 understanding single inheritance, but we can start changing APIs
 in preparation.

 Good you told me, I was about trying that myself. Did not know gengtype
 do not understand inheritance yet.

 APPROACH ###

 Add converters and testers.
 Change callers to use those.
 Change callers to use type-safe parameters.
 Change implementation to class hierarchy.
 Add accessors.

 Sounds good to me.


 CONVERTERS AND TESTERS ###

 add
 symtab_node_base symtab_node_def::ref_symbol()
   { return symbol; }
 symtab_node_base cgraph_node::ref_symbol()
   { return symbol; }
 symtab_node_base varpool_node::ref_symbol()
   { return symbol; }

 change
 node-symbol.whatever
 to
 node-ref_symbol().whatever

 OK, the basic idea is that symtab_node is basetype of cgraph_node and
 varpool_node.  We may want to drop the historica cgraph/varpool names here,
 since function_node/variable_node would sound better. Cgraph still exists, but
 varpool is more or less historical relic.

The cgraph redesign probably deserves more discussion.

1) It may be worthwhile to abstract the graph manipulation code into a
utility class which is templatized.

  graphT, nodeT with node inheriting from T.

2) Introduce a global symbol table containing a function table and a
global variable table. The function table should replace the current
cgraph node link list, and the variable table replaces the varpool.
The symbol table should provide basic interfaces to do named based
lookup, traversal, alias handling etc.  I noticed trunk already has
some of that -- but it seems more abstraction is needed.

3) it seems natural to drop 'node' in the naming scheme

symbol (symtab_entry) -- base class;
function -- derived from symbol;  (It conflicts with the existing
struct function though);
variable -- derived from symbol;

typedef nodefunction cnode;

4) coding convention is needed for functions that do 'casting' and
'trial casting'.

thanks,

David



 I would expect inheritance to allow me to write node-whatever and to have
 casting operators to convert things back and forth, so we only need to add
 testers?  Probably is_function/is_variable  try_function/try_variable sounds
 more readable to me.
 What do you think? (I just arrived from China, so will take it more tought
 once unjetlagged)

 Honza

 

 should not need to add these

 cgraph_node symtab_node_def::ref_cgraph()
   { gcc_assert (symbol.type == SYMTAB_FUNCTION);
 return x_function; }
 varpool_node symtab_node_def::ref_varpool()
   { gcc_assert (symbol.type == SYMTAB_VARIABLE);
 return x_variable; }

 

 add
 symtab_node_base *symtab_node_def::try_symbol()
   { return symbol; }
 cgraph_node *symtab_node_def::try_cgraph()
   { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; }
 varpool_node *symtab_node_def::try_varpool()
   { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; }

 change
 if (symtab_function_p (node)  cgraph (node)-analyzed)
   return cgraph (node);
 to
 if (cgraph_node *p = node-try_cgraph())
   if (p-analyzed)
 return p;

 change
 if (symtab_function_p (node)  cgraph (node)-callers)
   
 to
 if (cgraph_node *p = node-try_cgraph())
   if (p-callers)
 

 change
 if (symtab_function_p (node))
   {
 struct cgraph_node *cnode = cgraph (node);
 
 to
 if (cgraph_node *cnode = node-try_cgraph ())
   {
 


 likewise symtab_variable_p (node) and varpool (node)

 

 If there are any symtab_function_p (node) expressions left,

 add
 bool symtab_node_def::is_cgraph()
 { return symbol.type == SYMTAB_FUNCTION; }
 bool symtab_node_def::is_varpool()
 { return symbol.type == SYMTAB_VARIABLE; }

 change
 symtab_function_p (node)
 to
 node-is_cgraph ()

 likewise symtab_variable_p (node)


 

 Though we would like to avoid doing so,
 if there are any cgraph (node) or varpool (node) expressions left,

 add

 symtab_node_base *symtab_node_def::ptr_symbol()
   { return symbol; }
 cgraph_node *symtab_node_def::ptr_cgraph()
   { gcc_assert (symbol.type == SYMTAB_FUNCTION);
   { return x_function; }
 varpool_node *symtab_node_def::ptr_varpool()
   { gcc_assert (symbol.type == SYMTAB_VARIABLE);
   { return x_variable; }

 change
 cgraph (node) = node-ptr_cgraph()

 likewise varpool (node)


 TYPE SAFETY ###

 If a function asserts that its symtab_node parameter is symtab_function_p,
 then convert the function to take a cgraph_node*
 and change the callers to convert as above.


 PROPOSED HIERARCHY 

Re: Cgraph Modification Plan

2012-09-05 Thread Lawrence Crowl
On 9/5/12, Jan Hubicka hubi...@ucw.cz wrote:
 CONVERTERS AND TESTERS ###

 add
 symtab_node_base symtab_node_def::ref_symbol()
  { return symbol; }
 symtab_node_base cgraph_node::ref_symbol()
  { return symbol; }
 symtab_node_base varpool_node::ref_symbol()
  { return symbol; }

 change
 node-symbol.whatever
 to
 node-ref_symbol().whatever

 OK, the basic idea is that symtab_node is basetype of
 cgraph_node and varpool_node.

Yes.

 We may want to drop the historica cgraph/varpool names here,
 since function_node/variable_node would sound better.  Cgraph
 still exists, but varpool is more or less historical relic.

I have a general preference for making name changes separate from
function changes when doing so makes svn diffs better reflect the
actual changes.  That may or may not be useful here, but I am happy
to use those names for the final product.

 I would expect inheritance to allow me to write node-whatever
 and to have casting operators to convert things back and forth,
 so we only need to add testers?

The inheritance allows implicit derived to base conversion,
which is type-safe.  With inheritance, the converting functions
are needed only for the conversion from base to derived, which
generally requires dynamic information to be safe.

 Probably is_function/is_variable  try_function/try_variable
 sounds more readable to me.  What do you think?

Fine with me.

 (I just arrived from China, so will take it more tought once
 unjetlagged)

-- 
Lawrence Crowl


Re: Cgraph Modification Plan

2012-09-05 Thread Lawrence Crowl
On 9/5/12, Xinliang David Li davi...@google.com wrote:
 On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
  OK, the basic idea is that symtab_node is basetype of
  cgraph_node and varpool_node.  We may want to drop the historica
  cgraph/varpool names here, since function_node/variable_node
  would sound better. Cgraph still exists, but varpool is more
  or less historical relic.

 The cgraph redesign probably deserves more discussion.

Hence, the post!

 1) It may be worthwhile to abstract the graph manipulation code
 into a utility class which is templatized.

 graphT, nodeT with node inheriting from T.

While I won't dispute the statement, getting from here to there
will likely involve some intermediate steps.

 2) Introduce a global symbol table containing a function table
 and a global variable table. The function table should replace
 the current cgraph node link list, and the variable table replaces
 the varpool.  The symbol table should provide basic interfaces to
 do named based lookup, traversal, alias handling etc.  I noticed
 trunk already has some of that -- but it seems more abstraction
 is needed.

Do you mean moving away from a pointer-based approach?

 3) it seems natural to drop 'node' in the naming scheme

 symbol (symtab_entry) -- base class;
 function -- derived from symbol;  (It conflicts with the existing
 struct function though);
 variable -- derived from symbol;

 typedef nodefunction cnode;

A nodefunction is not a derived class of nodesymbol even when
function is derived from symbol.  That property is helpful in
ensuring usable type safety.

 4) coding convention is needed for functions that do 'casting'
 and 'trial casting'.

The functions I have suggestion are ptr_... and try_... respectively.
Are you suggesting amending the coding conventions?

-- 
Lawrence Crowl


Re: Cgraph Modification Plan

2012-09-05 Thread Jan Hubicka
 On 9/5/12, Xinliang David Li davi...@google.com wrote:
  On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
   OK, the basic idea is that symtab_node is basetype of
   cgraph_node and varpool_node.  We may want to drop the historica
   cgraph/varpool names here, since function_node/variable_node
   would sound better. Cgraph still exists, but varpool is more
   or less historical relic.
 
  The cgraph redesign probably deserves more discussion.
 
 Hence, the post!
 
  1) It may be worthwhile to abstract the graph manipulation code
  into a utility class which is templatized.
 
  graphT, nodeT with node inheriting from T.
 
 While I won't dispute the statement, getting from here to there
 will likely involve some intermediate steps.

Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but
there is not _that_ much code in common - we do some strongly connected
components discovery on all three kinds of graphs and basic traversals.
 
  2) Introduce a global symbol table containing a function table
  and a global variable table. The function table should replace
  the current cgraph node link list, and the variable table replaces
  the varpool.  The symbol table should provide basic interfaces to
  do named based lookup, traversal, alias handling etc.  I noticed
  trunk already has some of that -- but it seems more abstraction
  is needed.
 
 Do you mean moving away from a pointer-based approach?

I would say that trunk now have what is described here.  I still plan to unify
some of code (like alias handling) that is split for historical reasons, but
those are mostly cleanups still intended for 4.8.

Honza


Re: Cgraph Modification Plan

2012-09-05 Thread Jan Hubicka
 
 The cgraph redesign probably deserves more discussion.
 
 1) It may be worthwhile to abstract the graph manipulation code into a
 utility class which is templatized.
 
   graphT, nodeT with node inheriting from T.
 
 2) Introduce a global symbol table containing a function table and a
 global variable table. The function table should replace the current
 cgraph node link list, and the variable table replaces the varpool.
 The symbol table should provide basic interfaces to do named based
 lookup, traversal, alias handling etc.  I noticed trunk already has
 some of that -- but it seems more abstraction is needed.
 
 3) it seems natural to drop 'node' in the naming scheme
 
 symbol (symtab_entry) -- base class;
 function -- derived from symbol;  (It conflicts with the existing
 struct function though);
 variable -- derived from symbol;
 
 typedef nodefunction cnode;

We could drop node, yes. The conflict with current function is more
interesting.  Basically symbol table function is that we need for IPA
optimizations, while struct function is what is load when function body is
needed. Those are different.

Not sure if we don't want to have symtab_ prefix for all symbol table stuff or
if we want to rename struct function into something more fit.

Honza


Re: Cgraph Modification Plan

2012-09-05 Thread Jan Hubicka
  On 9/5/12, Xinliang David Li davi...@google.com wrote:
   On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
OK, the basic idea is that symtab_node is basetype of
cgraph_node and varpool_node.  We may want to drop the historica
cgraph/varpool names here, since function_node/variable_node
would sound better. Cgraph still exists, but varpool is more
or less historical relic.
  
   The cgraph redesign probably deserves more discussion.
  
  Hence, the post!
  
   1) It may be worthwhile to abstract the graph manipulation code
   into a utility class which is templatized.
  
   graphT, nodeT with node inheriting from T.
  
  While I won't dispute the statement, getting from here to there
  will likely involve some intermediate steps.
 
 Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but
 there is not _that_ much code in common - we do some strongly connected
 components discovery on all three kinds of graphs and basic traversals.

Also note that symbol table realy is an oriented multigraph with different
types of edges (i.e. calls and references and eventualy we will have indirect
calls with mutiple possible destinations derrived from PTA and devirtualization
logic). 

Honza


Re: Cgraph Modification Plan

2012-09-05 Thread Xinliang David Li
On Wed, Sep 5, 2012 at 5:41 PM, Lawrence Crowl cr...@googlers.com wrote:
 On 9/5/12, Xinliang David Li davi...@google.com wrote:
 On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
  OK, the basic idea is that symtab_node is basetype of
  cgraph_node and varpool_node.  We may want to drop the historica
  cgraph/varpool names here, since function_node/variable_node
  would sound better. Cgraph still exists, but varpool is more
  or less historical relic.

 The cgraph redesign probably deserves more discussion.

 Hence, the post!

 1) It may be worthwhile to abstract the graph manipulation code
 into a utility class which is templatized.

 graphT, nodeT with node inheriting from T.

 While I won't dispute the statement, getting from here to there
 will likely involve some intermediate steps.

Yes. My point is that the intermediate steps should be driven by the
final goal. It is important to first have  the design of final class
hierarchy and interfaces nailed and then the intermediate steps
defined instead of the other way around.


 2) Introduce a global symbol table containing a function table
 and a global variable table. The function table should replace
 the current cgraph node link list, and the variable table replaces
 the varpool.  The symbol table should provide basic interfaces to
 do named based lookup, traversal, alias handling etc.  I noticed
 trunk already has some of that -- but it seems more abstraction
 is needed.

 Do you mean moving away from a pointer-based approach?

See above. I mean it is important to have well defined symtab
interfaces that hide implementation as much as possible. It will make
the interfaces more stable. It is currently quite difficult for
cgraph/varpool related changes in gcc branches to keep up with trunk
without stable core APIs.



 3) it seems natural to drop 'node' in the naming scheme

 symbol (symtab_entry) -- base class;
 function -- derived from symbol;  (It conflicts with the existing
 struct function though);
 variable -- derived from symbol;

 typedef nodefunction cnode;

 A nodefunction is not a derived class of nodesymbol even when
 function is derived from symbol.  That property is helpful in
 ensuring usable type safety.


We don't need nodesymbol -- only nodefunction is needed, and it is
derived from function and function is derived from symbol.


 4) coding convention is needed for functions that do 'casting'
 and 'trial casting'.

 The functions I have suggestion are ptr_... and try_... respectively.
 Are you suggesting amending the coding conventions?


Is this something agreed upon in previous discussions or something
newly brought up? How about using  template function in base to do
casting:

trial cast:

templatetypename T T* symbol::cast_to(symbol* p) {
   if (p-isT())
  return static_castT*(p);
   return 0;
 }

cast:

templatetypename T T symbol:as(symbol* p) {
   assert(p-isT())
   return static_castT(*p);

 }


David

 --
 Lawrence Crowl


Re: Cgraph Modification Plan

2012-09-05 Thread Xinliang David Li
On Wed, Sep 5, 2012 at 9:32 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On 9/5/12, Xinliang David Li davi...@google.com wrote:
  On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
   OK, the basic idea is that symtab_node is basetype of
   cgraph_node and varpool_node.  We may want to drop the historica
   cgraph/varpool names here, since function_node/variable_node
   would sound better. Cgraph still exists, but varpool is more
   or less historical relic.
 
  The cgraph redesign probably deserves more discussion.

 Hence, the post!

  1) It may be worthwhile to abstract the graph manipulation code
  into a utility class which is templatized.
 
  graphT, nodeT with node inheriting from T.

 While I won't dispute the statement, getting from here to there
 will likely involve some intermediate steps.

 Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but
 there is not _that_ much code in common - we do some strongly connected
 components discovery on all three kinds of graphs and basic traversals.


Once we have the common infrastructure for graphs, new uses will be
easily created (e.g, dynamic callgraph in profile-gen runtime
analysis, module graph, etc) -- graph is one of the most basic data
structures for compilers.

David

  2) Introduce a global symbol table containing a function table
  and a global variable table. The function table should replace
  the current cgraph node link list, and the variable table replaces
  the varpool.  The symbol table should provide basic interfaces to
  do named based lookup, traversal, alias handling etc.  I noticed
  trunk already has some of that -- but it seems more abstraction
  is needed.

 Do you mean moving away from a pointer-based approach?

 I would say that trunk now have what is described here.  I still plan to unify
 some of code (like alias handling) that is split for historical reasons, but
 those are mostly cleanups still intended for 4.8.

 Honza


Re: Cgraph Modification Plan

2012-09-05 Thread Xinliang David Li
On Wed, Sep 5, 2012 at 9:38 PM, Jan Hubicka hubi...@ucw.cz wrote:
  On 9/5/12, Xinliang David Li davi...@google.com wrote:
   On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
OK, the basic idea is that symtab_node is basetype of
cgraph_node and varpool_node.  We may want to drop the historica
cgraph/varpool names here, since function_node/variable_node
would sound better. Cgraph still exists, but varpool is more
or less historical relic.
  
   The cgraph redesign probably deserves more discussion.
 
  Hence, the post!
 
   1) It may be worthwhile to abstract the graph manipulation code
   into a utility class which is templatized.
  
   graphT, nodeT with node inheriting from T.
 
  While I won't dispute the statement, getting from here to there
  will likely involve some intermediate steps.

 Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but
 there is not _that_ much code in common - we do some strongly connected
 components discovery on all three kinds of graphs and basic traversals.

 Also note that symbol table realy is an oriented multigraph with different
 types of edges (i.e. calls and references and eventualy we will have indirect
 calls with mutiple possible destinations derrived from PTA and 
 devirtualization
 logic).

yes, but additional edge properties can also be abstracted away by
using template parameters.

David


 Honza


Re: Cgraph Modification Plan

2012-09-05 Thread Jan Hubicka
 On Wed, Sep 5, 2012 at 5:41 PM, Lawrence Crowl cr...@googlers.com wrote:
  On 9/5/12, Xinliang David Li davi...@google.com wrote:
  On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote:
   OK, the basic idea is that symtab_node is basetype of
   cgraph_node and varpool_node.  We may want to drop the historica
   cgraph/varpool names here, since function_node/variable_node
   would sound better. Cgraph still exists, but varpool is more
   or less historical relic.
 
  The cgraph redesign probably deserves more discussion.
 
  Hence, the post!
 
  1) It may be worthwhile to abstract the graph manipulation code
  into a utility class which is templatized.
 
  graphT, nodeT with node inheriting from T.
 
  While I won't dispute the statement, getting from here to there
  will likely involve some intermediate steps.
 
 Yes. My point is that the intermediate steps should be driven by the
 final goal. It is important to first have  the design of final class
 hierarchy and interfaces nailed and then the intermediate steps
 defined instead of the other way around.

Yes, it seems resonable to me. We just need to keep in mind the specifics
callgraph has (i.e. many types of edges that are relevant in different
contextes and have good reason for different kind of in memory reppresentation)
There are also different types of nodes - functions, variables, aliases. I plan
to add labels in 4.8 timeframe, too).
 
 
  2) Introduce a global symbol table containing a function table
  and a global variable table. The function table should replace
  the current cgraph node link list, and the variable table replaces
  the varpool.  The symbol table should provide basic interfaces to
  do named based lookup, traversal, alias handling etc.  I noticed
  trunk already has some of that -- but it seems more abstraction
  is needed.
 
  Do you mean moving away from a pointer-based approach?
 
 See above. I mean it is important to have well defined symtab
 interfaces that hide implementation as much as possible. It will make
 the interfaces more stable. It is currently quite difficult for
 cgraph/varpool related changes in gcc branches to keep up with trunk
 without stable core APIs.

Well, the APIs did not changed much over years (LTO brought in some more busy
changes, but many were just hacks that needed redesign anyway), in 4.8 we
changed quite a lot because of introduction of the symbol table. It would be
really nice to get as much as C++ conversion into 4.8, too, since we broke most
of existing patches anyway, to get more stability frm 4.9 up.

  3) it seems natural to drop 'node' in the naming scheme
 
  symbol (symtab_entry) -- base class;
  function -- derived from symbol;  (It conflicts with the existing
  struct function though);
  variable -- derived from symbol;
 
  typedef nodefunction cnode;
 
  A nodefunction is not a derived class of nodesymbol even when
  function is derived from symbol.  That property is helpful in
  ensuring usable type safety.
 
 
 We don't need nodesymbol -- only nodefunction is needed, and it is
 derived from function and function is derived from symbol.

Yes, this is where I expected to land.  I expect to see more complex inheritance
tree.  Currently we have:

symbol
  |
  +-function (cgraph)
  |
  +-variable (varpool)

While function can have five types - just external declarations, actual
definitions with body (analyzed flag is set), thunk, alias and virtual clone

Similarly variables are declarations, ones with body (analyzed flag is set),
and aliases.

We do not represent labels and const_decl references. We ought: those are also
symbols and they are important for partitioning.

We probably want rither structure here in longer term:

symbol
  |
  +-function declaration
  | |
  | +-function definition
  | |
  | +-function alternative entry point (thunk)
  | |
  | +-function alias
  | |
  | +-virtual clone
  |
  +-variable declaration
  | |
  | +-variable definition (possibly constant pool references via const_decl 
here, too)
  | |
  | +-variable alias
  |
  +-function label definition

In particular we do not want to store all the stuff we store for function
definitions for declarations.

variable/function aliases do have a lot in common (similarly for
function/variable definitions), too. The reason why I did not introduce
separate alias_node type is that function aliases are different from variable
aliases in several ways. For example, they can not be destinations of function
calls.
 
 
  4) coding convention is needed for functions that do 'casting'
  and 'trial casting'.
 
  The functions I have suggestion are ptr_... and try_... respectively.
  Are you suggesting amending the coding conventions?
 
 
 Is this something agreed upon in previous discussions or something
 newly brought up? How about using  template function in base to do
 casting:
 
 trial cast:
 
 templatetypename T T* symbol::cast_to(symbol* p) {
if (p-isT())
   return static_castT*(p);
return 0;
  }
 
 cast: