Re: Cgraph Modification Plan
Hi, On Tue, 11 Sep 2012, Lawrence Crowl wrote: change node-symbol.whatever to node-ref_symbol().whatever Please don't uglify the compiler with this. If GTY deficiencies force you to do that, then hold off on it until GTY doesn't force you anymore. Ciao, Michael.
Re: Cgraph Modification Plan
On 2012-09-11 16:22 , Lawrence Crowl wrote: We do not yet seem to have consensus on a long term plan. Would it be reasonable to start on short term prepatory work? In particular, I was think we could do Add converters and testers. Change callers to use those. and maybe Change callers to use type-safe parameters. Where those mean what I earlier stated. Comments? Seems straightforward enough. And it doesn't seem like it will affect the longer term conversion. Diego.
Re: Cgraph Modification Plan
We do not yet seem to have consensus on a long term plan. Would it be reasonable to start on short term prepatory work? In particular, I was think we could do Add converters and testers. Change callers to use those. and maybe Change callers to use type-safe parameters. Where those mean what I earlier stated. Comments? Yes, it seems sane. CONVERTERS AND TESTERS ### add symtab_node_base symtab_node_def::ref_symbol() { return symbol; } symtab_node_base cgraph_node::ref_symbol() { return symbol; } symtab_node_base varpool_node::ref_symbol() { return symbol; } change node-symbol.whatever to node-ref_symbol().whatever I still do not gather why simple inheritance can't let me to write node-whatever again... If that is GTY limitation, I have no problem having this in short term and update it later. symbol_node is really base of functions/variables and new beasts we are going to get soon. I implemented it as union only to make GTY happy. should not need to add these cgraph_node symtab_node_def::ref_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); I think checking_assert is enough in these cases. We are getting more conversions back and forth and we don't really want to have asserts all around code. return x_function; } varpool_node symtab_node_def::ref_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); return x_variable; } add symtab_node_base *symtab_node_def::try_symbol() { return symbol; } cgraph_node *symtab_node_def::try_cgraph() { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; } varpool_node *symtab_node_def::try_varpool() { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; } I think try_function/try_variable reads better than cgraph/varpool especially with the longer term plan to drop these names when cgraph/varpool no longer exists and we have symbol table. Honza
Re: Cgraph Modification Plan
On 9/12/12, Jan Hubicka hubi...@ucw.cz wrote: We do not yet seem to have consensus on a long term plan. Would it be reasonable to start on short term prepatory work? In particular, I was think we could do Add converters and testers. Change callers to use those. and maybe Change callers to use type-safe parameters. Where those mean what I earlier stated. Comments? Yes, it seems sane. CONVERTERS AND TESTERS ### add symtab_node_base symtab_node_def::ref_symbol() { return symbol; } symtab_node_base cgraph_node::ref_symbol() { return symbol; } symtab_node_base varpool_node::ref_symbol() { return symbol; } change node-symbol.whatever to node-ref_symbol().whatever I still do not gather why simple inheritance can't let me to write node-whatever again... If that is GTY limitation, I have no problem having this in short term and update it later. symbol_node is really base of functions/variables and new beasts we are going to get soon. I implemented it as union only to make GTY happy. It is a GTY limitation right now. We can do most of this work before gengtype supports inheritance, and for that we need ref_symbol. should not need to add these cgraph_node symtab_node_def::ref_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); I think checking_assert is enough in these cases. We are getting more conversions back and forth and we don't really want to have asserts all around code. The are untested conversions from base to derived. We should not be using them, and hence they should be rare. I mention them only because there might be places in the code where we get backed into a corner. I've changed my notes to use gcc_checking_assert, and we can revisit the decision whenever we need to. return x_function; } varpool_node symtab_node_def::ref_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); return x_variable; } add symtab_node_base *symtab_node_def::try_symbol() { return symbol; } cgraph_node *symtab_node_def::try_cgraph() { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; } varpool_node *symtab_node_def::try_varpool() { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; } I think try_function/try_variable reads better than cgraph/varpool especially with the longer term plan to drop these names when cgraph/varpool no longer exists and we have symbol table. Okay, I will do that. -- Lawrence Crowl
Re: Cgraph Modification Plan
We do not yet seem to have consensus on a long term plan. Would it be reasonable to start on short term prepatory work? In particular, I was think we could do Add converters and testers. Change callers to use those. and maybe Change callers to use type-safe parameters. Where those mean what I earlier stated. Comments? CONVERTERS AND TESTERS ### add symtab_node_base symtab_node_def::ref_symbol() { return symbol; } symtab_node_base cgraph_node::ref_symbol() { return symbol; } symtab_node_base varpool_node::ref_symbol() { return symbol; } change node-symbol.whatever to node-ref_symbol().whatever should not need to add these cgraph_node symtab_node_def::ref_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); return x_function; } varpool_node symtab_node_def::ref_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); return x_variable; } add symtab_node_base *symtab_node_def::try_symbol() { return symbol; } cgraph_node *symtab_node_def::try_cgraph() { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; } varpool_node *symtab_node_def::try_varpool() { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; } change if (symtab_function_p (node) cgraph (node)-analyzed) return cgraph (node); to if (cgraph_node *p = node-try_cgraph()) if (p-analyzed) return p; change if (symtab_function_p (node) cgraph (node)-callers) to if (cgraph_node *p = node-try_cgraph()) if (p-callers) change if (symtab_function_p (node)) { struct cgraph_node *cnode = cgraph (node); to if (cgraph_node *cnode = node-try_cgraph ()) { likewise symtab_variable_p (node) and varpool (node) If there are any symtab_function_p (node) expressions left, add bool symtab_node_def::is_cgraph() { return symbol.type == SYMTAB_FUNCTION; } bool symtab_node_def::is_varpool() { return symbol.type == SYMTAB_VARIABLE; } change symtab_function_p (node) to node-is_cgraph () likewise symtab_variable_p (node) Though we would like to avoid doing so, if there are any cgraph (node) or varpool (node) expressions left, add symtab_node_base *symtab_node_def::ptr_symbol() { return symbol; } cgraph_node *symtab_node_def::ptr_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); { return x_function; } varpool_node *symtab_node_def::ptr_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); { return x_variable; } change cgraph (node) = node-ptr_cgraph() likewise varpool (node) TYPE SAFETY ### If a function asserts that its symtab_node parameter is symtab_function_p, then convert the function to take a cgraph_node* and change the callers to convert as above. -- Lawrence Crowl
Re: Cgraph Modification Plan
Sorry to interrupt here, but please finish the existing partial C++ transitions instead of starting to work on new ones. Current stage1 will not last forever (stage1 is usually 6 months, so its natural end would be end of September). I'd rather have the current transition to a symbol table finished than having that half-way done and half-way done in C++. Please. I am focusing on getting the transition finished for 4.8. Some discussion about C++ transition (I would like to see done for 4.8 at least in the basic and obvious way) and problems people have with the current design seems good to me. We do not need to implement everything discussed in this thread immediately, but it is good to have the discussion. Btw, I also think the current symtab hierarchy is somewhat flawed. At the core a symtab entry should just be the symbol name and a list of entities associated with it (much similar to the LTO symtab stuff). We do have that. There is list of symtab entries sharing the same symtab hash entry (assembler name). I have WIP patch to make lto-symtab to use these instead of its own hashes. They need more testing and bit of cleanup though. Entities then are callgraph nodes, varpool nodes or alias nodes (or other Having alias nodes separate from callgrap/varpool is difficult thing. The problem is that one realy wants to handle function aliases as functions and variable aliases as variables (i.e. one want to have call to function alias, or read from variable alias but not other ways around). Thus we currently have aliases inherited from functions/variables both on tree declaration level and symbol table level. I have patches to commonize more the alias handling that is now duplicated over former cgraph/vaprool in queue though. stuff). Thus, the current symtab_node_base is too fat, and decls, instead of having DECL_ASSEMBLER_NAME should have a pointer to the (new) symtab node they are associated with. Yes, it is the longer term goal (or rather getting rid of assembler name in favor of real symbol name + flags that are currently encoded in assembler name) At the moment it does seem to make sense though when everything is tied to declarations and eclarations have their own assembler name. I will slowly work on this, but after 4.8. Honza Thanks, Richard.
Re: Cgraph Modification Plan
On Wed, Sep 5, 2012 at 10:14 PM, Jan Hubicka hubi...@ucw.cz wrote: On Wed, Sep 5, 2012 at 5:41 PM, Lawrence Crowl cr...@googlers.com wrote: On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. Yes. My point is that the intermediate steps should be driven by the final goal. It is important to first have the design of final class hierarchy and interfaces nailed and then the intermediate steps defined instead of the other way around. Yes, it seems resonable to me. We just need to keep in mind the specifics callgraph has (i.e. many types of edges that are relevant in different contextes and have good reason for different kind of in memory reppresentation) There are also different types of nodes - functions, variables, aliases. I plan to add labels in 4.8 timeframe, too). Areas that are confusing and need clean up (IMO) include: 1) handling of aliases and clones 2) reachability, needed, analyzed bits. The needed bit is not in sync with the call to check if a function is needed. Bits and attributes used in analysis should be isolated out from the core data structure. 3) ipa references -- should they be modeled as special edges? 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. Do you mean moving away from a pointer-based approach? See above. I mean it is important to have well defined symtab interfaces that hide implementation as much as possible. It will make the interfaces more stable. It is currently quite difficult for cgraph/varpool related changes in gcc branches to keep up with trunk without stable core APIs. Well, the APIs did not changed much over years (LTO brought in some more busy changes, but many were just hacks that needed redesign anyway), in 4.8 we changed quite a lot because of introduction of the symbol table. It would be really nice to get as much as C++ conversion into 4.8, too, since we broke most of existing patches anyway, to get more stability frm 4.9 up. 3) it seems natural to drop 'node' in the naming scheme symbol (symtab_entry) -- base class; function -- derived from symbol; (It conflicts with the existing struct function though); variable -- derived from symbol; typedef nodefunction cnode; A nodefunction is not a derived class of nodesymbol even when function is derived from symbol. That property is helpful in ensuring usable type safety. We don't need nodesymbol -- only nodefunction is needed, and it is derived from function and function is derived from symbol. Yes, this is where I expected to land. I expect to see more complex inheritance tree. Currently we have: symbol | +-function (cgraph) | +-variable (varpool) While function can have five types - just external declarations, actual definitions with body (analyzed flag is set), thunk, alias and virtual clone Similarly variables are declarations, ones with body (analyzed flag is set), and aliases. We do not represent labels and const_decl references. We ought: those are also symbols and they are important for partitioning. We probably want rither structure here in longer term: symbol | +-function declaration | | | +-function definition | | | +-function alternative entry point (thunk) | | | +-function alias | | | +-virtual clone | +-variable declaration | | | +-variable definition (possibly constant pool references via const_decl here, too) | | | +-variable alias | +-function label definition In particular we do not want to store all the stuff we store for function definitions for declarations. True, but that is at the cost of too complex class hierarchy. Is it enough to have 'has_body()' to differentiate def vs declarations (the meaning of 'is_external' is overloaded)? The additional information can be accessed via indirection (to additional data structure). For aliases, why can't those nodes be merged into one node (mapped from
Re: Cgraph Modification Plan
Areas that are confusing and need clean up (IMO) include: 1) handling of aliases and clones I am slowly cleaning up alias stuff, it had major reorg in 4.7 and further cleanups in 4.8. Do you have more specific suggestions? 2) reachability, needed, analyzed bits. The needed bit is not in sync with the call to check if a function is needed. Bits and attributes used in analysis should be isolated out from the core data structure. This is mostly done. Reachable and needed flags are removed. We still have analyzed flag, but it really has meaning is definition these days except for early cgraph construction phase, so I intend to rename it (and make cgraph construction to use its own bitmap). 3) ipa references -- should they be modeled as special edges? They are special edges. Different in memory from cgraph edges though. I did some estimates how much memory would be needed for representing them by the cgraph_edge way and it was quite steep for Mozilla LTO (over a gig, as compared to about 200MB we need now). We could also rework cgraph edges into vector represnetaiton instead of doubly linked lists to save some memory, not sure how fruitful that would be. Still different sizes of in memory representations and thus different vectors is desirable here. 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. Do you mean moving away from a pointer-based approach? See above. I mean it is important to have well defined symtab interfaces that hide implementation as much as possible. It will make the interfaces more stable. It is currently quite difficult for cgraph/varpool related changes in gcc branches to keep up with trunk without stable core APIs. Well, the APIs did not changed much over years (LTO brought in some more busy changes, but many were just hacks that needed redesign anyway), in 4.8 we changed quite a lot because of introduction of the symbol table. It would be really nice to get as much as C++ conversion into 4.8, too, since we broke most of existing patches anyway, to get more stability frm 4.9 up. 3) it seems natural to drop 'node' in the naming scheme symbol (symtab_entry) -- base class; function -- derived from symbol; (It conflicts with the existing struct function though); variable -- derived from symbol; typedef nodefunction cnode; A nodefunction is not a derived class of nodesymbol even when function is derived from symbol. That property is helpful in ensuring usable type safety. We don't need nodesymbol -- only nodefunction is needed, and it is derived from function and function is derived from symbol. Yes, this is where I expected to land. I expect to see more complex inheritance tree. Currently we have: symbol | +-function (cgraph) | +-variable (varpool) While function can have five types - just external declarations, actual definitions with body (analyzed flag is set), thunk, alias and virtual clone Similarly variables are declarations, ones with body (analyzed flag is set), and aliases. We do not represent labels and const_decl references. We ought: those are also symbols and they are important for partitioning. We probably want rither structure here in longer term: symbol | +-function declaration | | | +-function definition | | | +-function alternative entry point (thunk) | | | +-function alias | | | +-virtual clone | +-variable declaration | | | +-variable definition (possibly constant pool references via const_decl here, too) | | | +-variable alias | +-function label definition In particular we do not want to store all the stuff we store for function definitions for declarations. True, but that is at the cost of too complex class hierarchy. Is it enough to have 'has_body()' to differentiate def vs declarations (the meaning of 'is_external' is overloaded)? The additional information can be accessed via indirection (to additional data structure). We can, not sure how effective those sparse tables will be in practice however, Some of this stuff is quite critical during WPA stage. For aliases, why can't those nodes be merged into one node (mapped from all assembler names in the symtab)? aliases will just be attribute of the symbol). I attempted to merge aliases into one node for cgraph/varpool, but this did not really work - the problem is that different symbols do have different visibilities (one is overwritable one not). Consequentely you need to track to which specific
Re: Cgraph Modification Plan
On Thu, Sep 6, 2012 at 12:47 AM, Jan Hubicka hubi...@ucw.cz wrote: What do you think of the following plan for turning cgraph into a class hierarchy? We cannot finish it until we have gengtype understanding single inheritance, but we can start changing APIs in preparation. Good you told me, I was about trying that myself. Did not know gengtype do not understand inheritance yet. APPROACH ### Add converters and testers. Change callers to use those. Change callers to use type-safe parameters. Change implementation to class hierarchy. Add accessors. Sounds good to me. Sorry to interrupt here, but please finish the existing partial C++ transitions instead of starting to work on new ones. Current stage1 will not last forever (stage1 is usually 6 months, so its natural end would be end of September). I'd rather have the current transition to a symbol table finished than having that half-way done and half-way done in C++. Please. Btw, I also think the current symtab hierarchy is somewhat flawed. At the core a symtab entry should just be the symbol name and a list of entities associated with it (much similar to the LTO symtab stuff). Entities then are callgraph nodes, varpool nodes or alias nodes (or other stuff). Thus, the current symtab_node_base is too fat, and decls, instead of having DECL_ASSEMBLER_NAME should have a pointer to the (new) symtab node they are associated with. Thanks, Richard.
Re: Cgraph Modification Plan
Hi, (I'm CCing the gcc mailing list too since I suppose it is an accident that it wasn't in the message I'm replying to) On Thu, Sep 06, 2012 at 09:22:27AM +0200, Jan Hubicka wrote: On Thu, Sep 6, 2012 at 12:08 AM, Jan Hubicka hubi...@ucw.cz wrote: Consequentely you need to track to which specific alias call/reference is going. So we ended up doing this only for same body aliases I was actually referring to same-body aliases which are more common in C++. Weak aliases look like different animals. The handling of aliases was unified in 4.7, hope you will find it convenient. I think it is improvement over what we have before. and eventually I got convinced to reorg the code other way (in 4.7) so the aliases are all explicitely referenced. Except for need to walk the alises to real node and work out the visibility, there is not much to worry about. Sadly I do not think aliases can be fully hidden in the abstraction - you need to actually think of them when doing many of IPA transforms. Thunks are only referenced from vtables. Is there a need to create cgraph nodes for them? This is not really true. thunks may get into code via constant folding or LTO unit merging. We used to disallow direct calls to thunks in 4.6 and earlier but decided to lift this restriction in 4.7. Why is that? Should the thunk code be inline expanded (e.g, in profile guided icall promotion)? There is no real need for thunk direct references, right? Well, I was trying to go down this route, too. We discussed it for a long while and eventually decided to give it up. There are many interesting side cases. For example we may devirtualize call in one unit before LTO streaming and this devirtualized call may turn out to be tunk in other unit at WPA time when we can not easilly inline the thunk code. I don't quite get this. Is the discussion thread available? I think it went mostly on the IRC. Richard/Martin? Well, great deal of it on IRC too but the most infamous relevant bug was PR 48674 (you might also want to look at PRs 45605 and 48661 to get the bigger picture). It was gross, the world is much better with explicit representation of thunks. Martin
Re: Cgraph Modification Plan
On Thu, 6 Sep 2012, Martin Jambor wrote: Hi, (I'm CCing the gcc mailing list too since I suppose it is an accident that it wasn't in the message I'm replying to) On Thu, Sep 06, 2012 at 09:22:27AM +0200, Jan Hubicka wrote: On Thu, Sep 6, 2012 at 12:08 AM, Jan Hubicka hubi...@ucw.cz wrote: Consequentely you need to track to which specific alias call/reference is going. So we ended up doing this only for same body aliases I was actually referring to same-body aliases which are more common in C++. Weak aliases look like different animals. The handling of aliases was unified in 4.7, hope you will find it convenient. I think it is improvement over what we have before. and eventually I got convinced to reorg the code other way (in 4.7) so the aliases are all explicitely referenced. Except for need to walk the alises to real node and work out the visibility, there is not much to worry about. Sadly I do not think aliases can be fully hidden in the abstraction - you need to actually think of them when doing many of IPA transforms. Thunks are only referenced from vtables. Is there a need to create cgraph nodes for them? This is not really true. thunks may get into code via constant folding or LTO unit merging. We used to disallow direct calls to thunks in 4.6 and earlier but decided to lift this restriction in 4.7. Why is that? Should the thunk code be inline expanded (e.g, in profile guided icall promotion)? There is no real need for thunk direct references, right? Well, I was trying to go down this route, too. We discussed it for a long while and eventually decided to give it up. There are many interesting side cases. For example we may devirtualize call in one unit before LTO streaming and this devirtualized call may turn out to be tunk in other unit at WPA time when we can not easilly inline the thunk code. I don't quite get this. Is the discussion thread available? I think it went mostly on the IRC. Richard/Martin? Well, great deal of it on IRC too but the most infamous relevant bug was PR 48674 (you might also want to look at PRs 45605 and 48661 to get the bigger picture). It was gross, the world is much better with explicit representation of thunks. Yes, and related to thunks we also discussed to explicitely represent multiple entry points (which thunks are). Richard.
Re: Cgraph Modification Plan
On 9/6/12, Richard Guenther richard.guent...@gmail.com wrote: Sorry to interrupt here, but please finish the existing partial C++ transitions instead of starting to work on new ones. Current stage1 will not last forever (stage1 is usually 6 months, so its natural end would be end of September). I'd rather have the current transition to a symbol table finished than having that half-way done and half-way done in C++. Please. There are often periods of time where I am waiting on discussions or approvals. To make good use of my time, I need to be able to work on more than one thing at once. In particular, the discussion on cgraph can go on in parallel with finishing up double_int. -- Lawrence Crowl
Re: Cgraph Modification Plan
What do you think of the following plan for turning cgraph into a class hierarchy? We cannot finish it until we have gengtype understanding single inheritance, but we can start changing APIs in preparation. Good you told me, I was about trying that myself. Did not know gengtype do not understand inheritance yet. APPROACH ### Add converters and testers. Change callers to use those. Change callers to use type-safe parameters. Change implementation to class hierarchy. Add accessors. Sounds good to me. CONVERTERS AND TESTERS ### add symtab_node_base symtab_node_def::ref_symbol() { return symbol; } symtab_node_base cgraph_node::ref_symbol() { return symbol; } symtab_node_base varpool_node::ref_symbol() { return symbol; } change node-symbol.whatever to node-ref_symbol().whatever OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. I would expect inheritance to allow me to write node-whatever and to have casting operators to convert things back and forth, so we only need to add testers? Probably is_function/is_variable try_function/try_variable sounds more readable to me. What do you think? (I just arrived from China, so will take it more tought once unjetlagged) Honza should not need to add these cgraph_node symtab_node_def::ref_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); return x_function; } varpool_node symtab_node_def::ref_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); return x_variable; } add symtab_node_base *symtab_node_def::try_symbol() { return symbol; } cgraph_node *symtab_node_def::try_cgraph() { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; } varpool_node *symtab_node_def::try_varpool() { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; } change if (symtab_function_p (node) cgraph (node)-analyzed) return cgraph (node); to if (cgraph_node *p = node-try_cgraph()) if (p-analyzed) return p; change if (symtab_function_p (node) cgraph (node)-callers) to if (cgraph_node *p = node-try_cgraph()) if (p-callers) change if (symtab_function_p (node)) { struct cgraph_node *cnode = cgraph (node); to if (cgraph_node *cnode = node-try_cgraph ()) { likewise symtab_variable_p (node) and varpool (node) If there are any symtab_function_p (node) expressions left, add bool symtab_node_def::is_cgraph() { return symbol.type == SYMTAB_FUNCTION; } bool symtab_node_def::is_varpool() { return symbol.type == SYMTAB_VARIABLE; } change symtab_function_p (node) to node-is_cgraph () likewise symtab_variable_p (node) Though we would like to avoid doing so, if there are any cgraph (node) or varpool (node) expressions left, add symtab_node_base *symtab_node_def::ptr_symbol() { return symbol; } cgraph_node *symtab_node_def::ptr_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); { return x_function; } varpool_node *symtab_node_def::ptr_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); { return x_variable; } change cgraph (node) = node-ptr_cgraph() likewise varpool (node) TYPE SAFETY ### If a function asserts that its symtab_node parameter is symtab_function_p, then convert the function to take a cgraph_node* and change the callers to convert as above. PROPOSED HIERARCHY ### enum symtab_type { SYMTAB_SYMBOL, SYMTAB_FUNCTION, SYMTAB_VARIABLE }; struct symtab_node_base /* marked SYMTAB_SYMBOL */ embeds symtab_type cgraph_node : symtab_node_base /* marked SYMTAB_FUNCTION */ varpool_node : symtab_node_base /* marked SYMTAB_VARIABLE */ changing typedef union symtab_node_def *symtab_node; typedef const union symtab_node_def *const_symtab_node; to typedef symtab_node_base *symtab_node; typedef const symtab_node_base *const_symtab_node; changing used of symtab_node_def to symtab_node_base -- Lawrence Crowl
Re: Cgraph Modification Plan
On Wed, Sep 5, 2012 at 3:47 PM, Jan Hubicka hubi...@ucw.cz wrote: What do you think of the following plan for turning cgraph into a class hierarchy? We cannot finish it until we have gengtype understanding single inheritance, but we can start changing APIs in preparation. Good you told me, I was about trying that myself. Did not know gengtype do not understand inheritance yet. APPROACH ### Add converters and testers. Change callers to use those. Change callers to use type-safe parameters. Change implementation to class hierarchy. Add accessors. Sounds good to me. CONVERTERS AND TESTERS ### add symtab_node_base symtab_node_def::ref_symbol() { return symbol; } symtab_node_base cgraph_node::ref_symbol() { return symbol; } symtab_node_base varpool_node::ref_symbol() { return symbol; } change node-symbol.whatever to node-ref_symbol().whatever OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. 3) it seems natural to drop 'node' in the naming scheme symbol (symtab_entry) -- base class; function -- derived from symbol; (It conflicts with the existing struct function though); variable -- derived from symbol; typedef nodefunction cnode; 4) coding convention is needed for functions that do 'casting' and 'trial casting'. thanks, David I would expect inheritance to allow me to write node-whatever and to have casting operators to convert things back and forth, so we only need to add testers? Probably is_function/is_variable try_function/try_variable sounds more readable to me. What do you think? (I just arrived from China, so will take it more tought once unjetlagged) Honza should not need to add these cgraph_node symtab_node_def::ref_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); return x_function; } varpool_node symtab_node_def::ref_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); return x_variable; } add symtab_node_base *symtab_node_def::try_symbol() { return symbol; } cgraph_node *symtab_node_def::try_cgraph() { return symbol.type == SYMTAB_FUNCTION ? x_function : NULL; } varpool_node *symtab_node_def::try_varpool() { return symbol.type == SYMTAB_VARIABLE ? x_variable : NULL; } change if (symtab_function_p (node) cgraph (node)-analyzed) return cgraph (node); to if (cgraph_node *p = node-try_cgraph()) if (p-analyzed) return p; change if (symtab_function_p (node) cgraph (node)-callers) to if (cgraph_node *p = node-try_cgraph()) if (p-callers) change if (symtab_function_p (node)) { struct cgraph_node *cnode = cgraph (node); to if (cgraph_node *cnode = node-try_cgraph ()) { likewise symtab_variable_p (node) and varpool (node) If there are any symtab_function_p (node) expressions left, add bool symtab_node_def::is_cgraph() { return symbol.type == SYMTAB_FUNCTION; } bool symtab_node_def::is_varpool() { return symbol.type == SYMTAB_VARIABLE; } change symtab_function_p (node) to node-is_cgraph () likewise symtab_variable_p (node) Though we would like to avoid doing so, if there are any cgraph (node) or varpool (node) expressions left, add symtab_node_base *symtab_node_def::ptr_symbol() { return symbol; } cgraph_node *symtab_node_def::ptr_cgraph() { gcc_assert (symbol.type == SYMTAB_FUNCTION); { return x_function; } varpool_node *symtab_node_def::ptr_varpool() { gcc_assert (symbol.type == SYMTAB_VARIABLE); { return x_variable; } change cgraph (node) = node-ptr_cgraph() likewise varpool (node) TYPE SAFETY ### If a function asserts that its symtab_node parameter is symtab_function_p, then convert the function to take a cgraph_node* and change the callers to convert as above. PROPOSED HIERARCHY
Re: Cgraph Modification Plan
On 9/5/12, Jan Hubicka hubi...@ucw.cz wrote: CONVERTERS AND TESTERS ### add symtab_node_base symtab_node_def::ref_symbol() { return symbol; } symtab_node_base cgraph_node::ref_symbol() { return symbol; } symtab_node_base varpool_node::ref_symbol() { return symbol; } change node-symbol.whatever to node-ref_symbol().whatever OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. Yes. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. I have a general preference for making name changes separate from function changes when doing so makes svn diffs better reflect the actual changes. That may or may not be useful here, but I am happy to use those names for the final product. I would expect inheritance to allow me to write node-whatever and to have casting operators to convert things back and forth, so we only need to add testers? The inheritance allows implicit derived to base conversion, which is type-safe. With inheritance, the converting functions are needed only for the conversion from base to derived, which generally requires dynamic information to be safe. Probably is_function/is_variable try_function/try_variable sounds more readable to me. What do you think? Fine with me. (I just arrived from China, so will take it more tought once unjetlagged) -- Lawrence Crowl
Re: Cgraph Modification Plan
On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. Do you mean moving away from a pointer-based approach? 3) it seems natural to drop 'node' in the naming scheme symbol (symtab_entry) -- base class; function -- derived from symbol; (It conflicts with the existing struct function though); variable -- derived from symbol; typedef nodefunction cnode; A nodefunction is not a derived class of nodesymbol even when function is derived from symbol. That property is helpful in ensuring usable type safety. 4) coding convention is needed for functions that do 'casting' and 'trial casting'. The functions I have suggestion are ptr_... and try_... respectively. Are you suggesting amending the coding conventions? -- Lawrence Crowl
Re: Cgraph Modification Plan
On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but there is not _that_ much code in common - we do some strongly connected components discovery on all three kinds of graphs and basic traversals. 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. Do you mean moving away from a pointer-based approach? I would say that trunk now have what is described here. I still plan to unify some of code (like alias handling) that is split for historical reasons, but those are mostly cleanups still intended for 4.8. Honza
Re: Cgraph Modification Plan
The cgraph redesign probably deserves more discussion. 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. 3) it seems natural to drop 'node' in the naming scheme symbol (symtab_entry) -- base class; function -- derived from symbol; (It conflicts with the existing struct function though); variable -- derived from symbol; typedef nodefunction cnode; We could drop node, yes. The conflict with current function is more interesting. Basically symbol table function is that we need for IPA optimizations, while struct function is what is load when function body is needed. Those are different. Not sure if we don't want to have symtab_ prefix for all symbol table stuff or if we want to rename struct function into something more fit. Honza
Re: Cgraph Modification Plan
On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but there is not _that_ much code in common - we do some strongly connected components discovery on all three kinds of graphs and basic traversals. Also note that symbol table realy is an oriented multigraph with different types of edges (i.e. calls and references and eventualy we will have indirect calls with mutiple possible destinations derrived from PTA and devirtualization logic). Honza
Re: Cgraph Modification Plan
On Wed, Sep 5, 2012 at 5:41 PM, Lawrence Crowl cr...@googlers.com wrote: On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. Yes. My point is that the intermediate steps should be driven by the final goal. It is important to first have the design of final class hierarchy and interfaces nailed and then the intermediate steps defined instead of the other way around. 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. Do you mean moving away from a pointer-based approach? See above. I mean it is important to have well defined symtab interfaces that hide implementation as much as possible. It will make the interfaces more stable. It is currently quite difficult for cgraph/varpool related changes in gcc branches to keep up with trunk without stable core APIs. 3) it seems natural to drop 'node' in the naming scheme symbol (symtab_entry) -- base class; function -- derived from symbol; (It conflicts with the existing struct function though); variable -- derived from symbol; typedef nodefunction cnode; A nodefunction is not a derived class of nodesymbol even when function is derived from symbol. That property is helpful in ensuring usable type safety. We don't need nodesymbol -- only nodefunction is needed, and it is derived from function and function is derived from symbol. 4) coding convention is needed for functions that do 'casting' and 'trial casting'. The functions I have suggestion are ptr_... and try_... respectively. Are you suggesting amending the coding conventions? Is this something agreed upon in previous discussions or something newly brought up? How about using template function in base to do casting: trial cast: templatetypename T T* symbol::cast_to(symbol* p) { if (p-isT()) return static_castT*(p); return 0; } cast: templatetypename T T symbol:as(symbol* p) { assert(p-isT()) return static_castT(*p); } David -- Lawrence Crowl
Re: Cgraph Modification Plan
On Wed, Sep 5, 2012 at 9:32 PM, Jan Hubicka hubi...@ucw.cz wrote: On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but there is not _that_ much code in common - we do some strongly connected components discovery on all three kinds of graphs and basic traversals. Once we have the common infrastructure for graphs, new uses will be easily created (e.g, dynamic callgraph in profile-gen runtime analysis, module graph, etc) -- graph is one of the most basic data structures for compilers. David 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. Do you mean moving away from a pointer-based approach? I would say that trunk now have what is described here. I still plan to unify some of code (like alias handling) that is split for historical reasons, but those are mostly cleanups still intended for 4.8. Honza
Re: Cgraph Modification Plan
On Wed, Sep 5, 2012 at 9:38 PM, Jan Hubicka hubi...@ucw.cz wrote: On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. Commonizing some of cfg/cgraph/ssa graph handling would not hurt, but there is not _that_ much code in common - we do some strongly connected components discovery on all three kinds of graphs and basic traversals. Also note that symbol table realy is an oriented multigraph with different types of edges (i.e. calls and references and eventualy we will have indirect calls with mutiple possible destinations derrived from PTA and devirtualization logic). yes, but additional edge properties can also be abstracted away by using template parameters. David Honza
Re: Cgraph Modification Plan
On Wed, Sep 5, 2012 at 5:41 PM, Lawrence Crowl cr...@googlers.com wrote: On 9/5/12, Xinliang David Li davi...@google.com wrote: On Sep 5, 2012 Jan Hubicka hubi...@ucw.cz wrote: OK, the basic idea is that symtab_node is basetype of cgraph_node and varpool_node. We may want to drop the historica cgraph/varpool names here, since function_node/variable_node would sound better. Cgraph still exists, but varpool is more or less historical relic. The cgraph redesign probably deserves more discussion. Hence, the post! 1) It may be worthwhile to abstract the graph manipulation code into a utility class which is templatized. graphT, nodeT with node inheriting from T. While I won't dispute the statement, getting from here to there will likely involve some intermediate steps. Yes. My point is that the intermediate steps should be driven by the final goal. It is important to first have the design of final class hierarchy and interfaces nailed and then the intermediate steps defined instead of the other way around. Yes, it seems resonable to me. We just need to keep in mind the specifics callgraph has (i.e. many types of edges that are relevant in different contextes and have good reason for different kind of in memory reppresentation) There are also different types of nodes - functions, variables, aliases. I plan to add labels in 4.8 timeframe, too). 2) Introduce a global symbol table containing a function table and a global variable table. The function table should replace the current cgraph node link list, and the variable table replaces the varpool. The symbol table should provide basic interfaces to do named based lookup, traversal, alias handling etc. I noticed trunk already has some of that -- but it seems more abstraction is needed. Do you mean moving away from a pointer-based approach? See above. I mean it is important to have well defined symtab interfaces that hide implementation as much as possible. It will make the interfaces more stable. It is currently quite difficult for cgraph/varpool related changes in gcc branches to keep up with trunk without stable core APIs. Well, the APIs did not changed much over years (LTO brought in some more busy changes, but many were just hacks that needed redesign anyway), in 4.8 we changed quite a lot because of introduction of the symbol table. It would be really nice to get as much as C++ conversion into 4.8, too, since we broke most of existing patches anyway, to get more stability frm 4.9 up. 3) it seems natural to drop 'node' in the naming scheme symbol (symtab_entry) -- base class; function -- derived from symbol; (It conflicts with the existing struct function though); variable -- derived from symbol; typedef nodefunction cnode; A nodefunction is not a derived class of nodesymbol even when function is derived from symbol. That property is helpful in ensuring usable type safety. We don't need nodesymbol -- only nodefunction is needed, and it is derived from function and function is derived from symbol. Yes, this is where I expected to land. I expect to see more complex inheritance tree. Currently we have: symbol | +-function (cgraph) | +-variable (varpool) While function can have five types - just external declarations, actual definitions with body (analyzed flag is set), thunk, alias and virtual clone Similarly variables are declarations, ones with body (analyzed flag is set), and aliases. We do not represent labels and const_decl references. We ought: those are also symbols and they are important for partitioning. We probably want rither structure here in longer term: symbol | +-function declaration | | | +-function definition | | | +-function alternative entry point (thunk) | | | +-function alias | | | +-virtual clone | +-variable declaration | | | +-variable definition (possibly constant pool references via const_decl here, too) | | | +-variable alias | +-function label definition In particular we do not want to store all the stuff we store for function definitions for declarations. variable/function aliases do have a lot in common (similarly for function/variable definitions), too. The reason why I did not introduce separate alias_node type is that function aliases are different from variable aliases in several ways. For example, they can not be destinations of function calls. 4) coding convention is needed for functions that do 'casting' and 'trial casting'. The functions I have suggestion are ptr_... and try_... respectively. Are you suggesting amending the coding conventions? Is this something agreed upon in previous discussions or something newly brought up? How about using template function in base to do casting: trial cast: templatetypename T T* symbol::cast_to(symbol* p) { if (p-isT()) return static_castT*(p); return 0; } cast: