Re: Canonical type nodes, or, comptypes considered harmful

Mike Stump Thu, 09 Nov 2006 16:32:14 -0800

On Nov 8, 2006, at 5:59 AM, Doug Gregor wrote:

However, this approach could have some odd side effects when there are
multiple mappings within one context. For instance, we could have
something like:


 typedef int foo_t;
 typedef int bar_t;

 foo_t* x = strlen("oops");

x is a decl, the decl has a type, the context of that instance of thetype is x.


map(int,x) == foo_t.

It is this, because we know that foo_x was used to create x and weset map(int,x) equal to foo_t as it is created.

It can never be wrong. It can never be wrong, because any use of thetype that would have had the wrong value comes from a specificcontext, and that specific context map(type,context) can be set to_any_ value, including the right value. Put another way, any timeone carries around type outside of a specific context, one also needsto also carry around the context the type came from.

The error message that pops out would likely reference "bar_t *"


map(int,x) doesn't yield bar_t.

This approach wouldn't help with the implementation of concepts,
because we need to be able to take two distinct types (say, template
type parameters T and U) and consider them equivalent in the type
system.

I'd need to see more specifics, but from just the above... Any datayou need that would make them different, you put into map(type,context), we're not restricted to just the typedef name. Onceyou do that, then you discover that what's left, is identical, andsince they are identical, they have the same address, and the sameaddress makes them the same type.

The two things this doesn't work on are if you have two differentnotions of equality, my scheme (unaltered) can only handle 1definition for equality, or some of the temporal aspects, like, wedidn't know T1 and T2 were the same before, but now we do, becausethey are both int. The later case I'm expecting to not be an issue,as to form the type, you do the substitution and after you do it, youreplace T1 with int (creating data in map(int,context), if you laterneed to know this was a T1 for any reason (debugging, errormessages)). These bubble up and one is left with the real type, andthen equality remains fast, post substitution. Reasoning about typeequality pre-substitution remains slow.

You can even get fast unsubstituted comparisons for a particulardefinition of equality. You boost the substitution bits out asvariants, notice then, you have nothing left, and nothing is nothing,so the values wind up being the same again. Now to get comptypes towork, you just have to add code to compare the boosted variants inthe top of comptypes. Now, before you say that that is as bad aswhat we had before, no, it isn't. If the type is equal, then you canimmediate fail the comparison, this takes care of 90% of the calls.After than you check the variants for equality and return that. Theone address compare doesn't hit memory and can answer most of theequations by itself. The variants are all on one cache line, and ifthe cost to compare them is cheap, it is just two memory hits.

We can't literally combine T and U into a single canonical
type node, because they start out as different types.

Granted, we could layer a union-find implementation (that bettersupports
concepts) on top of this approach.

Ah, but once you break the fundamental quality that differentaddresses implies different types, you limit things to structuralequality and that is slow.

 type = type_representative (TREE_TYPE (exp));
 if (TREE_CODE (type) == REFERENCE_TYPE)
   type = TREE_TYPE (type);

We could find all of these places by "poisoning" TREE_CODE for
TYPE_ALIAS_TYPE nodes, then patch up the compiler to make the
appropriate type_representative calls. We'd want to save the original
type for diagnostics.


Or, you can just save the context the type came from:

        type = TREE_TYPE (exp);
        type_context = &TREE_TYPE (exp);

same amount of work on the use side, but much faster equality checking.

An alternative to poisoning TREE_CODE would be to have TREE_TYPE do
the mapping itself and have another macro to access the original
(named) type:

 #define TREE_TYPE(NODE) type_representative ((NODE)->common.type)
 #define TREE_ORIGINAL_TYPE(NODE) ((NODE)->common.type)


Likewise, given those, we could do:

 #define TREE_TYPE(NODE) ((NODE)->common.type)
 #define TREE_ORIGINAL_TYPE(NODE)
   (map((NODE)->common.type, &(NODE)->common.type)
    ? map((NODE)->common.type, &(NODE)->common.type)
    : (NODE)->common.type)

and remain fast for equality.

Since we know that type canonicalization is incremental, could wework

> toward type canonicalization in the GCC 4.3 time frame?

If by we you mean you, I don't see why that would be a bad
idea.  :-)  The risk is if one invests all this effort, and the win
turns out to be < 1% on real code and 10x on benchmark code, one
feels bad.


ConceptGCC has hit the point where compile times have gotten
prohibitive, and we need to illustrate that concepts can be compiled
efficiently. So, I'm stuck working toward type canonicalization in
ConceptGCC for performance reasons. Since type comparison is literally
50% of my compile time now, I'm sure to win. I just don't know if
what I come up with will be a win when concepts aren't present.

:-) At 50%, I think it is clear, that if you require types to befully canonical, trivially, type comparisons go from linear toconstant time, where the constant time is around one machineinstruction that doesn't touch memory, not one byte. Trivially, this

gets a 2x faster compiler.  We really, really want a 2x faster compiler.

Now, what's the cost, in normal C++, debug information needs to know,error messages need to know and we have to run a hash for them. Thecost is log n for each, instead of constant. I'd like to think thiscost is paid for by the improvement from linear time type checking,as error messages are either, never printed, or we don't care aboutcompilation speed, and debug information is only done once, if atall. Type comparisons are done often.

The big question is whether I end up doing this work in ConceptGCC
(only) or also in FSF GCC, and if anyone is willing to help with the
FSF GCC version.

I'd endorse putting type canonicalization into mainline if it weresame speed or faster. I think/hope it would be a speed win.However, I don't know it would be. That'd be the risk of doing thecode. A bad hash could easily kill performance. For the 50% case,I'm fairly sure it would be a speed win. I think for template heavycode, it would be a speed win.

Re: Canonical type nodes, or, comptypes considered harmful

Reply via email to