> >   2) As we query the type_hash while we are rewritting the types,
> >      we run into instability of the hashtable. This manifests itself
> >      as an ICE when one adds sanity check that while merging function
> >      types their arg types are equivalent, too.
> >      This ICEs compiling i.e. sqlite but I did not really managed to
> >      reduce this.  I tracked it down to the argument type being inserted
> >      into gimple_type_hash but at the time we query the new argument type,
> >      the original is no longer found despite their hashes are equivalent.
> >      The problem is hidden when things fit into the leader cache,
> >      so one needs rather big testcase.
> 
> Ugh.  For reduction you can disable those caches though.  The above
> means there is a disconnect between hashing and comparing.
> Maybe it's something weird with the early out
> 
>       if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
>         goto same_types;
> ?

Well, the problem goes away when you process all types before changing, so I
think it really is instability of hash table computation. But I am not sure how
to test for it.
Even disabling the caching and recomputing after gimple_register_type leads
to different results.
> 
> > So I tried to register all gimple types first.  Use TREE_VISITED within
> > the merging code to mark that type is not a leader and then TREE_CHAIN
> > to point to the leader.  This avoids need to re-query the hashtable
> > from the later fixups.  We only look for types with TREEE_VISITED
> > and replace them by TREE_CHAIN.
> 
> TREE_CHAIN is unused for types?  But we probably shouldn't add a new
> use ...

It is used, but unused for type merging.  
 /* Nodes are chained together for many purposes.
   Types are chained together to record them for being output to the debugger
   (see the function `chain_type'). */

We know that types that lost merging will not be used later, so we can
overwrite pointers we don't need.

When one removes the type from variant list during registering, one can
also use TYPE_MAIN_VARIANT, for example.
> 
> > This has two passes.  First we compute the main variants and mark
> > field_decls and type_decls for merging and in last pass we finally do
> > fixup on what remained in the table.
> >
> > This allows me to poison pointers in the removed types in a way
> > so the GGC would ICE if they stayed reachable.
> > I however need the extra pass because
> >  1) I can not update the type_decls/field_decls while registering
> >     types or I run into the hash table problems
> >  2) I can not merge the second two passes because at the time
> >     I find type/field decls equialent there may be earlier pointers
> >     to them.
> 
> You need to "merge" all trees reachable from the one you start at once
> (what I'm working on from time to time - work per tree "SCC", in a DFS
> walk).

Yep, doing things per-SCC is definitely good idea. 

It will also give a chance to improve the hash itself.  If you process in SCC
order you know that all references outside SCC have already leaders set and you
can hash their addresses rather than using the weak hash.

I would really love to see this done.  After updating Mozilla we now need 10GB
of RAM and about 18 minutes for merging (they merged in new JIT that aparently
plays badly with our types). This makes any development/testing difficult.

Honza

Reply via email to