On Tue, 2010-11-09 at 00:41 -0800, Roland McGrath wrote: > > Another idea would be to use not use a "perfect hash" for reference > > attributes. Allow some equal reference attributes to not hash the same. > > I don't see how that would fly, unless, as you say, we do it only > temporarily and then replace them with perfect ones later. The matching > hashes is the core of the whole plan for duplicate elimination.
It would mean "take our losses" and not be perfect. Some duplication would exist that we don't detect. I am just suggesting that to get at least some results (that is if the dwarf comparator wouldn't be too slow also, then we could plug in a dwarf writer and see how far we get). > > The nice thing about most of the hashing done in dwarf_output is that it > > "uniqifies" value/die groups and so creates a perfect unique hash. This > > can be done with ordered sets of values or die trees. But the reference > > attributes introduce arbitrary (possibly circular) graphs between > > elements. We get in trouble because we want to cut the circularity, but > > don't know "the start of the circle". > > It might be possible to choose a canonical ordering so that we always > start at the same step in the cycle. I haven't thought this through, > but maybe it is a line worth considering. For example, say we chose a > canonical ordering of the direct children of a CU. Then, among all CUs > so canonicalized, there would be a single choice for the first DIE in > the depth-first ordering of the whole CU tree that is the start of a > cycle, the same choice for all instances of an identical cycle. > Is that true? That takes advantage of the fact that we know we can ignore the order of the children of a top-level CU, but I think we cannot do that easily for any deeper nested DIE children. Unless we somehow encode when it is allowed to ignore the order of of children for each DIE tag. The same issue can show up under a DW_TAG_namespace for example. And how would we decide the correct canonical order? If we look at something like this, then it looks like we should pick pointer_types first and then variables. <compile_unit> <structure_type ref="ref2" name="list"> <member name="next" type="#ref1"/> </structure_type> <pointer_type ref="ref1" type="#ref2"/> <variable name="var" type="#ref3"/> <pointer_type ref="ref3" type="#ref2"/> </compile_unit> <compile_unit> <structure_type ref="ref2" name="list"> <member name="next" type="#ref1"/> </structure_type> <pointer_type ref="ref1" type="#ref2"/> <variable name="var" type="#ref1"/> </compile_unit> Since if we start with the pointer_types, we are fine since we can collapse ref1 and ref3 immediate. But things seem to go wrong if we happen to pick the variables to resolve first, because then the circularity seems to look different in both CUs, because we haven't collapsed the pointer_types yet. Is this a solid rule, or does it just happen to fall out of this particular example and can you construct a counter example that reverses the "correct" order? Cheers, Mark _______________________________________________ elfutils-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/elfutils-devel
