Re: dwarf_edit_output testcase sanity checking

Mark Wielaard Tue, 09 Nov 2010 07:14:35 -0800

On Tue, 2010-11-09 at 00:41 -0800, Roland McGrath wrote:
> > Another idea would be to use not use a "perfect hash" for reference
> > attributes. Allow some equal reference attributes to not hash the same.
> 
> I don't see how that would fly, unless, as you say, we do it only
> temporarily and then replace them with perfect ones later.  The matching
> hashes is the core of the whole plan for duplicate elimination.


It would mean "take our losses" and not be perfect. Some duplication
would exist that we don't detect. I am just suggesting that to get at
least some results (that is if the dwarf comparator wouldn't be too slow
also, then we could plug in a dwarf writer and see how far we get).

> > The nice thing about most of the hashing done in dwarf_output is that it
> > "uniqifies" value/die groups and so creates a perfect unique hash. This
> > can be done with ordered sets of values or die trees. But the reference
> > attributes introduce arbitrary (possibly circular) graphs between
> > elements. We get in trouble because we want to cut the circularity, but
> > don't know "the start of the circle".
> 
> It might be possible to choose a canonical ordering so that we always
> start at the same step in the cycle.  I haven't thought this through,
> but maybe it is a line worth considering.  For example, say we chose a
> canonical ordering of the direct children of a CU.  Then, among all CUs
> so canonicalized, there would be a single choice for the first DIE in
> the depth-first ordering of the whole CU tree that is the start of a
> cycle, the same choice for all instances of an identical cycle.  
> Is that true?

That takes advantage of the fact that we know we can ignore the order of
the children of a top-level CU, but I think we cannot do that easily for
any deeper nested DIE children. Unless we somehow encode when it is
allowed to ignore the order of of children for each DIE tag. The same
issue can show up under a DW_TAG_namespace for example.

And how would we decide the correct canonical order? If we look at
something like this, then it looks like we should pick pointer_types
first and then variables.

 <compile_unit>
  <structure_type ref="ref2" name="list">
   <member name="next" type="#ref1"/>
  </structure_type>
  <pointer_type ref="ref1" type="#ref2"/>
  <variable name="var" type="#ref3"/>
  <pointer_type ref="ref3" type="#ref2"/>
 </compile_unit>

 <compile_unit>
  <structure_type ref="ref2" name="list">
   <member name="next" type="#ref1"/>
  </structure_type>
  <pointer_type ref="ref1" type="#ref2"/>
  <variable name="var" type="#ref1"/>
 </compile_unit>

Since if we start with the pointer_types, we are fine since we can
collapse ref1 and ref3 immediate. But things seem to go wrong if we
happen to pick the variables to resolve first, because then the
circularity seems to look different in both CUs, because we haven't
collapsed the pointer_types yet. Is this a solid rule, or does it just
happen to fall out of this particular example and can you construct a
counter example that reverses the "correct" order?

Cheers,

Mark

_______________________________________________
elfutils-devel mailing list
[email protected]
https://fedorahosted.org/mailman/listinfo/elfutils-devel

Re: dwarf_edit_output testcase sanity checking

Reply via email to