new dwarf_output algorithm

Mark Wielaard Wed, 05 Jan 2011 12:18:02 -0800

Hi,

After rereading some of the discussions we had last year on the
dwarf_output algorithm, here is a recap and a new description of how I
think we should go forward with it.


The goal of dwarf_output is given an existing dwarf data structure to
provide a dwarf data structure where as many items as possible are
shared because they end up being (semantically) equal in the original
dwarf structure.

To that end the dwarf_output::copier goes over the CU and puts elements
into the dwarf_output_collector, which makes sure every element is
uniquely represented. To put an element into the collector we need to
make sure it is "complete", which also means it has a stable hash that
can be used for quick comparisons.

The copier goes over the DIEs keeping track of each DIE reference to
know its state (undefined - not seen yet, pending - depends on non-final
references, final - all references also final, placed - done).

The problem we had came from the fact that if taking the DIE children
and DIE attribute references together we got a full graph which could
contain cycles. So the hash of an attribute reference was defined as the
identify (pointer) of the DIE it was pointing to, or zero if it was part
of a circular chain. But only the reference that was detected as
"causing" the chain. It was that last issue that caused problems, since
in identical structures different references could be pin-pointed as
"causing" the cycle.

Just hashing all references to zero that are part of a cycle makes too
many structures hash identical, and cycle detection in arbitrary graphs
is not fun. So I was proposing a way to define categorizing/hashing the
DIEs based on properties that don't involve arbitrary cycles.

The algorithm depends on the fact that you can see the CU as two
different trees/forests without cycles in themselves. First there is the
DIE trees where we ignore the attribute references and only look at the
children of a DIE. Secondly there is the attribute reference tree
(attributes reference other DIEs to describe properties, but these
properties itself aren't cyclic or self-referential (this seems
semantically true for all current dwarf reference attribute types, but
structurally dwarf seems to allow DIEs with reference attributes that
would allow backreferences, so we would need to detect this as a sanity
check).

The original proposal was worded as walking the DIE tree multiple times.
https://fedorahosted.org/pipermail/elfutils-devel/2010-November/001699.html
But it might be easier to describe/fit into the current copier design by
describing it as when elements can be finalized into the collector
and/or how hashes are calculated.

In the below I assume we will keep two hashes per element. The
"children-hash" and the "references-hash". An element can be
finalized/put in the collector if both are fully defined. We can
probably keep one hash, but it seemed simpler to describe using two.
(These two are different from the use pointer as hash recently discussed
on the list.)

- The children-hash of a DIE without children is calculated from its
attributes but only using the values of basic attributes (ignoring the
value of reference attributes).

- The children-hash of a DIE with children is the same as one without,
combined with all children-hashes of its children.

- The references-hash of a DIE without reference attributes is equal to
its children-hash.

- The references-hash of a DIE with reference attributes is the
combination of the children DIE of the DIE itself, combined with all
reference-hashes of all DIEs that are referenced by the reference
attributes.

- For an attribute set to be finalized all DIEs pointed to by referece
attributes should have a reference-hash set. The hash of an attribute
set then is the combination of simple attribute values hashes combined
with the reference-hashes of all those DIEs.

- For a vector of DIEs (children of another DIE or the CU) to be placed
in the collector all DIEs need to have reference-hashes. The hash of a
children vector is the combination of all DIE reference-hashes.

- For a DIE to be finalized, the attribute set and the children vector
need to have been finalized and it has to have its reference-hash set.

So by calculating the children-hash of a DIE always before the
reference-hash of the DIE, we make sure we are chasing down one "tree"
at a time.

I think the above should fit into the current copier code setup. But I
am sure I forgot something. It will come up when actually trying to
implement it this way. Missing is the sanity check to make sure that a
reference attribute never references a DIE for which its reference hash
is currently being resolved (which shouldn't ever happen, but might be
caused by funky dwarf).

Cheers,

Mark


_______________________________________________
elfutils-devel mailing list
[email protected]
https://fedorahosted.org/mailman/listinfo/elfutils-devel

new dwarf_output algorithm

Reply via email to