Hi, After rereading some of the discussions we had last year on the dwarf_output algorithm, here is a recap and a new description of how I think we should go forward with it.
The goal of dwarf_output is given an existing dwarf data structure to provide a dwarf data structure where as many items as possible are shared because they end up being (semantically) equal in the original dwarf structure. To that end the dwarf_output::copier goes over the CU and puts elements into the dwarf_output_collector, which makes sure every element is uniquely represented. To put an element into the collector we need to make sure it is "complete", which also means it has a stable hash that can be used for quick comparisons. The copier goes over the DIEs keeping track of each DIE reference to know its state (undefined - not seen yet, pending - depends on non-final references, final - all references also final, placed - done). The problem we had came from the fact that if taking the DIE children and DIE attribute references together we got a full graph which could contain cycles. So the hash of an attribute reference was defined as the identify (pointer) of the DIE it was pointing to, or zero if it was part of a circular chain. But only the reference that was detected as "causing" the chain. It was that last issue that caused problems, since in identical structures different references could be pin-pointed as "causing" the cycle. Just hashing all references to zero that are part of a cycle makes too many structures hash identical, and cycle detection in arbitrary graphs is not fun. So I was proposing a way to define categorizing/hashing the DIEs based on properties that don't involve arbitrary cycles. The algorithm depends on the fact that you can see the CU as two different trees/forests without cycles in themselves. First there is the DIE trees where we ignore the attribute references and only look at the children of a DIE. Secondly there is the attribute reference tree (attributes reference other DIEs to describe properties, but these properties itself aren't cyclic or self-referential (this seems semantically true for all current dwarf reference attribute types, but structurally dwarf seems to allow DIEs with reference attributes that would allow backreferences, so we would need to detect this as a sanity check). The original proposal was worded as walking the DIE tree multiple times. https://fedorahosted.org/pipermail/elfutils-devel/2010-November/001699.html But it might be easier to describe/fit into the current copier design by describing it as when elements can be finalized into the collector and/or how hashes are calculated. In the below I assume we will keep two hashes per element. The "children-hash" and the "references-hash". An element can be finalized/put in the collector if both are fully defined. We can probably keep one hash, but it seemed simpler to describe using two. (These two are different from the use pointer as hash recently discussed on the list.) - The children-hash of a DIE without children is calculated from its attributes but only using the values of basic attributes (ignoring the value of reference attributes). - The children-hash of a DIE with children is the same as one without, combined with all children-hashes of its children. - The references-hash of a DIE without reference attributes is equal to its children-hash. - The references-hash of a DIE with reference attributes is the combination of the children DIE of the DIE itself, combined with all reference-hashes of all DIEs that are referenced by the reference attributes. - For an attribute set to be finalized all DIEs pointed to by referece attributes should have a reference-hash set. The hash of an attribute set then is the combination of simple attribute values hashes combined with the reference-hashes of all those DIEs. - For a vector of DIEs (children of another DIE or the CU) to be placed in the collector all DIEs need to have reference-hashes. The hash of a children vector is the combination of all DIE reference-hashes. - For a DIE to be finalized, the attribute set and the children vector need to have been finalized and it has to have its reference-hash set. So by calculating the children-hash of a DIE always before the reference-hash of the DIE, we make sure we are chasing down one "tree" at a time. I think the above should fit into the current copier code setup. But I am sure I forgot something. It will come up when actually trying to implement it this way. Missing is the sanity check to make sure that a reference attribute never references a DIE for which its reference hash is currently being resolved (which shouldn't ever happen, but might be caused by funky dwarf). Cheers, Mark _______________________________________________ elfutils-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/elfutils-devel
