> I think I (once again) forgot about the importance of the fact that when > a reference attribute points to a DIE the context of that DIE always > matters. That makes any comparison of DIEs not very "simple enough".
Ah, yes. I managed to forget about this when reviewing your plan, too. > So then you get to drag in the reference tracker and use dwarf_comparator > for correctness. But I would really like to have the dwarf_output > algorithm be independent from the dwarf_comparator implementation. Not > just because we then have an independent checker. But also because it > feels that what dwarf_output should be simpler and more concrete. I agree with those sentiments. But I'm not sure what it means in practice. > Since in the proposed algorithm the first pass does a full walk over the > DIE tree already, it should be simple to just store the full context of > every DIE. Then in the second pass when we hash all reference attributes > we can hash in this context. Which should reduce collisions again. This does complicate things somewhat, since we have always said before that we want to use an "equal enough" predicate for context matching, rather than an "identical" predicate. It so happens the first cut at "equal enough" is to compare basic attributes and ignore the references. But we want to leave the door open to tweak that predicate later on to improve sharing. (In fact, I am fairly sure we'll need to tweak it to get sufficient sharing for C++ because of how the namespace declarations are done in the libstdc++ headers.) So in the first pass, we'll have two different hash functions. To begin with, "context hash" can be just the combination of the basic-attributes-only hash (the part of the first pass hash that we compute before looking at children) with the parent's context hash. Later on, we'll most likely want to replace that with a heuristic that excludes certain basic attributes on certain tags. > I am pondering if we could just rely on the fact that a pure reference > chain (ignoring child DIEs) will never loop. I am not sure what to do > if we ever encounter a real reference chain loop. But it should be easy > to detect. But if we can then we would have for each DIE a full hash for > the whole "child chain" and for the whole "reference chain", which > should give us "perfect" hashes, since they would describe each DIE > completely. I don't think we should have the plan of just blowing up when there is such a loop. But I don't think we'll encounter one in the real world. So it should be fine to start out with detecting it and barfing coherently with an "implement me" error. Of course this won't ever be a "perfect hash" in the normal sense of that term, since it's still a combinatorial function that can always produce the same hash for unrelated inputs if the bits happen that way. > But I am note sure we can get rid of ignoring hash-collisions though. > Because I don't see how you ever make a distinction between a hash > collision because two DIEs are indeed completely identical and when > there is false collision because the hash function isn't perfect. I don't see how it could ever be avoided completely, except for the cryptographic hash idea (which I don't especially like). Thanks, Roland _______________________________________________ elfutils-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/elfutils-devel
