On Mon, 2010-11-15 at 18:16 -0800, Roland McGrath wrote: > This plan of attack hinges on the "simple enough to be sure it's > correct" comparison implementation not being unusably slow on our > real-world data. I'm not really very sure about that at all. An > alternative plan is to stub it out to assume that all hash collision > comparisons are false, and try proceed in parallel on those other > things from there. But that is going to conflate all a ** with b ** > and suchlike, so it seems unlikely to be very usable with real data. > > What do you think?
I think I (once again) forgot about the importance of the fact that when a reference attribute points to a DIE the context of that DIE always matters. That makes any comparison of DIEs not very "simple enough". So then you get to drag in the reference tracker and use dwarf_comparator for correctness. But I would really like to have the dwarf_output algorithm be independent from the dwarf_comparator implementation. Not just because we then have an independent checker. But also because it feels that what dwarf_output should be simpler and more concrete. Since in the proposed algorithm the first pass does a full walk over the DIE tree already, it should be simple to just store the full context of every DIE. Then in the second pass when we hash all reference attributes we can hash in this context. Which should reduce collisions again. I am pondering if we could just rely on the fact that a pure reference chain (ignoring child DIEs) will never loop. I am not sure what to do if we ever encounter a real reference chain loop. But it should be easy to detect. But if we can then we would have for each DIE a full hash for the whole "child chain" and for the whole "reference chain", which should give us "perfect" hashes, since they would describe each DIE completely. But I am note sure we can get rid of ignoring hash-collisions though. Because I don't see how you ever make a distinction between a hash collision because two DIEs are indeed completely identical and when there is false collision because the hash function isn't perfect. Cheers, Mark _______________________________________________ elfutils-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/elfutils-devel
