> > 1. basic writer data structure building (designed for the eventual fancy > > plans)
DwarfOutput is a sketchy new wiki page about the writer work. I didn't post because I didn't get it more together. But I should have been posting more partial incoherencies earlier rather than waiting. > What are these "fancy plans"? There it was an oblique reference to the "combined debug archive" idea. I wrote some wiki stuff about an earlier version of that idea, but my current thinking I haven't really written down in detail (or hashed out). The core issue of "eventual fancy" is to combine multiple .debug objects together into an ar archive or something similar (maybe its own format akin to locale-archive), where we write the DWARF in the constituent files using low-level format extensions to permit sharing of data across files inside the archive. For today, the only thing to consider about that is that it motivates the separation of dwarf_output_collector from dwarf_output. If we do the combined-archive output mode, there will still be one dwarf_output object per logical .debug file, but all will be built using a single dwarf_output_collector object. For this reason, some essential writer work resides in the collector. That will include the core means of identifying duplicates, and format output stuff like abbrev generation. > DWARF compression (as in zlib) was mentioned a couple times, We have never mentioned the zlib stuff, though binutils does now support it. The term "DWARF compression" is often used and is very ambiguous. The plain section data-compression stuff does not serve the interests that motivate our DWARF size reduction work (download size of aggregate packages that already use data compression). It is trivial to support, but possibly not even desireable (CPU/memory cost vs direct shareable mmap from files). > I expect that the "semantic compression" will be built on top of this, > as an application rather than intrinsic feature of the library. No, the DWARF-level size reduction will be in the writer. Part of that is optimal choice of "invisible" format details like abbrevs and form selection. But if you look at any large object, all those other sections are dwarfed in size (no pun intended) by .debug_info. What we expect to be the most effective "semantic compression" is consolidation of duplicate identical DIE subtrees. The writer will (optionally) do this automagically. > Also, what level do you want to write the writer on? C? There is "C++ > interface for writer" item on your list of tasks, but it's not clear > whether C++ is the place where the writer will be implemented, or rather > a place where it would be wrapped. The writer will be in pure C++, no real programmatic C interface to it. That is the reason for the whole focus on the C++ layer for the reader. (Eventually some high-level C wrappers + shared argp stuff for "just do it" transformation uses.) > I'm thinking that .debug_abbrev is one approach to compression that > isn't currently held back by absence of reference equality component. Most of the work really is not held back by that issue (and it also should not be so huge of an issue). We can do a lot of the work in parallel. The "peephole" optimization of .debug_abbrev is IIRC what nickc started on (in C) on his branch some time ago. That alone really does not have enough payoff to warrant spending time on it--.debug_abbrev size is not really the problem. The whole-writer approach intrinsically includes doing optimal abbrev generation (and aranges et al), though obviously it is a holistic approach that takes a long time to get from zero to useful. So, let's get into it. I haven't written the plan at all coherently, and clearly failed utterly heretofore even to communicate the outlines of it to you. I started writing a little bit of code, and we can start talking about that. It's very unfinished and not even its structure entirely figured out, on git branch roland/dwarf-collector. (I don't intend to make that a real branch, just parked temp commits before I have anything worth committing for real.) As dwarfcmp.cc test_writer code constructs dwarf_edit from dwarf, so it will construct dwarf_output from dwarf (parameterized by a collector object). The dwarf_output is usable for read like a dwarf or dwarf_edit object, but immutable like a dwarf object (and unlike a dwarf_edit). The construction of the dwarf_output will collect in the collector everything needed to write the output, and consolidate all duplication on the way. The part I've started tackling but haven't even close to finished is the collector data structures to hold and de-duplicate all the same data that a dwarf/dwarf_edit holds today. I think I have some of the containers kind of sane, but I need to figure out how to organize the constructor code. Once a construct-only dwarf_output works like a dwarf/dwarf_edit (e.g. dwarfcmp -T test), we get to the first real writer step. That is abbrev generation, which really pulls in form selection too. Thanks, Roland _______________________________________________ elfutils-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/elfutils-devel
