Re: next tasks

Roland McGrath Wed, 17 Jun 2009 14:18:20 -0700

> > 1. basic writer data structure building (designed for the eventual fancy 
> > plans)


DwarfOutput is a sketchy new wiki page about the writer work.  I didn't
post because I didn't get it more together.  But I should have been
posting more partial incoherencies earlier rather than waiting.

> What are these "fancy plans"?  

There it was an oblique reference to the "combined debug archive" idea.
I wrote some wiki stuff about an earlier version of that idea, but my
current thinking I haven't really written down in detail (or hashed
out).  The core issue of "eventual fancy" is to combine multiple .debug
objects together into an ar archive or something similar (maybe its own
format akin to locale-archive), where we write the DWARF in the
constituent files using low-level format extensions to permit sharing of
data across files inside the archive.

For today, the only thing to consider about that is that it motivates
the separation of dwarf_output_collector from dwarf_output.  If we do
the combined-archive output mode, there will still be one dwarf_output
object per logical .debug file, but all will be built using a single
dwarf_output_collector object.  For this reason, some essential writer
work resides in the collector.  That will include the core means of
identifying duplicates, and format output stuff like abbrev generation.

> DWARF compression (as in zlib) was mentioned a couple times, 

We have never mentioned the zlib stuff, though binutils does now support
it.  The term "DWARF compression" is often used and is very ambiguous.
The plain section data-compression stuff does not serve the interests
that motivate our DWARF size reduction work (download size of aggregate
packages that already use data compression).  It is trivial to support,
but possibly not even desireable (CPU/memory cost vs direct shareable
mmap from files).

> I expect that the "semantic compression" will be built on top of this,
> as an application rather than intrinsic feature of the library.

No, the DWARF-level size reduction will be in the writer.  Part of that
is optimal choice of "invisible" format details like abbrevs and form
selection.  But if you look at any large object, all those other
sections are dwarfed in size (no pun intended) by .debug_info.  What we
expect to be the most effective "semantic compression" is consolidation
of duplicate identical DIE subtrees.  The writer will (optionally) do
this automagically.

> Also, what level do you want to write the writer on?  C?  There is "C++ 
> interface for writer" item on your list of tasks, but it's not clear 
> whether C++ is the place where the writer will be implemented, or rather 
> a place where it would be wrapped.

The writer will be in pure C++, no real programmatic C interface to it.
That is the reason for the whole focus on the C++ layer for the reader.
(Eventually some high-level C wrappers + shared argp stuff for "just do
it" transformation uses.)

> I'm thinking that .debug_abbrev is one approach to compression that 
> isn't currently held back by absence of reference equality component. 

Most of the work really is not held back by that issue (and it also
should not be so huge of an issue).  We can do a lot of the work in
parallel.

The "peephole" optimization of .debug_abbrev is IIRC what nickc started
on (in C) on his branch some time ago.  That alone really does not have
enough payoff to warrant spending time on it--.debug_abbrev size is not
really the problem.  The whole-writer approach intrinsically includes
doing optimal abbrev generation (and aranges et al), though obviously it
is a holistic approach that takes a long time to get from zero to useful.


So, let's get into it.  I haven't written the plan at all coherently,
and clearly failed utterly heretofore even to communicate the outlines
of it to you.  I started writing a little bit of code, and we can start
talking about that.  It's very unfinished and not even its structure
entirely figured out, on git branch roland/dwarf-collector.  (I don't
intend to make that a real branch, just parked temp commits before I
have anything worth committing for real.)

As dwarfcmp.cc test_writer code constructs dwarf_edit from dwarf, so it
will construct dwarf_output from dwarf (parameterized by a collector
object).  The dwarf_output is usable for read like a dwarf or dwarf_edit
object, but immutable like a dwarf object (and unlike a dwarf_edit).
The construction of the dwarf_output will collect in the collector
everything needed to write the output, and consolidate all duplication
on the way.

The part I've started tackling but haven't even close to finished is the
collector data structures to hold and de-duplicate all the same data
that a dwarf/dwarf_edit holds today.  I think I have some of the
containers kind of sane, but I need to figure out how to organize the
constructor code.

Once a construct-only dwarf_output works like a dwarf/dwarf_edit
(e.g. dwarfcmp -T test), we get to the first real writer step.
That is abbrev generation, which really pulls in form selection too.


Thanks,
Roland
_______________________________________________
elfutils-devel mailing list
[email protected]
https://fedorahosted.org/mailman/listinfo/elfutils-devel

Re: next tasks

Reply via email to