Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-10-20 Thread Tom Tromey
> "Cary" == Cary Coutant  writes:

Cary> At Google, we've found that the cost of linking applications with
Cary> debug info is much too high.
[...]

Cary> * .debug_macinfo - Macro information, unaffected by this design.

There is also the new .debug_macro section.  This section refers to
.debug_str, so it will need some updates for your changes there.

Tom


Re: [Dwarf-Discuss] RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-23 Thread John DelSignore
Hi Jason,

Jason Molenda wrote:
> On Sep 23, 2011, at 10:58 AM, Cary Coutant wrote:
> 
>>> The compiler puts DWARF in the .o file, the linker adds some records in the 
>>> executable which help us to understand where files/function/symbols landed 
>>> in the final executable[1].
>> Did you intend to add a footnote?
> 
> Yeah, I realized after I sent the email - it didn't seem interesting enough 
> to warrant a separate followup.
> 
> The records that our linker puts in the executable are in the form of stabs 
> entries.  There are a handful of stabs records created - file start, file 
> end, function start, function end, symbol, pointer to a .o file, maybe one or 
> two others.  We chose that format because it was trivial to support and we 
> already had tools for stripping these records out of the executable once the 
> dSYM had been created.

I don't remember the exact details, but the problem I recall with the Darwin 
scheme is that it builds an incomplete index in the Mach-O symbol table. IIRC, 
it was missing things that a user might want to lookup by-name in the debugger, 
like static functions or variables, and type names with external linkage. 
Without a reasonably complete index, the debugger can't know where to find the 
definitions of certain things, and that forces the user to navigate using other 
information, like source file name or global function definitions to force the 
debug information in the object to be read.

Of course, the current DWARF indexes (like pubnames/pubtypes) have the same 
problem, and some compilers do a really bad job at generating those sections. 
But at least when there's a single .debug_info section, the debugger can decide 
to ignore the indexes and "skim" the full debug information. The compilers on 
IRIX did a better job at generating indexes, so the debugger could find by-name 
static functions/objects.

> Once a dSYM has been created with all of the DWARF collected in a single 
> file, our DWARF is parseable by any debug info consumer with minimal changes 
> -- they need to know to look in a separate file for the DWARF from the main 
> executable, but the format itself is unchanged.  Supporting the 
> debug-information-in-.o-files is more involved, I don't know if any of the 
> third-party debuggers on our platform work with it.

TotalView supports debug information in .o files on Darwin, and has since day 
one. Perhaps you recall all those email exchanges you and I had several years 
back. It was a modest amount of work, given that we already supported debug 
information in .o files on the Sun and HP platforms.

I seem to recall one of the sore spots for us on Dawrin was getting good 
address information for certain DWARF location operations, like DW_OP_addr. 
Fortran was a particularly messy because some compilers didn't supply a linkage 
name attribute, so the debugger had to make several guesses at the name, and 
look things up by trial and error.

Cheers, John D.

>> We're trying to achieve something very similar, but we have the
>> additional goal of separating the info from the .o files because of
>> our distributed build environment. I also wanted to attempt to
>> standardize the approach, instead of having each vendor go in separate
>> directions.
> 
> 
> Yeah, if your regular build environment involves distributed compilation, and 
> the .o files need to be copied to a central system for the linker, then I can 
> see why you're pursuing this approach.  For us, the most common usage is 
> single-computer compilation & linking -- where the linker never pages in the 
> debug info sections from the .o files so their size is not particular 
> important.
> 
> J
> ___
> Dwarf-Discuss mailing list
> dwarf-disc...@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
> 


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-23 Thread Jason Molenda

On Sep 23, 2011, at 10:58 AM, Cary Coutant wrote:

>> The compiler puts DWARF in the .o file, the linker adds some records in the 
>> executable which help us to understand where files/function/symbols landed 
>> in the final executable[1].
> 
> Did you intend to add a footnote?

Yeah, I realized after I sent the email - it didn't seem interesting enough to 
warrant a separate followup.

The records that our linker puts in the executable are in the form of stabs 
entries.  There are a handful of stabs records created - file start, file end, 
function start, function end, symbol, pointer to a .o file, maybe one or two 
others.  We chose that format because it was trivial to support and we already 
had tools for stripping these records out of the executable once the dSYM had 
been created.

Once a dSYM has been created with all of the DWARF collected in a single file, 
our DWARF is parseable by any debug info consumer with minimal changes -- they 
need to know to look in a separate file for the DWARF from the main executable, 
but the format itself is unchanged.  Supporting the 
debug-information-in-.o-files is more involved, I don't know if any of the 
third-party debuggers on our platform work with it.


> We're trying to achieve something very similar, but we have the
> additional goal of separating the info from the .o files because of
> our distributed build environment. I also wanted to attempt to
> standardize the approach, instead of having each vendor go in separate
> directions.


Yeah, if your regular build environment involves distributed compilation, and 
the .o files need to be copied to a central system for the linker, then I can 
see why you're pursuing this approach.  For us, the most common usage is 
single-computer compilation & linking -- where the linker never pages in the 
debug info sections from the .o files so their size is not particular important.

J


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-23 Thread Cary Coutant
> The Apple approach has both the features of the Sun/HP implementation as well 
> as the ability to create a standalone debug info file.

Thanks for the clarifications. I based my comments on a description
you sent me a couple of years ago, and I apologize for any
oversimplifications I introduced.

> The compiler puts DWARF in the .o file, the linker adds some records in the 
> executable which help us to understand where files/function/symbols landed in 
> the final executable[1].

Did you intend to add a footnote?

>  If the user runs our gdb or lldb on one of these binaries, the debugger will 
> read the DWARF directly out of the .o files on the fly.  Because the linker 
> doesn't need to copy around/update/modify the DWARF, link times are very 
> fast.  If the developer decides to debug the program, no extra steps are 
> required - the debugger can be started up & used with the debug info still in 
> the .o files.

We're trying to achieve something very similar, but we have the
additional goal of separating the info from the .o files because of
our distributed build environment. I also wanted to attempt to
standardize the approach, instead of having each vendor go in separate
directions.

Thanks,

-cary


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-23 Thread Cary Coutant
>> * .debug_pubtypes - Public types for use in building the
>>   .gdb_index section at link time. This section will have an
>>   extended format to allow it to represent both types in the
>>   .debug_dwo_info section and type units in .debug_types.
>    ^^^
>    = .dwo_info , maybe both .debug_info and .dwo_info
>
>
>> * .dwo_abbrev - Defines the abbreviation codes used by the
>>   .debug_dwo_info section.
>    ^^^
>    = .dwo_info

Thanks, I've fixed the wiki page.

> I find this .dwo_* setup is great for rapid development rebuilds but it should
> remain optional as the currently used DWARF final separate .debug info file is
> smaller than all the .dwo files together.  In the case of the final linked
> .debug builds (rpm/deb/...) one does not consider the build speed as 
> important.
> It probably does not make sense to merge + convert .dwo files back to a single
> .debug file for the rpm/deb/... build performance reasons.

Yes, we'll definitely make this a compile-time option.

While I haven't finished designing the package format for collecting
all the .dwo files, I do plan on having the packaging tool do at least
duplicate type elimination to reduce the size of the package file.

-cary


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-23 Thread Jan Kratochvil
On Fri, 23 Sep 2011 02:21:44 +0200, Cary Coutant wrote:
> * .debug_pubtypes - Public types for use in building the
>   .gdb_index section at link time. This section will have an
>   extended format to allow it to represent both types in the
>   .debug_dwo_info section and type units in .debug_types.
^^^
= .dwo_info , maybe both .debug_info and .dwo_info


> * .dwo_abbrev - Defines the abbreviation codes used by the
>   .debug_dwo_info section.
^^^
= .dwo_info


I find this .dwo_* setup is great for rapid development rebuilds but it should
remain optional as the currently used DWARF final separate .debug info file is
smaller than all the .dwo files together.  In the case of the final linked
.debug builds (rpm/deb/...) one does not consider the build speed as important.
It probably does not make sense to merge + convert .dwo files back to a single
.debug file for the rpm/deb/... build performance reasons.


Thanks,
Jan


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-22 Thread Paul Pluzhnikov
On Thu, Sep 22, 2011 at 6:35 PM, Jason Molenda  wrote:

> Because the linker doesn't need to copy around/update/modify the DWARF,
> link times are very fast.

AFAIU, the link times are fast only if all the files are local to the
developers' machine.

They will not be fast (and the .o files *will* need to be copied) if a
distributed compilation system (a build farm) is used (as is the case here).

Thanks,
-- 
Paul Pluzhnikov


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-22 Thread Jason Molenda
Hi Cary, just one quick clarification -

On Sep 22, 2011, at 5:21 PM, Cary Coutant wrote:

> Previous Implementations of Separate Debug Information
> ==
> 
> In the Sun and HP implementations, the debug information in the
> relocatable objects still requires relocation at debug time, and
> the debugger must read the summary information from the
> executable file in order to map symbols and sections to the
> output file when processing and applying the relocations. The
> Apple implementation avoids this cost at debug time, but at the
> cost of having a separate link step for the debug information.


The Apple approach has both the features of the Sun/HP implementation as well 
as the ability to create a standalone debug info file. 

The compiler puts DWARF in the .o file, the linker adds some records in the 
executable which help us to understand where files/function/symbols landed in 
the final executable[1].  If the user runs our gdb or lldb on one of these 
binaries, the debugger will read the DWARF directly out of the .o files on the 
fly.  Because the linker doesn't need to copy around/update/modify the DWARF, 
link times are very fast.  If the developer decides to debug the program, no 
extra steps are required - the debugger can be started up & used with the debug 
info still in the .o files.

Clearly this is only viable if you have the .o files on your computer so we 
added a command, "dsymutil", which links the DWARF from the .o files into a 
single standalone ".dSYM" file.  The executable file and the dSYM file have a 
shared 128-bit number to ensure that the debug info and the executable match; 
the debugger will ignore a dSYM with a non-matching UUID for a given 
executable.  A developer will typically create a dSYM when they sending a copy 
of the binary to someone and want to provide debug information, or they are 
archiving a released binary, or they want to debug it on another machine (where 
the .o files will not be in place.)

In practice people create dSYMs rarely -- when they are doing iterative 
development on their computer, all of the DWARF sits in the .o files 
unprocessed unless they launch a debugger, link times are fast.


As a minor detail, the dSYM is just another executable binary image on our 
system (Mach-O file format), sans any of the text or data of the real binary 
file, with only the debug_info, etc. sections.  The name "dSYM" was a little 
joke based on the CodeWarrior "xSYM" debug info format.

J


RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-22 Thread Cary Coutant
At Google, we've found that the cost of linking applications with
debug info is much too high. A large C++ application that might be,
say, 200MB without debug info, is somewhere around 1GB with debug
info, and the total size of the object files that we send to the
linker is around 5GB (and that's with compressed debug sections).
We've come to the conclusion that the most promising solution is to
eliminate the debug info from the link step. I've had direct
experience with HP's approach to this, and I've looked into Sun's and
Apple's approaches, but none of those three approaches actually
separates the debug info from the non-debug info at the object file
(.o) level. I know we're not alone in having concerns about the size
of debug info, so we've developed the following proposal to extend the
DWARF format and produce separate .o and ".dwo" (DWARF object) files
at the compilation step. Our plan is to develop the gcc and gdb
changes on new upstream branches.

After we get the basics working and have some results to show
(assuming it all works out and proves worthwhile), I'll develop this
into a formal proposal to the DWARF committee.

I've also posted this proposal on the GCC wiki:

   http://gcc.gnu.org/wiki/DebugFission

We've named the project "Fission."

I'd appreciate any comments.

-cary


DWARF Extensions for Separate Debug Information Files

September 22, 2011


Problems with Size of the Debug Information
===

Large applications compiled with debug information experience
slow link times, possible out-of-memory conditions at link time,
and slow gdb startup times. In addition, they can contribute to
significant increases in storage requirements, and additional
network latency when transferring files in a distributed build
environment.

* Out-of-memory conditions: When the total size of the input
  files is large, the linker may exceed its total memory
  allocation during the link and may get killed by the operating
  system. As a rule of thumb, the link job total memory
  requirements can be estimated at about 200% of the total size
  of its input files.

* Slow link times: Link times can be frustrating when recompiling
  only a small source file or two. Link times may be aggravated
  when linking on a machine that has insufficient RAM, resulting
  in excessive page thrashing.

* Slow gdb startup times: The debugger today performs a partial
  scan of the debug information in order to build its internal
  tables that allow it to map names and addresses to the debug
  information. This partial scan was designed to improve startup
  performance, and avoids a full scan of the debug information,
  but for large applications, it can still take a minute or more
  before the debugger is ready for the first command. The
  debugger now has the ability to save a ".gdb_index" section in
  the executable and the gold linker now supports a --gdb-index
  option to build this index at link time, but both of these
  options still require the initial partial scan of the debug
  information.

These conditions are largely a direct result of the amount of
debug information generated by the compiler. In a large C++
application compiled with -O2 and -g, the debug information
accounts for 87% of the total size of the object files sent as
inputs to the link step, and 84% of the total size of the output
binary.

Recently, the -Wa,--compress-debug-sections option has been made
available. This option reduces the total size of the object files
sent to the linker by more than a third, so that the debug
information now accounts for 70-80% of the total size of the
object files. The output file is unaffected: the linker
decompresses the debug information in order to link it, and
outputs the uncompressed result (there is an option to recompress
the debug information at link time, but this step would only
reduce the size of the output file without improving link time or
memory usage).


What's All That Space Being Used For?
=

The debugging information in the relocatable object files sent to
the linker consists of a number of separate tables (percentages
are for uncompressed debug information relative to the total
object file size):

* Debug Information Entries - .debug_info (11%): This table
  contains the debug info for subprograms and variables defined
  in the program, and many of the trivial types used.

* Type Units - .debug_types (12%): This table contains the debug
  info for most of the non-trivial types (e.g., structs and
  classes, enums, typedefs), keyed by a hashed type signature so
  that duplicate type definitions can be eliminated by the
  linker. During the link, about 85% of this data is discarded as
  duplicate. These sections have the same structure as the
  .debug_info sections.

* Strings - .debug_str (25%): This table contains strings that
  are not placed inline in the .debug_info and .debug_types
  sections. The linker merges the string ta