Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
> "Cary" == Cary Coutant writes: Cary> At Google, we've found that the cost of linking applications with Cary> debug info is much too high. [...] Cary> * .debug_macinfo - Macro information, unaffected by this design. There is also the new .debug_macro section. This section refers to .debug_str, so it will need some updates for your changes there. Tom
Re: [Dwarf-Discuss] RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
Hi Jason, Jason Molenda wrote: > On Sep 23, 2011, at 10:58 AM, Cary Coutant wrote: > >>> The compiler puts DWARF in the .o file, the linker adds some records in the >>> executable which help us to understand where files/function/symbols landed >>> in the final executable[1]. >> Did you intend to add a footnote? > > Yeah, I realized after I sent the email - it didn't seem interesting enough > to warrant a separate followup. > > The records that our linker puts in the executable are in the form of stabs > entries. There are a handful of stabs records created - file start, file > end, function start, function end, symbol, pointer to a .o file, maybe one or > two others. We chose that format because it was trivial to support and we > already had tools for stripping these records out of the executable once the > dSYM had been created. I don't remember the exact details, but the problem I recall with the Darwin scheme is that it builds an incomplete index in the Mach-O symbol table. IIRC, it was missing things that a user might want to lookup by-name in the debugger, like static functions or variables, and type names with external linkage. Without a reasonably complete index, the debugger can't know where to find the definitions of certain things, and that forces the user to navigate using other information, like source file name or global function definitions to force the debug information in the object to be read. Of course, the current DWARF indexes (like pubnames/pubtypes) have the same problem, and some compilers do a really bad job at generating those sections. But at least when there's a single .debug_info section, the debugger can decide to ignore the indexes and "skim" the full debug information. The compilers on IRIX did a better job at generating indexes, so the debugger could find by-name static functions/objects. > Once a dSYM has been created with all of the DWARF collected in a single > file, our DWARF is parseable by any debug info consumer with minimal changes > -- they need to know to look in a separate file for the DWARF from the main > executable, but the format itself is unchanged. Supporting the > debug-information-in-.o-files is more involved, I don't know if any of the > third-party debuggers on our platform work with it. TotalView supports debug information in .o files on Darwin, and has since day one. Perhaps you recall all those email exchanges you and I had several years back. It was a modest amount of work, given that we already supported debug information in .o files on the Sun and HP platforms. I seem to recall one of the sore spots for us on Dawrin was getting good address information for certain DWARF location operations, like DW_OP_addr. Fortran was a particularly messy because some compilers didn't supply a linkage name attribute, so the debugger had to make several guesses at the name, and look things up by trial and error. Cheers, John D. >> We're trying to achieve something very similar, but we have the >> additional goal of separating the info from the .o files because of >> our distributed build environment. I also wanted to attempt to >> standardize the approach, instead of having each vendor go in separate >> directions. > > > Yeah, if your regular build environment involves distributed compilation, and > the .o files need to be copied to a central system for the linker, then I can > see why you're pursuing this approach. For us, the most common usage is > single-computer compilation & linking -- where the linker never pages in the > debug info sections from the .o files so their size is not particular > important. > > J > ___ > Dwarf-Discuss mailing list > dwarf-disc...@lists.dwarfstd.org > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org >
Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
On Sep 23, 2011, at 10:58 AM, Cary Coutant wrote: >> The compiler puts DWARF in the .o file, the linker adds some records in the >> executable which help us to understand where files/function/symbols landed >> in the final executable[1]. > > Did you intend to add a footnote? Yeah, I realized after I sent the email - it didn't seem interesting enough to warrant a separate followup. The records that our linker puts in the executable are in the form of stabs entries. There are a handful of stabs records created - file start, file end, function start, function end, symbol, pointer to a .o file, maybe one or two others. We chose that format because it was trivial to support and we already had tools for stripping these records out of the executable once the dSYM had been created. Once a dSYM has been created with all of the DWARF collected in a single file, our DWARF is parseable by any debug info consumer with minimal changes -- they need to know to look in a separate file for the DWARF from the main executable, but the format itself is unchanged. Supporting the debug-information-in-.o-files is more involved, I don't know if any of the third-party debuggers on our platform work with it. > We're trying to achieve something very similar, but we have the > additional goal of separating the info from the .o files because of > our distributed build environment. I also wanted to attempt to > standardize the approach, instead of having each vendor go in separate > directions. Yeah, if your regular build environment involves distributed compilation, and the .o files need to be copied to a central system for the linker, then I can see why you're pursuing this approach. For us, the most common usage is single-computer compilation & linking -- where the linker never pages in the debug info sections from the .o files so their size is not particular important. J
Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
> The Apple approach has both the features of the Sun/HP implementation as well > as the ability to create a standalone debug info file. Thanks for the clarifications. I based my comments on a description you sent me a couple of years ago, and I apologize for any oversimplifications I introduced. > The compiler puts DWARF in the .o file, the linker adds some records in the > executable which help us to understand where files/function/symbols landed in > the final executable[1]. Did you intend to add a footnote? > If the user runs our gdb or lldb on one of these binaries, the debugger will > read the DWARF directly out of the .o files on the fly. Because the linker > doesn't need to copy around/update/modify the DWARF, link times are very > fast. If the developer decides to debug the program, no extra steps are > required - the debugger can be started up & used with the debug info still in > the .o files. We're trying to achieve something very similar, but we have the additional goal of separating the info from the .o files because of our distributed build environment. I also wanted to attempt to standardize the approach, instead of having each vendor go in separate directions. Thanks, -cary
Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
>> * .debug_pubtypes - Public types for use in building the >> .gdb_index section at link time. This section will have an >> extended format to allow it to represent both types in the >> .debug_dwo_info section and type units in .debug_types. > ^^^ > = .dwo_info , maybe both .debug_info and .dwo_info > > >> * .dwo_abbrev - Defines the abbreviation codes used by the >> .debug_dwo_info section. > ^^^ > = .dwo_info Thanks, I've fixed the wiki page. > I find this .dwo_* setup is great for rapid development rebuilds but it should > remain optional as the currently used DWARF final separate .debug info file is > smaller than all the .dwo files together. In the case of the final linked > .debug builds (rpm/deb/...) one does not consider the build speed as > important. > It probably does not make sense to merge + convert .dwo files back to a single > .debug file for the rpm/deb/... build performance reasons. Yes, we'll definitely make this a compile-time option. While I haven't finished designing the package format for collecting all the .dwo files, I do plan on having the packaging tool do at least duplicate type elimination to reduce the size of the package file. -cary
Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
On Fri, 23 Sep 2011 02:21:44 +0200, Cary Coutant wrote: > * .debug_pubtypes - Public types for use in building the > .gdb_index section at link time. This section will have an > extended format to allow it to represent both types in the > .debug_dwo_info section and type units in .debug_types. ^^^ = .dwo_info , maybe both .debug_info and .dwo_info > * .dwo_abbrev - Defines the abbreviation codes used by the > .debug_dwo_info section. ^^^ = .dwo_info I find this .dwo_* setup is great for rapid development rebuilds but it should remain optional as the currently used DWARF final separate .debug info file is smaller than all the .dwo files together. In the case of the final linked .debug builds (rpm/deb/...) one does not consider the build speed as important. It probably does not make sense to merge + convert .dwo files back to a single .debug file for the rpm/deb/... build performance reasons. Thanks, Jan
Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
On Thu, Sep 22, 2011 at 6:35 PM, Jason Molenda wrote: > Because the linker doesn't need to copy around/update/modify the DWARF, > link times are very fast. AFAIU, the link times are fast only if all the files are local to the developers' machine. They will not be fast (and the .o files *will* need to be copied) if a distributed compilation system (a build farm) is used (as is the case here). Thanks, -- Paul Pluzhnikov
Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
Hi Cary, just one quick clarification - On Sep 22, 2011, at 5:21 PM, Cary Coutant wrote: > Previous Implementations of Separate Debug Information > == > > In the Sun and HP implementations, the debug information in the > relocatable objects still requires relocation at debug time, and > the debugger must read the summary information from the > executable file in order to map symbols and sections to the > output file when processing and applying the relocations. The > Apple implementation avoids this cost at debug time, but at the > cost of having a separate link step for the debug information. The Apple approach has both the features of the Sun/HP implementation as well as the ability to create a standalone debug info file. The compiler puts DWARF in the .o file, the linker adds some records in the executable which help us to understand where files/function/symbols landed in the final executable[1]. If the user runs our gdb or lldb on one of these binaries, the debugger will read the DWARF directly out of the .o files on the fly. Because the linker doesn't need to copy around/update/modify the DWARF, link times are very fast. If the developer decides to debug the program, no extra steps are required - the debugger can be started up & used with the debug info still in the .o files. Clearly this is only viable if you have the .o files on your computer so we added a command, "dsymutil", which links the DWARF from the .o files into a single standalone ".dSYM" file. The executable file and the dSYM file have a shared 128-bit number to ensure that the debug info and the executable match; the debugger will ignore a dSYM with a non-matching UUID for a given executable. A developer will typically create a dSYM when they sending a copy of the binary to someone and want to provide debug information, or they are archiving a released binary, or they want to debug it on another machine (where the .o files will not be in place.) In practice people create dSYMs rarely -- when they are doing iterative development on their computer, all of the DWARF sits in the .o files unprocessed unless they launch a debugger, link times are fast. As a minor detail, the dSYM is just another executable binary image on our system (Mach-O file format), sans any of the text or data of the real binary file, with only the debug_info, etc. sections. The name "dSYM" was a little joke based on the CodeWarrior "xSYM" debug info format. J
RFC: DWARF Extensions for Separate Debug Info Files ("Fission")
At Google, we've found that the cost of linking applications with debug info is much too high. A large C++ application that might be, say, 200MB without debug info, is somewhere around 1GB with debug info, and the total size of the object files that we send to the linker is around 5GB (and that's with compressed debug sections). We've come to the conclusion that the most promising solution is to eliminate the debug info from the link step. I've had direct experience with HP's approach to this, and I've looked into Sun's and Apple's approaches, but none of those three approaches actually separates the debug info from the non-debug info at the object file (.o) level. I know we're not alone in having concerns about the size of debug info, so we've developed the following proposal to extend the DWARF format and produce separate .o and ".dwo" (DWARF object) files at the compilation step. Our plan is to develop the gcc and gdb changes on new upstream branches. After we get the basics working and have some results to show (assuming it all works out and proves worthwhile), I'll develop this into a formal proposal to the DWARF committee. I've also posted this proposal on the GCC wiki: http://gcc.gnu.org/wiki/DebugFission We've named the project "Fission." I'd appreciate any comments. -cary DWARF Extensions for Separate Debug Information Files September 22, 2011 Problems with Size of the Debug Information === Large applications compiled with debug information experience slow link times, possible out-of-memory conditions at link time, and slow gdb startup times. In addition, they can contribute to significant increases in storage requirements, and additional network latency when transferring files in a distributed build environment. * Out-of-memory conditions: When the total size of the input files is large, the linker may exceed its total memory allocation during the link and may get killed by the operating system. As a rule of thumb, the link job total memory requirements can be estimated at about 200% of the total size of its input files. * Slow link times: Link times can be frustrating when recompiling only a small source file or two. Link times may be aggravated when linking on a machine that has insufficient RAM, resulting in excessive page thrashing. * Slow gdb startup times: The debugger today performs a partial scan of the debug information in order to build its internal tables that allow it to map names and addresses to the debug information. This partial scan was designed to improve startup performance, and avoids a full scan of the debug information, but for large applications, it can still take a minute or more before the debugger is ready for the first command. The debugger now has the ability to save a ".gdb_index" section in the executable and the gold linker now supports a --gdb-index option to build this index at link time, but both of these options still require the initial partial scan of the debug information. These conditions are largely a direct result of the amount of debug information generated by the compiler. In a large C++ application compiled with -O2 and -g, the debug information accounts for 87% of the total size of the object files sent as inputs to the link step, and 84% of the total size of the output binary. Recently, the -Wa,--compress-debug-sections option has been made available. This option reduces the total size of the object files sent to the linker by more than a third, so that the debug information now accounts for 70-80% of the total size of the object files. The output file is unaffected: the linker decompresses the debug information in order to link it, and outputs the uncompressed result (there is an option to recompress the debug information at link time, but this step would only reduce the size of the output file without improving link time or memory usage). What's All That Space Being Used For? = The debugging information in the relocatable object files sent to the linker consists of a number of separate tables (percentages are for uncompressed debug information relative to the total object file size): * Debug Information Entries - .debug_info (11%): This table contains the debug info for subprograms and variables defined in the program, and many of the trivial types used. * Type Units - .debug_types (12%): This table contains the debug info for most of the non-trivial types (e.g., structs and classes, enums, typedefs), keyed by a hashed type signature so that duplicate type definitions can be eliminated by the linker. During the link, about 85% of this data is discarded as duplicate. These sections have the same structure as the .debug_info sections. * Strings - .debug_str (25%): This table contains strings that are not placed inline in the .debug_info and .debug_types sections. The linker merges the string ta