On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <[email protected]> wrote:
> > > On Apr 30, 2015, at 4:55 PM, David Blaikie <[email protected]> wrote: > > > > > > > > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <[email protected]> > wrote: > >> > >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <[email protected]> > wrote: > >> > > >> > > >> > > >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <[email protected]> > wrote: > >> >> > >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected]> > wrote: > >> >> > > >> >> > > >> >> > > >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul < > [email protected]> wrote: > >> >> > Beyond the above (that using a new tag would mean this would go > from 'free' to 'not free' for GDB) having a new top level tag is pretty > substantial (we only have two at the moment, and with our talk of modules > being a "bag of dwarf" might go back to having one top level tag? (it's not > clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag, > I don't think it is?) > >> >> > > >> >> >> The .debug_info section contains one or more compilation units, > partial units, or in DWARF 5, type units. DW_TAG_module isn't a unit, if > you want it to be handled independently then it would need to be wrapped in > a DW_TAG_partial_unit. You would probably then use DW_TAG_imported_unit to > refer to it, rather than DW_TAG_imported_module. > >> >> >> > >> >> > > >> >> > This makes a fair bit of sense - though the terminology's never > going to quite line up with modules, I suspect, and this would still > require modifying existing consumers (well, GDB) that can handle > split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe > that does work? - and still don't know how existing consumers would handle > imported_unit either - could be worth some testing, as it sounds sort of > right out of several less right options). > >> >> > >> >> Thanks for all the input so far! > >> >> To concretize this end of the discussion up let’s sketch some dwarf > of how this could look like in practice. > >> >> > >> >> ELF (no imports) > >> >> ---------------- > >> >> > >> >> On ELF or COFF a foo.c referencing types from the module Foundation > looks like this: > >> >> > >> >> .debug_info: > >> >> DW_TAG_compile_unit > >> >> DW_AT_name(“foo.c”) > >> >> > >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat) > >> >> DW_TAG_partial_unit > >> > > >> > For now I'd suggest we use compile_unit - that way it'll just work > with existing split-dwarf consumers. We can see about standardizing a > top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps? > I'm not sure. > >> > > >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >> >> > >> >> > >> >> Side question: Is .debug_info.dwo the right section to put the > module skeleton in, or should it be a .debug_info section like normal > fission skeletons? > >> > > >> > Skeletons go in .debug_info, the dwo sections are just for the .dwo > file (or the module file, in our new case - the extension isn't actually > important). > >> > > >> > It might be worth you compiling an example or two of split-dwarf to > see how this all works hands-on. > >> > > >> >> Mach-O (no comdat, no imports) > >> >> ------------------------------ > >> >> > >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if > that option is the best discriminator) this could look like: > >> >> > >> >> .debug_info: > >> >> DW_TAG_compile_unit > >> >> DW_AT_name(“foo.c”) > >> >> DW_TAG_partial_unit > >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >> >> > >> >> > >> >> Mach-O (no comdat, with imports) > >> >> ------------------------------ > >> >> > >> >> If we add the module import information to this, we get: > >> >> > >> >> .debug_info: > >> >> DW_TAG_compile_unit > >> >> DW_AT_name(“foo.c”) > >> >> DW_TAG_imported_module > >> >> DW_AT_import(DW_FORM_ref_addr 0x10) > >> > > >> > Since we got went down the tangent of explaining split-dwarf many > emails ago, I've forgotten (& can't readily find) what we were discussing > about what ways the imported_module could work. > >> > > >> > The simplest representation I can think of would be to have it > reference, by signature, the module unit (whatever tag it uses) - > DW_FORM_ref_sig8, seems the simplest thing to do. > >> > > >> >> > >> >> DW_TAG_partial_unit > >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >> >> > >> >> 0x10: > >> > > >> > This is inside the partial unit? I figured we'd just put these > attributes on the top level (compile_unit, or whatever it might be later) - > potentially conditionalized on platform, sure. > >> > > >> >> DW_TAG_module > >> >> DW_AT_name(“Foundation”) > >> >> DW_AT_LLVM_sysroot(“/“) > >> >> DW_AT_LLVM_include_dir(“”) > >> >> DW_AT_LLVM_macros(“-DNDEBUG”) > >> >> ... > >> >> > >> >> > >> >> ELF (comdat, with imports) > >> >> -------------------------- > >> >> > >> >> But now let’s go back to ELF. Since the skeleton with the partial > unit is comdat'd, I assume that this breaks the FORM_ref_addr used in the > DW_AT_import. We could reuse the module hash as a signature for the module: > >> >> > >> >> .debug_info: > >> >> DW_TAG_compile_unit > >> >> DW_AT_name(“foo.c”) > >> >> DW_TAG_imported_module > >> >> DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE) > >> > > >> > Still only really need these imported_modules for lldb, right? I'd > consider having them off-by-default for non-darwin, but I'm not strictly > wedded to that notion. Wouldn't mind seeing size impact numbers of some > kind - if it's really fractional % increase & GDB doesn't fall over when it > sees them (in whatever FORM/tag/etc we decide on) then that's not the end > of the world. > >> > > >> > Just seems nice if the default mode is the nice, standard, > split-dwarf output. Doesn't need anything fancy. > >> > > >> > > >> >> .debug_info.dwo (group 0x1234ABCDE, comdat) > >> >> DW_TAG_partial_unit > >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >> >> > >> >> DW_TAG_module > >> >> DW_AT_signature(“0x1234ABCDE”) > >> >> DW_AT_name(“Foundation”) > >> > > >> > > >> > The thing you haven't covered is the actual .dwo sections > (.debug_info.dwo (we'll probably need a simple stub compile_unit to make > this correct split-dwarf) and .debug_types.dwo being important - but all > the supporting .dwo sections will be necessary) that go in the module file. > >> > > >> >> This is bending the definition of DW_AT_signature, but I guess it > could be made to work. Or we could say that for now, users have to choose > between the comdat optimization and having the module imports recorded in > Dwarf, since GDB wouldn’t know what to do with that information anyway. > >> > >> Sorry for the long delay. Here’s a more complete example that should > include all the suggestions made so far. For context I also included > external type references in the example although admittedly this is a bit > out of scope for this thread: > >> > >> ELF (typeunits, comdats, with imports) > >> -------------------------------------- > >> > >> On ELF or COFF a bar.c referencing type Foo from the module FooLib > looks like this: > >> > >> bar.o > >> ~~~~~ > >> > >> // To keep this example focussed/readable, I'm assuming that bar.o > itself was not compiled with fission. > >> .debug_info: > >> DW_TAG_compile_unit > >> DW_AT_name(“bar.c”) > >> ... > >> > >> DW_TAG_imported_module // <- This could be optional on ELF. > >> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) > >> > >> DW_TAG_variable > >> DW_AT_name(“MyFoo”) > >> DW_AT_type [DW_FORM_ref4] 0x20 > >> 0x20: > >> DW_TAG_structure_type > >> DW_AT_declaration (true) > >> DW_AT_signature [DW_FORM_ref_sig8] (0xF00) > >> > >> > >> // Split DWARF skeleton CU for the module Foo. > >> DW_TAG_compile_unit > >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >> DW_AT_dwo_id(“0xFEDB9876”) > >> ... > >> > >> // Comdat’d partial unit containing the optional module descriptor. > >> .debug_info, group 0xABCD1234, comdat > >> DW_TAG_partial_unit > >> DW_TAG_module > >> DW_AT_name(“FooLib”) > >> DW_AT_LLVM_sysroot(“/“) > >> DW_AT_LLVM_include_dirs(“-I/path”) > >> DW_AT_LLVM_macros(“-DNDEBUG”) > >> ... > >> > >> FooLib-XYZ.pcm > >> ~~~~~~~~~~~~~~ > >> > >> .debug_info.dwo > >> DW_TAG_compile_unit > >> DW_AT_dwo_id(“0xFEDB9876”) > >> ... > >> > >> // Type unit for the type Foo. > >> .debug_types.dwo, group 0xF00, comdat > >> DW_TAG_type_unit > >> DW_TAG_structure_type > >> DW_AT_name (“Foo”) > >> ... > >> > >> > >> I think it awkward to have both the skeleton compile_unit in > .debug_info and the partial_unit containing the TAG_module. Personally I’d > prefer putting the TAG_module into the skeleton CU and then just refer to > it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat > section, it looks like that’s what’s necessary. > > > > It's been a while & I've probably lost all the context, but I think my > original theory was to have the skeleton compile_unit be comdat'd so they'd > deduplicate on linking (so we'd only have one reference to the module.dwo > in the linked binary). I don't recall there being a need for a separate > partial_unit - I imagine we'd just put the LLDB/LLVM extension attributes > on the skeleton compile_unit and expect debuggers that didn't understand > them, to ignore them. > > > > Was there some reason this didn't work/make sense? Because you need a > DW_TAG_module to import with DW_TAG_imported_module? > Using DW_TAG_module was the best practice that was recommended on > dwarf-discuss. > Did they have any ideas on how to reference it without duplicating it in every CU? Once we've got the "Bag O Dwarf" stuff (rather than the narrower type units) this would be easier - (I suppose we could do a partial solution/abuse of type units - use a type unit header (perhaps with Eric's merged type/compile unit work) and a DW_FORM_ref_sig8 value for the DW_AT_module in the DW_TAG_imported_module. Though I suppose if we're going to have DW_TAG_imported_module in every CU that references a module, it might not be that big of a deal to include the DW_TAG_module itself there too... while I don't care about this scheme immediately, Google's growing LLDB investment in various platforms, so I am vaguely concerned about getting this right & it's not immediately obvious to me what that right answer is. > > If it turns out that's the right way to get a target for the > imported_module, we could put both the skeleton CU and the partial unit in > the same comdat and dedup them both together. > > I think this works as long as we only have one TAG_module per .pcm file > (because we need to refer to it via signature). Not quite following here - why would we have more than one module per pcm - a pcm is a module, right? > But if we don’t mind having duplicate dwo_* references in the same .o file > this would also work with more than one TAG_module (or submodules). > > .debug_info: > DW_TAG_compile_unit > DW_AT_name(“bar.c”) > ... > > DW_TAG_imported_module // <- This could be optional on ELF. > DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876) > > ... > > // Comdat’d split DWARF skeleton CU for the module Foo. > .debug_info, group 0xFEDB9876, comdat > DW_TAG_compile_unit > > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > DW_AT_dwo_id(“0xFEDB9876”) > ... > > DW_TAG_module > DW_AT_name(“FooLib”) > DW_AT_LLVM_sysroot(“/“) > DW_AT_LLVM_include_dirs(“-I/path”) > DW_AT_LLVM_macros(“-DNDEBUG”) > ... > > > > > > But this gets into complicated territory when the original binary is > built with fission... which will be relevant for modules on ELF with LLDB. > Hmm, maybe it's not too complicated - the partial_unit would end up in the > .dwo file (maybe we'd have to teach the .dwo file to deduplicate these too > - the same way it does for type units... - might require a new header to > include the hash, etc :/)... would be tricky to have the dwp tool resolve > the relocations to these things. Cross-unit references as you've got there > aren't something that every DWARF consumer is totally cool with, I don't > think? > > Ah. I thought the deduplication happens because all ELF sections sharing > the same group are uniqued based on the group id. COMDAT groups deduplicate for a normal non-fission build, but fission linking doesn't require the .dwo file to use/contain COMDATs as it uses a DWARF-aware tool (so you don't bother putting the type units in COMDAT groups, for example - the fission linker knows how to parse debug_types, find the type unit headers and their hashes and deduplicates them that way). > It certainly would be nice if we could avoid introducing a new .debug_info > header... > > > > Sort of inclined to have the imported module stuff just for LLDB, but > I've lost some of the context for that in the ensuing weeks. > > -- adrian > > > > >> > >> > >> > >> > >> MachO (no typeunits, no comdats, with imports) > >> ---------------------------------------------- > >> > >> Since we don’t have comdat sections in Mach-O and we don’t have the > tool support for type units, the way that external types can be referenced > necessarily needs to be a bit different. The design that Greg and I came up > with for Mach-O relies on llvm-dsymutil to fix up the DWARF for > non-module-aware consumers. Just as ELF DWARF consumers need not be able to > tell the difference between module debugging an split DWARF, on Mach-O the > .dSYM bundle generated by llvm-dsymutil looks like traditional DWARF. > >> > >> There are three differences in the DWARF output that make this possible: > >> - Refer to external types by UID rather than by type signature. > >> (This doubles as the key that allows a debugger to look import the > type > >> directly from the AST and protects us against hash collisions) > >> - Add an index to the .o file that maps UID -> module file. > >> (Fast lookup + UIDs for C and ObjC are only unique within a module) > >> - Add an entry for each type’s UID to the types accelerator table. > >> (Fast lookup) > >> > >> bar.o > >> ~~~~~ > >> > >> .debug_info: > >> DW_TAG_compile_unit > >> DW_AT_name(“bar.c”) > >> DW_TAG_imported_module > >> DW_AT_import(DW_FORM_ref_addr 0x40) > >> > >> DW_TAG_variable > >> DW_AT_name(“MyFoo”) > >> DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”) // We could use a custom > FORM here > >> > >> // Skeleton unit. > >> DW_TAG_compile_unit > >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >> DW_AT_dwo_id(“0xFEDB9876”) > >> ... > >> 0x40: > >> DW_TAG_module > >> DW_AT_name(“FooLib”) > >> DW_AT_LLVM_sysroot(“/“) > >> DW_AT_LLVM_include_dirs(“-I/path”) > >> DW_AT_LLVM_macros(“-DNDEBUG”) > >> > >> // This index uses the usual accelerator table format. > >> .apple_exttypes: > >> { “_ZTS3Foo” => debug_str offset of > ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” } > >> > >> FooLib-XYZ.pcm > >> ~~~~~~~~~~~~~~ > >> > >> .debug_info > >> DW_TAG_compile_unit > >> DW_AT_dwo_id(“0xFEDB9876”) > >> > >> 0x80: > >> DW_TAG_structure_type > >> DW_AT_name (“Foo”) > >> DW_AT_signature > >> ... > >> > >> // In addition to the entry for “Foo”, there is also an entry for the > type’s UID “_ZTS3Foo” pointing to the type definition DIE. > >> .apple_types > >> { “Foo” => 0x80 } > >> { “_ZTS3Foo” => 0x80 } > >> > >> > >> > >> When the debug info linker (llvm-dsymutil) is run, it first pulls in > the .debug_info section from the clang module and fixes up all the > DW_FORM_strp external type references by turning them into a > DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled > in from the module. To find the correct type DIE it looks up the UID in the > .apple_exttypes index, finds the module, looks up the UID in the regular > .apple_types accelerator table and replaces the temporary DW_FROM_strp with > a DW_FORM_ref_addr (which incidentally takes up the same amount of space in > the DIE). > >> > >> > >> Thoughts? > >> -- > >> adrian > >> > > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
