On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <[email protected]> wrote:
> > On May 1, 2015, at 9:23 AM, David Blaikie <[email protected]> wrote: > > > > On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <[email protected]> wrote: > >> >> > On Apr 30, 2015, at 4:55 PM, David Blaikie <[email protected]> wrote: >> > >> > >> > >> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <[email protected]> >> wrote: >> >> >> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <[email protected]> >> wrote: >> >> > >> >> > >> >> > >> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <[email protected]> >> wrote: >> >> >> >> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected]> >> wrote: >> >> >> > >> >> >> > >> >> >> > >> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul < >> [email protected]> wrote: >> >> >> > Beyond the above (that using a new tag would mean this would go >> from 'free' to 'not free' for GDB) having a new top level tag is pretty >> substantial (we only have two at the moment, and with our talk of modules >> being a "bag of dwarf" might go back to having one top level tag? (it's not >> clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag, >> I don't think it is?) >> >> >> > >> >> >> >> The .debug_info section contains one or more compilation units, >> partial units, or in DWARF 5, type units. DW_TAG_module isn't a unit, if >> you want it to be handled independently then it would need to be wrapped in >> a DW_TAG_partial_unit. You would probably then use DW_TAG_imported_unit to >> refer to it, rather than DW_TAG_imported_module. >> >> >> >> >> >> >> > >> >> >> > This makes a fair bit of sense - though the terminology's never >> going to quite line up with modules, I suspect, and this would still >> require modifying existing consumers (well, GDB) that can handle >> split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe >> that does work? - and still don't know how existing consumers would handle >> imported_unit either - could be worth some testing, as it sounds sort of >> right out of several less right options). >> >> >> >> >> >> Thanks for all the input so far! >> >> >> To concretize this end of the discussion up let’s sketch some dwarf >> of how this could look like in practice. >> >> >> >> >> >> ELF (no imports) >> >> >> ---------------- >> >> >> >> >> >> On ELF or COFF a foo.c referencing types from the module Foundation >> looks like this: >> >> >> >> >> >> .debug_info: >> >> >> DW_TAG_compile_unit >> >> >> DW_AT_name(“foo.c”) >> >> >> >> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat) >> >> >> DW_TAG_partial_unit >> >> > >> >> > For now I'd suggest we use compile_unit - that way it'll just work >> with existing split-dwarf consumers. We can see about standardizing a >> top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps? >> I'm not sure. >> >> > >> >> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >> >> >> >> >> >> >> >> >> Side question: Is .debug_info.dwo the right section to put the >> module skeleton in, or should it be a .debug_info section like normal >> fission skeletons? >> >> > >> >> > Skeletons go in .debug_info, the dwo sections are just for the .dwo >> file (or the module file, in our new case - the extension isn't actually >> important). >> >> > >> >> > It might be worth you compiling an example or two of split-dwarf to >> see how this all works hands-on. >> >> > >> >> >> Mach-O (no comdat, no imports) >> >> >> ------------------------------ >> >> >> >> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if >> that option is the best discriminator) this could look like: >> >> >> >> >> >> .debug_info: >> >> >> DW_TAG_compile_unit >> >> >> DW_AT_name(“foo.c”) >> >> >> DW_TAG_partial_unit >> >> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >> >> >> >> >> >> >> >> >> Mach-O (no comdat, with imports) >> >> >> ------------------------------ >> >> >> >> >> >> If we add the module import information to this, we get: >> >> >> >> >> >> .debug_info: >> >> >> DW_TAG_compile_unit >> >> >> DW_AT_name(“foo.c”) >> >> >> DW_TAG_imported_module >> >> >> DW_AT_import(DW_FORM_ref_addr 0x10) >> >> > >> >> > Since we got went down the tangent of explaining split-dwarf many >> emails ago, I've forgotten (& can't readily find) what we were discussing >> about what ways the imported_module could work. >> >> > >> >> > The simplest representation I can think of would be to have it >> reference, by signature, the module unit (whatever tag it uses) - >> DW_FORM_ref_sig8, seems the simplest thing to do. >> >> > >> >> >> >> >> >> DW_TAG_partial_unit >> >> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >> >> >> >> >> >> 0x10: >> >> > >> >> > This is inside the partial unit? I figured we'd just put these >> attributes on the top level (compile_unit, or whatever it might be later) - >> potentially conditionalized on platform, sure. >> >> > >> >> >> DW_TAG_module >> >> >> DW_AT_name(“Foundation”) >> >> >> DW_AT_LLVM_sysroot(“/“) >> >> >> DW_AT_LLVM_include_dir(“”) >> >> >> DW_AT_LLVM_macros(“-DNDEBUG”) >> >> >> ... >> >> >> >> >> >> >> >> >> ELF (comdat, with imports) >> >> >> -------------------------- >> >> >> >> >> >> But now let’s go back to ELF. Since the skeleton with the partial >> unit is comdat'd, I assume that this breaks the FORM_ref_addr used in the >> DW_AT_import. We could reuse the module hash as a signature for the module: >> >> >> >> >> >> .debug_info: >> >> >> DW_TAG_compile_unit >> >> >> DW_AT_name(“foo.c”) >> >> >> DW_TAG_imported_module >> >> >> DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE) >> >> > >> >> > Still only really need these imported_modules for lldb, right? I'd >> consider having them off-by-default for non-darwin, but I'm not strictly >> wedded to that notion. Wouldn't mind seeing size impact numbers of some >> kind - if it's really fractional % increase & GDB doesn't fall over when it >> sees them (in whatever FORM/tag/etc we decide on) then that's not the end >> of the world. >> >> > >> >> > Just seems nice if the default mode is the nice, standard, >> split-dwarf output. Doesn't need anything fancy. >> >> > >> >> > >> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat) >> >> >> DW_TAG_partial_unit >> >> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >> >> >> >> >> >> DW_TAG_module >> >> >> DW_AT_signature(“0x1234ABCDE”) >> >> >> DW_AT_name(“Foundation”) >> >> > >> >> > >> >> > The thing you haven't covered is the actual .dwo sections >> (.debug_info.dwo (we'll probably need a simple stub compile_unit to make >> this correct split-dwarf) and .debug_types.dwo being important - but all >> the supporting .dwo sections will be necessary) that go in the module file. >> >> > >> >> >> This is bending the definition of DW_AT_signature, but I guess it >> could be made to work. Or we could say that for now, users have to choose >> between the comdat optimization and having the module imports recorded in >> Dwarf, since GDB wouldn’t know what to do with that information anyway. >> >> >> >> Sorry for the long delay. Here’s a more complete example that should >> include all the suggestions made so far. For context I also included >> external type references in the example although admittedly this is a bit >> out of scope for this thread: >> >> >> >> ELF (typeunits, comdats, with imports) >> >> -------------------------------------- >> >> >> >> On ELF or COFF a bar.c referencing type Foo from the module FooLib >> looks like this: >> >> >> >> bar.o >> >> ~~~~~ >> >> >> >> // To keep this example focussed/readable, I'm assuming that bar.o >> itself was not compiled with fission. >> >> .debug_info: >> >> DW_TAG_compile_unit >> >> DW_AT_name(“bar.c”) >> >> ... >> >> >> >> DW_TAG_imported_module // <- This could be optional on ELF. >> >> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) >> >> >> >> DW_TAG_variable >> >> DW_AT_name(“MyFoo”) >> >> DW_AT_type [DW_FORM_ref4] 0x20 >> >> 0x20: >> >> DW_TAG_structure_type >> >> DW_AT_declaration (true) >> >> DW_AT_signature [DW_FORM_ref_sig8] (0xF00) >> >> >> >> >> >> // Split DWARF skeleton CU for the module Foo. >> >> DW_TAG_compile_unit >> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >> >> DW_AT_dwo_id(“0xFEDB9876”) >> >> ... >> >> >> >> // Comdat’d partial unit containing the optional module descriptor. >> >> .debug_info, group 0xABCD1234, comdat >> >> DW_TAG_partial_unit >> >> DW_TAG_module >> >> DW_AT_name(“FooLib”) >> >> DW_AT_LLVM_sysroot(“/“) >> >> DW_AT_LLVM_include_dirs(“-I/path”) >> >> DW_AT_LLVM_macros(“-DNDEBUG”) >> >> ... >> >> >> >> FooLib-XYZ.pcm >> >> ~~~~~~~~~~~~~~ >> >> >> >> .debug_info.dwo >> >> DW_TAG_compile_unit >> >> DW_AT_dwo_id(“0xFEDB9876”) >> >> ... >> >> >> >> // Type unit for the type Foo. >> >> .debug_types.dwo, group 0xF00, comdat >> >> DW_TAG_type_unit >> >> DW_TAG_structure_type >> >> DW_AT_name (“Foo”) >> >> ... >> >> >> >> >> >> I think it awkward to have both the skeleton compile_unit in >> .debug_info and the partial_unit containing the TAG_module. Personally I’d >> prefer putting the TAG_module into the skeleton CU and then just refer to >> it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat >> section, it looks like that’s what’s necessary. >> > >> > It's been a while & I've probably lost all the context, but I think my >> original theory was to have the skeleton compile_unit be comdat'd so they'd >> deduplicate on linking (so we'd only have one reference to the module.dwo >> in the linked binary). I don't recall there being a need for a separate >> partial_unit - I imagine we'd just put the LLDB/LLVM extension attributes >> on the skeleton compile_unit and expect debuggers that didn't understand >> them, to ignore them. >> > >> > Was there some reason this didn't work/make sense? Because you need a >> DW_TAG_module to import with DW_TAG_imported_module? >> Using DW_TAG_module was the best practice that was recommended on >> dwarf-discuss. >> > > Did they have any ideas on how to reference it without duplicating it in > every CU? > > > We didn’t touch the deduplication issue. > > Once we've got the "Bag O Dwarf" stuff (rather than the narrower type > units) this would be easier - (I suppose we could do a partial > solution/abuse of type units - use a type unit header (perhaps with Eric's > merged type/compile unit work) and a DW_FORM_ref_sig8 value for the > DW_AT_module in the DW_TAG_imported_module. > > Though I suppose if we're going to have DW_TAG_imported_module in every CU > that references a module, it might not be that big of a deal to include the > DW_TAG_module itself there too... while I don't care about this scheme > immediately, Google's growing LLDB investment in various platforms, so I am > vaguely concerned about getting this right & it's not immediately obvious > to me what that right answer is. > > > Maybe the best path forward is to stage this by initially putting the > DW_TAG_module into the main CU and leave the deduplication as an > optimization to be implemented once the bag’o dwarf is more fleshed out. > This way we won’t do anything that would confuse consumers (assuming they > ignore unknown tags) and the extra overhead is likely not even going to be > noticeable, since all the string attributes inside the TAG_module can > already be deduplicated by traditional means. > > > >> > If it turns out that's the right way to get a target for the >> imported_module, we could put both the skeleton CU and the partial unit in >> the same comdat and dedup them both together. >> >> I think this works as long as we only have one TAG_module per .pcm file >> (because we need to refer to it via signature). > > > Not quite following here - why would we have more than one module per pcm > - a pcm is a module, right? > > > Clang modules may have submodules and a compile unit could import two > submodules that live in the same .pcm file. For example on Darwin there is > a module Darwin.pcm that contains a submodule “C" that contains the > submodule “stdio". > > > >> But if we don’t mind having duplicate dwo_* references in the same .o >> file this would also work with more than one TAG_module (or submodules). > > >> >> .debug_info: >> DW_TAG_compile_unit >> DW_AT_name(“bar.c”) >> ... >> >> DW_TAG_imported_module // <- This could be optional on ELF. >> DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876) >> >> ... >> >> // Comdat’d split DWARF skeleton CU for the module Foo. >> .debug_info, group 0xFEDB9876, comdat >> DW_TAG_compile_unit >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >> DW_AT_dwo_id(“0xFEDB9876”) >> ... >> >> DW_TAG_module >> DW_AT_name(“FooLib”) >> DW_AT_LLVM_sysroot(“/“) >> DW_AT_LLVM_include_dirs(“-I/path”) >> DW_AT_LLVM_macros(“-DNDEBUG”) >> ... >> >> >> > >> > But this gets into complicated territory when the original binary is >> built with fission... which will be relevant for modules on ELF with LLDB. >> Hmm, maybe it's not too complicated - the partial_unit would end up in the >> .dwo file (maybe we'd have to teach the .dwo file to deduplicate these too >> - the same way it does for type units... - might require a new header to >> include the hash, etc :/)... would be tricky to have the dwp tool resolve >> the relocations to these things. Cross-unit references as you've got there >> aren't something that every DWARF consumer is totally cool with, I don't >> think? >> >> Ah. I thought the deduplication happens because all ELF sections sharing >> the same group are uniqued based on the group id. > > > COMDAT groups deduplicate for a normal non-fission build, but fission > linking doesn't require the .dwo file to use/contain COMDATs as it uses a > DWARF-aware tool (so you don't bother putting the type units in COMDAT > groups, for example - the fission linker knows how to parse debug_types, > find the type unit headers and their hashes and deduplicates them that way). > > > Ok that makes sense. > > -- adrian > > > >> It certainly would be nice if we could avoid introducing a new >> .debug_info header... > > >> > >> > Sort of inclined to have the imported module stuff just for LLDB, but >> I've lost some of the context for that in the ensuing weeks. >> >> -- adrian >> >> > >> >> >> >> >> >> >> >> >> >> MachO (no typeunits, no comdats, with imports) >> >> ---------------------------------------------- >> >> >> >> Since we don’t have comdat sections in Mach-O and we don’t have the >> tool support for type units, the way that external types can be referenced >> necessarily needs to be a bit different. The design that Greg and I came up >> with for Mach-O relies on llvm-dsymutil to fix up the DWARF for >> non-module-aware consumers. Just as ELF DWARF consumers need not be able to >> tell the difference between module debugging an split DWARF, on Mach-O the >> .dSYM bundle generated by llvm-dsymutil looks like traditional DWARF. >> >> >> >> There are three differences in the DWARF output that make this >> possible: >> >> - Refer to external types by UID rather than by type signature. >> >> (This doubles as the key that allows a debugger to look import the >> type >> >> directly from the AST and protects us against hash collisions) >> >> - Add an index to the .o file that maps UID -> module file. >> >> (Fast lookup + UIDs for C and ObjC are only unique within a module) >> >> - Add an entry for each type’s UID to the types accelerator table. >> >> (Fast lookup) >> >> >> >> bar.o >> >> ~~~~~ >> >> >> >> .debug_info: >> >> DW_TAG_compile_unit >> >> DW_AT_name(“bar.c”) >> >> DW_TAG_imported_module >> >> DW_AT_import(DW_FORM_ref_addr 0x40) >> >> >> >> DW_TAG_variable >> >> DW_AT_name(“MyFoo”) >> >> DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”) // We could use a custom >> FORM here >> >> >> >> // Skeleton unit. >> >> DW_TAG_compile_unit >> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >> >> DW_AT_dwo_id(“0xFEDB9876”) >> >> ... >> >> 0x40: >> >> DW_TAG_module >> >> DW_AT_name(“FooLib”) >> >> DW_AT_LLVM_sysroot(“/“) >> >> DW_AT_LLVM_include_dirs(“-I/path”) >> >> DW_AT_LLVM_macros(“-DNDEBUG”) >> >> >> >> // This index uses the usual accelerator table format. >> >> .apple_exttypes: >> >> { “_ZTS3Foo” => debug_str offset of >> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” } >> >> >> >> FooLib-XYZ.pcm >> >> ~~~~~~~~~~~~~~ >> >> >> >> .debug_info >> >> DW_TAG_compile_unit >> >> DW_AT_dwo_id(“0xFEDB9876”) >> >> >> >> 0x80: >> >> DW_TAG_structure_type >> >> DW_AT_name (“Foo”) >> >> DW_AT_signature >> >> ... >> >> >> >> // In addition to the entry for “Foo”, there is also an entry for the >> type’s UID “_ZTS3Foo” pointing to the type definition DIE. >> >> .apple_types >> >> { “Foo” => 0x80 } >> >> { “_ZTS3Foo” => 0x80 } >> >> >> >> >> >> >> >> When the debug info linker (llvm-dsymutil) is run, it first pulls in >> the .debug_info section from the clang module and fixes up all the >> DW_FORM_strp external type references by turning them into a >> DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled >> in from the module. To find the correct type DIE it looks up the UID in the >> .apple_exttypes index, finds the module, looks up the UID in the regular >> .apple_types accelerator table and replaces the temporary DW_FROM_strp with >> a DW_FORM_ref_addr (which incidentally takes up the same amount of space in >> the DIE). >> >> >> >> >> >> Thoughts? >> >> -- >> >> adrian >> >> >> > >> > > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
