On Wed, Mar 18, 2015 at 5:21 PM, Adrian Prantl <[email protected]> wrote:
> > On Mar 18, 2015, at 5:03 PM, David Blaikie <[email protected]> wrote: > > > > On Wed, Mar 18, 2015 at 4:53 PM, Adrian Prantl <[email protected]> wrote: > >> >> On Mar 18, 2015, at 4:41 PM, David Blaikie <[email protected]> wrote: >> >> >> >> On Wed, Mar 18, 2015 at 4:31 PM, Adrian Prantl <[email protected]> wrote: >> >>> >>> On Mar 18, 2015, at 4:02 PM, David Blaikie <[email protected]> wrote: >>> >>> >>> >>> On Wed, Mar 18, 2015 at 3:50 PM, Adrian Prantl <[email protected]> >>> wrote: >>> >>>> >>>> On Mar 17, 2015, at 6:44 PM, David Blaikie <[email protected]> wrote: >>>> >>>> >>>> >>>> On Tue, Mar 17, 2015 at 3:47 PM, Adrian Prantl <[email protected]> >>>> wrote: >>>> >>>>> >>>>> > On Mar 17, 2015, at 10:03 AM, Greg Clayton <[email protected]> >>>>> wrote: >>>>> > >>>>> > >>>>> >> On Mar 17, 2015, at 9:46 AM, David Blaikie <[email protected]> >>>>> wrote: >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Tue, Mar 17, 2015 at 9:42 AM, Greg Clayton <[email protected]> >>>>> wrote: >>>>> >> >>>>> >>> On Mar 16, 2015, at 6:47 PM, David Blaikie <[email protected]> >>>>> wrote: >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <[email protected]> >>>>> wrote: >>>>> >>> >>>>> >>> Thanks for the explanation David, I missed that it is entirely the >>>>> linker's (or some dwarf post-processor's) responsibility to find the >>>>> module >>>>> files and link in the debug info from the .pcm files, so debugger doesn’t >>>>> notice a difference. >>>>> >>> >>>>> >>> I think there's still some confusion here. Sorry if I'm rehashing >>>>> something, but I'll try to explain how this all works. >>>>> >>> >>>>> >>> Normal split DWARF: >>>>> >>> >>>>> >>> Compiler generates two files: .o and .dwo. >>>>> >>> .dwo has static, non-relocatable debug info. >>>>> >>> .o has a skeleton compile_unit that has the name of the .dwo file >>>>> and a hash to verify that the .dwo file isn't stale when the debugger >>>>> reads >>>>> it. >>>>> >>> The .o files are all linked together, the .dwo files stay where >>>>> they are. >>>>> >>> The debugger reads the linked executable, finds the skeleton >>>>> compile_units contained therein, and find/loads the .dwo files >>>>> >>> >>>>> >>> The scenario I have in mind for module debug info is this: >>>>> >>> Module is compiled as an object file with debug info (this file is >>>>> actually a .dwo file, even if it has some other extension - it has the >>>>> non-relocatable debug info in it) >>>>> >>> .o file has a comdat'd skeleton compile_unit describing the >>>>> .dwo/module file >>>>> >>> <from here on no extra work is required, the linker and debugger >>>>> just act as normal> >>>>> >>> The .o files are linked together, the skeleton compile_units get >>>>> deduplicated by the linker (comdat sections) >>>>> >> >>>>> >> One issue I can think of is we will need to figure out a way to >>>>> make COMDAT work with mach-o. COMDAT requires large number of sections and >>>>> mach-o can only have 255. >>>>> >> >>>>> >> Ah, fair enough - how does MachO handle inline functions (the most >>>>> common use of comdat) currently, then? >>>>> > >>>>> > Currently mach-o relies on symbols in the symbol table being marked >>>>> as weak and I believe the data for these symbols are in special sections >>>>> that are marked as containing items that can be coalesced. >>>>> > >>>>> That’s not necessarily an issue that needs to be solved on Darwin, or >>>>> am I maybe missing something? The linker leaves all debug info in the .o >>>>> (as it currently does) and llvm-dsymutil is resolving all the external >>>>> module type references while creating the .dSYM bundle. >>>>> >>>> >>>> Yeah, with a debug aware linker (or in the case of dsymutil, a >>>> debug-only linker) you would just know that since you're looking at object >>>> files, module references will be redundant across objects and should be >>>> deduplicated (by the dwo hash, most likely). >>>> >>>> If you're not teaching your debugger to read modules, and want to link >>>> the debug info in from the .dwos - at that point you can probably drop the >>>> skeleton stuff entirely (you'd still need to teach your debugger about .dwo >>>> sections and some of the esoteric things there - like str_index and the >>>> extra/special line table just for file names (decl_file, etc, uses this)) >>>> and just put the contents of the module debug info straight in the dsym. >>>> It'd be a bit weird, but do-able without too much work, I'd imagine. You >>>> could move them back into the original sections, if you wanted to avoid the >>>> weird .dwo +non-.dwo sections together... *shrug* not sure what exactly >>>> you'd want there. >>>> >>>> >>>> My plan was to have -gmodules to behave like the latter variant >>>> unless -gsplit-dwarf is also present; this way there wouldn't be any weird >>>> Darwin-specific code paths. >>>> >>> >>> Not sure I quite follow (mostly my fault given the rambling paragraph up >>> there) - given the lack of a dsymutil-like tool on other platforms as part >>> of the common tool path for debug info, I'm not sure module debug info >>> without split dwarf is viable in that world. There's no tool to read these >>> extra files at any point. >>> >>> >>> In theory someone could port llvm-dsymutil to a different platform, but >>> that scenario is a little far-fetched. I’m not sure what will happen if >>> LLDB is presented with linked, non-split debug info that contains module >>> references. >>> >> >> Linked non-split debug info should come out for free - all the debug info >> would be is a bunch of TUs in a single comdat - no skeleton CU, nothing >> else. It would look just like normal DWARF, except with one comdat instead >> of multiple, for each set of types from a module. (& there would be no real >> size gains - since you'd be redundantly including all the type information >> in every object file) >> >> >>> >>> >>> I suppose we could be creating one giant comdat for the module's debug >>> info (no skeleton unit, no distinct type unit comdats, just one big >>> comdat). But we'd probably want/need a tool to do the merging at compile >>> time (like the objcopy feature for split-dwarf, but in reverse - we'd >>> compile, then run a tool to smoosh all the comdats from the modules onto >>> the object we just generated). It wouldn't provide much in the way of space >>> savings, a little less stress on the linker (fewer comdats to handle), etc. >>> Not sure if there's a default mode of objcopy that would cope with this >>> straight out, or whether we'd need a new feature there (which wouldn't be a >>> priority for Google to implement, since we use fission, nor a priority for >>> you to implement since you have dsymutil, etc - so I'm not sure anyone >>> would bother) >>> >>> Long story short: maybe just error on -gmodules if -gsplit-dwarf isn't >>> specified or the platform isn't darwin? (& if it's darwin, dsymutil could >>> read the module skeletons to find which modules to link into the .dSYM?) >>> >>> >>> That’s reasonable, too :-) >>> The plan is for llvm-dsymutil to follow the references in the module >>> skeletons, copy the module CUs >>> >> >> TUs for now >> >> >>> into the .dSYM, and fixup the external type references to become >>> DW_FORM_ref_addrs. >>> >> >> Sounds good for you guys - the fixup work will be a bit non-trivial, >> since it'll need to remove the type skeletons in the CUs, move all the >> extra members from the skeletons into the type unit (& resolve any >> duplicates), etc... - does that make sense? (otherwise I can provide some >> DWARF snippets to explain better) >> >> >> Or we use a weird Darwin-specific code path to not emit the modules with >> -generate-type-units in the first place (bag of DWARF+index mapping hash to >> DIE), >> > > bag-o-dwarf still doesn't address all the issues with type member merging > I described above. Certain things can't go in the type in the module > because they depend on context - most importantly/obviously, implicit > special members and member function template instatiations. > > > I suppose you could still have type references reference the type in the > bag-o-dwarf/type unit directly (DW_AT_type with DW_FORM_ref_sig8) while > having the partial type (the type declaration with its extra CU-specific > members) which would simplify the dwarf in the easy cases. > > > Yes, something along these lines would make a good first iteration. > > > >> which would make dsymutil's job really easy. As much as I’d like to get >> rid of platform-specific behavior, due to the automatic way that modules >> are generated on Darwin I don’t see an elegant way of making this >> switchable by the user. >> > > Not sure I quite follow here how implicit modules impact this > functionality. We can still have a flag that you pass to the compiler that > dictates how debug info in modules is created/what schema we use. > > > The problem is the combination of implicit generation and a global module > cache. I guess we could treat a module with the wrong kind of debug info as > out of date, but I’m not excited. > I'm assuming the global module cache already has to factor in command line arguments to the compiler (things as simple as configuration macros, for example) - so this would be another property to the module cache key. > > -- adrian > > > - David > > >> >> -- adrian >> >> >> >>> >>> -- adrian >>> >> >> >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
