> On Mar 16, 2015, at 6:47 PM, David Blaikie <[email protected]> wrote: > > > > On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <[email protected] > <mailto:[email protected]>> wrote: > > Thanks for the explanation David, I missed that it is entirely the linker's > (or some dwarf post-processor's) responsibility to find the module files and > link in the debug info from the .pcm files, so debugger doesn’t notice a > difference. > > I think there's still some confusion here. Sorry if I'm rehashing something, > but I'll try to explain how this all works.
thanks! > Normal split DWARF: > > Compiler generates two files: .o and .dwo. > .dwo has static, non-relocatable debug info. > .o has a skeleton compile_unit that has the name of the .dwo file and a hash > to verify that the .dwo file isn't stale when the debugger reads it. > The .o files are all linked together, the .dwo files stay where they are. > The debugger reads the linked executable, finds the skeleton compile_units > contained therein, and find/loads the .dwo files That makes total sense. Now, to eliminate the last remaining misconception: Does LLVM actually emit the separate .dwo file currently? From looking at testcases like DebugInfo/X86/fission-cu.ll it appears as if the relocatable and the non-relocatable output both end up in the .o file. This is where I got the impression that there was another tool involved that extracted the non-relocateable content from the .o into a .dwo file, but maybe that’s just something we do for testing? -- adrian > > The scenario I have in mind for module debug info is this: > Module is compiled as an object file with debug info (this file is actually a > .dwo file, even if it has some other extension - it has the non-relocatable > debug info in it) > .o file has a comdat'd skeleton compile_unit describing the .dwo/module file > <from here on no extra work is required, the linker and debugger just act as > normal> > The .o files are linked together, the skeleton compile_units get deduplicated > by the linker (comdat sections) > The debugger reads the linked executable, finds the skeleton compile_units > contained therein, and find/loads the module files just as .dwo files. > > There's no need for a debug-aware linker or any DWARF post-processing so far > as I understand it. No module-linking is required. Debugger reads the modules > directly, just as if they were .dwo files - they're just object files in the > filesystem like any other (that they have a different extension isn't too > important). > > Does this make sense? > > > >> On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul >> <[email protected] >> <mailto:[email protected]>> wrote: >> Beyond the above (that using a new tag would mean this would go from 'free' >> to 'not free' for GDB) having a new top level tag is pretty substantial (we >> only have two at the moment, and with our talk of modules being a "bag of >> dwarf" might go back to having one top level tag? (it's not clear to me from >> DWARF4 whether DW_TAG_module is currently a top-level tag, I don't think it >> is?) >> >> >> The .debug_info section contains one or more compilation units, partial >> units, or in DWARF 5, type units. DW_TAG_module isn't a unit, if you want >> it to be handled independently then it would need to be wrapped in a >> DW_TAG_partial_unit. You would probably then use DW_TAG_imported_unit to >> refer to it, rather than DW_TAG_imported_module. <> >> >> This makes a fair bit of sense - though the terminology's never going to >> quite line up with modules, I suspect, and this would still require >> modifying existing consumers (well, GDB) that can handle split-dwarf today, >> I suspect (not sure how it'd handle partial_unit - maybe that does work? - >> and still don't know how existing consumers would handle imported_unit >> either - could be worth some testing, as it sounds sort of right out of >> several less right options). > > The standard specifically recommends DW_TAG_partial_unit for #include > directives so that sounds like a comparatively good match. Partial units were > already introduced in DWARF3 so maybe GDB supports them. But even if it > doesn’t this shouldn’t necessarily be a problem (unless it crashes). The > DW_TAG_imported_unit since this is primarily useful for AST-based debuggers > that know how to import a module before expression evaluation. > > -- adrian > >> - David >> (Sorry about the top-quoting but Outlook can't handle HTML editing properly.) >> > > Unfortunately the gmail client somewhat forces a thread to HTML — gmail > quotation markers mysteriously disappear in the plain text version displayed > by other mail clients. > >> --paulr >> >> >> >> From: David Blaikie [mailto:[email protected] <mailto:[email protected]>] >> Sent: Monday, March 16, 2015 1:36 PM >> To: Adrian Prantl >> Cc: Richard Smith; Eric Christopher; llvm cfe; Greg Clayton; Robinson, Paul >> Subject: Re: [PATCH] Have clang list the imported modules in the debug info >> >> >> >> >> >> >> >> On Mon, Mar 16, 2015 at 1:24 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Mar 10, 2015, at 12:10 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> >> >> >> >> On Tue, Mar 10, 2015 at 12:05 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Mar 9, 2015, at 5:16 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> >> >> >> >> On Mon, Mar 9, 2015 at 5:07 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Mar 9, 2015, at 2:14 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> >> >> >> >> On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Feb 24, 2015, at 3:06 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Feb 23, 2015, at 2:59 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> > On Jan 20, 2015, at 11:07 AM, David Blaikie <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > My vague recollection from the previous design discussions was that these >> > module references would be their own 'unit' COMDAT'd so that we don't end >> > up with the duplication of every module reference in every unit linked >> > together when linking debug info? >> > >> > I think in my brain I'd been picturing this module reference as being an >> > extended fission reference (fission skeleton CU + extra fields for users >> > who want to load the Clang AST module directly and skip the split CU). >> >> Apologies for letting this rest for so long. >> >> Your memory was of course correct and I didn’t follow up on this because I >> had convinced myself that the fission reference would be completely >> sufficient. Now that I’ve been thinking some more about it, I don’t think >> that it is sufficient in the LTO case. >> >> Here is the example from >> thehttp://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html >> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>: >> >> foo.o: >> .debug_info.dwo >> DW_TAG_compile_unit >> // For DWARF consumers >> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm") >> DW_AT_dwo_id ([unique AST signature]) >> >> .debug_info >> DW_TAG_compile_unit >> DW_TAG_variable >> DW_AT_name "x" >> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct]) >> >> In this example it is clear that foo.o imported MyModule because its DWO >> skeleton is there in the same object file. But if we deal with the result of >> an LTO compilation we will end up with many compile units in the same >> .debug_info section, plus a bunch of skeleton compile units for _all_ >> imported modules in the entire project. We thus loose the ability to >> determine which of the compile units imported which module. >> >> >> Why would we need to know which CU imported which modules? (I can imagine >> some possible reasons, but wondering what you have in mind) >> >> >> >> When the debugger is stopped at a breakpoint and the user wants to evaluate >> an expression, it should import the modules that are available at this >> location, so the user can write the expression from within the context of >> the breakpoint (e.g., without having to fully qualify each type, etc). >> >> >> I'm not sure how much current debuggers actually worry about that - (& this >> may differ from lldb to gdb to other things, of course). I'm pretty sure at >> least for GDB, a context in one CU is as good as one in another (at least >> without split-dwarf, type units, etc - with those sometimes things end up >> overly restrictive as the debugger won't search everything properly). >> >> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you run >> 'start' in gdb (which breaks at the beginning of main) you can still run 'p >> func()' to call the func, even though there's no declaration of it in a.cpp, >> etc. >> >> >> >> LLDB would definitely care (as it is using clang for the expression >> evaluation supporting these kinds of features is really straightforward >> there). By importing the modules (rather than searching through the DWARF), >> the expression evaluator gains access to additional declarations that are >> not there in the DWARF, such as templates. But since clang modules are not >> namespaces, we can’t generally "import the world” as a debugger would >> usually do. >> >> >> Sorry, not sure I understand this last sentence - could you explain further? >> >> I imagine it would be rather limiting for the user if they could only use >> expressions that are valid in this file from the file - it wouldn't be >> uncommon to want to call a function from another module/file/etc to aid in >> debugging. >> >> >> >> Usually LLDB’s expression evaluator works by creating a clang AST type out >> of a DWARF type and inserting it into its AST context. We could pre-polulate >> it with the definitions from the imported modules (with all sorts of >> benefits as described above), but that only works if no two modules >> conflict. If the declaration can’t be found in any imported module, LLDB >> would still import it from DWARF in the “traditional” fashion. >> >> >> But it would import it from DWARF in other TUs rather than use the module >> info just because the module wasn't directly referenced from this TU? That >> would seem strange to me. (you would lose debug info fidelity (by falling >> back to DWARF even though there are modules with the full fidelity info) >> unnecessarily, it sounds like) >> >> >> >> I think it’s reasonable to expect full fidelity for everything that is >> available in the current TU, and having the normal DWARF-based debugging >> capabilities for everything beyond that. But we can only ever provide full >> fidelity if we have the list of imports for the current TU. >> >> >> >> Would it be reasonable to use the accelerator table/index to lookup the >> types, then if the type is in the module you could use the module rather >> than the DWARF stashed alongside it? (so the comdat'd split-dwarf skeleton >> CU for the module would have an index to tell you what names are inside it, >> but if you got an index hit you'd just look at the module instead of loading >> the split-dwarf debug info in the referenced file) >> >> >> >> I don’t think this approach would work for templates and enumerator values; >> >> >> Not sure why enumerator values are an issue - but templates (& all manner of >> other things that don't make it into the index, unfortunately), sure. >> >> >> they aren’t in the accelerator tables to begin with. It would also be slower >> if the declaration is available in a module. >> >> >> Though you're rapidly going to end up loading a lot of modules in (as you go >> up & down a stack printing various things you'll cross into other TUs & load >> more modules). >> >> For a standard DWARF consumer, it seems fine to just have a comdat'd >> skeleton CU for a module without the need for other CUs to mention which >> module CUs they reference (but I could be wrong here) & that's the design we >> originally discussed. >> >> It would seem unfortunate to bloat every CU with a non-deduplicable list of >> every module it references, but if that's necessary for a serialized AST >> aware debugger, it might be fine to have it as an option (so long as it can >> be turned off) & may still benefit from that list not being the >> authoritative module reference, but a /very/ terse reference to it so all >> the extra flags & stuff can be in the deduplicable comdat (& to keep it as >> consistent as possible between the flag (on/off) codepaths for this extra >> data). Maybe a FORM_block (?) of fixed-size hashes of all the modules >> back-to-back, so it's as small as possible? >> >> But I wouldn't mind spending some more time discussing whether there's a >> better way to keep these things streamlined/symmetric/the same between >> modular and non-modular debug info. >> >> Sure! >> >> Now that we established that recording the list of imported modules for >> every CU is useful for an AST-based debugger, >> >> >> +Richard, just to see if he's got some ideas about how a debugger might >> efficiently use modules to support debugger scenarios and whether or not >> having a list of which modules are referenced from which contexts is >> valuable in that. >> >> It still concerns me that this would create something of a >> regression/oddity/difference between AST-based debug info (you wouldn't be >> able to handle expressions referencing things in other TUs) and non-AST >> based debug info (where I think the average user is used to not worrying >> about what headers are included in the current file they're debugging when >> they try to use a type or other identifier) >> >> >> >> If I understood you correctly, this is not actually the case. The list of >> imported modules allows the AST-based debugger to import all the modules >> that were imported by the CU that the current frame is in. This enables the >> user to, e.g., type "p myVector->size()" even though >> std::vector<MyClass>::size() was not used by the CU and is thus not >> available in DWARF. >> >> If the user types “p foo” even though foo was not defined in any imported >> module the debugger can — after failing to import foo via clang — still fall >> back to looking up foo in DWARF and do what it always did. >> >> >> If you do the DWARF fallback then you'll get a pretty clear inconsistency >> between templates and non-templates. If I have a function foo and a function >> template foo_tmpl in one file, and I'm debugging in another file I'll be >> able to call 'foo' (normal DWARF fallback/search) but not foo_tmpl (if I'm >> calling a new instantiation of foo_tmpl - if I'm calling an existing >> instantiation presumably the fallback would catch me). Seems >> unfortunate/confusing, perhaps. >> >> >> >> Good point, but it my guess is that this wouldn’t be any worse than the “why >> can’t I print the size() of this vector!?”-situations we have at the moment. >> >> >> Sure - it's strictly better in the sense that there are strictly more >> expressions that can be evaluated, but seems incomplete is my point, and >> maybe worth considering alternative designs that might be more-betterer. >> >> >> In certain situations (i.e., non-templates) the debugger could use the DWARF >> in the modules to print a message about which module to import. >> >> >> >> >> >> >> >> >> >> let’s talk about how to most efficiently represent this information. >> >> >> >> In the CU, using DW_TAG_imported_module appears to be the most appropriate >> choice, even though there is some room for confusion since C++ using >> declarations are also represented this way. Inside the >> DW_TAG_imported_module, we could use >> >> (1) a DW_AT_import that references the skeleton (I hope that is the right >> terminology) CU for the module, the idea being that the skeleton CU would >> contain all the details (flags, name, include dirs, hash, ...) and be in a >> comdat'ed section. >> >> >> I'd be concerned about overloading the terminology & confusing other >> debuggers - they might try to follow the DW_AT_import and be surprised that >> it doesn't refer to a DW_TAG_namespace tag. >> >> >> >> That’s a valid concern, and we probably should not be emitting this if we >> have any evidence of, e.g., gdb crashes when encountering such a construct. >> Then again, we would be using a DW_TAG_imported_module to express what it is >> meant to express according to the DWARF spec (namely importing a module)... >> but I admit that the tag also does have a very specific meaning for C++, >> which we maybe shouldn’t overload. >> >> >> That's my concern, yes. >> >> >> The right thing here is probably to put aside my personal sense of >> aesthetics and use a private _LLVM_ namespace for all new additions, and >> then attempt to standardize an official DWARF version once we know what is >> really needed and what isn't. >> >> >> I'd prefer this, yes. I mean the usual bar we use for language features is >> that they're at least proposed for standardization before we adopt them in >> clang - I wouldn't mind a similar bar here. If you want to bring up this use >> of DW_TAG_imported_module with the DWARF committee & see if it sounds >> reasonable (& test/inquire about GDB's behavior here). >> >> >> >> I started a thread on dwarf-discuss to this end >> (http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org >> <http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org>, the >> list archives are only visible to subscribers, but anyone can subscribe). >> >> >> Cool cool >> >> >> >> To paraphrase the replies that my question solicited: We are, perhaps not >> very surprising, encouraged to follow the standard and use a >> DW_TAG_imported_module that references a DW_TAG_module. If we, however, >> choose to describe the module by using a skeleton DW_TAG_compile_unit, we >> should be careful (my own words) about using a DW_TAG_imported_module until >> that use is sanctioned by the standard. >> >> >> >> I see two possible ways to proceed in this spirit: >> >> a) Rename the module skeleton DW_TAG_compile_units to DW_TAG_module, but >> keep all the comdat/split dwarf goodness from the original proposal [1]. My >> understanding is that even though we are making clever use of the split >> DWARF features, GDB would still need to be taught to follow references to >> external files, >> >> >> Not sure what you're referring to here, perhaps a misunderstanding about how >> split DWARF works. >> >> To the best of my knowledge, what we've talked about for module DWARF debug >> info is actually just split-dwarf, no extra work required by DWARF >> consumers*. >> >> * It's, admittedly, a little tricksy to include type unit references in an >> object file that doesn't include the type unit at all - relying on it being >> linked into the final executable. But DWARF doesn't really talk about >> objects versus executables, etc - so, so long as the type unit is there in >> the end, it's valid DWARF no matter how it got there (& should work fine for >> existing consumers - they can't tell if the type unit was in every object >> file that referenced the type or not once it's been linked and deduplicated). >> >> >> so having it recognize a new tag in this context doesn’t appear to be much >> additional effort (but others may provide more insight here). >> >> >> Beyond the above (that using a new tag would mean this would go from 'free' >> to 'not free' for GDB) having a new top level tag is pretty substantial (we >> only have two at the moment, and with our talk of modules being a "bag of >> dwarf" might go back to having one top level tag? (it's not clear to me from >> DWARF4 whether DW_TAG_module is currently a top-level tag, I don't think it >> is?) >> >> >> b) Emit an LLVM-specific DW_AT_LLVM_import attribute inside the >> DW_TAG_imported_module (or vice versa) that refers to the skeleton >> DW_TAG_compile_unit. >> >> >> >> I think that option (a) is a bit more elegant and it is bending the dwarf >> standard not quite as much and will make the dwarf output a bit more >> readable. >> >> >> >> -- adrian >> >> >> >> [1] Module debugging proposal for reference: >> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html >> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html> >> >> >> >> - David >> >> >> >> >> -- adrian >> >> >> >> >> >> But extension tags seems like the conservatively correct option (not sure >> what GDB does on tags it doesn't recognize - I forget if it warns or just >> completely ignores them, hopefully the latter) >> >> >> >> (2) David’s suggestion of using a custom form that records the module hash >> directly is quite space-efficient, but it has the drawback of not being >> resilient against small changes to the imported module >> >> >> That's going to be true of the normal fission info here (the skeleton CU and >> the full CU in the .dwo file (or module) are associated by hash) - granted, >> in the "loading an AST" mode, you can ignore those hashes and rely on your >> custom attributes instead. >> >> >> , since clang’s module hash changes each time the module is being rebuilt. >> >> >> Clang's module hash only changes if the DWARF contents change - it doesn't >> use a timestamp or anything. It seems like actually you're going to want to >> fail to load even more aggressively - there are ways the AST might've >> changed that the debug info doesn't reflect but are still important (a type >> unreferenced in this module, but built into some other code that is not >> built with debug info changes - no hash changes because the debug info for >> that type is unreferenced here, but if you try to use it you could have an >> incompatible layout, etc). >> >> >> >> Agreed: If the module contents changed the debugger needs to display a big >> flashing "here be dragons" warning. >> >> >> >> >> This is less of an issue if the hash is referring to a skeleton CU in the >> same file, which contains all the detailed information. >> >> >> >> Personally I’d prefer option 1 because mostly uses the existing mechanisms >> from DWARF. Here’s a visual guide to the options on the table: >> >> >> >> (1) >> >> foo.o (compiled with, let’s call it .. "-gmodule-imports”) >> >> ----- >> >> .debug_info: >> >> DW_TAG_compile_unit >> >> DW_AT_name(“foo.c”) >> >> DW_TAG_imported_module >> >> DW_AT_import(DW_FORM_ref_addr 0x123) // Could be a FORM_ref_sig8 >> 0x1234ABCDE as well. >> >> DW_TAG_imported_module >> >> DW_AT_import(...) >> >> >> >> .debug_info.dwo: >> >> // Skeleton CUs for modules imported by foo.o. >> >> 0x123: >> >> DW_TAG_compile_unit >> >> // Used by split-dwarf debuggers to find external type definitions. >> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >> >> DW_AT_dwo_id(“0x1234ABCDE”) >> >> >> >> // Used by AST-based debuggers to import the module. >> >> DW_AT_name(“Foundation”) >> >> >> (side notes: the mixed indentation here makes it a bit hard to read this >> example, and I'd make sure /all/ the extended attributes (including the name >> here) use custom attribute names, not standard ones) >> >> >> >> Agreed. >> >> >> >> >> >> >> DW_AT_LLVM_sysroot(“/“) >> >> DW_AT_LLVM_include_dir(“”) >> >> DW_AT_LLVM_macros(“-DNDEBUG”) >> >> >> >> (2) >> >> .debug_info.dwo: >> >> (As above.) >> >> >> >> .debug_info: >> >> DW_TAG_compile_unit >> >> DW_AT_name(“foo.c”) >> >> DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE 0xDEADBEEF 0x....) >> >> >> >> Now I’m curious what option (3) will look like; the one that we’ll actually >> implement! >> >> >> ;) >> >> >> >> >> -- adrian >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
