> On Mar 9, 2015, at 5:16 PM, David Blaikie <[email protected]> wrote: > > > > On Mon, Mar 9, 2015 at 5:07 PM, Adrian Prantl <[email protected] > <mailto:[email protected]>> wrote: > >> On Mar 9, 2015, at 2:14 PM, David Blaikie <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <[email protected] >> <mailto:[email protected]>> wrote: >> >>> On Feb 24, 2015, at 3:06 PM, David Blaikie <[email protected] >>> <mailto:[email protected]>> wrote: >>> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <[email protected] >>> <mailto:[email protected]>> wrote: >>> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected] >>> <mailto:[email protected]>> wrote: >>>> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected] >>>> <mailto:[email protected]>> wrote: >>>>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie <[email protected] >>>>>>> > <mailto:[email protected]>> wrote: >>>>>>> > >>>>>>> > My vague recollection from the previous design discussions was that >>>>>>> > these module references would be their own 'unit' COMDAT'd so that we >>>>>>> > don't end up with the duplication of every module reference in every >>>>>>> > unit linked together when linking debug info? >>>>>>> > >>>>>>> > I think in my brain I'd been picturing this module reference as being >>>>>>> > an extended fission reference (fission skeleton CU + extra fields for >>>>>>> > users who want to load the Clang AST module directly and skip the >>>>>>> > split CU). >>>>>>> >>>>>>> Apologies for letting this rest for so long. >>>>>>> >>>>>>> Your memory was of course correct and I didn’t follow up on this >>>>>>> because I had convinced myself that the fission reference would be >>>>>>> completely sufficient. Now that I’ve been thinking some more about it, >>>>>>> I don’t think that it is sufficient in the LTO case. >>>>>>> >>>>>>> Here is the example from the >>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html >>>>>>> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>: >>>>>>> >>>>>>> foo.o: >>>>>>> .debug_info.dwo >>>>>>> DW_TAG_compile_unit >>>>>>> // For DWARF consumers >>>>>>> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm") >>>>>>> DW_AT_dwo_id ([unique AST signature]) >>>>>>> >>>>>>> .debug_info >>>>>>> DW_TAG_compile_unit >>>>>>> DW_TAG_variable >>>>>>> DW_AT_name "x" >>>>>>> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct]) >>>>>>> >>>>>>> In this example it is clear that foo.o imported MyModule because its >>>>>>> DWO skeleton is there in the same object file. But if we deal with the >>>>>>> result of an LTO compilation we will end up with many compile units in >>>>>>> the same .debug_info section, plus a bunch of skeleton compile units >>>>>>> for _all_ imported modules in the entire project. We thus loose the >>>>>>> ability to determine which of the compile units imported which module. >>>>>>> >>>>>>> Why would we need to know which CU imported which modules? (I can >>>>>>> imagine some possible reasons, but wondering what you have in mind) >>>>>> >>>>>> When the debugger is stopped at a breakpoint and the user wants to >>>>>> evaluate an expression, it should import the modules that are available >>>>>> at this location, so the user can write the expression from within the >>>>>> context of the breakpoint (e.g., without having to fully qualify each >>>>>> type, etc). >>>>>> >>>>>> I'm not sure how much current debuggers actually worry about that - (& >>>>>> this may differ from lldb to gdb to other things, of course). I'm pretty >>>>>> sure at least for GDB, a context in one CU is as good as one in another >>>>>> (at least without split-dwarf, type units, etc - with those sometimes >>>>>> things end up overly restrictive as the debugger won't search everything >>>>>> properly). >>>>>> >>>>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you >>>>>> run 'start' in gdb (which breaks at the beginning of main) you can still >>>>>> run 'p func()' to call the func, even though there's no declaration of >>>>>> it in a.cpp, etc. >>>>> >>>>> LLDB would definitely care (as it is using clang for the expression >>>>> evaluation supporting these kinds of features is really straightforward >>>>> there). By importing the modules (rather than searching through the >>>>> DWARF), the expression evaluator gains access to additional declarations >>>>> that are not there in the DWARF, such as templates. But since clang >>>>> modules are not namespaces, we can’t generally "import the world” as a >>>>> debugger would usually do. >>>>> >>>>> Sorry, not sure I understand this last sentence - could you explain >>>>> further? >>>>> >>>>> I imagine it would be rather limiting for the user if they could only use >>>>> expressions that are valid in this file from the file - it wouldn't be >>>>> uncommon to want to call a function from another module/file/etc to aid >>>>> in debugging. >>>> >>>> Usually LLDB’s expression evaluator works by creating a clang AST type out >>>> of a DWARF type and inserting it into its AST context. We could >>>> pre-polulate it with the definitions from the imported modules (with all >>>> sorts of benefits as described above), but that only works if no two >>>> modules conflict. If the declaration can’t be found in any imported >>>> module, LLDB would still import it from DWARF in the “traditional” fashion. >>>> >>>> But it would import it from DWARF in other TUs rather than use the module >>>> info just because the module wasn't directly referenced from this TU? That >>>> would seem strange to me. (you would lose debug info fidelity (by falling >>>> back to DWARF even though there are modules with the full fidelity info) >>>> unnecessarily, it sounds like) >>> >>> I think it’s reasonable to expect full fidelity for everything that is >>> available in the current TU, and having the normal DWARF-based debugging >>> capabilities for everything beyond that. But we can only ever provide full >>> fidelity if we have the list of imports for the current TU. >>>> >>>> Would it be reasonable to use the accelerator table/index to lookup the >>>> types, then if the type is in the module you could use the module rather >>>> than the DWARF stashed alongside it? (so the comdat'd split-dwarf skeleton >>>> CU for the module would have an index to tell you what names are inside >>>> it, but if you got an index hit you'd just look at the module instead of >>>> loading the split-dwarf debug info in the referenced file) >>> >>> I don’t think this approach would work for templates and enumerator values; >>> >>> Not sure why enumerator values are an issue - but templates (& all manner >>> of other things that don't make it into the index, unfortunately), sure. >>> >>> they aren’t in the accelerator tables to begin with. It would also be >>> slower if the declaration is available in a module. >>> >>> Though you're rapidly going to end up loading a lot of modules in (as you >>> go up & down a stack printing various things you'll cross into other TUs & >>> load more modules). >>> >>> For a standard DWARF consumer, it seems fine to just have a comdat'd >>> skeleton CU for a module without the need for other CUs to mention which >>> module CUs they reference (but I could be wrong here) & that's the design >>> we originally discussed. >>> >>> It would seem unfortunate to bloat every CU with a non-deduplicable list of >>> every module it references, but if that's necessary for a serialized AST >>> aware debugger, it might be fine to have it as an option (so long as it can >>> be turned off) & may still benefit from that list not being the >>> authoritative module reference, but a /very/ terse reference to it so all >>> the extra flags & stuff can be in the deduplicable comdat (& to keep it as >>> consistent as possible between the flag (on/off) codepaths for this extra >>> data). Maybe a FORM_block (?) of fixed-size hashes of all the modules >>> back-to-back, so it's as small as possible? >>> >>> But I wouldn't mind spending some more time discussing whether there's a >>> better way to keep these things streamlined/symmetric/the same between >>> modular and non-modular debug info. >> >> Sure! >> Now that we established that recording the list of imported modules for >> every CU is useful for an AST-based debugger, >> >> +Richard, just to see if he's got some ideas about how a debugger might >> efficiently use modules to support debugger scenarios and whether or not >> having a list of which modules are referenced from which contexts is >> valuable in that. >> >> It still concerns me that this would create something of a >> regression/oddity/difference between AST-based debug info (you wouldn't be >> able to handle expressions referencing things in other TUs) and non-AST >> based debug info (where I think the average user is used to not worrying >> about what headers are included in the current file they're debugging when >> they try to use a type or other identifier) > > If I understood you correctly, this is not actually the case. The list of > imported modules allows the AST-based debugger to import all the modules that > were imported by the CU that the current frame is in. This enables the user > to, e.g., type "p myVector->size()" even though std::vector<MyClass>::size() > was not used by the CU and is thus not available in DWARF. > If the user types “p foo” even though foo was not defined in any imported > module the debugger can — after failing to import foo via clang — still fall > back to looking up foo in DWARF and do what it always did. > > If you do the DWARF fallback then you'll get a pretty clear inconsistency > between templates and non-templates. If I have a function foo and a function > template foo_tmpl in one file, and I'm debugging in another file I'll be able > to call 'foo' (normal DWARF fallback/search) but not foo_tmpl (if I'm calling > a new instantiation of foo_tmpl - if I'm calling an existing instantiation > presumably the fallback would catch me). Seems unfortunate/confusing, perhaps.
Good point, but it my guess is that this wouldn’t be any worse than the “why can’t I print the size() of this vector!?”-situations we have at the moment. In certain situations (i.e., non-templates) the debugger could use the DWARF in the modules to print a message about which module to import. > > >> >> let’s talk about how to most efficiently represent this information. >> >> In the CU, using DW_TAG_imported_module appears to be the most appropriate >> choice, even though there is some room for confusion since C++ using >> declarations are also represented this way. Inside the >> DW_TAG_imported_module, we could use >> (1) a DW_AT_import that references the skeleton (I hope that is the right >> terminology) CU for the module, the idea being that the skeleton CU would >> contain all the details (flags, name, include dirs, hash, ...) and be in a >> comdat'ed section. >> >> I'd be concerned about overloading the terminology & confusing other >> debuggers - they might try to follow the DW_AT_import and be surprised that >> it doesn't refer to a DW_TAG_namespace tag. > > That’s a valid concern, and we probably should not be emitting this if we > have any evidence of, e.g., gdb crashes when encountering such a construct. > Then again, we would be using a DW_TAG_imported_module to express what it is > meant to express according to the DWARF spec (namely importing a module)... > but I admit that the tag also does have a very specific meaning for C++, > which we maybe shouldn’t overload. > > That's my concern, yes. > > The right thing here is probably to put aside my personal sense of aesthetics > and use a private _LLVM_ namespace for all new additions, and then attempt to > standardize an official DWARF version once we know what is really needed and > what isn't. > > I'd prefer this, yes. I mean the usual bar we use for language features is > that they're at least proposed for standardization before we adopt them in > clang - I wouldn't mind a similar bar here. If you want to bring up this use > of DW_TAG_imported_module with the DWARF committee & see if it sounds > reasonable (& test/inquire about GDB's behavior here). I started a thread on dwarf-discuss to this end (http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org <http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org>, the list archives are only visible to subscribers, but anyone can subscribe). -- adrian > > But extension tags seems like the conservatively correct option (not sure > what GDB does on tags it doesn't recognize - I forget if it warns or just > completely ignores them, hopefully the latter) >> >> (2) David’s suggestion of using a custom form that records the module hash >> directly is quite space-efficient, but it has the drawback of not being >> resilient against small changes to the imported module >> >> That's going to be true of the normal fission info here (the skeleton CU and >> the full CU in the .dwo file (or module) are associated by hash) - granted, >> in the "loading an AST" mode, you can ignore those hashes and rely on your >> custom attributes instead. >> >> , since clang’s module hash changes each time the module is being rebuilt. >> >> Clang's module hash only changes if the DWARF contents change - it doesn't >> use a timestamp or anything. It seems like actually you're going to want to >> fail to load even more aggressively - there are ways the AST might've >> changed that the debug info doesn't reflect but are still important (a type >> unreferenced in this module, but built into some other code that is not >> built with debug info changes - no hash changes because the debug info for >> that type is unreferenced here, but if you try to use it you could have an >> incompatible layout, etc). > > Agreed: If the module contents changed the debugger needs to display a big > flashing "here be dragons" warning. >> >> This is less of an issue if the hash is referring to a skeleton CU in the >> same file, which contains all the detailed information. >> >> Personally I’d prefer option 1 because mostly uses the existing mechanisms >> from DWARF. Here’s a visual guide to the options on the table: >> >> (1) >> foo.o (compiled with, let’s call it .. "-gmodule-imports”) >> ----- >> .debug_info: >> DW_TAG_compile_unit >> DW_AT_name(“foo.c”) >> DW_TAG_imported_module >> DW_AT_import(DW_FORM_ref_addr 0x123) // Could be a FORM_ref_sig8 >> 0x1234ABCDE as well. >> DW_TAG_imported_module >> DW_AT_import(...) >> >> .debug_info.dwo: >> // Skeleton CUs for modules imported by foo.o. >> 0x123: >> DW_TAG_compile_unit >> // Used by split-dwarf debuggers to find external type definitions. >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >> DW_AT_dwo_id(“0x1234ABCDE”) >> >> // Used by AST-based debuggers to import the module. >> DW_AT_name(“Foundation”) >> >> (side notes: the mixed indentation here makes it a bit hard to read this >> example, and I'd make sure /all/ the extended attributes (including the name >> here) use custom attribute names, not standard ones) > > Agreed. > >> >> DW_AT_LLVM_sysroot(“/“) >> DW_AT_LLVM_include_dir(“”) >> DW_AT_LLVM_macros(“-DNDEBUG”) >> >> (2) >> .debug_info.dwo: >> (As above.) >> >> .debug_info: >> DW_TAG_compile_unit >> DW_AT_name(“foo.c”) >> DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE 0xDEADBEEF 0x....) >> >> Now I’m curious what option (3) will look like; the one that we’ll actually >> implement! >> >> ;) >> > > -- adrian > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
