On Mon, Mar 16, 2015 at 1:24 PM, Adrian Prantl <[email protected]> wrote:
> > On Mar 10, 2015, at 12:10 PM, David Blaikie <[email protected]> wrote: > > > > On Tue, Mar 10, 2015 at 12:05 PM, Adrian Prantl <[email protected]> wrote: > >> >> On Mar 9, 2015, at 5:16 PM, David Blaikie <[email protected]> wrote: >> >> >> >> On Mon, Mar 9, 2015 at 5:07 PM, Adrian Prantl <[email protected]> wrote: >> >>> >>> On Mar 9, 2015, at 2:14 PM, David Blaikie <[email protected]> wrote: >>> >>> >>> >>> On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <[email protected]> wrote: >>> >>>> >>>> On Feb 24, 2015, at 3:06 PM, David Blaikie <[email protected]> wrote: >>>> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <[email protected]> >>>> wrote: >>>> >>>>> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected]> wrote: >>>>> >>>>> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected]> >>>>> wrote: >>>>> >>>>>> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected]> >>>>>> wrote: >>>>>> >>>>>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie <[email protected]> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > My vague recollection from the previous design discussions was >>>>>>>>> that these module references would be their own 'unit' COMDAT'd so >>>>>>>>> that we >>>>>>>>> don't end up with the duplication of every module reference in every >>>>>>>>> unit >>>>>>>>> linked together when linking debug info? >>>>>>>>> > >>>>>>>>> > I think in my brain I'd been picturing this module reference as >>>>>>>>> being an extended fission reference (fission skeleton CU + extra >>>>>>>>> fields for >>>>>>>>> users who want to load the Clang AST module directly and skip the >>>>>>>>> split CU). >>>>>>>>> >>>>>>>>> Apologies for letting this rest for so long. >>>>>>>>> >>>>>>>>> Your memory was of course correct and I didn’t follow up on this >>>>>>>>> because I had convinced myself that the fission reference would be >>>>>>>>> completely sufficient. Now that I’ve been thinking some more about >>>>>>>>> it, I >>>>>>>>> don’t think that it is sufficient in the LTO case. >>>>>>>>> >>>>>>>>> Here is the example from the >>>>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html >>>>>>>>> : >>>>>>>>> >>>>>>>>> foo.o: >>>>>>>>> .debug_info.dwo >>>>>>>>> DW_TAG_compile_unit >>>>>>>>> // For DWARF consumers >>>>>>>>> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm") >>>>>>>>> DW_AT_dwo_id ([unique AST signature]) >>>>>>>>> >>>>>>>>> .debug_info >>>>>>>>> DW_TAG_compile_unit >>>>>>>>> DW_TAG_variable >>>>>>>>> DW_AT_name "x" >>>>>>>>> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct]) >>>>>>>>> >>>>>>>>> In this example it is clear that foo.o imported MyModule because >>>>>>>>> its DWO skeleton is there in the same object file. But if we deal >>>>>>>>> with the >>>>>>>>> result of an LTO compilation we will end up with many compile units >>>>>>>>> in the >>>>>>>>> same .debug_info section, plus a bunch of skeleton compile units for >>>>>>>>> _all_ >>>>>>>>> imported modules in the entire project. We thus loose the ability to >>>>>>>>> determine which of the compile units imported which module. >>>>>>>>> >>>>>>>> >>>>>>>> Why would we need to know which CU imported which modules? (I can >>>>>>>> imagine some possible reasons, but wondering what you have in mind) >>>>>>>> >>>>>>>> >>>>>>>> When the debugger is stopped at a breakpoint and the user wants to >>>>>>>> evaluate an expression, it should import the modules that are >>>>>>>> available at >>>>>>>> this location, so the user can write the expression from within the >>>>>>>> context >>>>>>>> of the breakpoint (e.g., without having to fully qualify each type, >>>>>>>> etc). >>>>>>>> >>>>>>> >>>>>>> I'm not sure how much current debuggers actually worry about that - >>>>>>> (& this may differ from lldb to gdb to other things, of course). I'm >>>>>>> pretty >>>>>>> sure at least for GDB, a context in one CU is as good as one in another >>>>>>> (at >>>>>>> least without split-dwarf, type units, etc - with those sometimes things >>>>>>> end up overly restrictive as the debugger won't search everything >>>>>>> properly). >>>>>>> >>>>>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and >>>>>>> you run 'start' in gdb (which breaks at the beginning of main) you can >>>>>>> still run 'p func()' to call the func, even though there's no >>>>>>> declaration >>>>>>> of it in a.cpp, etc. >>>>>>> >>>>>>> >>>>>>> LLDB would definitely care (as it is using clang for the expression >>>>>>> evaluation supporting these kinds of features is really straightforward >>>>>>> there). By importing the modules (rather than searching through the >>>>>>> DWARF), >>>>>>> the expression evaluator gains access to additional declarations that >>>>>>> are >>>>>>> not there in the DWARF, such as templates. But since clang modules are >>>>>>> not >>>>>>> namespaces, we can’t generally "import the world” as a debugger would >>>>>>> usually do. >>>>>>> >>>>>> >>>>>> Sorry, not sure I understand this last sentence - could you explain >>>>>> further? >>>>>> >>>>>> I imagine it would be rather limiting for the user if they could only >>>>>> use expressions that are valid in this file from the file - it wouldn't >>>>>> be >>>>>> uncommon to want to call a function from another module/file/etc to aid >>>>>> in >>>>>> debugging. >>>>>> >>>>>> >>>>>> Usually LLDB’s expression evaluator works by creating a clang AST >>>>>> type out of a DWARF type and inserting it into its AST context. We could >>>>>> pre-polulate it with the definitions from the imported modules (with all >>>>>> sorts of benefits as described above), but that only works if no two >>>>>> modules conflict. If the declaration can’t be found in any imported >>>>>> module, >>>>>> LLDB would still import it from DWARF in the “traditional” fashion. >>>>>> >>>>> >>>>> But it would import it from DWARF in other TUs rather than use the >>>>> module info just because the module wasn't directly referenced from this >>>>> TU? That would seem strange to me. (you would lose debug info fidelity (by >>>>> falling back to DWARF even though there are modules with the full fidelity >>>>> info) unnecessarily, it sounds like) >>>>> >>>>> >>>>> I think it’s reasonable to expect full fidelity for everything that is >>>>> available in the current TU, and having the normal DWARF-based debugging >>>>> capabilities for everything beyond that. But we can only ever provide full >>>>> fidelity if we have the list of imports for the current TU. >>>>> >>>>> >>>>> Would it be reasonable to use the accelerator table/index to lookup >>>>> the types, then if the type is in the module you could use the module >>>>> rather than the DWARF stashed alongside it? (so the comdat'd split-dwarf >>>>> skeleton CU for the module would have an index to tell you what names are >>>>> inside it, but if you got an index hit you'd just look at the module >>>>> instead of loading the split-dwarf debug info in the referenced file) >>>>> >>>>> >>>>> I don’t think this approach would work for templates and enumerator >>>>> values; >>>>> >>>> >>>> Not sure why enumerator values are an issue - but templates (& all >>>> manner of other things that don't make it into the index, unfortunately), >>>> sure. >>>> >>>> >>>>> they aren’t in the accelerator tables to begin with. It would also be >>>>> slower if the declaration is available in a module. >>>>> >>>> >>>> Though you're rapidly going to end up loading a lot of modules in (as >>>> you go up & down a stack printing various things you'll cross into other >>>> TUs & load more modules). >>>> >>>> For a standard DWARF consumer, it seems fine to just have a comdat'd >>>> skeleton CU for a module without the need for other CUs to mention which >>>> module CUs they reference (but I could be wrong here) & that's the design >>>> we originally discussed. >>>> >>>> It would seem unfortunate to bloat every CU with a non-deduplicable >>>> list of every module it references, but if that's necessary for a >>>> serialized AST aware debugger, it might be fine to have it as an option (so >>>> long as it can be turned off) & may still benefit from that list not being >>>> the authoritative module reference, but a /very/ terse reference to it so >>>> all the extra flags & stuff can be in the deduplicable comdat (& to keep it >>>> as consistent as possible between the flag (on/off) codepaths for this >>>> extra data). Maybe a FORM_block (?) of fixed-size hashes of all the modules >>>> back-to-back, so it's as small as possible? >>>> >>>> But I wouldn't mind spending some more time discussing whether there's >>>> a better way to keep these things streamlined/symmetric/the same between >>>> modular and non-modular debug info. >>>> >>>> Sure! >>>> Now that we established that recording the list of imported modules for >>>> every CU is useful for an AST-based debugger, >>>> >>> >>> +Richard, just to see if he's got some ideas about how a debugger might >>> efficiently use modules to support debugger scenarios and whether or not >>> having a list of which modules are referenced from which contexts is >>> valuable in that. >>> >>> It still concerns me that this would create something of a >>> regression/oddity/difference between AST-based debug info (you wouldn't be >>> able to handle expressions referencing things in other TUs) and non-AST >>> based debug info (where I think the average user is used to not worrying >>> about what headers are included in the current file they're debugging when >>> they try to use a type or other identifier) >>> >>> >>> If I understood you correctly, this is not actually the case. The list >>> of imported modules allows the AST-based debugger to import all the modules >>> that were imported by the CU that the current frame is in. This enables the >>> user to, e.g., type "p myVector->size()" even though >>> std::vector<MyClass>::size() was not used by the CU and is thus not >>> available in DWARF. >>> >> If the user types “p foo” even though foo was not defined in any imported >>> module the debugger can — after failing to import foo via clang — still >>> fall back to looking up foo in DWARF and do what it always did. >>> >> >> If you do the DWARF fallback then you'll get a pretty clear inconsistency >> between templates and non-templates. If I have a function foo and a >> function template foo_tmpl in one file, and I'm debugging in another file >> I'll be able to call 'foo' (normal DWARF fallback/search) but not foo_tmpl >> (if I'm calling a new instantiation of foo_tmpl - if I'm calling an >> existing instantiation presumably the fallback would catch me). Seems >> unfortunate/confusing, perhaps. >> >> >> Good point, but it my guess is that this wouldn’t be any worse than the >> “why can’t I print the size() of this vector!?”-situations we have at the >> moment. >> > > Sure - it's strictly better in the sense that there are strictly more > expressions that can be evaluated, but seems incomplete is my point, and > maybe worth considering alternative designs that might be more-betterer. > > >> In certain situations (i.e., non-templates) the debugger could use the >> DWARF in the modules to print a message about which module to import. >> >> >> >>> >>> >>> >>>> let’s talk about how to most efficiently represent this information. >>>> >>>> In the CU, using DW_TAG_imported_module appears to be the most >>>> appropriate choice, even though there is some room for confusion since C++ >>>> using declarations are also represented this way. Inside the >>>> DW_TAG_imported_module, we could use >>>> (1) a DW_AT_import that references the skeleton (I hope that is the >>>> right terminology) CU for the module, the idea being that the skeleton CU >>>> would contain all the details (flags, name, include dirs, hash, ...) and be >>>> in a comdat'ed section. >>>> >>> >>> I'd be concerned about overloading the terminology & confusing other >>> debuggers - they might try to follow the DW_AT_import and be surprised that >>> it doesn't refer to a DW_TAG_namespace tag. >>> >>> >>> That’s a valid concern, and we probably should not be emitting this if >>> we have any evidence of, e.g., gdb crashes when encountering such a >>> construct. Then again, we would be using a DW_TAG_imported_module to >>> express what it is meant to express according to the DWARF spec (namely >>> importing a module)... but I admit that the tag also does have a very >>> specific meaning for C++, which we maybe shouldn’t overload. >>> >> >> That's my concern, yes. >> >> >>> The right thing here is probably to put aside my personal sense of >>> aesthetics and use a private _LLVM_ namespace for all new additions, and >>> then attempt to standardize an official DWARF version once we know what is >>> really needed and what isn't. >>> >> >> I'd prefer this, yes. I mean the usual bar we use for language features >> is that they're at least proposed for standardization before we adopt them >> in clang - I wouldn't mind a similar bar here. If you want to bring up this >> use of DW_TAG_imported_module with the DWARF committee & see if it sounds >> reasonable (& test/inquire about GDB's behavior here). >> >> >> I started a thread on dwarf-discuss to this end ( >> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org, the >> list archives are only visible to subscribers, but anyone can subscribe). >> > > Cool cool > > > To paraphrase the replies that my question solicited: We are, perhaps not > very surprising, encouraged to follow the standard and use a > DW_TAG_imported_module that references a DW_TAG_module. If we, however, > choose to describe the module by using a skeleton DW_TAG_compile_unit, we > should be careful (my own words) about using a DW_TAG_imported_module until > that use is sanctioned by the standard. > > I see two possible ways to proceed in this spirit: > a) Rename the module skeleton DW_TAG_compile_units to DW_TAG_module, but > keep all the comdat/split dwarf goodness from the original proposal [1]. My > understanding is that even though we are making clever use of the split > DWARF features, GDB would still need to be taught to follow references to > external files, > Not sure what you're referring to here, perhaps a misunderstanding about how split DWARF works. To the best of my knowledge, what we've talked about for module DWARF debug info is actually just split-dwarf, no extra work required by DWARF consumers*. * It's, admittedly, a little tricksy to include type unit references in an object file that doesn't include the type unit at all - relying on it being linked into the final executable. But DWARF doesn't really talk about objects versus executables, etc - so, so long as the type unit is there in the end, it's valid DWARF no matter how it got there (& should work fine for existing consumers - they can't tell if the type unit was in every object file that referenced the type or not once it's been linked and deduplicated). > so having it recognize a new tag in this context doesn’t appear to be much > additional effort (but others may provide more insight here). > Beyond the above (that using a new tag would mean this would go from 'free' to 'not free' for GDB) having a new top level tag is pretty substantial (we only have two at the moment, and with our talk of modules being a "bag of dwarf" might go back to having one top level tag? (it's not clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag, I don't think it is?) > b) Emit an LLVM-specific DW_AT_LLVM_import attribute inside the > DW_TAG_imported_module (or vice versa) that refers to the skeleton > DW_TAG_compile_unit. > > I think that option (a) is a bit more elegant and it is bending the dwarf > standard not quite as much and will make the dwarf output a bit more > readable. > > -- adrian > > [1] Module debugging proposal for reference: > http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html > > > - David > > >> >> -- adrian >> >> >> But extension tags seems like the conservatively correct option (not sure >> what GDB does on tags it doesn't recognize - I forget if it warns or just >> completely ignores them, hopefully the latter) >> >>> >>> >>>> (2) David’s suggestion of using a custom form that records the module >>>> hash directly is quite space-efficient, but it has the drawback of not >>>> being resilient against small changes to the imported module >>>> >>> >>> That's going to be true of the normal fission info here (the skeleton CU >>> and the full CU in the .dwo file (or module) are associated by hash) - >>> granted, in the "loading an AST" mode, you can ignore those hashes and rely >>> on your custom attributes instead. >>> >>> >>>> , since clang’s module hash changes each time the module is being >>>> rebuilt. >>>> >>> >>> Clang's module hash only changes if the DWARF contents change - it >>> doesn't use a timestamp or anything. It seems like actually you're going to >>> want to fail to load even more aggressively - there are ways the AST >>> might've changed that the debug info doesn't reflect but are still >>> important (a type unreferenced in this module, but built into some other >>> code that is not built with debug info changes - no hash changes because >>> the debug info for that type is unreferenced here, but if you try to use it >>> you could have an incompatible layout, etc). >>> >>> >>> Agreed: If the module contents changed the debugger needs to display a >>> big flashing "here be dragons" warning. >>> >>> >>> >>>> This is less of an issue if the hash is referring to a skeleton CU in >>>> the same file, which contains all the detailed information. >>>> >>>> Personally I’d prefer option 1 because mostly uses the existing >>>> mechanisms from DWARF. Here’s a visual guide to the options on the table: >>>> >>>> (1) >>>> foo.o (compiled with, let’s call it .. "-gmodule-imports”) >>>> ----- >>>> .debug_info: >>>> DW_TAG_compile_unit >>>> DW_AT_name(“foo.c”) >>>> DW_TAG_imported_module >>>> DW_AT_import(DW_FORM_ref_addr 0x123) // Could be a FORM_ref_sig8 >>>> 0x1234ABCDE as well. >>>> DW_TAG_imported_module >>>> DW_AT_import(...) >>>> >>>> .debug_info.dwo: >>>> // Skeleton CUs for modules imported by foo.o. >>>> 0x123: >>>> DW_TAG_compile_unit >>>> // Used by split-dwarf debuggers to find external type definitions. >>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/ >>>> Foundation.pcm”) >>>> DW_AT_dwo_id(“0x1234ABCDE”) >>>> >>>> // Used by AST-based debuggers to import the module. >>>> DW_AT_name(“Foundation”) >>>> >>> >>> (side notes: the mixed indentation here makes it a bit hard to read this >>> example, and I'd make sure /all/ the extended attributes (including the >>> name here) use custom attribute names, not standard ones) >>> >>> >>> Agreed. >>> >>> >>> >>>> DW_AT_LLVM_sysroot(“/“) >>>> DW_AT_LLVM_include_dir(“”) >>>> DW_AT_LLVM_macros(“-DNDEBUG”) >>>> >>>> (2) >>>> .debug_info.dwo: >>>> (As above.) >>>> >>>> .debug_info: >>>> DW_TAG_compile_unit >>>> DW_AT_name(“foo.c”) >>>> DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE 0xDEADBEEF >>>> 0x....) >>>> >>>> Now I’m curious what option (3) will look like; the one that we’ll >>>> actually implement! >>>> >>> >>> ;) >>> >>> >>> >>> -- adrian >>> >>> >> >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
