Re: [PATCH] Have clang list the imported modules in the debug info

Adrian Prantl Mon, 16 Mar 2015 17:16:00 -0700

Thanks for the explanation David, I missed that it is entirely the linker's (or 
some dwarf post-processor's) responsibility to find the module files and link 
in the debug info from the .pcm files, so debugger doesn’t notice a difference.


> On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul 
> <[email protected] 
> <mailto:[email protected]>> wrote:
> Beyond the above (that using a new tag would mean this would go from 'free' 
> to 'not free' for GDB) having a new top level tag is pretty substantial (we 
> only have two at the moment, and with our talk of modules being a "bag of 
> dwarf" might go back to having one top level tag? (it's not clear to me from 
> DWARF4 whether DW_TAG_module is currently a top-level tag, I don't think it 
> is?)
> 
> 
> The .debug_info section contains one or more compilation units, partial 
> units, or in DWARF 5, type units.  DW_TAG_module isn't a unit, if you want it 
> to be handled independently then it would need to be wrapped in a 
> DW_TAG_partial_unit.  You would probably then use DW_TAG_imported_unit to 
> refer to it, rather than DW_TAG_imported_module. <>
> 
> This makes a fair bit of sense - though the terminology's never going to 
> quite line up with modules, I suspect, and this would still require modifying 
> existing consumers (well, GDB) that can handle split-dwarf today, I suspect 
> (not sure how it'd handle partial_unit - maybe that does work? - and still 
> don't know how existing consumers would handle imported_unit either - could 
> be worth some testing, as it sounds sort of right out of several less right 
> options).

The standard specifically recommends DW_TAG_partial_unit for #include 
directives so that sounds like a comparatively good match. Partial units were 
already introduced in DWARF3 so maybe GDB supports them. But even if it doesn’t 
this shouldn’t necessarily be a problem (unless it crashes). The 
DW_TAG_imported_unit since this is primarily useful for AST-based debuggers 
that know how to import a module before expression evaluation.

-- adrian

> - David 
> (Sorry about the top-quoting but Outlook can't handle HTML editing properly.)
> 

Unfortunately the gmail client somewhat forces a thread to HTML — gmail 
quotation markers mysteriously disappear in the plain text version displayed by 
other mail clients.

> --paulr
> 
>  
> 
> From: David Blaikie [mailto:[email protected] <mailto:[email protected]>] 
> Sent: Monday, March 16, 2015 1:36 PM
> To: Adrian Prantl
> Cc: Richard Smith; Eric Christopher; llvm cfe; Greg Clayton; Robinson, Paul
> Subject: Re: [PATCH] Have clang list the imported modules in the debug info
> 
>  
> 
>  
> 
>  
> 
> On Mon, Mar 16, 2015 at 1:24 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  
> 
> On Mar 10, 2015, at 12:10 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  
> 
>  
> 
>  
> 
> On Tue, Mar 10, 2015 at 12:05 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  
> 
> On Mar 9, 2015, at 5:16 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  
> 
>  
> 
>  
> 
> On Mon, Mar 9, 2015 at 5:07 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  
> 
> On Mar 9, 2015, at 2:14 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  
> 
>  
> 
>  
> 
> On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  
> 
> On Feb 24, 2015, at 3:06 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Feb 23, 2015, at 2:59 PM, David Blaikie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
> > On Jan 20, 2015, at 11:07 AM, David Blaikie <[email protected] 
> > <mailto:[email protected]>> wrote:
> >
> > My vague recollection from the previous design discussions was that these 
> > module references would be their own 'unit' COMDAT'd so that we don't end 
> > up with the duplication of every module reference in every unit linked 
> > together when linking debug info?
> >
> > I think in my brain I'd been picturing this module reference as being an 
> > extended fission reference (fission skeleton CU + extra fields for users 
> > who want to load the Clang AST module directly and skip the split CU).
> 
> Apologies for letting this rest for so long.
> 
> Your memory was of course correct and I didn’t follow up on this because I 
> had convinced myself that the fission reference would be completely 
> sufficient. Now that I’ve been thinking some more about it, I don’t think 
> that it is sufficient in the LTO case.
> 
> Here is the example from 
> thehttp://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html 
> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>:
> 
> foo.o:
> .debug_info.dwo
>   DW_TAG_compile_unit
>      // For DWARF consumers
>      DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>      DW_AT_dwo_id   ([unique AST signature])
> 
> .debug_info
>   DW_TAG_compile_unit
>     DW_TAG_variable
>       DW_AT_name "x"
>       DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
> 
> In this example it is clear that foo.o imported MyModule because its DWO 
> skeleton is there in the same object file. But if we deal with the result of 
> an LTO compilation we will end up with many compile units in the same 
> .debug_info section, plus a bunch of skeleton compile units for _all_ 
> imported modules in the entire project. We thus loose the ability to 
> determine which of the compile units imported which module.
> 
> 
> Why would we need to know which CU imported which modules? (I can imagine 
> some possible reasons, but wondering what you have in mind)
> 
>  
> 
> When the debugger is stopped at a breakpoint and the user wants to evaluate 
> an expression, it should import the modules that are available at this 
> location, so the user can write the expression from within the context of the 
> breakpoint (e.g., without having to fully qualify each type, etc).
> 
> 
> I'm not sure how much current debuggers actually worry about that - (& this 
> may differ from lldb to gdb to other things, of course). I'm pretty sure at 
> least for GDB, a context in one CU is as good as one in another (at least 
> without split-dwarf, type units, etc - with those sometimes things end up 
> overly restrictive as the debugger won't search everything properly).
> 
> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you run 
> 'start' in gdb (which breaks at the beginning of main) you can still run 'p 
> func()' to call the func, even though there's no declaration of it in a.cpp, 
> etc.
> 
>  
> 
> LLDB would definitely care (as it is using clang for the expression 
> evaluation supporting these kinds of features is really straightforward 
> there). By importing the modules (rather than searching through the DWARF), 
> the expression evaluator gains access to additional declarations that are not 
> there in the DWARF, such as templates. But since clang modules are not 
> namespaces, we can’t generally "import the world” as a debugger would usually 
> do.
> 
> 
> Sorry, not sure I understand this last sentence - could you explain further?
> 
> I imagine it would be rather limiting for the user if they could only use 
> expressions that are valid in this file from the file - it wouldn't be 
> uncommon to want to call a function from another module/file/etc to aid in 
> debugging.
> 
>  
> 
> Usually LLDB’s expression evaluator works by creating a clang AST type out of 
> a DWARF type and inserting it into its AST context. We could pre-polulate it 
> with the definitions from the imported modules (with all sorts of benefits as 
> described above), but that only works if no two modules conflict. If the 
> declaration can’t be found in any imported module, LLDB would still import it 
> from DWARF in the “traditional” fashion.
> 
> 
> But it would import it from DWARF in other TUs rather than use the module 
> info just because the module wasn't directly referenced from this TU? That 
> would seem strange to me. (you would lose debug info fidelity (by falling 
> back to DWARF even though there are modules with the full fidelity info) 
> unnecessarily, it sounds like)
> 
>  
> 
> I think it’s reasonable to expect full fidelity for everything that is 
> available in the current TU, and having the normal DWARF-based debugging 
> capabilities for everything beyond that. But we can only ever provide full 
> fidelity if we have the list of imports for the current TU.
> 
> 
> 
> Would it be reasonable to use the accelerator table/index to lookup the 
> types, then if the type is in the module you could use the module rather than 
> the DWARF stashed alongside it? (so the comdat'd split-dwarf skeleton CU for 
> the module would have an index to tell you what names are inside it, but if 
> you got an index hit you'd just look at the module instead of loading the 
> split-dwarf debug info in the referenced file)
> 
>  
> 
> I don’t think this approach would work for templates and enumerator values;
> 
> 
> Not sure why enumerator values are an issue - but templates (& all manner of 
> other things that don't make it into the index, unfortunately), sure.
>  
> 
> they aren’t in the accelerator tables to begin with. It would also be slower 
> if the declaration is available in a module.
> 
> 
> Though you're rapidly going to end up loading a lot of modules in (as you go 
> up & down a stack printing various things you'll cross into other TUs & load 
> more modules).
> 
> For a standard DWARF consumer, it seems fine to just have a comdat'd skeleton 
> CU for a module without the need for other CUs to mention which module CUs 
> they reference (but I could be wrong here) & that's the design we originally 
> discussed.
> 
> It would seem unfortunate to bloat every CU with a non-deduplicable list of 
> every module it references, but if that's necessary for a serialized AST 
> aware debugger, it might be fine to have it as an option (so long as it can 
> be turned off) & may still benefit from that list not being the authoritative 
> module reference, but a /very/ terse reference to it so all the extra flags & 
> stuff can be in the deduplicable comdat (& to keep it as consistent as 
> possible between the flag (on/off) codepaths for this extra data). Maybe a 
> FORM_block (?) of fixed-size hashes of all the modules back-to-back, so it's 
> as small as possible?
> 
> But I wouldn't mind spending some more time discussing whether there's a 
> better way to keep these things streamlined/symmetric/the same between 
> modular and non-modular debug info.
> 
> Sure!
> 
> Now that we established that recording the list of imported modules for every 
> CU is useful for an AST-based debugger,
> 
> 
> +Richard, just to see if he's got some ideas about how a debugger might 
> efficiently use modules to support debugger scenarios and whether or not 
> having a list of which modules are referenced from which contexts is valuable 
> in that.
> 
> It still concerns me that this would create something of a 
> regression/oddity/difference between AST-based debug info (you wouldn't be 
> able to handle expressions referencing things in other TUs) and non-AST based 
> debug info (where I think the average user is used to not worrying about what 
> headers are included in the current file they're debugging when they try to 
> use a type or other identifier)
> 
>  
> 
> If I understood you correctly, this is not actually the case. The list of 
> imported modules allows the AST-based debugger to import all the modules that 
> were imported by the CU that the current frame is in. This enables the user 
> to, e.g., type "p myVector->size()" even though std::vector<MyClass>::size() 
> was not used by the CU and is thus not available in DWARF. 
> 
> If the user types “p foo” even though foo was not defined in any imported 
> module the debugger can — after failing to import foo via clang — still fall 
> back to looking up foo in DWARF and do what it always did.
> 
> 
> If you do the DWARF fallback then you'll get a pretty clear inconsistency 
> between templates and non-templates. If I have a function foo and a function 
> template foo_tmpl in one file, and I'm debugging in another file I'll be able 
> to call 'foo' (normal DWARF fallback/search) but not foo_tmpl (if I'm calling 
> a new instantiation of foo_tmpl - if I'm calling an existing instantiation 
> presumably the fallback would catch me). Seems unfortunate/confusing, perhaps.
> 
>  
> 
> Good point, but it my guess is that this wouldn’t be any worse than the “why 
> can’t I print the size() of this vector!?”-situations we have at the moment.
> 
> 
> Sure - it's strictly better in the sense that there are strictly more 
> expressions that can be evaluated, but seems incomplete is my point, and 
> maybe worth considering alternative designs that might be more-betterer.
>  
> 
> In certain situations (i.e., non-templates) the debugger could use the DWARF 
> in the modules to print a message about which module to import.
> 
> 
>  
> 
> 
> 
> 
>  
> 
> let’s talk about how to most efficiently represent this information.
> 
>  
> 
> In the CU, using DW_TAG_imported_module appears to be the most appropriate 
> choice, even though there is some room for confusion since C++ using 
> declarations are also represented this way. Inside the 
> DW_TAG_imported_module, we could use 
> 
> (1) a DW_AT_import that references the skeleton (I hope that is the right 
> terminology) CU for the module, the idea being that the skeleton CU would 
> contain all the details (flags, name, include dirs, hash, ...) and be in a 
> comdat'ed section.
> 
> 
> I'd be concerned about overloading the terminology & confusing other 
> debuggers - they might try to follow the DW_AT_import and be surprised that 
> it doesn't refer to a DW_TAG_namespace tag.
> 
>  
> 
> That’s a valid concern, and we probably should not be emitting this if we 
> have any evidence of, e.g., gdb crashes when encountering such a construct. 
> Then again, we would be using a DW_TAG_imported_module to express what it is 
> meant to express according to the DWARF spec (namely importing a module)... 
> but I admit that the tag also does have a very specific meaning for C++, 
> which we maybe shouldn’t overload.
> 
> 
> That's my concern, yes.
>  
> 
> The right thing here is probably to put aside my personal sense of aesthetics 
> and use a private _LLVM_ namespace for all new additions, and then attempt to 
> standardize an official DWARF version once we know what is really needed and 
> what isn't.
> 
> 
> I'd prefer this, yes. I mean the usual bar we use for language features is 
> that they're at least proposed for standardization before we adopt them in 
> clang - I wouldn't mind a similar bar here. If you want to bring up this use 
> of DW_TAG_imported_module with the DWARF committee & see if it sounds 
> reasonable (& test/inquire about GDB's behavior here).
> 
>  
> 
> I started a thread on dwarf-discuss to this end 
> (http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org 
> <http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org>, the list 
> archives are only visible to subscribers, but anyone can subscribe).
> 
> 
> Cool cool
> 
>  
> 
> To paraphrase the replies that my question solicited: We are, perhaps not 
> very surprising, encouraged to follow the standard and use a 
> DW_TAG_imported_module that references a DW_TAG_module. If we, however, 
> choose to describe the module by using a skeleton DW_TAG_compile_unit, we 
> should be careful (my own words) about using a DW_TAG_imported_module until 
> that use is sanctioned by the standard.
> 
>  
> 
> I see two possible ways to proceed in this spirit:
> 
> a) Rename the module skeleton DW_TAG_compile_units to DW_TAG_module, but keep 
> all the comdat/split dwarf goodness from the original proposal [1]. My 
> understanding is that even though we are making clever use of the split DWARF 
> features, GDB would still need to be taught to follow references to external 
> files,
> 
> 
> Not sure what you're referring to here, perhaps a misunderstanding about how 
> split DWARF works.
> 
> To the best of my knowledge, what we've talked about for module DWARF debug 
> info is actually just split-dwarf, no extra work required by DWARF consumers*.
> 
> * It's, admittedly, a little tricksy to include type unit references in an 
> object file that doesn't include the type unit at all - relying on it being 
> linked into the final executable. But DWARF doesn't really talk about objects 
> versus executables, etc - so, so long as the type unit is there in the end, 
> it's valid DWARF no matter how it got there (& should work fine for existing 
> consumers - they can't tell if the type unit was in every object file that 
> referenced the type or not once it's been linked and deduplicated).
>  
> 
> so having it recognize a new tag in this context doesn’t appear to be much 
> additional effort (but others may provide more insight here).
> 
> 
> Beyond the above (that using a new tag would mean this would go from 'free' 
> to 'not free' for GDB) having a new top level tag is pretty substantial (we 
> only have two at the moment, and with our talk of modules being a "bag of 
> dwarf" might go back to having one top level tag? (it's not clear to me from 
> DWARF4 whether DW_TAG_module is currently a top-level tag, I don't think it 
> is?)
>  
> 
> b) Emit an LLVM-specific DW_AT_LLVM_import attribute inside the 
> DW_TAG_imported_module (or vice versa) that refers to the skeleton 
> DW_TAG_compile_unit.
> 
>  
> 
> I think that option (a) is a bit more elegant and it is bending the dwarf 
> standard not quite as much and will make the dwarf output a bit more readable.
> 
>  
> 
> -- adrian
> 
>  
> 
> [1] Module debugging proposal for reference: 
> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html 
> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>
> 
> 
> 
> - David
>  
> 
>  
> 
> -- adrian
> 
> 
> 
> 
> 
> But extension tags seems like the conservatively correct option (not sure 
> what GDB does on tags it doesn't recognize - I forget if it warns or just 
> completely ignores them, hopefully the latter)
> 
>  
> 
> (2) David’s suggestion of using a custom form that records the module hash 
> directly is quite space-efficient, but it has the drawback of not being 
> resilient against small changes to the imported module
> 
> 
> That's going to be true of the normal fission info here (the skeleton CU and 
> the full CU in the .dwo file (or module) are associated by hash) - granted, 
> in the "loading an AST" mode, you can ignore those hashes and rely on your 
> custom attributes instead.
>  
> 
> , since clang’s module hash changes each time the module is being rebuilt.
> 
> 
> Clang's module hash only changes if the DWARF contents change - it doesn't 
> use a timestamp or anything. It seems like actually you're going to want to 
> fail to load even more aggressively - there are ways the AST might've changed 
> that the debug info doesn't reflect but are still important (a type 
> unreferenced in this module, but built into some other code that is not built 
> with debug info changes - no hash changes because the debug info for that 
> type is unreferenced here, but if you try to use it you could have an 
> incompatible layout, etc).
> 
>  
> 
> Agreed: If the module contents changed the debugger needs to display a big 
> flashing "here be dragons" warning.
> 
> 
>  
> 
> This is less of an issue if the hash is referring to a skeleton CU in the 
> same file, which contains all the detailed information.
> 
>  
> 
> Personally I’d prefer option 1 because mostly uses the existing mechanisms 
> from DWARF. Here’s a visual guide to the options on the table:
> 
>  
> 
> (1)
> 
> foo.o (compiled with, let’s call it .. "-gmodule-imports”)
> 
> -----
> 
> .debug_info:
> 
>   DW_TAG_compile_unit
> 
>     DW_AT_name(“foo.c”)
> 
>     DW_TAG_imported_module
> 
>       DW_AT_import(DW_FORM_ref_addr 0x123)  // Could be a FORM_ref_sig8 
> 0x1234ABCDE as well.
> 
>     DW_TAG_imported_module
> 
>       DW_AT_import(...)
> 
>  
> 
> .debug_info.dwo:
> 
> // Skeleton CUs for modules imported by foo.o.
> 
> 0x123:
> 
>   DW_TAG_compile_unit
> 
>     // Used by split-dwarf debuggers to find external type definitions.
> 
>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> 
>     DW_AT_dwo_id(“0x1234ABCDE”)
> 
>  
> 
>     // Used by AST-based debuggers to import the module.
> 
>     DW_AT_name(“Foundation”)
> 
> 
> (side notes: the mixed indentation here makes it a bit hard to read this 
> example, and I'd make sure /all/ the extended attributes (including the name 
> here) use custom attribute names, not standard ones)
> 
>  
> 
> Agreed.
> 
> 
> 
> 
>  
> 
>     DW_AT_LLVM_sysroot(“/“)
> 
>     DW_AT_LLVM_include_dir(“”)
> 
>     DW_AT_LLVM_macros(“-DNDEBUG”)
> 
>  
> 
> (2)
> 
> .debug_info.dwo:
> 
> (As above.)
> 
>  
> 
> .debug_info:
> 
>   DW_TAG_compile_unit
> 
>     DW_AT_name(“foo.c”)
> 
>     DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE 0xDEADBEEF 0x....)
> 
>  
> 
> Now I’m curious what option (3) will look like; the one that we’ll actually 
> implement!
> 
> 
> ;)
>  
> 
>  
> 
> -- adrian
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Have clang list the imported modules in the debug info

Reply via email to