Re: [PATCH] Have clang list the imported modules in the debug info

Adrian Prantl Tue, 10 Mar 2015 12:13:27 -0700

> On Mar 9, 2015, at 5:16 PM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Mon, Mar 9, 2015 at 5:07 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Mar 9, 2015, at 2:14 PM, David Blaikie <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> On Feb 24, 2015, at 3:06 PM, David Blaikie <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie <[email protected] 
>>>>>>> > <mailto:[email protected]>> wrote:
>>>>>>> >
>>>>>>> > My vague recollection from the previous design discussions was that 
>>>>>>> > these module references would be their own 'unit' COMDAT'd so that we 
>>>>>>> > don't end up with the duplication of every module reference in every 
>>>>>>> > unit linked together when linking debug info?
>>>>>>> >
>>>>>>> > I think in my brain I'd been picturing this module reference as being 
>>>>>>> > an extended fission reference (fission skeleton CU + extra fields for 
>>>>>>> > users who want to load the Clang AST module directly and skip the 
>>>>>>> > split CU).
>>>>>>> 
>>>>>>> Apologies for letting this rest for so long.
>>>>>>> 
>>>>>>> Your memory was of course correct and I didn’t follow up on this 
>>>>>>> because I had convinced myself that the fission reference would be 
>>>>>>> completely sufficient. Now that I’ve been thinking some more about it, 
>>>>>>> I don’t think that it is sufficient in the LTO case.
>>>>>>> 
>>>>>>> Here is the example from the 
>>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html 
>>>>>>> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>:
>>>>>>> 
>>>>>>> foo.o:
>>>>>>> .debug_info.dwo
>>>>>>>   DW_TAG_compile_unit
>>>>>>>      // For DWARF consumers
>>>>>>>      DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>>>>>>      DW_AT_dwo_id   ([unique AST signature])
>>>>>>> 
>>>>>>> .debug_info
>>>>>>>   DW_TAG_compile_unit
>>>>>>>     DW_TAG_variable
>>>>>>>       DW_AT_name "x"
>>>>>>>       DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>>>>>>> 
>>>>>>> In this example it is clear that foo.o imported MyModule because its 
>>>>>>> DWO skeleton is there in the same object file. But if we deal with the 
>>>>>>> result of an LTO compilation we will end up with many compile units in 
>>>>>>> the same .debug_info section, plus a bunch of skeleton compile units 
>>>>>>> for _all_ imported modules in the entire project. We thus loose the 
>>>>>>> ability to determine which of the compile units imported which module.
>>>>>>> 
>>>>>>> Why would we need to know which CU imported which modules? (I can 
>>>>>>> imagine some possible reasons, but wondering what you have in mind)
>>>>>> 
>>>>>> When the debugger is stopped at a breakpoint and the user wants to 
>>>>>> evaluate an expression, it should import the modules that are available 
>>>>>> at this location, so the user can write the expression from within the 
>>>>>> context of the breakpoint (e.g., without having to fully qualify each 
>>>>>> type, etc).
>>>>>> 
>>>>>> I'm not sure how much current debuggers actually worry about that - (& 
>>>>>> this may differ from lldb to gdb to other things, of course). I'm pretty 
>>>>>> sure at least for GDB, a context in one CU is as good as one in another 
>>>>>> (at least without split-dwarf, type units, etc - with those sometimes 
>>>>>> things end up overly restrictive as the debugger won't search everything 
>>>>>> properly).
>>>>>> 
>>>>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you 
>>>>>> run 'start' in gdb (which breaks at the beginning of main) you can still 
>>>>>> run 'p func()' to call the func, even though there's no declaration of 
>>>>>> it in a.cpp, etc.
>>>>> 
>>>>> LLDB would definitely care (as it is using clang for the expression 
>>>>> evaluation supporting these kinds of features is really straightforward 
>>>>> there). By importing the modules (rather than searching through the 
>>>>> DWARF), the expression evaluator gains access to additional declarations 
>>>>> that are not there in the DWARF, such as templates. But since clang 
>>>>> modules are not namespaces, we can’t generally "import the world” as a 
>>>>> debugger would usually do.
>>>>> 
>>>>> Sorry, not sure I understand this last sentence - could you explain 
>>>>> further?
>>>>> 
>>>>> I imagine it would be rather limiting for the user if they could only use 
>>>>> expressions that are valid in this file from the file - it wouldn't be 
>>>>> uncommon to want to call a function from another module/file/etc to aid 
>>>>> in debugging.
>>>> 
>>>> Usually LLDB’s expression evaluator works by creating a clang AST type out 
>>>> of a DWARF type and inserting it into its AST context. We could 
>>>> pre-polulate it with the definitions from the imported modules (with all 
>>>> sorts of benefits as described above), but that only works if no two 
>>>> modules conflict. If the declaration can’t be found in any imported 
>>>> module, LLDB would still import it from DWARF in the “traditional” fashion.
>>>> 
>>>> But it would import it from DWARF in other TUs rather than use the module 
>>>> info just because the module wasn't directly referenced from this TU? That 
>>>> would seem strange to me. (you would lose debug info fidelity (by falling 
>>>> back to DWARF even though there are modules with the full fidelity info) 
>>>> unnecessarily, it sounds like)
>>> 
>>> I think it’s reasonable to expect full fidelity for everything that is 
>>> available in the current TU, and having the normal DWARF-based debugging 
>>> capabilities for everything beyond that. But we can only ever provide full 
>>> fidelity if we have the list of imports for the current TU.
>>>> 
>>>> Would it be reasonable to use the accelerator table/index to lookup the 
>>>> types, then if the type is in the module you could use the module rather 
>>>> than the DWARF stashed alongside it? (so the comdat'd split-dwarf skeleton 
>>>> CU for the module would have an index to tell you what names are inside 
>>>> it, but if you got an index hit you'd just look at the module instead of 
>>>> loading the split-dwarf debug info in the referenced file)
>>> 
>>> I don’t think this approach would work for templates and enumerator values;
>>> 
>>> Not sure why enumerator values are an issue - but templates (& all manner 
>>> of other things that don't make it into the index, unfortunately), sure.
>>>  
>>> they aren’t in the accelerator tables to begin with. It would also be 
>>> slower if the declaration is available in a module.
>>> 
>>> Though you're rapidly going to end up loading a lot of modules in (as you 
>>> go up & down a stack printing various things you'll cross into other TUs & 
>>> load more modules).
>>> 
>>> For a standard DWARF consumer, it seems fine to just have a comdat'd 
>>> skeleton CU for a module without the need for other CUs to mention which 
>>> module CUs they reference (but I could be wrong here) & that's the design 
>>> we originally discussed.
>>> 
>>> It would seem unfortunate to bloat every CU with a non-deduplicable list of 
>>> every module it references, but if that's necessary for a serialized AST 
>>> aware debugger, it might be fine to have it as an option (so long as it can 
>>> be turned off) & may still benefit from that list not being the 
>>> authoritative module reference, but a /very/ terse reference to it so all 
>>> the extra flags & stuff can be in the deduplicable comdat (& to keep it as 
>>> consistent as possible between the flag (on/off) codepaths for this extra 
>>> data). Maybe a FORM_block (?) of fixed-size hashes of all the modules 
>>> back-to-back, so it's as small as possible?
>>> 
>>> But I wouldn't mind spending some more time discussing whether there's a 
>>> better way to keep these things streamlined/symmetric/the same between 
>>> modular and non-modular debug info.
>> 
>> Sure!
>> Now that we established that recording the list of imported modules for 
>> every CU is useful for an AST-based debugger,
>> 
>> +Richard, just to see if he's got some ideas about how a debugger might 
>> efficiently use modules to support debugger scenarios and whether or not 
>> having a list of which modules are referenced from which contexts is 
>> valuable in that.
>> 
>> It still concerns me that this would create something of a 
>> regression/oddity/difference between AST-based debug info (you wouldn't be 
>> able to handle expressions referencing things in other TUs) and non-AST 
>> based debug info (where I think the average user is used to not worrying 
>> about what headers are included in the current file they're debugging when 
>> they try to use a type or other identifier)
> 
> If I understood you correctly, this is not actually the case. The list of 
> imported modules allows the AST-based debugger to import all the modules that 
> were imported by the CU that the current frame is in. This enables the user 
> to, e.g., type "p myVector->size()" even though std::vector<MyClass>::size() 
> was not used by the CU and is thus not available in DWARF. 
> If the user types “p foo” even though foo was not defined in any imported 
> module the debugger can — after failing to import foo via clang — still fall 
> back to looking up foo in DWARF and do what it always did.
> 
> If you do the DWARF fallback then you'll get a pretty clear inconsistency 
> between templates and non-templates. If I have a function foo and a function 
> template foo_tmpl in one file, and I'm debugging in another file I'll be able 
> to call 'foo' (normal DWARF fallback/search) but not foo_tmpl (if I'm calling 
> a new instantiation of foo_tmpl - if I'm calling an existing instantiation 
> presumably the fallback would catch me). Seems unfortunate/confusing, perhaps.


Good point, but it my guess is that this wouldn’t be any worse than the “why 
can’t I print the size() of this vector!?”-situations we have at the moment. In 
certain situations (i.e., non-templates) the debugger could use the DWARF in 
the modules to print a message about which module to import.
>  
> 
>>  
>> let’s talk about how to most efficiently represent this information.
>> 
>> In the CU, using DW_TAG_imported_module appears to be the most appropriate 
>> choice, even though there is some room for confusion since C++ using 
>> declarations are also represented this way. Inside the 
>> DW_TAG_imported_module, we could use 
>> (1) a DW_AT_import that references the skeleton (I hope that is the right 
>> terminology) CU for the module, the idea being that the skeleton CU would 
>> contain all the details (flags, name, include dirs, hash, ...) and be in a 
>> comdat'ed section.
>> 
>> I'd be concerned about overloading the terminology & confusing other 
>> debuggers - they might try to follow the DW_AT_import and be surprised that 
>> it doesn't refer to a DW_TAG_namespace tag.
> 
> That’s a valid concern, and we probably should not be emitting this if we 
> have any evidence of, e.g., gdb crashes when encountering such a construct. 
> Then again, we would be using a DW_TAG_imported_module to express what it is 
> meant to express according to the DWARF spec (namely importing a module)... 
> but I admit that the tag also does have a very specific meaning for C++, 
> which we maybe shouldn’t overload.
> 
> That's my concern, yes.
>  
> The right thing here is probably to put aside my personal sense of aesthetics 
> and use a private _LLVM_ namespace for all new additions, and then attempt to 
> standardize an official DWARF version once we know what is really needed and 
> what isn't.
> 
> I'd prefer this, yes. I mean the usual bar we use for language features is 
> that they're at least proposed for standardization before we adopt them in 
> clang - I wouldn't mind a similar bar here. If you want to bring up this use 
> of DW_TAG_imported_module with the DWARF committee & see if it sounds 
> reasonable (& test/inquire about GDB's behavior here).

I started a thread on dwarf-discuss to this end 
(http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org 
<http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org>, the list 
archives are only visible to subscribers, but anyone can subscribe).

-- adrian

> 
> But extension tags seems like the conservatively correct option (not sure 
> what GDB does on tags it doesn't recognize - I forget if it warns or just 
> completely ignores them, hopefully the latter)
>>  
>> (2) David’s suggestion of using a custom form that records the module hash 
>> directly is quite space-efficient, but it has the drawback of not being 
>> resilient against small changes to the imported module
>> 
>> That's going to be true of the normal fission info here (the skeleton CU and 
>> the full CU in the .dwo file (or module) are associated by hash) - granted, 
>> in the "loading an AST" mode, you can ignore those hashes and rely on your 
>> custom attributes instead.
>>  
>> , since clang’s module hash changes each time the module is being rebuilt.
>> 
>> Clang's module hash only changes if the DWARF contents change - it doesn't 
>> use a timestamp or anything. It seems like actually you're going to want to 
>> fail to load even more aggressively - there are ways the AST might've 
>> changed that the debug info doesn't reflect but are still important (a type 
>> unreferenced in this module, but built into some other code that is not 
>> built with debug info changes - no hash changes because the debug info for 
>> that type is unreferenced here, but if you try to use it you could have an 
>> incompatible layout, etc).
> 
> Agreed: If the module contents changed the debugger needs to display a big 
> flashing "here be dragons" warning.
>>  
>> This is less of an issue if the hash is referring to a skeleton CU in the 
>> same file, which contains all the detailed information.
>> 
>> Personally I’d prefer option 1 because mostly uses the existing mechanisms 
>> from DWARF. Here’s a visual guide to the options on the table:
>> 
>> (1)
>> foo.o (compiled with, let’s call it .. "-gmodule-imports”)
>> -----
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“foo.c”)
>>     DW_TAG_imported_module
>>       DW_AT_import(DW_FORM_ref_addr 0x123)  // Could be a FORM_ref_sig8 
>> 0x1234ABCDE as well.
>>     DW_TAG_imported_module
>>       DW_AT_import(...)
>> 
>> .debug_info.dwo:
>> // Skeleton CUs for modules imported by foo.o.
>> 0x123:
>>   DW_TAG_compile_unit
>>     // Used by split-dwarf debuggers to find external type definitions.
>>     
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>     DW_AT_dwo_id(“0x1234ABCDE”)
>> 
>>     // Used by AST-based debuggers to import the module.
>>     DW_AT_name(“Foundation”)
>> 
>> (side notes: the mixed indentation here makes it a bit hard to read this 
>> example, and I'd make sure /all/ the extended attributes (including the name 
>> here) use custom attribute names, not standard ones)
> 
> Agreed.
> 
>>  
>>     DW_AT_LLVM_sysroot(“/“)
>>     DW_AT_LLVM_include_dir(“”)
>>     DW_AT_LLVM_macros(“-DNDEBUG”)
>> 
>> (2)
>> .debug_info.dwo:
>> (As above.)
>> 
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“foo.c”)
>>     DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE 0xDEADBEEF 0x....)
>> 
>> Now I’m curious what option (3) will look like; the one that we’ll actually 
>> implement!
>> 
>> ;)
>>  
> 
> -- adrian
> 
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Have clang list the imported modules in the debug info

Reply via email to