Re: [PATCH] Have clang list the imported modules in the debug info

Adrian Prantl Tue, 24 Feb 2015 15:02:10 -0800

> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> 
>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie <[email protected] 
>>>> > <mailto:[email protected]>> wrote:
>>>> >
>>>> > My vague recollection from the previous design discussions was that 
>>>> > these module references would be their own 'unit' COMDAT'd so that we 
>>>> > don't end up with the duplication of every module reference in every 
>>>> > unit linked together when linking debug info?
>>>> >
>>>> > I think in my brain I'd been picturing this module reference as being an 
>>>> > extended fission reference (fission skeleton CU + extra fields for users 
>>>> > who want to load the Clang AST module directly and skip the split CU).
>>>> 
>>>> Apologies for letting this rest for so long.
>>>> 
>>>> Your memory was of course correct and I didn’t follow up on this because I 
>>>> had convinced myself that the fission reference would be completely 
>>>> sufficient. Now that I’ve been thinking some more about it, I don’t think 
>>>> that it is sufficient in the LTO case.
>>>> 
>>>> Here is the example from the 
>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html 
>>>> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>:
>>>> 
>>>> foo.o:
>>>> .debug_info.dwo
>>>>   DW_TAG_compile_unit
>>>>      // For DWARF consumers
>>>>      DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>>>      DW_AT_dwo_id   ([unique AST signature])
>>>> 
>>>> .debug_info
>>>>   DW_TAG_compile_unit
>>>>     DW_TAG_variable
>>>>       DW_AT_name "x"
>>>>       DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>>>> 
>>>> In this example it is clear that foo.o imported MyModule because its DWO 
>>>> skeleton is there in the same object file. But if we deal with the result 
>>>> of an LTO compilation we will end up with many compile units in the same 
>>>> .debug_info section, plus a bunch of skeleton compile units for _all_ 
>>>> imported modules in the entire project. We thus loose the ability to 
>>>> determine which of the compile units imported which module.
>>>> 
>>>> Why would we need to know which CU imported which modules? (I can imagine 
>>>> some possible reasons, but wondering what you have in mind)
>>> 
>>> When the debugger is stopped at a breakpoint and the user wants to evaluate 
>>> an expression, it should import the modules that are available at this 
>>> location, so the user can write the expression from within the context of 
>>> the breakpoint (e.g., without having to fully qualify each type, etc).
>>> 
>>> I'm not sure how much current debuggers actually worry about that - (& this 
>>> may differ from lldb to gdb to other things, of course). I'm pretty sure at 
>>> least for GDB, a context in one CU is as good as one in another (at least 
>>> without split-dwarf, type units, etc - with those sometimes things end up 
>>> overly restrictive as the debugger won't search everything properly).
>>> 
>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you run 
>>> 'start' in gdb (which breaks at the beginning of main) you can still run 'p 
>>> func()' to call the func, even though there's no declaration of it in 
>>> a.cpp, etc.
>> 
>> LLDB would definitely care (as it is using clang for the expression 
>> evaluation supporting these kinds of features is really straightforward 
>> there). By importing the modules (rather than searching through the DWARF), 
>> the expression evaluator gains access to additional declarations that are 
>> not there in the DWARF, such as templates. But since clang modules are not 
>> namespaces, we can’t generally "import the world” as a debugger would 
>> usually do.
>> 
>> Sorry, not sure I understand this last sentence - could you explain further?
>> 
>> I imagine it would be rather limiting for the user if they could only use 
>> expressions that are valid in this file from the file - it wouldn't be 
>> uncommon to want to call a function from another module/file/etc to aid in 
>> debugging.
> 
> Usually LLDB’s expression evaluator works by creating a clang AST type out of 
> a DWARF type and inserting it into its AST context. We could pre-polulate it 
> with the definitions from the imported modules (with all sorts of benefits as 
> described above), but that only works if no two modules conflict. If the 
> declaration can’t be found in any imported module, LLDB would still import it 
> from DWARF in the “traditional” fashion.
> 
> But it would import it from DWARF in other TUs rather than use the module 
> info just because the module wasn't directly referenced from this TU? That 
> would seem strange to me. (you would lose debug info fidelity (by falling 
> back to DWARF even though there are modules with the full fidelity info) 
> unnecessarily, it sounds like)


I think it’s reasonable to expect full fidelity for everything that is 
available in the current TU, and having the normal DWARF-based debugging 
capabilities for everything beyond that. But we can only ever provide full 
fidelity if we have the list of imports for the current TU.
> 
> Would it be reasonable to use the accelerator table/index to lookup the 
> types, then if the type is in the module you could use the module rather than 
> the DWARF stashed alongside it? (so the comdat'd split-dwarf skeleton CU for 
> the module would have an index to tell you what names are inside it, but if 
> you got an index hit you'd just look at the module instead of loading the 
> split-dwarf debug info in the referenced file)

I don’t think this approach would work for templates and enumerator values; 
they aren’t in the accelerator tables to begin with. It would also be slower if 
the declaration is available in a module.

-- adrian

> 
> - David
> 
> 
>  
> 
> -- adrian
> 
>>  
>> 
>> -- adrian
>>>>  
>>>> I think it really is necessary to put the info about the module imported 
>>>> into the compile unit that imported it. Or is there a way to do this using 
>>>> the fission capabilities that I’m not aware of?
>>>> 
>>>> -- adrian
>>>> 
>>>> >
>>>> > [rambling a bit more along those lines:
>>>> > This would work fine in the case of the module (now an object file) 
>>>> > containing all the static debug info
>>>> > The future step, when we put IR/object code in a module to be linked 
>>>> > into the final binary, we could put the skeleton CU in that object file 
>>>> > that's being linked in (then we wouldn't need to COMDAT it) or, 
>>>> > optionally, link in the debug info itself (skipping the indirection 
>>>> > through the external file) if a standalone debug info executable was 
>>>> > desired]
>>>> 
>>>> 
>>>> 
>>>> >
>>>> > On Tue, Jan 20, 2015 at 9:39 AM, Adrian Prantl <[email protected] 
>>>> > <mailto:[email protected]>> wrote:
>>>> > As a complementary part of the module debugging story, here is a 
>>>> > proposal to list the imported modules in the debug info. This patch is 
>>>> > not about efficiency, but rather enables a cool debugging feature:
>>>> >
>>>> > Record the clang modules imported by the current compile unit in the 
>>>> > debug info. This allows a module-aware debugger (such as LLDB) to 
>>>> > @import all modules visible in the current context before evaluating an 
>>>> > expression, thus making available all declarations in the current 
>>>> > context (that originate from a module) and not just the ones that were 
>>>> > actually used by the program.
>>>> >
>>>> > This implementation uses existing DWARF mechanisms as much as possible 
>>>> > by emitting a DW_TAG_imported_module that references a DW_TAG_module, 
>>>> > which contains the information necessary for the debugger to rebuild the 
>>>> > module. This is similar to how C++ using declarations are encoded in 
>>>> > DWARF, with the difference that we're importing a module instead of a 
>>>> > namespace.
>>>> > The information stored for a module includes the umbrella directory, any 
>>>> > config macros passed in via the command line that affect the module, and 
>>>> > the filename of the raw .pcm file. Why include all these parameters when 
>>>> > we have the .pcm file? Apart from module chache volatility, there is no 
>>>> > guarantee that the debugger was linked against the same version of clang 
>>>> > that generated the .pcm, so it may need to regenerate the module while 
>>>> > importing it.
>>>> >
>>>> > Let me know what you think!
>>>> > -- adrian
>>>> >
>>>> >
>>>> >
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Have clang list the imported modules in the debug info

Reply via email to