> On Dec 1, 2014, at 10:32 AM, Adrian Prantl <[email protected]> wrote:
>
>
>> On Dec 1, 2014, at 10:27 AM, Ben Langmuir <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>> On Nov 25, 2014, at 5:25 PM, Adrian Prantl <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>>
>>>> On Nov 24, 2014, at 4:55 PM, Richard Smith <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> On Fri, Nov 21, 2014 at 5:52 PM, Adrian Prantl <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>> Plans for module debugging
>>>> ==========================
>>>>
>>>> I recently had a chat with Eric Christopher and David Blaikie to discuss
>>>> ideas for debug info for Clang modules and this is what we came up with.
>>>>
>>>> Goals
>>>> -----
>>>>
>>>> Clang modules [1], (and their siblings C++ modules and precompiled header
>>>> files) are a method for improving compile time by making the serialized
>>>> AST for commonly-used headers files directly available to the compiler.
>>>>
>>>> Currently debug info is totally oblivious to this, when the developer
>>>> compiles a file that uses a type from a module, clang simply emits a copy
>>>> of the full definition (some exceptions apply for C++) of this type in
>>>> DWARF into the debug info section of the resulting object file. That's a
>>>> lot of copies.
>>>>
>>>> The key idea is to emit DWARF for types defined in modules only once, and
>>>> then only emit references to these types in all the individual compile
>>>> units that import this module. We are going to build on the split DWARF
>>>> and type unit facilities provided by DWARF for this. DWARF consumers can
>>>> follow the type references into module debug info section quite similar to
>>>> how they resolve types in external type units today. Additionally, the
>>>> format will allow consumers that support clang modules natively (such as
>>>> LLDB) to directly look up types in the module, without having to go
>>>> through the usual translation from AST to DWARF and back to AST.
>>>>
>>>> The primary benefit from doing all this is performance. This change is
>>>> expected to reduce the size of the debug info in object files
>>>> significantly by
>>>> - emitting only references to the full types and thus
>>>> - implicitly uniquing types that are defined in modules.
>>>> The smaller object files will result in faster compile times and faster
>>>> llvm::Module load times when doing LTO. The type uniquing will also result
>>>> in significantly smaller debug info for the finished executables,
>>>> especially for C and Objective-C, which do not support ODR-based type
>>>> uniquing. This comes at the price of longer initial module build times, as
>>>> debug info is emitted alongside the module.
>>>>
>>>> Design
>>>> ------
>>>>
>>>> Clang modules are designed to be ephemeral build artifacts that live in a
>>>> shared module cache. Compiling a source file that imports `MyModule`
>>>> results in `Module.pcm` to be generated to the module cache directory,
>>>> which contains the serialized AST of the declarations found in the header
>>>> files that comprise the module.
>>>>
>>>> We will change the binary clang module format to became a container (ELF,
>>>> Mach-O, depending on the platform). Inside the container there will be
>>>> multiple sections: one containing the serialized AST, and ones containing
>>>> DWARF5 split debug type information for all types defined in the module
>>>> that can be encoded in DWARF. By virtue of using type units, each type is
>>>> emitted into its own type unit which can be identified via a unique type
>>>> signature. DWARF consumers can use the type signatures to look up type
>>>> definitions in the module debug info section. For module-aware consumers
>>>> (LLDB), we will add an index that maps type signatures directly to an
>>>> offset in the AST section.
>>>>
>>>> For an object file that was built using modules, we need to record the
>>>> fact that a module has been imported. To this end, we add a
>>>> DW_TAG_compile_unit into a COMDAT .debug_info.dwo section that references
>>>> the split DWARF inside the module. Similar to split DWARF objects, the
>>>> module will be identified by its filename and a checksum. The imported
>>>> unit also contains a couple of extra attributes holding all the
>>>> information necessary to recreate the module in case the module cache has
>>>> been flushed.
>>>>
>>>> How does the debugging experience work in this case? When do you trigger
>>>> the (possibly-lengthy) rebuild of the source in order to recreate the
>>>> DWARF for the module (is it possible to delay that until the information
>>>> is needed)?
>>>
>>> The module debugging scenario is primarily aimed at providing a
>>> better/faster edit-compile-debug cycle. In this scenario, the module would
>>> most likely still be in the cache. In a case were the binary was build so
>>> long ago that the module cache has since been flushed it is generally more
>>> likely the the user also used a DWARF linking step (such as dsymutil on
>>> Darwin, and maybe dwz on Linux?) because they did a release/archive build
>>> which would just copy the DWARF out of the module and store it alongside
>>> the binary. For this reason I’m not very concerned about the time necessary
>>> for rebuilding the module. But this is all very platform-specific, and
>>> different platforms may need different defaults.
>>
>> This description is in terms of building a module that has gone missing, but
>> just to be clear: a modules-aware debugger probably also needs to rebuild
>> modules that have gone out of date, such as when one of their headers is
>> modified.
>
> In this case were the module is out of date, the debugger should probably
> fall back to the DWARF types, because it cannot guarantee that the
> modifications to the header files did not change the types it wants to look
> up.
Sorry, I just realized that this doesn’t make any sense if the DWARF is stored
in the module. The behavior should be:
1. If the module is missing, recreate the module.
2. If the module signature does not match the signature in the .o file, either
print a large warning that types from that module may be bogus, or
categorically refuse to use them.
For long-term debugging users are expected to use a DWARF linker (dsymutil,
dwz), which archives all types in a future-proof format (DWARF).
-- adrian
>
>>
>>> Delaying the module DWARF output until needed (maybe even by the debugger!)
>>> is an interesting idea. We should definitely measure how expensive it is to
>>> emit DWARF for an entire module with of types to see if this is worthwhile.
>>>
>>>> How much knowledge does the debugger have/need of Clang's modules to do
>>>> this? Are we just embedding an arbitrary command that can be run to
>>>> rebuild the .dwo if it's missing? And if so, how do we make that safe when
>>>> (say) root attaches a debugger to an arbitrary process?
>>>
>>> I think it is reasonable to assume that a consumer that can make use of
>>> clang modules also knows how to rebuild clang modules, which is why the
>>> example only contained the name of the module, sysroot, include path, and
>>> defines; not an arbitrary command. On platforms were the debugger does not
>>> understand clang modules, the whole problem can be dodged by treating the
>>> modules as explicit build artifacts.
>>
>> You are probably already aware, but you will need a bunch more information
>> (language options, target options, header search options) to rebuild a
>> module.
>
> Thanks, language options and target options were absent from the list
> previously!
>
> -- adrian
>>
>>>
>>>>
>>>> Platforms that treat modules as an explicit build artifact do not have
>>>> this problem. In the .debug_info section all types that are defined in the
>>>> module are referenced via their unique type signature using
>>>> DW_FORM_ref_sig8, just as they would be if this were types from a regular
>>>> DWARF type unit.
>>>>
>>>> Example
>>>> -------
>>>>
>>>> Let's say we have a module `MyModule` that defines a type `MyStruct`::
>>>> $ cat foo.c
>>>> #include <MyModule.h>
>>>> MyStruct x;
>>>>
>>>> when compiling `foo.c` like this::
>>>> clang -fmodules -gmodules foo.c -c
>>>>
>>>> clang produces `foo.o` and an ELF or Mach-O container for the module::
>>>> /path/to/module-cache/MyModule.pcm
>>>>
>>>> In the module container, we have a section for the serialized AST and a
>>>> split DWARF sections for the debug type info. The exact format is likely
>>>> still going to evolve a little, but this should give a rough idea::
>>>>
>>>> MyModule.pcm:
>>>> .debug_info.dwo:
>>>> DW_TAG_compile_unit
>>>> DW_AT_dwo_name ("/path/to/MyModule.pcm")
>>>> DW_AT_dwo_id ([unique AST signature])
>>>>
>>>> DW_TAG_type_unit ([hash for MyStruct])
>>>> DW_TAG_structure_type
>>>> DW_AT_signature ([hash for MyStruct])
>>>> DW_AT_name “MyStruct”
>>>> ...
>>>>
>>>> .debug_abbrev.dwo:
>>>> // abbrevs referenced by .debug_info.dwo
>>>> .debug_line.dwo:
>>>> // filenames referenced by .debug_info.dwo
>>>> .debug_str.dwo:
>>>> // strings referenced by .debug_info.dwo
>>>>
>>>> .ast
>>>> // Index at the top of the AST section sorted by hash value.
>>>> [hash for MyStruct] -> [offset for MyStruct in this section]
>>>> ...
>>>> // Serialized AST follows
>>>> ...
>>>>
>>>> The debug info in foo.o will look like this::
>>>>
>>>> .debug_info.dwo
>>>> DW_TAG_compile_unit
>>>> // For DWARF consumers
>>>> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>>> DW_AT_dwo_id ([unique AST signature])
>>>>
>>>> // For LLDB / dsymutil so they can recreate the module
>>>> DW_AT_name “MyModule"
>>>> DW_AT_LLVM_system_root "/"
>>>> DW_AT_LLVM_preprocessor_defines "-DNDEBUG"
>>>> DW_AT_LLVM_include_path "/path/to/MyModule.map"
>>>>
>>>> .debug_info
>>>> DW_TAG_compile_unit
>>>> DW_TAG_variable
>>>> DW_AT_name "x"
>>>> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>>>>
>>>>
>>>> Type signatures
>>>> ---------------
>>>>
>>>> We are going to deviate from the DWARF spec by using a more efficient
>>>> hashing function that uses the type's unique mangled name and the name of
>>>> the module as input.
>>>>
>>>> Why do you need/want the name of the module here? Modules are not a
>>>> namespacing mechanism. How would you compute this name when the same type
>>>> is defined in multiple imported modules?
>>>
>>> Great point! I’m mostly concerned about non-ODR languages ...
>>>>
>>>> For languages that do not have mangled type names or an ODR,
>>>>
>>>> The people working on C modules have expressed an intent to apply the ODR
>>>> there too, so it's not clear that Clang modules will support any such
>>>> language in the longer term.
>>>
>>> ... and this may be the answer to the question!
>>>
>>> +Doug: do Objective-C modules have an ODR?
>>>
>>>>
>>>> we will use the unique identifiers produces by the clang indexer (USRs) as
>>>> input instead.
>>>>
>>>> Extension: Replacing type units with a more efficient storage format
>>>> --------------------------------------------------------------------
>>>>
>>>> As an extension to this proposal, we are thinking of replacing the type
>>>> units within the module debug info with a more efficient format: Instead
>>>> of emitting each type into its own type unit (complete with its entire
>>>> declcontext), it would be much more more efficient to emit one large bag
>>>> of DWARF together with an index that maps hash values (type signatures) to
>>>> DIE offsets.
>>>>
>>>> Next steps
>>>> ----------
>>>>
>>>> In order to implement this, the next steps would be as follows:
>>>> 1. Change the clang module format to be an ELF/Mach-O container.
>>>> 2. Teach clang to emit debug info for module types (e.g., by passing an
>>>> empty compile unit with retained types to LLVM) into the module container.
>>>> 3a. Add a -gmodules switch to clang that triggers the emission of type
>>>> signatures for types coming from a module.
>>>>
>>>> Can you clarify what this flag would do? Does this turn on adding DWARF to
>>>> the .pcm file? Does it turn off generating DWARF for imported modules in
>>>> the current IR module? Both?
>>>
>>> It would emit references to the type from imported modules instead of the
>>> types themselves.
>>> Since the module cache is shared, we could — depending on just expensive
>>> this is — turn on DWARF generation for .pcm files by default. I’d like to
>>> measure this first, though.
>>>
>>>>
>>>> I assume this means that the default remains that we build debug
>>>> information for modules as if we didn't have modules (that is, put
>>>> complete DWARF with the object code). Do you think that's the right
>>>> long-term default? I think it's possibly not.
>>>
>>> I think you’re absolutely right about the long term. In the short term, it
>>> may be better to have compatibility by default, but I don’t know what the
>>> official LLVM policy on new features is, if there is one.
>>>
>>>>
>>>> How does this interact with explicit module builds? Can I use a module
>>>> built without -g in a compile that uses -g? And if I do, do I get complete
>>>> debug information, or debug info just for the parts that aren't in the
>>>> module? Does -gmodules let me choose between these?
>>>
>>> Personally I would expect old-style (full copy of the types) debug
>>> information if I build agains a module that does not have embedded debug
>>> information.
>>>
>>> thanks,
>>> adrian
>>>>
>>>> 3b. Implement type-signature-based lookup in llvm-dsymutil and lldb.
>>>> 4a. Emit an index that maps type signatures to AST section offsets into
>>>> the module container.
>>>> 4b. Implement direct loading of AST types in lldb.
>>>> 5a. Improve the efficiency by replace type units in the module debug info
>>>> with a lookup table that maps type signatures to DIE offsets.
>>>> 5b. Support this format in lldb and llvm-dsymutil.
>>>>
>>>> Let me know what you think!
>>>>
>>>> cheers,
>>>> Adrian
>>>>
>>>> [1] For more details about clang modules see
>>>> http://clang.llvm.org/docs/Modules.html
>>>> <http://clang.llvm.org/docs/Modules.html> and
>>>> http://clang.llvm.org/docs/PCHInternals.html
>>>> <http://clang.llvm.org/docs/PCHInternals.html>
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>
>
_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev