> On Dec 1, 2014, at 10:57 AM, Adrian Prantl <[email protected]> wrote:
>
>>
>> On Dec 1, 2014, at 10:50 AM, Ben Langmuir <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>> On Dec 1, 2014, at 10:41 AM, Adrian Prantl <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>>
>>>> On Dec 1, 2014, at 10:32 AM, Adrian Prantl <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>
>>>>> On Dec 1, 2014, at 10:27 AM, Ben Langmuir <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>>
>>>>>> On Nov 25, 2014, at 5:25 PM, Adrian Prantl <[email protected]
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Nov 24, 2014, at 4:55 PM, Richard Smith <[email protected]
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 5:52 PM, Adrian Prantl <[email protected]
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> Plans for module debugging
>>>>>>> ==========================
>>>>>>>
>>>>>>> I recently had a chat with Eric Christopher and David Blaikie to
>>>>>>> discuss ideas for debug info for Clang modules and this is what we came
>>>>>>> up with.
>>>>>>>
>>>>>>> Goals
>>>>>>> -----
>>>>>>>
>>>>>>> Clang modules [1], (and their siblings C++ modules and precompiled
>>>>>>> header files) are a method for improving compile time by making the
>>>>>>> serialized AST for commonly-used headers files directly available to
>>>>>>> the compiler.
>>>>>>>
>>>>>>> Currently debug info is totally oblivious to this, when the developer
>>>>>>> compiles a file that uses a type from a module, clang simply emits a
>>>>>>> copy of the full definition (some exceptions apply for C++) of this
>>>>>>> type in DWARF into the debug info section of the resulting object file.
>>>>>>> That's a lot of copies.
>>>>>>>
>>>>>>> The key idea is to emit DWARF for types defined in modules only once,
>>>>>>> and then only emit references to these types in all the individual
>>>>>>> compile units that import this module. We are going to build on the
>>>>>>> split DWARF and type unit facilities provided by DWARF for this. DWARF
>>>>>>> consumers can follow the type references into module debug info section
>>>>>>> quite similar to how they resolve types in external type units today.
>>>>>>> Additionally, the format will allow consumers that support clang
>>>>>>> modules natively (such as LLDB) to directly look up types in the
>>>>>>> module, without having to go through the usual translation from AST to
>>>>>>> DWARF and back to AST.
>>>>>>>
>>>>>>> The primary benefit from doing all this is performance. This change is
>>>>>>> expected to reduce the size of the debug info in object files
>>>>>>> significantly by
>>>>>>> - emitting only references to the full types and thus
>>>>>>> - implicitly uniquing types that are defined in modules.
>>>>>>> The smaller object files will result in faster compile times and faster
>>>>>>> llvm::Module load times when doing LTO. The type uniquing will also
>>>>>>> result in significantly smaller debug info for the finished
>>>>>>> executables, especially for C and Objective-C, which do not support
>>>>>>> ODR-based type uniquing. This comes at the price of longer initial
>>>>>>> module build times, as debug info is emitted alongside the module.
>>>>>>>
>>>>>>> Design
>>>>>>> ------
>>>>>>>
>>>>>>> Clang modules are designed to be ephemeral build artifacts that live in
>>>>>>> a shared module cache. Compiling a source file that imports `MyModule`
>>>>>>> results in `Module.pcm` to be generated to the module cache directory,
>>>>>>> which contains the serialized AST of the declarations found in the
>>>>>>> header files that comprise the module.
>>>>>>>
>>>>>>> We will change the binary clang module format to became a container
>>>>>>> (ELF, Mach-O, depending on the platform). Inside the container there
>>>>>>> will be multiple sections: one containing the serialized AST, and ones
>>>>>>> containing DWARF5 split debug type information for all types defined in
>>>>>>> the module that can be encoded in DWARF. By virtue of using type units,
>>>>>>> each type is emitted into its own type unit which can be identified via
>>>>>>> a unique type signature. DWARF consumers can use the type signatures to
>>>>>>> look up type definitions in the module debug info section. For
>>>>>>> module-aware consumers (LLDB), we will add an index that maps type
>>>>>>> signatures directly to an offset in the AST section.
>>>>>>>
>>>>>>> For an object file that was built using modules, we need to record the
>>>>>>> fact that a module has been imported. To this end, we add a
>>>>>>> DW_TAG_compile_unit into a COMDAT .debug_info.dwo section that
>>>>>>> references the split DWARF inside the module. Similar to split DWARF
>>>>>>> objects, the module will be identified by its filename and a checksum.
>>>>>>> The imported unit also contains a couple of extra attributes holding
>>>>>>> all the information necessary to recreate the module in case the module
>>>>>>> cache has been flushed.
>>>>>>>
>>>>>>> How does the debugging experience work in this case? When do you
>>>>>>> trigger the (possibly-lengthy) rebuild of the source in order to
>>>>>>> recreate the DWARF for the module (is it possible to delay that until
>>>>>>> the information is needed)?
>>>>>>
>>>>>> The module debugging scenario is primarily aimed at providing a
>>>>>> better/faster edit-compile-debug cycle. In this scenario, the module
>>>>>> would most likely still be in the cache. In a case were the binary was
>>>>>> build so long ago that the module cache has since been flushed it is
>>>>>> generally more likely the the user also used a DWARF linking step (such
>>>>>> as dsymutil on Darwin, and maybe dwz on Linux?) because they did a
>>>>>> release/archive build which would just copy the DWARF out of the module
>>>>>> and store it alongside the binary. For this reason I’m not very
>>>>>> concerned about the time necessary for rebuilding the module. But this
>>>>>> is all very platform-specific, and different platforms may need
>>>>>> different defaults.
>>>>>
>>>>> This description is in terms of building a module that has gone missing,
>>>>> but just to be clear: a modules-aware debugger probably also needs to
>>>>> rebuild modules that have gone out of date, such as when one of their
>>>>> headers is modified.
>>>>
>>>> In this case were the module is out of date, the debugger should probably
>>>> fall back to the DWARF types, because it cannot guarantee that the
>>>> modifications to the header files did not change the types it wants to
>>>> look up.
>>>
>>> Sorry, I just realized that this doesn’t make any sense if the DWARF is
>>> stored in the module. The behavior should be:
>>> 1. If the module is missing, recreate the module.
>>> 2. If the module signature does not match the signature in the .o file,
>>> either print a large warning that types from that module may be bogus, or
>>> categorically refuse to use them.
>>
>> Maybe this is described elsewhere, but what is the “signature” being used
>> here? Assuming it depends on the detailed contents of the serialized AST:
>> currently ASTWriter output is nondeterministic and things like the ID#s for
>> identifiers, types, etc. will change every time you build the module; until
>> that gets fixed, we would always hit case (2).
>
> I was actually hoping that we could rely on deterministic output from clang.
> If it is infeasible make ASTWriter output deterministic, we can fall back to
> something like the DWARF dwo_id signature here.
I think everyone agrees that deterministic output is a good idea. Last I
heard, Richard had indicated some interest in tackling this problem.
Ben
>
> -- adrian
>
>>
>>>
>>> For long-term debugging users are expected to use a DWARF linker (dsymutil,
>>> dwz), which archives all types in a future-proof format (DWARF).
>>>
>>> -- adrian
>>>
>>>>
>>>>>
>>>>>> Delaying the module DWARF output until needed (maybe even by the
>>>>>> debugger!) is an interesting idea. We should definitely measure how
>>>>>> expensive it is to emit DWARF for an entire module with of types to see
>>>>>> if this is worthwhile.
>>>>>>
>>>>>>> How much knowledge does the debugger have/need of Clang's modules to do
>>>>>>> this? Are we just embedding an arbitrary command that can be run to
>>>>>>> rebuild the .dwo if it's missing? And if so, how do we make that safe
>>>>>>> when (say) root attaches a debugger to an arbitrary process?
>>>>>>
>>>>>> I think it is reasonable to assume that a consumer that can make use of
>>>>>> clang modules also knows how to rebuild clang modules, which is why the
>>>>>> example only contained the name of the module, sysroot, include path,
>>>>>> and defines; not an arbitrary command. On platforms were the debugger
>>>>>> does not understand clang modules, the whole problem can be dodged by
>>>>>> treating the modules as explicit build artifacts.
>>>>>
>>>>> You are probably already aware, but you will need a bunch more
>>>>> information (language options, target options, header search options) to
>>>>> rebuild a module.
>>>>
>>>> Thanks, language options and target options were absent from the list
>>>> previously!
>>>>
>>>> -- adrian
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Platforms that treat modules as an explicit build artifact do not have
>>>>>>> this problem. In the .debug_info section all types that are defined in
>>>>>>> the module are referenced via their unique type signature using
>>>>>>> DW_FORM_ref_sig8, just as they would be if this were types from a
>>>>>>> regular DWARF type unit.
>>>>>>>
>>>>>>> Example
>>>>>>> -------
>>>>>>>
>>>>>>> Let's say we have a module `MyModule` that defines a type `MyStruct`::
>>>>>>> $ cat foo.c
>>>>>>> #include <MyModule.h>
>>>>>>> MyStruct x;
>>>>>>>
>>>>>>> when compiling `foo.c` like this::
>>>>>>> clang -fmodules -gmodules foo.c -c
>>>>>>>
>>>>>>> clang produces `foo.o` and an ELF or Mach-O container for the module::
>>>>>>> /path/to/module-cache/MyModule.pcm
>>>>>>>
>>>>>>> In the module container, we have a section for the serialized AST and a
>>>>>>> split DWARF sections for the debug type info. The exact format is
>>>>>>> likely still going to evolve a little, but this should give a rough
>>>>>>> idea::
>>>>>>>
>>>>>>> MyModule.pcm:
>>>>>>> .debug_info.dwo:
>>>>>>> DW_TAG_compile_unit
>>>>>>> DW_AT_dwo_name ("/path/to/MyModule.pcm")
>>>>>>> DW_AT_dwo_id ([unique AST signature])
>>>>>>>
>>>>>>> DW_TAG_type_unit ([hash for MyStruct])
>>>>>>> DW_TAG_structure_type
>>>>>>> DW_AT_signature ([hash for MyStruct])
>>>>>>> DW_AT_name “MyStruct”
>>>>>>> ...
>>>>>>>
>>>>>>> .debug_abbrev.dwo:
>>>>>>> // abbrevs referenced by .debug_info.dwo
>>>>>>> .debug_line.dwo:
>>>>>>> // filenames referenced by .debug_info.dwo
>>>>>>> .debug_str.dwo:
>>>>>>> // strings referenced by .debug_info.dwo
>>>>>>>
>>>>>>> .ast
>>>>>>> // Index at the top of the AST section sorted by hash value.
>>>>>>> [hash for MyStruct] -> [offset for MyStruct in this section]
>>>>>>> ...
>>>>>>> // Serialized AST follows
>>>>>>> ...
>>>>>>>
>>>>>>> The debug info in foo.o will look like this::
>>>>>>>
>>>>>>> .debug_info.dwo
>>>>>>> DW_TAG_compile_unit
>>>>>>> // For DWARF consumers
>>>>>>> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>>>>>> DW_AT_dwo_id ([unique AST signature])
>>>>>>>
>>>>>>> // For LLDB / dsymutil so they can recreate the module
>>>>>>> DW_AT_name “MyModule"
>>>>>>> DW_AT_LLVM_system_root "/"
>>>>>>> DW_AT_LLVM_preprocessor_defines "-DNDEBUG"
>>>>>>> DW_AT_LLVM_include_path "/path/to/MyModule.map"
>>>>>>>
>>>>>>> .debug_info
>>>>>>> DW_TAG_compile_unit
>>>>>>> DW_TAG_variable
>>>>>>> DW_AT_name "x"
>>>>>>> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>>>>>>>
>>>>>>>
>>>>>>> Type signatures
>>>>>>> ---------------
>>>>>>>
>>>>>>> We are going to deviate from the DWARF spec by using a more efficient
>>>>>>> hashing function that uses the type's unique mangled name and the name
>>>>>>> of the module as input.
>>>>>>>
>>>>>>> Why do you need/want the name of the module here? Modules are not a
>>>>>>> namespacing mechanism. How would you compute this name when the same
>>>>>>> type is defined in multiple imported modules?
>>>>>>
>>>>>> Great point! I’m mostly concerned about non-ODR languages ...
>>>>>>>
>>>>>>> For languages that do not have mangled type names or an ODR,
>>>>>>>
>>>>>>> The people working on C modules have expressed an intent to apply the
>>>>>>> ODR there too, so it's not clear that Clang modules will support any
>>>>>>> such language in the longer term.
>>>>>>
>>>>>> ... and this may be the answer to the question!
>>>>>>
>>>>>> +Doug: do Objective-C modules have an ODR?
>>>>>>
>>>>>>>
>>>>>>> we will use the unique identifiers produces by the clang indexer (USRs)
>>>>>>> as input instead.
>>>>>>>
>>>>>>> Extension: Replacing type units with a more efficient storage format
>>>>>>> --------------------------------------------------------------------
>>>>>>>
>>>>>>> As an extension to this proposal, we are thinking of replacing the type
>>>>>>> units within the module debug info with a more efficient format:
>>>>>>> Instead of emitting each type into its own type unit (complete with its
>>>>>>> entire declcontext), it would be much more more efficient to emit one
>>>>>>> large bag of DWARF together with an index that maps hash values (type
>>>>>>> signatures) to DIE offsets.
>>>>>>>
>>>>>>> Next steps
>>>>>>> ----------
>>>>>>>
>>>>>>> In order to implement this, the next steps would be as follows:
>>>>>>> 1. Change the clang module format to be an ELF/Mach-O container.
>>>>>>> 2. Teach clang to emit debug info for module types (e.g., by passing an
>>>>>>> empty compile unit with retained types to LLVM) into the module
>>>>>>> container.
>>>>>>> 3a. Add a -gmodules switch to clang that triggers the emission of type
>>>>>>> signatures for types coming from a module.
>>>>>>>
>>>>>>> Can you clarify what this flag would do? Does this turn on adding DWARF
>>>>>>> to the .pcm file? Does it turn off generating DWARF for imported
>>>>>>> modules in the current IR module? Both?
>>>>>>
>>>>>> It would emit references to the type from imported modules instead of
>>>>>> the types themselves.
>>>>>> Since the module cache is shared, we could — depending on just expensive
>>>>>> this is — turn on DWARF generation for .pcm files by default. I’d like
>>>>>> to measure this first, though.
>>>>>>
>>>>>>>
>>>>>>> I assume this means that the default remains that we build debug
>>>>>>> information for modules as if we didn't have modules (that is, put
>>>>>>> complete DWARF with the object code). Do you think that's the right
>>>>>>> long-term default? I think it's possibly not.
>>>>>>
>>>>>> I think you’re absolutely right about the long term. In the short term,
>>>>>> it may be better to have compatibility by default, but I don’t know what
>>>>>> the official LLVM policy on new features is, if there is one.
>>>>>>
>>>>>>>
>>>>>>> How does this interact with explicit module builds? Can I use a module
>>>>>>> built without -g in a compile that uses -g? And if I do, do I get
>>>>>>> complete debug information, or debug info just for the parts that
>>>>>>> aren't in the module? Does -gmodules let me choose between these?
>>>>>>
>>>>>> Personally I would expect old-style (full copy of the types) debug
>>>>>> information if I build agains a module that does not have embedded debug
>>>>>> information.
>>>>>>
>>>>>> thanks,
>>>>>> adrian
>>>>>>>
>>>>>>> 3b. Implement type-signature-based lookup in llvm-dsymutil and lldb.
>>>>>>> 4a. Emit an index that maps type signatures to AST section offsets into
>>>>>>> the module container.
>>>>>>> 4b. Implement direct loading of AST types in lldb.
>>>>>>> 5a. Improve the efficiency by replace type units in the module debug
>>>>>>> info with a lookup table that maps type signatures to DIE offsets.
>>>>>>> 5b. Support this format in lldb and llvm-dsymutil.
>>>>>>>
>>>>>>> Let me know what you think!
>>>>>>>
>>>>>>> cheers,
>>>>>>> Adrian
>>>>>>>
>>>>>>> [1] For more details about clang modules see
>>>>>>> http://clang.llvm.org/docs/Modules.html
>>>>>>> <http://clang.llvm.org/docs/Modules.html> and
>>>>>>> http://clang.llvm.org/docs/PCHInternals.html
>>>>>>> <http://clang.llvm.org/docs/PCHInternals.html>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> [email protected] <mailto:[email protected]>
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>
_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev