> On May 5, 2015, at 3:14 PM, David Blaikie <dblai...@gmail.com> wrote: > > > > On Tue, May 5, 2015 at 3:01 PM, Adrian Prantl <apra...@apple.com > <mailto:apra...@apple.com>> wrote: > > > On May 5, 2015, at 9:59 AM, David Blaikie <dblai...@gmail.com > > <mailto:dblai...@gmail.com>> wrote: > > > > > > > > On Mon, May 4, 2015 at 4:46 PM, Adrian Prantl <apra...@apple.com > > <mailto:apra...@apple.com>> wrote: > > > >>> On May 4, 2015, at 1:31 PM, David Blaikie <dblai...@gmail.com > >>> <mailto:dblai...@gmail.com>> wrote: > >>> > >> ... > >>>> >>> So you're going to need to implement fission (to at least some > >>>> >>> degree) support in LLDB, then? (to support the case where you > >>>> >>> haven't linked debug info with llvm-dsymutil, but you've hit one of > >>>> >>> these lookup problems where you need to cross possibly-conflicting > >>>> >>> modules) > >>>> >> > >>>> >> Yes. Specifically, it won’t support type units, and it will look up > >>>> >> types by name rather than by signature. (cf. the second part of > >>>> >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html > >>>> >> > >>>> >> <http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html>) > >>>> > > >>>> > How are you going to reference the types in the module's fission CU > >>>> > without type units/signatures? Are you going to emit type declarations > >>>> > into the normal CU and rely on the debugger to know that these > >>>> > declarations can be resolved by looking elsewhere? (just without the > >>>> > benefit of constraining that search to just looking for a matching TU?) > >>>> > >>>> If you look at the example in > >>>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html > >>>> > >>>> <http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html>, > >>>> there will be an external type index (using the usual accelerator table > >>>> format) that maps an external type’s UID to a pcm. In the pcm there is > >>>> an extra accelerator table entry that maps UID to DIE offset. > >>> > >>> I mean I guess that's up to you, but seems like a relatively large > >>> workaround compared to supporting type units... (I mean certainly seems > >>> like strictly less work to do the workaround than implementing type units > >>> in LLDB, but a relatively large amount of work to do/throw away > >>> eventually once LLDB supports type units) > >> > >> It’s not primarily meant to be a workaround so LLDB doesn’t need to > >> implement type units; the UIDs double as the key (decl context + name) to > >> import types directly from the AST. The other advantage is that we won’t > >> have to worry about MD5 hash collisions, but that’s more of a theoretical > >> advantage. > > > > If there's a concern about collisions, we should fix that regardless > > (because we'll have the same problem with normal type units or modularized > > type units). > > My understanding is that split DWARF is disambiguating hash collisions by > adding the full decl context and name to both the definition and the forward > declaration: > > No, that's not a requirement. It's just an aspect of our implementation so we > can put entities in the type when needed (such as for putting member function > declarations in there to refererence from out of line member function > definitions) > > In the simplest case, you can reference a type unit directly: > > DW_TAG_variable > DW_AT_type [DW_FORM_ref_sig8] ....
You’re right. The standard allows this, although all the examples in the appendix show the long form that LLVM also emits. > > Without any DW_TAG_structure_type/namespaces/etc. Once we have Bag O' DWARF, > we'd move to that representation almost entirely. (the signature we're using > for this is the mangled name of the type so it includes the decl context in > the hash anyway) .. which, incidentally, will look a lot like this UID example, right? > (for completeness, the UID type reference would look like this: > DW_TAG_variable > DW_AT_name “myC” > DW_AT_type [DW_FORM_strp] “_ZTN1AS1BS1C” ) > Oh I see, you mean hashing the mangled name and using it as a signature rather than emitting the name directly. But then, to disambiguate, it would still be beneficial to have the mangled name available. > > > So I think that hash collisions are actually handled for normal fission and > we don’t need to worry about them. The bag o’dwarf will still contain the > full decl context in the DWO, so it will be fine there, too. > > > > > If we've already got (& have to use in the case of DWARF fallback for > > modules) to support a hash or other numeric identifier, it might be good to > > use that for everything rather than having two mechanisms? > > Generally, one way to do it is of course better than maintaining two :-) > But using the UIDs in the .o file allows us to elide the the forward > declarations from the .o file > > It doesn't, though - because we need the forward declarations to put members > into (such as member function definitions - this'll come up even for modules > quite commonly with any inline member function, for example - its definition > will appear in the .o file and the definition DW_TAG_subprogram will need to > DW_AT_specification pointing to the declaration of the member function - > which is a DW_TAG_subprogram inside the DW_TAG_class that is a declaration > with a signature that refers to the type unit). Couldn’t we make the DW_AT_specification an external type reference pointing to the full type in the module? > > Only once we've got Bag O' DWARF do we remove the need for these declarations. > > Granted we can get rid of them in /some/ cases today - when there's no need > to reference members in the type (or nest other members - like a nested > class, for example - though we could possibly workaround even that with Bag O > DWARF by having the type in the type unit have a declaration of the nested > type - then reference that from the nested type definition in another type > unit or .o file, etc). > > I don't think there's anything you can leverage here that's different from > what type units need. If you're going to do anything different from the way > type units work today, I'd really like to discuss that further and understand > what's different here that benefits from/allows for that difference that > doesn't apply to type units themselves. > > (granted the UIDs carry the same information, so it’s not actually as small > as it looks like) so all that dsymutil needs to do is replace the string > references with FORM_ref_addrs, no further stripping needed. Good point. I don’t have any other secret plans up my sleeve apart from what I already outlined in this thread :-) I see the UID type references for LLDB like a bag o’dwarf version 0.5. Any lessons learned there (like the hash collisions) should definitely be addressed in the final bag o’dwarf design. > > > > > >> > >>> > >>> > >>> >> > >>> >> > >>> >>> > >>> >>> OK, so I think it's probably reasonable for now to just add > >>> >>> DW_TAG_modules to the CU for each referenced module (or does it have > >>> >>> to be each referenced submodule? (can two submodules within a single > >>> >>> module be contradictory/conflicting?)). Since we don't have any good > >>> >>> way to reference the module is a foreign unit while deduplicating > >>> >>> that unit... there's not much point having the imported_module - but > >>> >>> if you think it adds anything, I'm open to ideas. > >>> >> It could help keeping things simpler. > >>> >> Emitting it doesn’t add much semantic value because module imports > >>> >> always occur at the top level, but it will make the transition to the > >>> >> deduplicated TAG_modules easier — It could be easier to teach > >>> >> consumers once about imported_module({ref to TAG_module}) rather than > >>> >> having them also recognize top-level TAG_modules as an intermediate > >>> >> step. It’s also slightly easier to implement in LLVM because the > >>> >> imported_module allows us to anchor the TAG_module in the CU, but > >>> >> that’s not a very strong argument. > >>> > > >>> > Agreed on all counts (not a strong argument, but convenient enough, > >>> > etc, etc). > >>> > > >>> > I'm still not entirely sure what the right answer is here, though, > >>> > which is why I'm hesitant to bake anything in too strongly. > >>> > > >>> > To come back to one of the outstanding questions: Do you need submodule > >>> > import information, or just module level (if modules cannot have > >>> > internal conflicts and you can't avoid cross-module conflicts just by > >>> > lack of visibility (I have no idea if either of those things are true) > >>> > then you may just need per-module not per-submodule info)? > >>> > >>> At the moment I do not think that it makes sense for two submodules to > >>> conflict, but there is nothing in the clang documentation that explicitly > >>> forbids this. With this in mind, I think it is reasonable to not support > >>> submodules (at least initially) and always emit an import for the parent > >>> module. > >>> Thats what I wanted to write ... but I as I’m browsing through our > >>> documentation, > >>> http://clang.llvm.org/docs/Modules.html#conflict-declarations > >>> <http://clang.llvm.org/docs/Modules.html#conflict-declarations> > >>> explicitly gives an example of two conflicting submodules, so maybe this > >>> is not a reasonable simplification after all. On the other hand, a quick > >>> grep over all system module maps on OS X doesn’t show a single conflict > >>> declaration. > >>> > >>> I still believe we do not need to support submodules right from the > >>> start, but we should have a story for getting there if we need to. > >>> > >>> Given the simple example that demonstrates the possibility, it seems fair > >>> to have a story for what that looks like, yes - even if a first > >>> pass/prototype doesn't support it. > >> > >> Sure. > >>> > >>> > >>> > > >>> > Also, does each submodule need different special attributes/flags? If > >>> > the special codegen attributes you want are at the module level, it'd > >>> > probably be best to keep those on the Skeleton CU for the module (that > >>> > will be comdat folded, etc, on ELF - and they could be DWARF-aware > >>> > deduplicated by llvm-dsymutil) so they're not duplicated. The > >>> > DW_TAG_module would then just have a DW_AT_signature attribute or > >>> > something similarly small/trivial to point to the skeleton CU. > >>> > >>> The attributes are derived from cc1 command line arguments. Not two > >>> submodules imported by one CU can have different attributes. All > >>> submodules in a pcm also share their attributes. Putting them into the > >>> skeleton CU appears to be the most efficient place to put them, though > >>> perhaps not the most logical one. > >>> > >>> Why not the most logical? It'd be nice if it were a DW_TAG_module instead > >>> of a DW_TAG_compile_unit - but given the limited vocabulary we have in > >>> DWARF top level tags, it seems as good as we can have. > >> > >> I tend to view the module configuration (include path, isysroot, > >> configuration macros) to be a part of the module and not a part of the > >> skeleton that points to the split debug info for that module. > > > > Sure, it's a workaround for the lack of Bag O' DWARF for now, one way or > > another. Either way the debugger's going to have to jump the > > [there’s something missing towards the end] > No it’s not a workaround. Having the full module configuration is what allows > LLDB to rebuild the module if the module cache has been cleared. > > > >> A module is uniquely identified by name + configuration. That’s why I feel > >> it should be part of the tag that also holds the name. > >> > >>> > >>> I would prefer to stick the attributes on the (top-level) DW_TAG_module > >>> and later deduplicate the attributes together with the DW_TAG_module. > >>> Sticking them on the skeleton won’t save any space in the .o files and > >>> would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot - > >>> 1x FORM_ref_sig_8) per CU and imported module. > >>> > >>> Seems nicer not to duplicate them, especially since not everyone will be > >>> using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux > >>> won't have that convenience). Eventually we can use Bag O' DWARF for the > >>> skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow > >>> that as a top-level tag, if desired/useful - I'm not sure it adds a lot) > >>> and have the imported_module reference it that way. > >>> (DW_TAG_imported_module, DW_AT_import, DW_FORM_ref_sig8) > >>> > >>> I'm not /hugely/ invested in this, but we do have people caring about > >>> LLDB on Linux and Windows, so avoiding tying the LLDB story to MachO and > >>> dsymutil, etc, seems valuable. > >> > >> I think that this would be an unnecessary intermediate step that we > >> eventually want to migrate away from anyway. We already identified that > >> the good solution for deduplication is going to be a skeleton TAG_module, > >> so my view is that it is not worth the trouble adding a temporary > >> indirection (and a new attribute name) > > > > New attribute name? > > What attribute would we put into the TAG_module to refer to the skeleton CU? > > I imagine we'd just match by name in that instance. That doesn’t generally work. I could have a.c import Foo and b.c import Foo -DNDEBUG. If the debugger loads the debug info from the linked .dSYM bundle it wouldn’t be able to tell which version of Foo to import. Actually, that’s a really good argument why the module configuration needs to sit inside the TAG_module. > > (& eventually use a sig8, etc) Unless we refer to it via ref_sig8 (and disregard collisions, which I think might be statistically fine for module configurations), which we already agreed to leave for the next revision. -- adrian > > > > > >> to save 4 bytes in the intermediate step. > > > > The debugger's going to need to resolve the skeleton anyway (in the case of > > non-AST based debugging) so I'm not sure how much it's an extra step. > > My “intermediate step” was referring to the non-dedup’ed TAG_module that is > not (yet) part of the dedup’ed skeleton CU (but will be in the next > iteration). I just didn’t think that it is worth to optimize a representation > that will be changed soon anyway. > > It doesn't seem to cost much, but I'll give up on this. > > > > > >> I don’t actually think there is anything about the TAG_module design tying > >> this to either MachO or dsymutil, but let me know if you feel otherwise. > > > > Sorry, what I was getting at was that with the Mach0/dsymutil/lldb story > > you probably don't need to consult the skeleton debug info (actually I > > forgot, you do - in the case where you need a name from a module that might > > be incompatible (it's not referenced directly in this CU)) - pre-dsymutil, > > you'd use the ASTs directly, post-dsymutil I expect you'll have inlined all > > the debug info so there are no skeletons, etc, if I'm understanding your > > design correctly. > > Right, we need all of it :-) > > -- adrian > > > > > - David > > > >> > >> -- adrian > >>> > >>> > > >>> > If you need submodule import lists, then each DW_AT_module representing > >>> > a submodule would have a name (anything else?) and the signature > >>> > refering to its module skeleton CU. > >>> > >>> What I’m envisioning is > >>> > >>> .debug_info: > >>> DW_TAG_compile_unit > >>> ... > >>> DW_TAG_imported_module > >>> // import FooSubA > >>> DW_AT_import [DW_FORM_ref4] (0x60) > >>> > >>> DW_TAG_module > >>> DW_AT_name(“FooLib”) > >>> DW_AT_LLVM_sysroot(“/“) > >>> DW_AT_LLVM_include_dirs(“-I/path”) > >>> DW_AT_LLVM_macros(“-DNDEBUG”) > >>> 0x60: > >>> DW_TAG_module > >>> DW_AT_name(“FooSubA”) > >>> // need not be emitted if not referenced. > >>> DW_TAG_module > >>> DW_AT_name(“FooSubASubA”) > >>> > >>> // need not be emitted if not referenced. > >>> DW_TAG_module > >>> DW_AT_name(“FooSubB”) > >>> > >>> > >>> > >>> -- adrian > >> > >> > > >
_______________________________________________ cfe-commits mailing list cfe-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits