dblaikie added inline comments.
================ Comment at: lib/CodeGen/CGDebugInfo.cpp:812-814 + // CodeView types with C++ mangling need a type identifier. + if (CGM.getCodeGenOpts().EmitCodeView) + return true; ---------------- rnk wrote: > dblaikie wrote: > > bwyma wrote: > > > dblaikie wrote: > > > > rnk wrote: > > > > > rnk wrote: > > > > > > dblaikie wrote: > > > > > > > Just came across this code today - it's /probably/ a problem for > > > > > > > LTO. LLVM IR linking depends on the identifier to determine if > > > > > > > two structs are the same for linkage purposes - so if an > > > > > > > identifier is added for a non-linkage (local/not externally > > > > > > > visible) type, LLVM will consider them to be the same even though > > > > > > > they're unrelated: > > > > > > > ``` > > > > > > > namespace { struct t1 { int i; int j; }; } > > > > > > > t1 v1; > > > > > > > void *v3 = &v1; > > > > > > > ``` > > > > > > > ``` > > > > > > > namespace { struct t1 { int i; }; } > > > > > > > t1 v1; > > > > > > > void *v2 = &v1; > > > > > > > ``` > > > > > > > ``` > > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g -gcodeview && > > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && > > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep > > > > > > > "\"t1\"" > > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: > > > > > > > "t1", scope: !9, file: !3, line: 1, size: 64, flags: > > > > > > > DIFlagTypePassByValue, elements: !10, identifier: > > > > > > > "_ZTSN12_GLOBAL__N_12t1E") > > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g && > > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && > > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep > > > > > > > "\"t1\"" > > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: > > > > > > > "t1", scope: !9, file: !3, line: 1, size: 64, flags: > > > > > > > DIFlagTypePassByValue, elements: !10) > > > > > > > !21 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: > > > > > > > "t1", scope: !9, file: !17, line: 1, size: 32, flags: > > > > > > > DIFlagTypePassByValue, elements: !22) > > > > > > > ``` > > > > > > > > > > > > > > So in linking, now both `a.cpp` and `b.cpp` refer to a single > > > > > > > `t1` type (in this case, it looks like the first one - the 64 bit > > > > > > > wide one). > > > > > > > > > > > > > > If CodeView actually can't represent these two distinct types > > > > > > > with the same name in the same object file, so be it? But this > > > > > > > looks like it's likely to cause problems for consumers/users. > > > > > > If you use the MSVC ABI, we will assign unique identifiers to these > > > > > > two structs: > > > > > > ``` > > > > > > !9 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: > > > > > > "t1", scope: !10, file: !7, line: 1, size: 64, flags: > > > > > > DIFlagTypePassByValue, elements: !11, identifier: > > > > > > ".?AUt1@?A0xF964240C@@") > > > > > > ``` > > > > > > > > > > > > The `A0xF964240C` is set up here, and it is based on the source > > > > > > file path (the hash will only be unique when source file paths are > > > > > > unique across the build): > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/MicrosoftMangle.cpp#L464 > > > > > > ``` > > > > > > // It's important to make the mangled names unique because, when > > > > > > CodeView > > > > > > // debug info is in use, the debugger uses mangled type names to > > > > > > distinguish > > > > > > // between otherwise identically named types in anonymous > > > > > > namespaces. > > > > > > ``` > > > > > > > > > > > > Maybe this should use a "is MSVC ABI" check instead, since that is > > > > > > what controls whether the identifier will be unique, and the > > > > > > unique-ness is what matters for LTO and PDBs. > > > > > After thinking about it some more, see the comment I added here: > > > > > rG59320fc9d78237a5f9fdd6a5030d6554bb6976ce > > > > > > > > > > ``` > > > > > // If the type is not externally visible, it should be unique to the > > > > > current TU, > > > > > // and should not need an identifier to participate in type > > > > > deduplication. > > > > > // However, when emitting CodeView, the format internally uses these > > > > > // unique type name identifers for references between debug info. For > > > > > example, > > > > > // the method of a class in an anonymous namespace uses the identifer > > > > > to refer > > > > > // to its parent class. The Microsoft C++ ABI attempts to provide > > > > > unique names > > > > > // for such types, so when emitting CodeView, always use identifiers > > > > > for C++ > > > > > // types. This may create problems when attempting to emit CodeView > > > > > when the MS > > > > > // C++ ABI is not in use. > > > > > ``` > > > > > > > > > > I think type identifiers are pretty crucial for CodeView > > > > > functionality. The debugger won't be able to link a method to its > > > > > class without them. Therefore, I think it's better to leave this as > > > > > it is. The bad experience of duplicate type identifiers is better > > > > > than the lack of functionality from not emitting identifiers at all > > > > > for non-externally visible types. > > > > Yeah, thanks for explaining/adding the comment - that about covers it. > > > > > > > > > > > > Hmm, do these identifiers just need to only be unique within a single > > > > object file to provide the name referencing mechanics? Perhaps we could > > > > generate them later to ensure they're unique - rather than risk > > > > collapsing them while linking? > > > > > > > > (at least here's an obscure case where they seem to collide: Compiling > > > > the same file with different `-D` values, I guess the hash is only for > > > > the filename, not for other properties that might impact the output - > > > > so this ends up with two types with the same mangled name even on MSVC > > > > ABI: > > > > ``` > > > > namespace { struct t1 { int i; > > > > #ifdef SECOND > > > > int j; > > > > #endif > > > > }; } > > > > t1 v1; > > > > void* > > > > #ifdef SECOND > > > > v2 > > > > #else > > > > v3 > > > > #endif > > > > = &v1; > > > > ``` > > > > ``` > > > > clang++-tot -emit-llvm -S a.cpp -g -gcodeview -target x86_64-windows > > > > && clang++-tot -emit-llvm -S a.cpp -DSECOND -g -gcodeview -o b.ll > > > > -target x86_64-windows && ~/dev/llvm/build/default/bin/llvm-link -o > > > > ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat > > > > ab.ll | grep "\"t1\"" > > > > ``` > > > > ) > > > As Reid already noted, the unique id is required in CodeView to match > > > type definitions to forward references. Unfortunately, the unique id must > > > align across translation units. Compiling the same file multiple times > > > with preprocessor conditionals will result in two different types with > > > the same unique id. You'll see the same behavior with Visual Studio 2019. > > I think what I was getting at is that in the case of mangled names for > > internal linkage things - presumably there's no ABI compatibility concern? > > It's not like something compiled with MSVC or another version of Clang has > > to use the same naming scheme for these local entities, so long as the > > naming scheme doesn't collide? > > > > So at least for LTO if we generated the name later (in the CodeView > > backend) it could be more unique - an LTO build wouldn't collapse the two > > types together. It could still have an identifier (generated by the > > CodeView backend) that was used to reference decls and defs within that > > file. > Yes, we could calculate the unique identifier during LTO. However, that seems > like a very marginal improvement in the final user experience since most > users don't use LTO for incremental development, which is when type > information is most useful. > > On the other hand, if we want to establish the invariant that all > DICompositeType identifiers are unique within a compile unit, we could > probably go ahead and do that without impacting the user experience. We > already have renaming logic like this for `llvm::GlobalValue`s, and frontends > are check for existing values of that name if they don't want renaming to > occur. > if we want to establish the invariant that all DICompositeType identifiers > are unique within a compile unit, we could probably go ahead and do that > without impacting the user experience. We already have renaming logic like > this for llvm::GlobalValues, and frontends are check for existing values of > that name if they don't want renaming to occur. That invariant exists - that's the original purpose of the identifiers - LLVM IR linking deduplicates on the basis of the identifier. So we can't/don't rename them (because the point is to deduplicate based on them) - we keep the first one we see and drop the rest. Any idea what happens when non-LTO with MSVC and you end up with duplicate identifiers? Are both versions kept, but references end up ambiguous? Is the second version seen dropped as being a duplicate/unneeded? If non-LTO MSVC behavior drops later instances of the identifier as duplicate, then at least the current LTO behavior is no worse than that - though there'd be some opportunity to make at least LTO better, but yeah, marginally beneficial. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D45438/new/ https://reviews.llvm.org/D45438 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits