dblaikie added inline comments.

================
Comment at: lib/CodeGen/CGDebugInfo.cpp:812-814
+  // CodeView types with C++ mangling need a type identifier.
+  if (CGM.getCodeGenOpts().EmitCodeView)
+    return true;
----------------
rnk wrote:
> dblaikie wrote:
> > bwyma wrote:
> > > dblaikie wrote:
> > > > rnk wrote:
> > > > > rnk wrote:
> > > > > > dblaikie wrote:
> > > > > > > Just came across this code today - it's /probably/ a problem for 
> > > > > > > LTO. LLVM IR linking depends on the identifier to determine if 
> > > > > > > two structs are the same for linkage purposes - so if an 
> > > > > > > identifier is added for a non-linkage (local/not externally 
> > > > > > > visible) type, LLVM will consider them to be the same even though 
> > > > > > > they're unrelated:
> > > > > > > ```
> > > > > > > namespace { struct t1 { int i; int j; }; }
> > > > > > > t1 v1;
> > > > > > > void *v3 = &v1;
> > > > > > > ```
> > > > > > > ```
> > > > > > > namespace { struct t1 { int i; }; }
> > > > > > > t1 v1;
> > > > > > > void *v2 = &v1;
> > > > > > > ```
> > > > > > > ```
> > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g -gcodeview  && 
> > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && 
> > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep 
> > > > > > > "\"t1\""
> > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: 
> > > > > > > "t1", scope: !9, file: !3, line: 1, size: 64, flags: 
> > > > > > > DIFlagTypePassByValue, elements: !10, identifier: 
> > > > > > > "_ZTSN12_GLOBAL__N_12t1E")
> > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g && 
> > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && 
> > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep 
> > > > > > > "\"t1\""
> > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: 
> > > > > > > "t1", scope: !9, file: !3, line: 1, size: 64, flags: 
> > > > > > > DIFlagTypePassByValue, elements: !10)
> > > > > > > !21 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: 
> > > > > > > "t1", scope: !9, file: !17, line: 1, size: 32, flags: 
> > > > > > > DIFlagTypePassByValue, elements: !22)
> > > > > > > ```
> > > > > > > 
> > > > > > > So in linking, now both `a.cpp` and `b.cpp` refer to a single 
> > > > > > > `t1` type (in this case, it looks like the first one - the 64 bit 
> > > > > > > wide one).
> > > > > > > 
> > > > > > > If CodeView actually can't represent these two distinct types 
> > > > > > > with the same name in the same object file, so be it? But this 
> > > > > > > looks like it's likely to cause problems for consumers/users.
> > > > > > If you use the MSVC ABI, we will assign unique identifiers to these 
> > > > > > two structs:
> > > > > > ```
> > > > > > !9 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: 
> > > > > > "t1", scope: !10, file: !7, line: 1, size: 64, flags: 
> > > > > > DIFlagTypePassByValue, elements: !11, identifier: 
> > > > > > ".?AUt1@?A0xF964240C@@")
> > > > > > ```
> > > > > > 
> > > > > > The `A0xF964240C` is set up here, and it is based on the source 
> > > > > > file path (the hash will only be unique when source file paths are 
> > > > > > unique across the build):
> > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/MicrosoftMangle.cpp#L464
> > > > > > ```
> > > > > >   // It's important to make the mangled names unique because, when 
> > > > > > CodeView
> > > > > >   // debug info is in use, the debugger uses mangled type names to 
> > > > > > distinguish
> > > > > >   // between otherwise identically named types in anonymous 
> > > > > > namespaces.
> > > > > > ```
> > > > > > 
> > > > > > Maybe this should use a "is MSVC ABI" check instead, since that is 
> > > > > > what controls whether the identifier will be unique, and the 
> > > > > > unique-ness is what matters for LTO and PDBs.
> > > > > After thinking about it some more, see the comment I added here: 
> > > > > rG59320fc9d78237a5f9fdd6a5030d6554bb6976ce
> > > > > 
> > > > > ```
> > > > > // If the type is not externally visible, it should be unique to the 
> > > > > current TU,
> > > > > // and should not need an identifier to participate in type 
> > > > > deduplication.
> > > > > // However, when emitting CodeView, the format internally uses these
> > > > > // unique type name identifers for references between debug info. For 
> > > > > example,
> > > > > // the method of a class in an anonymous namespace uses the identifer 
> > > > > to refer
> > > > > // to its parent class. The Microsoft C++ ABI attempts to provide 
> > > > > unique names
> > > > > // for such types, so when emitting CodeView, always use identifiers 
> > > > > for C++
> > > > > // types. This may create problems when attempting to emit CodeView 
> > > > > when the MS
> > > > > // C++ ABI is not in use.
> > > > > ```
> > > > > 
> > > > > I think type identifiers are pretty crucial for CodeView 
> > > > > functionality. The debugger won't be able to link a method to its 
> > > > > class without them. Therefore, I think it's better to leave this as 
> > > > > it is. The bad experience of duplicate type identifiers is better 
> > > > > than the lack of functionality from not emitting identifiers at all 
> > > > > for non-externally visible types.
> > > > Yeah, thanks for explaining/adding the comment - that about covers it.
> > > > 
> > > > 
> > > > Hmm, do these identifiers just need to only be unique within a single 
> > > > object file to provide the name referencing mechanics? Perhaps we could 
> > > > generate them later to ensure they're unique - rather than risk 
> > > > collapsing them while linking?
> > > > 
> > > > (at least here's an obscure case where they seem to collide: Compiling 
> > > > the same file with different `-D` values, I guess the hash is only for 
> > > > the filename, not for other properties that might impact the output - 
> > > > so this ends up with two types with the same mangled name even on MSVC 
> > > > ABI:
> > > > ```
> > > > namespace { struct t1 { int i;
> > > > #ifdef SECOND
> > > >   int j;
> > > > #endif
> > > > }; }
> > > > t1 v1;
> > > > void*
> > > > #ifdef SECOND
> > > > v2
> > > > #else
> > > > v3
> > > > #endif
> > > > = &v1;
> > > > ```
> > > > ```
> > > > clang++-tot -emit-llvm -S a.cpp -g -gcodeview  -target x86_64-windows 
> > > > && clang++-tot -emit-llvm -S a.cpp -DSECOND -g -gcodeview -o b.ll 
> > > > -target x86_64-windows && ~/dev/llvm/build/default/bin/llvm-link -o 
> > > > ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat 
> > > > ab.ll | grep "\"t1\""
> > > > ```
> > > > )
> > > As Reid already noted, the unique id is required in CodeView to match 
> > > type definitions to forward references. Unfortunately, the unique id must 
> > > align across translation units. Compiling the same file multiple times 
> > > with preprocessor conditionals will result in two different types with 
> > > the same unique id. You'll see the same behavior with Visual Studio 2019.
> > I think what I was getting at is that in the case of mangled names for 
> > internal linkage things - presumably there's no ABI compatibility concern? 
> > It's not like something compiled with MSVC or another version of Clang has 
> > to use the same naming scheme for these local entities, so long as the 
> > naming scheme doesn't collide?
> > 
> > So at least for LTO if we generated the name later (in the CodeView 
> > backend) it could be more unique - an LTO build wouldn't collapse the two 
> > types together. It could still have an identifier (generated by the 
> > CodeView backend) that was used to reference decls and defs within that 
> > file.
> Yes, we could calculate the unique identifier during LTO. However, that seems 
> like a very marginal improvement in the final user experience since most 
> users don't use LTO for incremental development, which is when type 
> information is most useful.
> 
> On the other hand, if we want to establish the invariant that all 
> DICompositeType identifiers are unique within a compile unit, we could 
> probably go ahead and do that without impacting the user experience. We 
> already have renaming logic like this for `llvm::GlobalValue`s, and frontends 
> are check for existing values of that name if they don't want renaming to 
> occur.
>  if we want to establish the invariant that all DICompositeType identifiers 
> are unique within a compile unit, we could probably go ahead and do that 
> without impacting the user experience. We already have renaming logic like 
> this for llvm::GlobalValues, and frontends are check for existing values of 
> that name if they don't want renaming to occur.

That invariant exists - that's the original purpose of the identifiers - LLVM 
IR linking deduplicates on the basis of the identifier. So we can't/don't 
rename them (because the point is to deduplicate based on them) - we keep the 
first one we see and drop the rest.

Any idea what happens when non-LTO with MSVC and you end up with duplicate 
identifiers? Are both versions kept, but references end up ambiguous? Is the 
second version seen dropped as being a duplicate/unneeded?

If non-LTO MSVC behavior drops later instances of the identifier as duplicate, 
then at least the current LTO behavior is no worse than that - though there'd 
be some opportunity to make at least LTO better, but yeah, marginally 
beneficial.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D45438/new/

https://reviews.llvm.org/D45438

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to