aeubanks added inline comments.

================
Comment at: clang/include/clang/Basic/AttrDocs.td:1416-1417
+not significant. This allows global constants with the same contents to be
+merged. This can break global pointer identity, i.e. two different globals have
+the same address.
+
----------------
rnk wrote:
> aaron.ballman wrote:
> > aeubanks wrote:
> > > rnk wrote:
> > > > erichkeane wrote:
> > > > > aeubanks wrote:
> > > > > > erichkeane wrote:
> > > > > > > aeubanks wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > aeubanks wrote:
> > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > aeubanks wrote:
> > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > What happens for tentative definitions where the 
> > > > > > > > > > > > > value isn't known? e.g.,
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > [[clang::unnamed_addr]] int i1, i2;
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What happens if the types are similar but not the 
> > > > > > > > > > > > > same?
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > [[clang::unnamed_addr]] signed int i1 = 32;
> > > > > > > > > > > > > [[clang::unnamed_addr]] unsigned int i2 = 32;
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Should we diagnose taking the address of such an 
> > > > > > > > > > > > > attributed variable so users have some hope of 
> > > > > > > > > > > > > spotting the non-conforming situations?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Does this attribute have impacts across translation 
> > > > > > > > > > > > > unit boundaries (perhaps only when doing LTO) or only 
> > > > > > > > > > > > > within a single TU?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What does this attribute do in C++ in the presence of 
> > > > > > > > > > > > > constructors and destructors? e.g.,
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > struct S {
> > > > > > > > > > > > >   S();
> > > > > > > > > > > > >   ~S();
> > > > > > > > > > > > > };
> > > > > > > > > > > > > 
> > > > > > > > > > > > > [[clang::unnamed_addr]] S s1, s2; // Are these merged 
> > > > > > > > > > > > > and there's only one ctor/dtor call?
> > > > > > > > > > > > > ```
> > > > > > > > > > > > globals are only mergeable if they're known to be 
> > > > > > > > > > > > constant and have the same value/size. this can be done 
> > > > > > > > > > > > at compile time only if the optimizer can see the 
> > > > > > > > > > > > constant values, or at link time
> > > > > > > > > > > > 
> > > > > > > > > > > > so nothing would happen in any of the cases you've 
> > > > > > > > > > > > given.
> > > > > > > > > > > > 
> > > > > > > > > > > > but yeah that does imply that we should warn when the 
> > > > > > > > > > > > attribute is used on non const, non-POD globals. I'll 
> > > > > > > > > > > > update this patch to do that
> > > > > > > > > > > > 
> > > > > > > > > > > > as mentioned in the description, we actually do want to 
> > > > > > > > > > > > take the address of these globals for table-driven 
> > > > > > > > > > > > parsing, but we don't care about identity equality
> > > > > > > > > > > > globals are only mergeable if they're known to be 
> > > > > > > > > > > > constant and have the same value/size. this can be done 
> > > > > > > > > > > > at compile time only if the optimizer can see the 
> > > > > > > > > > > > constant values, or at link time
> > > > > > > > > > > >
> > > > > > > > > > > > so nothing would happen in any of the cases you've 
> > > > > > > > > > > > given.
> > > > > > > > > > > 
> > > > > > > > > > > Ahhhh that's good to know. So I assume we *will* merge 
> > > > > > > > > > > these?
> > > > > > > > > > > 
> > > > > > > > > > > ```
> > > > > > > > > > > struct S {
> > > > > > > > > > >   int i, j;
> > > > > > > > > > >   float f;
> > > > > > > > > > > };
> > > > > > > > > > > 
> > > > > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > > > > > ```
> > > > > > > > > > > 
> > > > > > > > > > > > but yeah that does imply that we should warn when the 
> > > > > > > > > > > > attribute is used on non const, non-POD globals. I'll 
> > > > > > > > > > > > update this patch to do that
> > > > > > > > > > > 
> > > > > > > > > > > Thank you, I think that will be more user-friendly
> > > > > > > > > > > 
> > > > > > > > > > > > as mentioned in the description, we actually do want to 
> > > > > > > > > > > > take the address of these globals for table-driven 
> > > > > > > > > > > > parsing, but we don't care about identity equality
> > > > > > > > > > > 
> > > > > > > > > > > Yeah, I still wonder if we want to diagnose just the same 
> > > > > > > > > > > -- if the address is never taken, there's not really a 
> > > > > > > > > > > way to notice the optimization, but if the address is 
> > > > > > > > > > > taken, you basically get UB (and I think we should 
> > > > > > > > > > > explicitly document it as such). Given how easy it is to 
> > > > > > > > > > > accidentally take the address of something (like via a 
> > > > > > > > > > > reference in C++), I think we should warn by default, but 
> > > > > > > > > > > still have a warning group for folks who want to live 
> > > > > > > > > > > life dangerously.
> > > > > > > > > > > > globals are only mergeable if they're known to be 
> > > > > > > > > > > > constant and have the same value/size. this can be done 
> > > > > > > > > > > > at compile time only if the optimizer can see the 
> > > > > > > > > > > > constant values, or at link time
> > > > > > > > > > > >
> > > > > > > > > > > > so nothing would happen in any of the cases you've 
> > > > > > > > > > > > given.
> > > > > > > > > > > 
> > > > > > > > > > > Ahhhh that's good to know. So I assume we *will* merge 
> > > > > > > > > > > these?
> > > > > > > > > > > 
> > > > > > > > > > > ```
> > > > > > > > > > > struct S {
> > > > > > > > > > >   int i, j;
> > > > > > > > > > >   float f;
> > > > > > > > > > > };
> > > > > > > > > > > 
> > > > > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > > > > > ```
> > > > > > > > > > yeah those are merged even just by clang
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > struct S {
> > > > > > > > > >   int i, j;
> > > > > > > > > >   float f;
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > > > > 
> > > > > > > > > > const void * f1() {
> > > > > > > > > >   return &s1;
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > const void * f2() {
> > > > > > > > > >   return &s2;
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > const void * f3() {
> > > > > > > > > >   return &s3;
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > $ ./build/rel/bin/clang++ -S -emit-llvm -o - -O2 /tmp/a.cc
> > > > > > > > > > ```
> > > > > > > > > > > 
> > > > > > > > > > > > but yeah that does imply that we should warn when the 
> > > > > > > > > > > > attribute is used on non const, non-POD globals. I'll 
> > > > > > > > > > > > update this patch to do that
> > > > > > > > > > > 
> > > > > > > > > > > Thank you, I think that will be more user-friendly
> > > > > > > > > > > 
> > > > > > > > > > > > as mentioned in the description, we actually do want to 
> > > > > > > > > > > > take the address of these globals for table-driven 
> > > > > > > > > > > > parsing, but we don't care about identity equality
> > > > > > > > > > > 
> > > > > > > > > > > Yeah, I still wonder if we want to diagnose just the same 
> > > > > > > > > > > -- if the address is never taken, there's not really a 
> > > > > > > > > > > way to notice the optimization, but if the address is 
> > > > > > > > > > > taken, you basically get UB (and I think we should 
> > > > > > > > > > > explicitly document it as such). Given how easy it is to 
> > > > > > > > > > > accidentally take the address of something (like via a 
> > > > > > > > > > > reference in C++), I think we should warn by default, but 
> > > > > > > > > > > still have a warning group for folks who want to live 
> > > > > > > > > > > life dangerously.
> > > > > > > > > > 
> > > > > > > > > > I don't understand how there would be any UB. Especially if 
> > > > > > > > > > the user only dereferences them, and isn't comparing 
> > > > > > > > > > pointers, which is the requested use case.
> > > > > > > > > > yeah those are merged even just by clang
> > > > > > > > > 
> > > > > > > > > Nice, thank you for the confirmation!
> > > > > > > > > 
> > > > > > > > > > I don't understand how there would be any UB. Especially if 
> > > > > > > > > > the user only dereferences them, and isn't comparing 
> > > > > > > > > > pointers, which is the requested use case.
> > > > > > > > > 
> > > > > > > > > That's just it -- nothing prevents the user from taking the 
> > > > > > > > > address and comparing the pointers, which is no longer 
> > > > > > > > > defined behavior with this attribute. It would require a 
> > > > > > > > > static analysis check to catch this problem unless the 
> > > > > > > > > compiler statically warns on taking the address in the first 
> > > > > > > > > place (IMO, we shouldn't assume users will use the attribute 
> > > > > > > > > properly and thus need no help to do so). I was also thinking 
> > > > > > > > > about things like accidental sharing across thread boundaries 
> > > > > > > > > -- but perhaps that's fine because the data is constant. I 
> > > > > > > > > was also thinking that this potentially breaks `restrict` 
> > > > > > > > > semantics but on reflection... that seems almost intentional 
> > > > > > > > > given the goal of the attribute. But things along these lines 
> > > > > > > > > are what have me worried -- the language assumes unique 
> > > > > > > > > locations for objects, so I expect there's going to be 
> > > > > > > > > fallout when object locations are no longer unique. If we can 
> > > > > > > > > remove sharp edges for users without compromising the utility 
> > > > > > > > > of the attribute, I think that's beneficial. Or are you 
> > > > > > > > > saying that warning like this would basically compromise the 
> > > > > > > > > utility?
> > > > > > > > > > I don't understand how there would be any UB. Especially if 
> > > > > > > > > > the user only dereferences them, and isn't comparing 
> > > > > > > > > > pointers, which is the requested use case.
> > > > > > > > > 
> > > > > > > > > That's just it -- nothing prevents the user from taking the 
> > > > > > > > > address and comparing the pointers, which is no longer 
> > > > > > > > > defined behavior with this attribute. It would require a 
> > > > > > > > > static analysis check to catch this problem unless the 
> > > > > > > > > compiler statically warns on taking the address in the first 
> > > > > > > > > place (IMO, we shouldn't assume users will use the attribute 
> > > > > > > > > properly and thus need no help to do so). I was also thinking 
> > > > > > > > > about things like accidental sharing across thread boundaries 
> > > > > > > > > -- but perhaps that's fine because the data is constant. I 
> > > > > > > > > was also thinking that this potentially breaks `restrict` 
> > > > > > > > > semantics but on reflection... that seems almost intentional 
> > > > > > > > > given the goal of the attribute. But things along these lines 
> > > > > > > > > are what have me worried -- the language assumes unique 
> > > > > > > > > locations for objects, so I expect there's going to be 
> > > > > > > > > fallout when object locations are no longer unique. If we can 
> > > > > > > > > remove sharp edges for users without compromising the utility 
> > > > > > > > > of the attribute, I think that's beneficial. Or are you 
> > > > > > > > > saying that warning like this would basically compromise the 
> > > > > > > > > utility?
> > > > > > > > 
> > > > > > > > when you say "undefined behavior" do you mean "it's unspecified 
> > > > > > > > what happens" or literally the C/C++ "undefined behavior" where 
> > > > > > > > the compiler can assume it doesn't happen?
> > > > > > > > 
> > > > > > > > I don't think there's any UB in the C/C++ "undefined behavior" 
> > > > > > > > sense, we're just dropping a C/C++ guarantee of unique pointer 
> > > > > > > > identity for certain globals.
> > > > > > > > 
> > > > > > > > Yes I believe the warning would compromise the utility since 
> > > > > > > > the underlying request behind this is a case where the user 
> > > > > > > > explicitly wants to take the address of these globals for table 
> > > > > > > > driven parsing but does not care about unique global identity. 
> > > > > > > > i.e. it's fine if we have duplicate addresses in the table as 
> > > > > > > > long as each entry points to the proper data.
> > > > > > > I think this IS causing undefined behavior, any program that 
> > > > > > > assumes the addresses aren't the same (such as inserting 
> > > > > > > addresses into a map, explicitly destructing/initializing/etc), 
> > > > > > > or are comparing addresses are now exhibiting UB (in the purest 
> > > > > > > of C++ senses).  It isn't the 'taking the address' that is UB, it 
> > > > > > > is comparing them, but unfortunately we don't have a good way to 
> > > > > > > diagnose THAT.  I believe what Aaron is getting at is that the 
> > > > > > > taking of the addresses should be diagnosed, particularly if we 
> > > > > > > end up taking said address less-obviously.
> > > > > > > 
> > > > > > > It DOES have to be a Static Analysis type diagnostic however, 
> > > > > > > since I don't think it is accurate enough.
> > > > > > > 
> > > > > > perhaps I'm arguing too literally (or am just wrong), but even if 
> > > > > > the program behaves incorrectly due to comparisons of addresses now 
> > > > > > giving different results, that's not necessarily UB. even if a 
> > > > > > program were using pointers as a key into a map and two globals 
> > > > > > that previously were separate entries are now one, that's not 
> > > > > > necessarily UB, that's just a program behaving incorrectly.
> > > > > > 
> > > > > > unless there's a specific C/C++ rule that you have in mind that I'm 
> > > > > > missing?
> > > > > > 
> > > > > > I'm happy to add a warning that we can turn off internally if 
> > > > > > people agree that taking the address of an `unnamed_addr` global is 
> > > > > > dangerous in general, not sure if that should be default on or off
> > > > > The simplest one I can think of is something like a :
> > > > > 
> > > > > ```swap_containers(global1, global2); //swap that takes by ref, but 
> > > > > doesn't check addresses```
> > > > > 
> > > > > Warnings are obviously trivially disable-able, and I'd be fine with 
> > > > > making it a really fine-grained warning group, but I think the 
> > > > > dangers of this attribute are significant.  MANY operations can have 
> > > > > a precondition of "my parameters are different objects" that this can 
> > > > > now cause you to trivially violate in a way that was never a problem 
> > > > > before (since we give them different names!).
> > > > I think, to Aaron's earlier question, warning on taking the address of 
> > > > such a global kind of defeats the point of the attribute. If you 
> > > > *don't* take the address of the constant global, LLVM will already mark 
> > > > it with `unnamed_addr` or `local_unnamed_addr`, and the attribute is 
> > > > usually unnecessary. You would only reach for this attribute if you are 
> > > > already taking the address of the global.
> > > > 
> > > > And, to try to elaborate more on the UB situation, I believe LLVM does 
> > > > perform optimizations like `&gv1 == &gv2` -> `false`:
> > > > https://gcc.godbolt.org/z/3f5qd8vKr
> > > > This does present a soundness issue, but it's already a bit of an 
> > > > existing issue, since there are other ways to make globals alias, such 
> > > > as aliases.
> > > ah I misunderstood how ICF worked in regards to `local_unnamed_addr`. 
> > > right now we will ICF globals that don't have their address taken across 
> > > any translation unit.
> > > 
> > > ```
> > > $ cat /tmp/a.c
> > > extern const int h;
> > > extern const int g;
> > > const int h = 42;
> > > const int g = 42;
> > > 
> > > int f() {
> > >   return h;
> > > }
> > > 
> > > $ cat /tmp/b.c
> > > extern const int g;
> > > 
> > > int f();
> > > 
> > > int main() {
> > >   return g + f();
> > > }
> > > 
> > > $ clang -o /tmp/a /tmp/a.c /tmp/b.c -O2 -fdata-sections 
> > > -ffunction-sections -fuse-ld=lld -Wl,--icf=safe -Wl,--print-icf-sections 
> > > -Wl,--verbose
> > > ...
> > > ld.lld: ICF needed 2 iterations
> > > selected section 
> > > /usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.h)
> > >   removing identical section 
> > > /usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.g)
> > > ...
> > > ```
> > > 
> > > so yeah +1 to @rnk's comment. this change would only benefit globals that 
> > > have their address taken.
> > > I think, to Aaron's earlier question, warning on taking the address of 
> > > such a global kind of defeats the point of the attribute. If you *don't* 
> > > take the address of the constant global, LLVM will already mark it with 
> > > unnamed_addr or local_unnamed_addr, and the attribute is usually 
> > > unnecessary. You would only reach for this attribute if you are already 
> > > taking the address of the global.
> > 
> > Thanks for the information! That somehow makes me even *less* comfortable 
> > with the attribute because the only times you'd reach for it are precisely 
> > the times when it can bite you. Having some sort of static analysis check 
> > paired with it to help give it some guard rails for problematic uses would 
> > make me more comfortable. However, we certainly have other attributes that 
> > are "use at your own peril" without such guard rails, so I don't insist 
> > (but I still strongly encourage), but then I would insist on more 
> > comprehensive documentation calling out what misuse looks like and what 
> > potential problems come from misuse.
> > 
> > > perhaps I'm arguing too literally (or am just wrong), but even if the 
> > > program behaves incorrectly due to comparisons of addresses now giving 
> > > different results, that's not necessarily UB. even if a program were 
> > > using pointers as a key into a map and two globals that previously were 
> > > separate entries are now one, that's not necessarily UB, that's just a 
> > > program behaving incorrectly.
> > 
> > The reason why I believe it is UB is because everything else in the 
> > language model believes object addresses are distinct. So there's no way to 
> > determine what the impact is of two objects whose addresses which should be 
> > distinct but aren't, when that's observed by the program. It's undefined 
> > because we have no definition of what happens in this case. The program 
> > could be incorrect and just behave poorly, crash, cause data loss, etc. (As 
> > a concrete example, the `restrict` keyword effectively boils down to 
> > reasoning about whether two pointers overlap; if you pass the address of 
> > two distinct objects, they won't overlap but if they've been merged with 
> > this attribute, now they will overlap and the `restrict` requirements are 
> > broken.)
> > 
> > Regardless, I wasn't trying to get us hung up on UB as a standards term of 
> > art. I am worried about the user experience with the attribute. 
> > Optimization attributes that silently break program correctness do not 
> > provide a particularly good user experience and so if there's a practical 
> > way we can help users out without negatively impacting correct uses of the 
> > attribute, I think we should implement it. C and C++ already get a bad rap 
> > for safety and security because tooling is unhelpful, so I think we need to 
> > do better than we've done in the past in terms of diagnostic behavior.
> So, I think we disagree on the substance of the issue here, and perhaps we 
> should try to get more opinions on Discourse. The feature request really is 
> to have an optimization attribute to mark global constant data as mergeable 
> by the linker, and for the user to promise that the address of the global is 
> not significant (named).
I'll post something to discourse


================
Comment at: clang/lib/Sema/SemaDecl.cpp:14524
+      VD->hasAttr<UnnamedAddrAttr>() &&
+      (!VD->getType().isPODType(getASTContext()) ||
+       !VD->getType().isConstQualified())) {
----------------
rnk wrote:
> I don't think `isPODType` really describes what we want. I think we want 
> `isConstantStorage` which @dblaikie just added, which means essentially, can 
> this global be placed in a readonly section.
@dblaikie is 
[this](https://github.com/llvm/llvm-project/blob/08d7377b67358496a409080fac22f3f7c077fb63/clang/lib/AST/Type.cpp#L124)
 missing a check for a trivial constructor? using `isConstantStorage` the test 
is failing on `const Foo` by warning there when it shouldn't be


================
Comment at: clang/lib/Sema/SemaDecl.cpp:14526
+       !VD->getType().isConstQualified())) {
+    Diag(VD->getLocation(), diag::warn_attribute_unnamed_addr);
+  }
----------------
rnk wrote:
> Should we ignore (drop) the attribute when we encounter this situation? Is it 
> safe to mark mutable data `unnamed_addr`, or does it just do nothing?
it should be safe since merging mutable globals would be illegal even if the 
address is insignificant


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158223/new/

https://reviews.llvm.org/D158223

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to