aeubanks added inline comments.

================
Comment at: clang/include/clang/Basic/AttrDocs.td:1416-1417
+not significant. This allows global constants with the same contents to be
+merged. This can break global pointer identity, i.e. two different globals have
+the same address.
+
----------------
rnk wrote:
> erichkeane wrote:
> > aeubanks wrote:
> > > erichkeane wrote:
> > > > aeubanks wrote:
> > > > > aaron.ballman wrote:
> > > > > > aeubanks wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > aeubanks wrote:
> > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > What happens for tentative definitions where the value 
> > > > > > > > > > isn't known? e.g.,
> > > > > > > > > > ```
> > > > > > > > > > [[clang::unnamed_addr]] int i1, i2;
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > What happens if the types are similar but not the same?
> > > > > > > > > > ```
> > > > > > > > > > [[clang::unnamed_addr]] signed int i1 = 32;
> > > > > > > > > > [[clang::unnamed_addr]] unsigned int i2 = 32;
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > Should we diagnose taking the address of such an attributed 
> > > > > > > > > > variable so users have some hope of spotting the 
> > > > > > > > > > non-conforming situations?
> > > > > > > > > > 
> > > > > > > > > > Does this attribute have impacts across translation unit 
> > > > > > > > > > boundaries (perhaps only when doing LTO) or only within a 
> > > > > > > > > > single TU?
> > > > > > > > > > 
> > > > > > > > > > What does this attribute do in C++ in the presence of 
> > > > > > > > > > constructors and destructors? e.g.,
> > > > > > > > > > ```
> > > > > > > > > > struct S {
> > > > > > > > > >   S();
> > > > > > > > > >   ~S();
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > [[clang::unnamed_addr]] S s1, s2; // Are these merged and 
> > > > > > > > > > there's only one ctor/dtor call?
> > > > > > > > > > ```
> > > > > > > > > globals are only mergeable if they're known to be constant 
> > > > > > > > > and have the same value/size. this can be done at compile 
> > > > > > > > > time only if the optimizer can see the constant values, or at 
> > > > > > > > > link time
> > > > > > > > > 
> > > > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > > > > 
> > > > > > > > > but yeah that does imply that we should warn when the 
> > > > > > > > > attribute is used on non const, non-POD globals. I'll update 
> > > > > > > > > this patch to do that
> > > > > > > > > 
> > > > > > > > > as mentioned in the description, we actually do want to take 
> > > > > > > > > the address of these globals for table-driven parsing, but we 
> > > > > > > > > don't care about identity equality
> > > > > > > > > globals are only mergeable if they're known to be constant 
> > > > > > > > > and have the same value/size. this can be done at compile 
> > > > > > > > > time only if the optimizer can see the constant values, or at 
> > > > > > > > > link time
> > > > > > > > >
> > > > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > > > 
> > > > > > > > Ahhhh that's good to know. So I assume we *will* merge these?
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > struct S {
> > > > > > > >   int i, j;
> > > > > > > >   float f;
> > > > > > > > };
> > > > > > > > 
> > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > > but yeah that does imply that we should warn when the 
> > > > > > > > > attribute is used on non const, non-POD globals. I'll update 
> > > > > > > > > this patch to do that
> > > > > > > > 
> > > > > > > > Thank you, I think that will be more user-friendly
> > > > > > > > 
> > > > > > > > > as mentioned in the description, we actually do want to take 
> > > > > > > > > the address of these globals for table-driven parsing, but we 
> > > > > > > > > don't care about identity equality
> > > > > > > > 
> > > > > > > > Yeah, I still wonder if we want to diagnose just the same -- if 
> > > > > > > > the address is never taken, there's not really a way to notice 
> > > > > > > > the optimization, but if the address is taken, you basically 
> > > > > > > > get UB (and I think we should explicitly document it as such). 
> > > > > > > > Given how easy it is to accidentally take the address of 
> > > > > > > > something (like via a reference in C++), I think we should warn 
> > > > > > > > by default, but still have a warning group for folks who want 
> > > > > > > > to live life dangerously.
> > > > > > > > > globals are only mergeable if they're known to be constant 
> > > > > > > > > and have the same value/size. this can be done at compile 
> > > > > > > > > time only if the optimizer can see the constant values, or at 
> > > > > > > > > link time
> > > > > > > > >
> > > > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > > > 
> > > > > > > > Ahhhh that's good to know. So I assume we *will* merge these?
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > struct S {
> > > > > > > >   int i, j;
> > > > > > > >   float f;
> > > > > > > > };
> > > > > > > > 
> > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > > ```
> > > > > > > yeah those are merged even just by clang
> > > > > > > 
> > > > > > > ```
> > > > > > > struct S {
> > > > > > >   int i, j;
> > > > > > >   float f;
> > > > > > > };
> > > > > > > 
> > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > 
> > > > > > > const void * f1() {
> > > > > > >   return &s1;
> > > > > > > }
> > > > > > > 
> > > > > > > const void * f2() {
> > > > > > >   return &s2;
> > > > > > > }
> > > > > > > 
> > > > > > > const void * f3() {
> > > > > > >   return &s3;
> > > > > > > }
> > > > > > > 
> > > > > > > $ ./build/rel/bin/clang++ -S -emit-llvm -o - -O2 /tmp/a.cc
> > > > > > > ```
> > > > > > > > 
> > > > > > > > > but yeah that does imply that we should warn when the 
> > > > > > > > > attribute is used on non const, non-POD globals. I'll update 
> > > > > > > > > this patch to do that
> > > > > > > > 
> > > > > > > > Thank you, I think that will be more user-friendly
> > > > > > > > 
> > > > > > > > > as mentioned in the description, we actually do want to take 
> > > > > > > > > the address of these globals for table-driven parsing, but we 
> > > > > > > > > don't care about identity equality
> > > > > > > > 
> > > > > > > > Yeah, I still wonder if we want to diagnose just the same -- if 
> > > > > > > > the address is never taken, there's not really a way to notice 
> > > > > > > > the optimization, but if the address is taken, you basically 
> > > > > > > > get UB (and I think we should explicitly document it as such). 
> > > > > > > > Given how easy it is to accidentally take the address of 
> > > > > > > > something (like via a reference in C++), I think we should warn 
> > > > > > > > by default, but still have a warning group for folks who want 
> > > > > > > > to live life dangerously.
> > > > > > > 
> > > > > > > I don't understand how there would be any UB. Especially if the 
> > > > > > > user only dereferences them, and isn't comparing pointers, which 
> > > > > > > is the requested use case.
> > > > > > > yeah those are merged even just by clang
> > > > > > 
> > > > > > Nice, thank you for the confirmation!
> > > > > > 
> > > > > > > I don't understand how there would be any UB. Especially if the 
> > > > > > > user only dereferences them, and isn't comparing pointers, which 
> > > > > > > is the requested use case.
> > > > > > 
> > > > > > That's just it -- nothing prevents the user from taking the address 
> > > > > > and comparing the pointers, which is no longer defined behavior 
> > > > > > with this attribute. It would require a static analysis check to 
> > > > > > catch this problem unless the compiler statically warns on taking 
> > > > > > the address in the first place (IMO, we shouldn't assume users will 
> > > > > > use the attribute properly and thus need no help to do so). I was 
> > > > > > also thinking about things like accidental sharing across thread 
> > > > > > boundaries -- but perhaps that's fine because the data is constant. 
> > > > > > I was also thinking that this potentially breaks `restrict` 
> > > > > > semantics but on reflection... that seems almost intentional given 
> > > > > > the goal of the attribute. But things along these lines are what 
> > > > > > have me worried -- the language assumes unique locations for 
> > > > > > objects, so I expect there's going to be fallout when object 
> > > > > > locations are no longer unique. If we can remove sharp edges for 
> > > > > > users without compromising the utility of the attribute, I think 
> > > > > > that's beneficial. Or are you saying that warning like this would 
> > > > > > basically compromise the utility?
> > > > > > > I don't understand how there would be any UB. Especially if the 
> > > > > > > user only dereferences them, and isn't comparing pointers, which 
> > > > > > > is the requested use case.
> > > > > > 
> > > > > > That's just it -- nothing prevents the user from taking the address 
> > > > > > and comparing the pointers, which is no longer defined behavior 
> > > > > > with this attribute. It would require a static analysis check to 
> > > > > > catch this problem unless the compiler statically warns on taking 
> > > > > > the address in the first place (IMO, we shouldn't assume users will 
> > > > > > use the attribute properly and thus need no help to do so). I was 
> > > > > > also thinking about things like accidental sharing across thread 
> > > > > > boundaries -- but perhaps that's fine because the data is constant. 
> > > > > > I was also thinking that this potentially breaks `restrict` 
> > > > > > semantics but on reflection... that seems almost intentional given 
> > > > > > the goal of the attribute. But things along these lines are what 
> > > > > > have me worried -- the language assumes unique locations for 
> > > > > > objects, so I expect there's going to be fallout when object 
> > > > > > locations are no longer unique. If we can remove sharp edges for 
> > > > > > users without compromising the utility of the attribute, I think 
> > > > > > that's beneficial. Or are you saying that warning like this would 
> > > > > > basically compromise the utility?
> > > > > 
> > > > > when you say "undefined behavior" do you mean "it's unspecified what 
> > > > > happens" or literally the C/C++ "undefined behavior" where the 
> > > > > compiler can assume it doesn't happen?
> > > > > 
> > > > > I don't think there's any UB in the C/C++ "undefined behavior" sense, 
> > > > > we're just dropping a C/C++ guarantee of unique pointer identity for 
> > > > > certain globals.
> > > > > 
> > > > > Yes I believe the warning would compromise the utility since the 
> > > > > underlying request behind this is a case where the user explicitly 
> > > > > wants to take the address of these globals for table driven parsing 
> > > > > but does not care about unique global identity. i.e. it's fine if we 
> > > > > have duplicate addresses in the table as long as each entry points to 
> > > > > the proper data.
> > > > I think this IS causing undefined behavior, any program that assumes 
> > > > the addresses aren't the same (such as inserting addresses into a map, 
> > > > explicitly destructing/initializing/etc), or are comparing addresses 
> > > > are now exhibiting UB (in the purest of C++ senses).  It isn't the 
> > > > 'taking the address' that is UB, it is comparing them, but 
> > > > unfortunately we don't have a good way to diagnose THAT.  I believe 
> > > > what Aaron is getting at is that the taking of the addresses should be 
> > > > diagnosed, particularly if we end up taking said address less-obviously.
> > > > 
> > > > It DOES have to be a Static Analysis type diagnostic however, since I 
> > > > don't think it is accurate enough.
> > > > 
> > > perhaps I'm arguing too literally (or am just wrong), but even if the 
> > > program behaves incorrectly due to comparisons of addresses now giving 
> > > different results, that's not necessarily UB. even if a program were 
> > > using pointers as a key into a map and two globals that previously were 
> > > separate entries are now one, that's not necessarily UB, that's just a 
> > > program behaving incorrectly.
> > > 
> > > unless there's a specific C/C++ rule that you have in mind that I'm 
> > > missing?
> > > 
> > > I'm happy to add a warning that we can turn off internally if people 
> > > agree that taking the address of an `unnamed_addr` global is dangerous in 
> > > general, not sure if that should be default on or off
> > The simplest one I can think of is something like a :
> > 
> > ```swap_containers(global1, global2); //swap that takes by ref, but doesn't 
> > check addresses```
> > 
> > Warnings are obviously trivially disable-able, and I'd be fine with making 
> > it a really fine-grained warning group, but I think the dangers of this 
> > attribute are significant.  MANY operations can have a precondition of "my 
> > parameters are different objects" that this can now cause you to trivially 
> > violate in a way that was never a problem before (since we give them 
> > different names!).
> I think, to Aaron's earlier question, warning on taking the address of such a 
> global kind of defeats the point of the attribute. If you *don't* take the 
> address of the constant global, LLVM will already mark it with `unnamed_addr` 
> or `local_unnamed_addr`, and the attribute is usually unnecessary. You would 
> only reach for this attribute if you are already taking the address of the 
> global.
> 
> And, to try to elaborate more on the UB situation, I believe LLVM does 
> perform optimizations like `&gv1 == &gv2` -> `false`:
> https://gcc.godbolt.org/z/3f5qd8vKr
> This does present a soundness issue, but it's already a bit of an existing 
> issue, since there are other ways to make globals alias, such as aliases.
ah I misunderstood how ICF worked in regards to `local_unnamed_addr`. right now 
we will ICF globals that don't have their address taken across any translation 
unit.

```
$ cat /tmp/a.c
extern const int h;
extern const int g;
const int h = 42;
const int g = 42;

int f() {
  return h;
}

$ cat /tmp/b.c
extern const int g;

int f();

int main() {
  return g + f();
}

$ clang -o /tmp/a /tmp/a.c /tmp/b.c -O2 -fdata-sections -ffunction-sections 
-fuse-ld=lld -Wl,--icf=safe -Wl,--print-icf-sections -Wl,--verbose
...
ld.lld: ICF needed 2 iterations
selected section /usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.h)
  removing identical section 
/usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.g)
...
```

so yeah +1 to @rnk's comment. this change would only benefit globals that have 
their address taken.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158223/new/

https://reviews.llvm.org/D158223

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to