[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2024-01-12 Thread Sam James via cfe-commits

thesamesam wrote:

> I put union TBAA under the option that is disabled by default. I am also 
> considering to put pointer TBAA under option too, if it raises enough 
> concerns too.

Let's please file a bug once this is merged so we remember to come back to that.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2024-01-11 Thread Vlad Serebrennikov via cfe-commits


@@ -1,7 +1,7 @@
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
 
 // dr158: yes

Endilll wrote:

I'm sorry I missed that you're not writing a new test.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2024-01-11 Thread Bushev Dmitry via cfe-commits


@@ -105,13 +105,34 @@ static bool isValidBaseType(QualType QTy) {
 if (RD->hasFlexibleArrayMember())
   return false;
 // RD can be struct, union, class, interface or enum.
-// For now, we only handle struct and class.
-if (RD->isStruct() || RD->isClass())
+if (RD->isStruct() || RD->isClass() ||
+(RD->isUnion() && CodeGenOpts.UnionTBAA))
   return true;
   }
   return false;
 }
 
+// Give unique tag for compatible types.
+std::string CodeGenTBAA::getPointeeName(const Type *Ty) {

dybv-sc wrote:

Changed function to append to raw_ostream. 
`Ty` here is coming from `getPointeeType()` so it may be not canonical, right?

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2024-01-11 Thread Bushev Dmitry via cfe-commits


@@ -1,7 +1,7 @@
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -disable-llvm-passes 
-pedantic-errors -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -pedantic-errors 
-emit-llvm -o - | FileCheck %s
 
 // dr158: yes

dybv-sc wrote:

As I see it is already mentioned:
https://github.com/llvm/llvm-project/blob/114e6d7ba02f090117f2cb1ffeb9027cf80f335b/clang/test/CXX/drs/dr1xx.cpp#L792C1-L792C25


https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-28 Thread John McCall via cfe-commits


@@ -4598,8 +4602,7 @@ LValue CodeGenFunction::EmitLValueForField(LValue base,
   if (base.getTBAAInfo().isMayAlias() ||
   rec->hasAttr() || FieldType->isVectorType()) {
 FieldTBAAInfo = TBAAAccessInfo::getMayAliasInfo();
-  } else if (rec->isUnion()) {
-// TODO: Support TBAA for unions.
+  } else if (rec->isUnion() && !CGM.getCodeGenOpts().UnionTBAA) {

rjmccall wrote:

Well, I understand your intent and was asking whether you've put any effort in 
analyzing whether this rule actually matches the standard's aliasing rules.  
For what it's worth, I think it does, at least as currently written; IIRC, the 
committee has considered various weaker rules that permit union members to 
alias.  But it's also true that this is going to make us enforce a stricter 
rule than we have been.

This patch is both implementing the stricter rules and enabling them by 
default, right?  I think we should probably stage those separately, i.e. land 
this as an opt-in feature and then try turning it on by default in a  follow-up 
commit.  That way, if we see miscompiles from this, we can (temporarily) revert 
the small commit changing the default while we're analyzing that and not have 
to revert the whole implementation.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-28 Thread John McCall via cfe-commits


@@ -105,13 +105,34 @@ static bool isValidBaseType(QualType QTy) {
 if (RD->hasFlexibleArrayMember())
   return false;
 // RD can be struct, union, class, interface or enum.
-// For now, we only handle struct and class.
-if (RD->isStruct() || RD->isClass())
+if (RD->isStruct() || RD->isClass() ||
+(RD->isUnion() && CodeGenOpts.UnionTBAA))
   return true;
   }
   return false;
 }
 
+// Give unique tag for compatible types.
+std::string CodeGenTBAA::getPointeeName(const Type *Ty) {

rjmccall wrote:

Since this is always being used to build a larger string, could you make this 
append to a `raw_ostream`?

I guess `Ty` *is* guaranteed to be a canonical type here, because that's 
apparently a precondition of `getTypeInfoHelper`.  Could you assert that?

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-28 Thread John McCall via cfe-commits


@@ -184,13 +199,24 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type 
*Ty) {
 return getChar();
 
   // Handle pointers and references.
-  // TODO: Implement C++'s type "similarity" and consider dis-"similar"
-  // pointers distinct.
-  if (Ty->isPointerType() || Ty->isReferenceType())
-return createScalarTypeNode("any pointer", getChar(), Size);
+  // Pointer types never alias if their pointee type is distinct.
+  if ((Ty->isPointerType() || Ty->isReferenceType())) {
+llvm::MDNode *AnyPtr = createScalarTypeNode("any pointer", getChar(), 
Size);
+if (!CodeGenOpts.PointerTBAA)
+  return AnyPtr;
+unsigned PtrDepth = 0;
+do {
+  PtrDepth++;
+  Ty = Ty->getPointeeType().getTypePtr();
+} while (!Ty->getPointeeType().isNull());

rjmccall wrote:

You're right, you do need to look through member pointers here, although this 
means you're also treating member-pointer structure as if it were just pointer 
structure.  I guess that's okay for now.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-25 Thread Bushev Dmitry via cfe-commits

dybv-sc wrote:

> (As usual, please make any LLVM changes separately from Clang changes, 
> especially if they affect IR design.)

I splitted commit into 2 parts for llvm and clang. Should I make separate PRs 
for them as well?

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-22 Thread Bushev Dmitry via cfe-commits


@@ -105,13 +105,28 @@ static bool isValidBaseType(QualType QTy) {
 if (RD->hasFlexibleArrayMember())
   return false;
 // RD can be struct, union, class, interface or enum.
-// For now, we only handle struct and class.
-if (RD->isStruct() || RD->isClass())
+if (RD->isStruct() || RD->isClass() ||
+(RD->isUnion() && CodeGenOpts.UnionTBAA))
   return true;
   }
   return false;
 }
 
+std::string CodeGenTBAA::getPointeeName(const Type *Ty) {
+  if (isa(Ty)) {
+llvm::MDNode *ScalarMD = getTypeInfoHelper(Ty);
+auto  = ScalarMD->getOperand(CodeGenOpts.NewStructPathTBAA ? 2 : 0);
+assert(isa(Op) && "Expected MDString operand");
+return cast(Op)->getString().str();
+  }
+
+  if (Ty->isIncompleteType())
+return "";

dybv-sc wrote:

Removed. I was uncertain in a moment what to do in that case, but after 
revisiting C/C++ standard I learned that there is no difference between 
complete and incomplete types when considering their similarity/compatibility.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-22 Thread Bushev Dmitry via cfe-commits

dybv-sc wrote:

@rjmccall , I updated part of commit that handles pointers. Added comment with 
C/C++ standard references to explain my decisions. Could you please review it 
again?

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-21 Thread John McCall via cfe-commits

rjmccall wrote:

> From all of above, the only way void* and char* alias if they are compatible, 
> but they are not:

Ah, you're right, I had misremembered this.  6.2.5p31 requires them to have the 
same representation as each other, and there's a footnote about this being 
meant to imply "interchangeability as arguments to functions, return values 
from functions, and members of unions", but it doesn't go so far as to make 
them formally compatible.  And even if we wanted to read that as implying a 
sort of semi-compatibility that ought to affect aliasing, the same paragraph 
goes on to say that all pointers to structs must have the same representation, 
and presumably we wouldn't want *that* to affect aliasing.

Alright, I withdraw that comment; it's not formally required.  We may need to 
be more conservative about the aliasing of `void*` l-values on practical 
grounds, but we can let that be informed by results on the ground.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-21 Thread Bushev Dmitry via cfe-commits


@@ -4598,8 +4602,7 @@ LValue CodeGenFunction::EmitLValueForField(LValue base,
   if (base.getTBAAInfo().isMayAlias() ||
   rec->hasAttr() || FieldType->isVectorType()) {
 FieldTBAAInfo = TBAAAccessInfo::getMayAliasInfo();
-  } else if (rec->isUnion()) {
-// TODO: Support TBAA for unions.
+  } else if (rec->isUnion() && !CGM.getCodeGenOpts().UnionTBAA) {

dybv-sc wrote:

The goal of that patch is to re-utilize existing struct path TBAA metadata and 
amend it functionality to work for union types. Here I allow for that metadata 
to be generated for union members accesses, so yes there will be tbaa struct 
records with all members with offset 0. Later in code I adapted current 
struct-path-tbaa tree walkers to handle such situations where multiple fields 
have same offset.  

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-20 Thread John McCall via cfe-commits


@@ -216,6 +216,9 @@ ENUM_CODEGENOPT(StructReturnConvention, 
StructReturnConventionKind, 2, SRCK_Defa
 CODEGENOPT(RelaxAll  , 1, 0) ///< Relax all machine code instructions.
 CODEGENOPT(RelaxedAliasing   , 1, 0) ///< Set when -fno-strict-aliasing is 
enabled.
 CODEGENOPT(StructPathTBAA, 1, 0) ///< Whether or not to use struct-path 
TBAA.
+CODEGENOPT(UnionTBAA, 1, 0) ///< Whether or not to use struct-path TBAA on 
unions.
+CODEGENOPT(PointerTBAA, 1, 0) ///< Whether or not to generate TBAA on 
pointers.
+CODEGENOPT(ArrayTBAA, 1, 0) ///< Whether or not to generate TBAA on arrays.

rjmccall wrote:

Please match the style of surrounding lines, in which the commas are aligned.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-20 Thread John McCall via cfe-commits


@@ -184,13 +199,24 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type 
*Ty) {
 return getChar();
 
   // Handle pointers and references.
-  // TODO: Implement C++'s type "similarity" and consider dis-"similar"
-  // pointers distinct.
-  if (Ty->isPointerType() || Ty->isReferenceType())
-return createScalarTypeNode("any pointer", getChar(), Size);
+  // Pointer types never alias if their pointee type is distinct.
+  if ((Ty->isPointerType() || Ty->isReferenceType())) {
+llvm::MDNode *AnyPtr = createScalarTypeNode("any pointer", getChar(), 
Size);
+if (!CodeGenOpts.PointerTBAA)
+  return AnyPtr;
+unsigned PtrDepth = 0;
+do {
+  PtrDepth++;
+  Ty = Ty->getPointeeType().getTypePtr();
+} while (!Ty->getPointeeType().isNull());

rjmccall wrote:

`getPointeeType()` will look through a lot of types that you probably don't 
want to look through, including member pointers.  You should write this to 
specifically look for pointers.  (References can't occur in nested positions.)

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-20 Thread John McCall via cfe-commits


@@ -105,13 +105,28 @@ static bool isValidBaseType(QualType QTy) {
 if (RD->hasFlexibleArrayMember())
   return false;
 // RD can be struct, union, class, interface or enum.
-// For now, we only handle struct and class.
-if (RD->isStruct() || RD->isClass())
+if (RD->isStruct() || RD->isClass() ||
+(RD->isUnion() && CodeGenOpts.UnionTBAA))
   return true;
   }
   return false;
 }
 
+std::string CodeGenTBAA::getPointeeName(const Type *Ty) {
+  if (isa(Ty)) {
+llvm::MDNode *ScalarMD = getTypeInfoHelper(Ty);
+auto  = ScalarMD->getOperand(CodeGenOpts.NewStructPathTBAA ? 2 : 0);
+assert(isa(Op) && "Expected MDString operand");
+return cast(Op)->getString().str();
+  }
+
+  if (Ty->isIncompleteType())
+return "";

rjmccall wrote:

Why are incomplete types treated differently here?

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-20 Thread John McCall via cfe-commits


@@ -184,13 +199,24 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type 
*Ty) {
 return getChar();
 
   // Handle pointers and references.
-  // TODO: Implement C++'s type "similarity" and consider dis-"similar"
-  // pointers distinct.
-  if (Ty->isPointerType() || Ty->isReferenceType())
-return createScalarTypeNode("any pointer", getChar(), Size);
+  // Pointer types never alias if their pointee type is distinct.

rjmccall wrote:

This comment both is incorrect and doesn't match the code.  What you seem to be 
implementing here is the C++ similar-type rule, in which what matters is 
whether the pointer type sub-structure matches while ignoring qualifiers.

Unfortunately, this rule seems to treat `void*` and `char*` as different types. 
 That is wrong in C because those types are compatible, and we probably ought 
to use the C rule even in C++.

Also, in general, I would suggest that you write `getPointeeName` much more 
conservatively rather than assuming that you can just render an arbitrary type 
and it's going to be okay to treat different renderings as distinct for TBAA 
purposes.  For example, this is going to treat pointers to different vector 
types as non-aliasing, which worries me a lot because vector programmers are 
often pretty fast-and-loose.  You're also going to be stricter about ObjC than 
I'm comfortable with.  I would strongly suggest just doing this rendering for 
specific kinds of types, like records, and otherwise having some kind of 
fallback to the any-pointer metadata.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-20 Thread John McCall via cfe-commits


@@ -4080,7 +4080,11 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const 
ArraySubscriptExpr *E,
 E->getType(), !getLangOpts().isSignedOverflowDefined(), SignedIndices,
 E->getExprLoc(), , E->getBase());
 EltBaseInfo = ArrayLV.getBaseInfo();
-EltTBAAInfo = CGM.getTBAAInfoForSubobject(ArrayLV, E->getType());
+// If array is member of some aggregate, keep struct path TBAA information
+// about it.
+EltTBAAInfo = isa(Array) && CGM.getCodeGenOpts().ArrayTBAA
+  ? ArrayLV.getTBAAInfo()
+  : CGM.getTBAAInfoForSubobject(ArrayLV, E->getType());

rjmccall wrote:

Hmm.  Okay, so if I understand correctly, the basic idea here is that TBAA for 
array types is just TBAA for the underlying element types, so if we have TBAA 
for an array l-value, whether it's struct-path TBAA or not, subscripting into 
the array can just preserve that TBAA onto the element.  And then that gets 
complicated by the fact that we apparently actually use *char* as the TBAA for 
array types unless we're doing struct-path TBAA, so it's quite important that 
we actually override that or else we basically lose TBAA completely for these 
subscripts.

At the very least, this needs to be reflected in the comment; the overall 
situation is very non-obvious locally.  But I would actually prefer that we 
just unconditionally change the TBAA we use for array types, because it seems 
unjustifiable.  And as far as I can see, that should be an NFC refactor because 
it's not actually possible to do accesses directly of array type: arrays just 
decay into pointers in every context that would otherwise cause an access, and 
that decay ends up changing the TBAA we'd use anyway.

That is, I think you should consider just doing the array part of this patch 
unconditionally.  The union and pointer changes are real increases in precision 
/ risk, though, and should continue to be guarded with flags.

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

2023-12-20 Thread John McCall via cfe-commits


@@ -4598,8 +4602,7 @@ LValue CodeGenFunction::EmitLValueForField(LValue base,
   if (base.getTBAAInfo().isMayAlias() ||
   rec->hasAttr() || FieldType->isVectorType()) {
 FieldTBAAInfo = TBAAAccessInfo::getMayAliasInfo();
-  } else if (rec->isUnion()) {
-// TODO: Support TBAA for unions.
+  } else if (rec->isUnion() && !CGM.getCodeGenOpts().UnionTBAA) {

rjmccall wrote:

You've checked that this is the right rule for unions?  We can just record the 
union as a containing aggregate that happens to have all the different union 
members at offset 0?

https://github.com/llvm/llvm-project/pull/75177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits